Abstract
Current explainable AI approaches to Deep Neural Networks (DNNs) primarily aim to understand network behavior by identifying key input features that influence predictions. However, these methods often fail to identify vulnerable regions in the input that are sensitive to minor perturbations and pose significant security risks. The vulnerability of DNNs is typically studied through adversarial examples, but traditional norm-based algorithms, lacking spatial constraints, distribute perturbations across the entire image, obscuring these critical areas. To overcome this limitation, we propose the Vulnerable Region Discovery Attack (VrdAttack), an efficient method that leverages Differential Evolution to generate diverse one-pixel perturbations, enabling the discovery of vulnerable regions and uncovering pixel-level vulnerabilities in Deep Neural Networks (DNNs). Our extensive experiments on CIFAR-10 and ImageNet demonstrate that our proposed VrdAttack outperforms existing methods in identifying diverse critical weak points in an input, highlighting model-specific vulnerabilities, and revealing the impact of adversarial training on these vulnerable regions.
| Original language | English |
|---|---|
| Article number | 163 |
| Journal | Machine Learning |
| Volume | 114 |
| Issue number | 7 |
| Early online date | 6 Jun 2025 |
| DOIs | |
| Publication status | Published - Jul 2025 |
Bibliographical note
Publisher Copyright:© The Author(s) 2025.
Funding
Open Access Publishing Support Fund provided by Lingnan University. This work was supported by NSFC (Grant No. 62250710682), the Guangdong Provincial Key Laboratory (Grant No. 2020B121201001), and the Program for Guangdong Introducing Innovative and Entrepreneurial Teams (Grant No. 2017ZT07X386).
Keywords
- Diverse adversarial examples
- Deep neural network vulnerability
- Adversarial attacks
- Explainability
- Differential evolution