Abstract
Text-to-Image (T2I) diffusion models exhibit concerning tendencies to generate harmful imagery that perpetuates social biases and stereotypes, posing significant ethical risks in real-world applications. While existing mitigation approaches predominantly employ black-box methodologies through dataset augmentation or constrained fine-tuning, they face critical limitations, including high data acquisition costs and potential exacerbation of stereotypes during model retraining. Inspired by neuroscience principles where neurological dysfunction often stems from aberrant neural activation patterns, we propose a novel framework, StereoClinic, targeting the root cause of stereotype generation through direct neural intervention. Our solution introduces two synergistic components: Diffusion Deep Taylor Decomposition (DDTD) for precisely localizing stereotype-related neurons via Layer-wise Relevance Propagation (LRP) attribution analysis, and Stereotype Neuron Suppression (SNS) implementing targeted activation damping to neutralize bias propagation. Through extensive empirical evaluations across multiple bias dimensions, we demonstrate that our method achieves significant stereotype mitigation without compromising image quality or requiring additional training data. This neuro-inspired approach establishes a new paradigm for model interpretability and ethical alignment in generative AI systems.
| Original language | English |
|---|---|
| Title of host publication | MM '25: Proceedings of the 33rd ACM International Conference on Multimedia |
| Editors | Cathal GURRIN, Klaus SCHOEFFMANN, Min ZHANG |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 11453-11461 |
| Number of pages | 9 |
| ISBN (Electronic) | 9798400720352 |
| DOIs | |
| Publication status | Published - 27 Oct 2025 |
| Event | 33rd ACM International Conference on Multimedia - Dublin, Ireland Duration: 27 Oct 2025 → 31 Oct 2025 |
Conference
| Conference | 33rd ACM International Conference on Multimedia |
|---|---|
| Abbreviated title | MM 2025 |
| Country/Territory | Ireland |
| City | Dublin |
| Period | 27/10/25 → 31/10/25 |
Bibliographical note
Publisher Copyright:© 2025 ACM.
Funding
This work was supported in part by Key Program of Guangdong Province under Grant 2021QN02X166, and in part by the National Natural Science Foundation of China (Project No. 72031003).
Keywords
- diffusion models
- neural suppression
- stereotypes
- text-to-image