Abstract
AI-Generated Images (AGIs) are increasingly used in various multimedia applications, making it essential to accurately assess the quality of AGIs to enhance user experience and optimize generative models. However, existing AI-Generated Image Quality Assessment (AGIQA) methods struggle to align fine-grained cross-modal semantics or capture diverse quality factors across multiple perceptual levels, limiting their effectiveness. To address these limitations, a Cross-modal Hierarchical Perception Network (CHPNet) is proposed for AGIQA, which simulates the hierarchical visual perception and adaptive decision-making mechanisms of the human brain. The proposed CHPNet comprises two key components: a Multi-level Cross-modal Interaction Network (MCINet) and an Adaptive Hierarchical Scoring Network (AHSNet). The MCINet is designed to generate multi-level quality-aware features by aligning and fusing visual and textual features at multiple semantic levels. To enhance semantic alignment, a Cross-modal Bidirectional Semantic Alignment Module (CBSAM) is built to improve the quality-aware feature extraction of MCINet by mitigating the semantic gap between cross-modal features. The AHSNet is developed to adaptively evaluate the importance of each perceptual level and assign importance-based weights to compute the final quality score. Extensive experiments on three AGIQA databases have demonstrated the effectiveness of the proposed CHPNet. The code of the proposed CHPNet is released at https://github.com/NUIST-Videocoding/CHPNet.git
| Original language | English |
|---|---|
| Pages (from-to) | 291-302 |
| Number of pages | 12 |
| Journal | IEEE Transactions on Broadcasting |
| Volume | 72 |
| Issue number | 1 |
| Early online date | 28 Oct 2025 |
| DOIs | |
| Publication status | Published - Mar 2026 |
Bibliographical note
Publisher Copyright:© 1963-12012 IEEE.
Funding
This work was supported by the National Natural Science Foundation of China under Grant 62322116.
Keywords
- AI-generated images
- quality assessment
- cross-modal hierarchical perception network
- multi-level cross-modal interaction network
- adaptive hierarchical scoring network
Fingerprint
Dive into the research topics of 'Assessing AI-Generated Image Quality Using a Cross-Modal Hierarchical Perception Network'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver