Abstract
Evaluating the visual quality of autostereoscopic 3D displays is crucial for quantifying their stereoscopic viewing experience and optimizing display performance. Existing quality evaluation methods primarily predict the visual quality of autostereoscopic 3D displays by indirectly learning display parameter information from image content. However, these methods fail to explicitly model the relationship between display parameters and visual quality, thereby limiting their prediction accuracy. To address this problem, a Multimodal Parameter Perception Network (MPPNet)-based visual quality assessment method is proposed in this paper, which treats display parameters as textual modalities to explicitly establish their relationship with visual quality. To effectively understand the semantic information of display parameter texts, a Contrastive Language-Image Pretraining (CLIP)-based adaptive text encoder is proposed to generate robust semantic representations by capturing both general and domain-specific semantic embeddings. In parallel, a hierarchical vision encoder is adopted to extract visual representations from display images, which simulates the human binocular perception by capturing multi-level visual features from the left and right views. To achieve comprehensive cross-modal interaction, a mamba-based cross-modal fusion module is proposed to fuse textual and visual representations of display parameters by capturing both shallow and deep correlations. Extensive experimental results demonstrate that the proposed MPPNet achieves state-of-the-art performance in evaluating the visual quality of autostereoscopic 3D displays.
| Original language | English |
|---|---|
| Title of host publication | MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025 |
| Publisher | Association for Computing Machinery, Inc |
| Pages | 7055-7063 |
| Number of pages | 9 |
| ISBN (Electronic) | 9798400720352 |
| DOIs | |
| Publication status | Published - 27 Oct 2025 |
| Event | 33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland Duration: 27 Oct 2025 → 31 Oct 2025 |
Publication series
| Name | MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025 |
|---|
Conference
| Conference | 33rd ACM International Conference on Multimedia, MM 2025 |
|---|---|
| Country/Territory | Ireland |
| City | Dublin |
| Period | 27/10/25 → 31/10/25 |
Bibliographical note
Publisher Copyright:© 2025 ACM.
Funding
This work was supported in part by the National Key R&D Program of China under Grant No. 2023YFB3611500, in part by the National Natural Science Foundation of China under Grant No. 62322116.
Keywords
- autostereoscopic 3d display
- clip-based adaptive text encoder
- display parameter
- mamba-based cross-modal fusion module
- visual quality assessment