Evaluating Visual Quality of Autostereoscopic 3D Displays via a Multimodal Parameter Perception Network

  • Liqian ZHANG
  • , Feng YUAN
  • , Haoran XIE
  • , Fu Lee WANG
  • , Zhaoqing PAN*
  • *Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Referred Conference Paperpeer-review

Abstract

Evaluating the visual quality of autostereoscopic 3D displays is crucial for quantifying their stereoscopic viewing experience and optimizing display performance. Existing quality evaluation methods primarily predict the visual quality of autostereoscopic 3D displays by indirectly learning display parameter information from image content. However, these methods fail to explicitly model the relationship between display parameters and visual quality, thereby limiting their prediction accuracy. To address this problem, a Multimodal Parameter Perception Network (MPPNet)-based visual quality assessment method is proposed in this paper, which treats display parameters as textual modalities to explicitly establish their relationship with visual quality. To effectively understand the semantic information of display parameter texts, a Contrastive Language-Image Pretraining (CLIP)-based adaptive text encoder is proposed to generate robust semantic representations by capturing both general and domain-specific semantic embeddings. In parallel, a hierarchical vision encoder is adopted to extract visual representations from display images, which simulates the human binocular perception by capturing multi-level visual features from the left and right views. To achieve comprehensive cross-modal interaction, a mamba-based cross-modal fusion module is proposed to fuse textual and visual representations of display parameters by capturing both shallow and deep correlations. Extensive experimental results demonstrate that the proposed MPPNet achieves state-of-the-art performance in evaluating the visual quality of autostereoscopic 3D displays.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages7055-7063
Number of pages9
ISBN (Electronic)9798400720352
DOIs
Publication statusPublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Bibliographical note

Publisher Copyright:
© 2025 ACM.

Funding

This work was supported in part by the National Key R&D Program of China under Grant No. 2023YFB3611500, in part by the National Natural Science Foundation of China under Grant No. 62322116.

Keywords

  • autostereoscopic 3d display
  • clip-based adaptive text encoder
  • display parameter
  • mamba-based cross-modal fusion module
  • visual quality assessment

Fingerprint

Dive into the research topics of 'Evaluating Visual Quality of Autostereoscopic 3D Displays via a Multimodal Parameter Perception Network'. Together they form a unique fingerprint.

Cite this