Cross-modality fusion and progressive integration network for saliency prediction on stereoscopic 3D images

Yudong MAO, Qiuping JIANG, Runmin CONG, Wei GAO, Feng SHAO, Sam KWONG

Research output: Journal PublicationsJournal Article (refereed)peer-review

12 Citations (Scopus)


Traditional 2D image-based saliency prediction models suffer from unsatisfactory performance when dealing with stereoscopic 3D (S3D) images because eye movements in the case of freely viewing S3D images are demonstrated to be guided by both RGB and depth features. This paper studies the problem of saliency prediction on S3D images, where the interactions between RGB and depth modalities are both taken into account. Specifically, we design a novel deep neural network named Cross-modality Fusion and Progressive Integration Network (CFPI-Net) to address this problem. It consists of a Multi-level Cross-modality Feature Fusion (MCFF) module and a Multi-stage Progressive Feature Integration (MPFI) module. The MCFF module first captures hierarchical contexture features from each modality and then effectively fuses the hierarchical contexture features from different modalities at each level. The MPFI module involves multiple cascaded deeply supervised feature integration (DSFI) blocks in which the low-level and high-level cross-modality features are progressively integrated using the integrated features in the previous stage as a guidance. Our proposed CFPI-Net benefits from the advantages of multi-level feature representation, cross-modality feature fusion, and multi-stage progressive feature integration, which hereby fully boost the performance. Experimental results on two benchmark datasets demonstrate that CFPI-Net outperforms state-of-the-art saliency prediction methods both quantitatively and qualitatively. All the results and relevant codes will be made available to the public.

Original languageEnglish
Pages (from-to)2435-2448
Number of pages14
JournalIEEE Transactions on Multimedia
Early online date19 May 2021
Publication statusPublished - 2022
Externally publishedYes

Bibliographical note

Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grants 61901236, 62071261, and 62002014, in part by the Ningbo Natural Science Foundation under Grant 2019A610097, in part by the Zhejiang Natural Science Foundation under Grant R18F010008, in part by the Fundamental Research Funds for the Provincial Universities of Zhejiang (SJLZ2020003), in part by the Beijing Nova Program (Z201100006820016), in part by Young Elite Scientist Sponsorship Program by the China Association for Science and Technology (2020QNRC001), and in part by the K.C. Wong Magna Fund at Ningbo University.

Publisher Copyright:
© 1999-2012 IEEE.


  • Cross-modality fusion
  • Progressive integration
  • Saliency prediction
  • Stereoscopic 3D image
  • Visual attention


Dive into the research topics of 'Cross-modality fusion and progressive integration network for saliency prediction on stereoscopic 3D images'. Together they form a unique fingerprint.

Cite this