Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection

Chen ZHANG, Runmin CONG, Qinwei LIN, Lin MA, Feng LI, Yao ZHAO, Sam KWONG

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

66 Citations (Scopus)


The popularity and promotion of depth maps have brought new vigor and vitality into salient object detection (SOD), and a mass of RGB-D SOD algorithms have been proposed, mainly concentrating on how to better integrate cross-modality features from RGB image and depth map. For the cross-modality interaction in feature encoder, existing methods either indiscriminately treat RGB and depth modalities, or only habitually utilize depth cues as auxiliary information of the RGB branch. Different from them, we reconsider the status of two modalities and propose a novel Cross-modality Discrepant Interaction Network (CDINet) for RGB-D SOD, which differentially models the dependence of two modalities according to the feature representations of different layers. To this end, two components are designed to implement the effective cross-modality interaction: 1) the RGB-induced Detail Enhancement (RDE) module leverages RGB modality to enhance the details of the depth features in low-level encoder stage. 2) the Depth-induced Semantic Enhancement (DSE) module transfers the object positioning and internal consistency of depth features to the RGB branch in high-level encoder stage. Furthermore, we also design a Dense Decoding Reconstruction (DDR) structure, which constructs a semantic block by combining multi-level encoder features to upgrade the skip connection in the feature decoding. Extensive experiments on five benchmark datasets demonstrate that our network outperforms $15$ state-of-the-art methods both quantitatively and qualitatively. Our code is publicly available at:
Original languageEnglish
Title of host publicationProceedings of the 29th ACM International Conference on Multimedia
Publication statusPublished - 2021
Externally publishedYes
EventThe 29th ACM International Conference on Multimedia - Virtual, Online, China
Duration: 20 Oct 202124 Oct 2021


ConferenceThe 29th ACM International Conference on Multimedia
CityVirtual, Online

Bibliographical note

This work was supported by the Beijing Nova Program under Grant Z201100006820016, in part by the National Key Research and Development of China under Grant 2018AAA0102100, in part by the National Natural Science Foundation of China under Grant 62002014, Grant U1936212, in part by Elite Scientist Sponsorship Program by the China Association for Science and Technology under Grant 2020QNRC001, in part by General Research Fund-Research Grants Council (GRF-RGC) under Grant 9042816 (CityU 11209819), Grant 9042958 (CityU 11203820), in part by Hong Kong Scholars Program under Grant XJ2020040, in part by CAAI-Huawei MindSpore Open Fund, and in part by China Postdoctoral Science Foundation under Grant 2020T130050, Grant 2019M660438.


  • dense decoding reconstruction
  • discrepant interaction
  • RGB-D images
  • salient object detection


Dive into the research topics of 'Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection'. Together they form a unique fingerprint.

Cite this