CIR-Net : Cross-Modality Interaction and Refinement for RGB-D Salient Object Detection

Runmin CONG, Qinwei LIN, Chen ZHANG*, Chongyi LI*, Xiaochun CAO, Qingming HUANG, Yao ZHAO

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

71 Citations (Scopus)


Focusing on the issue of how to effectively capture and utilize cross-modality information in RGB-D salient object detection (SOD) task, we present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement. For the cross-modality interaction, 1) a progressive attention guided integration unit is proposed to sufficiently integrate RGB-D feature representations in the encoder stage, and 2) a convergence aggregation structure is proposed, which flows the RGB and depth decoding features into the corresponding RGB-D decoding streams via an importance gated fusion unit in the decoder stage. For the cross-modality refinement, we insert a refinement middleware structure between the encoder and the decoder, in which the RGB, depth, and RGB-D encoder features are further refined by successively using a self-modality attention refinement unit and a cross-modality weighting refinement unit. At last, with the gradually refined features, we predict the saliency map in the decoder stage. Extensive experiments on six popular RGB-D SOD benchmarks demonstrate that our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively. The code and results can be found from the link of

Original languageEnglish
Pages (from-to)6800-6815
Number of pages16
JournalIEEE Transactions on Image Processing
Early online date26 Oct 2022
Publication statusPublished - 2022
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 1992-2012 IEEE.


  • cross-modality attention
  • cross-modality interaction
  • RGB-D images
  • Salient object detection


Dive into the research topics of 'CIR-Net : Cross-Modality Interaction and Refinement for RGB-D Salient Object Detection'. Together they form a unique fingerprint.

Cite this