Projects per year
Abstract
Depth information for RGB-D Salient Object Detection(SOD) is important and conventional deep models are usually relied on the CNN feature extractors. The long-range contextual dependencies, dense modeling on the saliency decoder, and multi-task learning assistance are usually ignored. In this work, we propose a Dual Swin-Transformer-based Mutual Interactive Network (DTMINet), aiming to learn contextualized, dense, and edge-aware features for RGB-D SOD. We adopt the Swin-Transformer as the visual backbone to extract contextualized features. A self-attention-based Cross-Modality Interaction module is proposed to strengthen the visual backbone for cross-modal interaction. In addition, a Gated Modality Attention module is designed for cross-modal fusion. At different decoding stages, enhanced with dense connections and progressively merge the multi-level encoding features with the proposed Dense Saliency Decoder. Considering the depth quality issue, a Skip Convolution module is introduced to provide guidance to the RGB modality for the saliency prediction. In addition, we add the edge prediction to the saliency predictor to regularize the learning process. Comprehensive experiments on five standard RGB-D SOD benchmark datasets over four evaluation metrics demonstrate the superiority of the proposed method.
Original language | English |
---|---|
Article number | 126779 |
Pages (from-to) | 126779 |
Journal | Neurocomputing |
Volume | 559 |
Early online date | 17 Sept 2023 |
DOIs | |
Publication status | Published - 28 Nov 2023 |
Bibliographical note
This work is supported by Key Project of Science and Technology Innovation 2030, China supported by the Ministry of Science and Technology of China (Grant No. 2018AAA0101301 ), the Hong Kong Innovation and Technology Commission (InnoHK Project CIMDA) , and in part by the Hong Kong GRF-RGC General Research Fund under Grant 11209819 (CityU 9042816) and Grant 11203820 ( 9042598 ).Publisher Copyright:
© 2023
Keywords
- Dense connection
- Edge supervision
- Gated modality attention
- RGB-D images
- Salient object detection
- Self-attention
- Swin-transformer
Fingerprint
Dive into the research topics of 'Dual Swin-transformer based mutual interactive network for RGB-D salient object detection'. Together they form a unique fingerprint.-
Adaptive Dynamic Range Enhancement Oriented to High Dynamic Display (面向高動態顯示的自適應動態範圍增強)
KWONG, S. T. W. (PI), KUO, C.-C. J. (CoI), WANG, S. (CoI) & ZHANG, X. (CoI)
Research Grants Council (HKSAR)
1/01/21 → 31/12/24
Project: Grant Research
-
Intelligent Ultra High Definition Video Encoder Optimization for Future Versatile Video Coding (用于未来多功能视频编码的智能超高清视频编码器优化)
KWONG, S. T. W. (PI), ZHOU, M. (CoI), KUO, C.-C. J. (CoI) & WANG, S. (CoI)
Research Grants Council (HKSAR)
1/01/20 → 30/06/23
Project: Grant Research