ViDSOD-100 : A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection

Junhao LIN, Lei ZHU, Jiaxing SHEN, Huazhu FU, Qing ZHANG, Liansheng WANG

Research output: Journal PublicationsJournal Article (refereed)peer-review

3 Citations (Scopus)

Abstract

With the rapid development of depth sensor, more and more RGB-D videos could be obtained. Identifying the foreground in RGB-D videos is a fundamental and important task. However, the existing salient object detection (SOD) works only focus on either static RGB-D images or RGB videos, ignoring the collaborating of RGB-D and video information. In this paper, we first collect a new annotated RGB-D video SOD (ViDSOD-100) dataset, which contains 100 videos within a total of 9362 frames, acquired from diverse natural scenes. All the frames in each video are manually annotated to a high-quality saliency annotation. Moreover, we propose a new baseline model, named attentive triple-fusion network (ATF-Net), for RGB-D video salient object detection. Our method aggregates the appearance information from an input RGB image, spatio-temporal information from an estimated motion map, and the geometry information from the depth map by devising three modality-specific branches and a multi-modality integration branch. The modality-specific branches extract the representation of different inputs, while the multi-modality integration branch combines the multi-level modality-specific features by introducing the encoder feature aggregation (MEA) modules and decoder feature aggregation (MDA) modules. The experimental findings conducted on both our newly introduced ViDSOD-100 dataset and the well-established DAVSOD dataset highlight the superior performance of the proposed ATF-Net.This performance enhancement is demonstrated both quantitatively and qualitatively, surpassing the capabilities of current state-of-the-art techniques across various domains, including RGB-D saliency detection, video saliency detection, and video object segmentation. We shall release our data, our results, and our code upon the publication of this work.
Original languageEnglish
Article number11
Pages (from-to)5173-5191
Number of pages19
JournalInternational Journal of Computer Vision
Volume132
Issue number11
Early online date4 Jun 2024
DOIs
Publication statusPublished - Nov 2024

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.

Funding

This work is supported by the Guangzhou Municipal Science and Technology Project (Grant No. 2023A03J0671), the InnoHK funding launched by Innovation and Technology Commission, Hong Kong SAR, the Guangzhou Industrial Information and Intelligent Key Laboratory Project (No. 2024A03J0628), the Guangzhou-HKUST(GZ) Joint Funding Program (No. 2024A03J0618), the Ministry of Science and Technology of the People’s Republic of China (STI2030-Major Projects2021ZD0201900), the National Natural Science Foundation of China (Grant No. 62371409), and National Research Foundation Singapore under its AI Singapore Programme (Award Number: AISG2-GC-2023-007).

Keywords

  • Neural networks
  • RGB-D video dataset
  • Salient object detection

Fingerprint

Dive into the research topics of 'ViDSOD-100 : A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection'. Together they form a unique fingerprint.

Cite this