FSCDiff : Frequency-Spatial Entangled Conditional Diffusion model for Underwater Salient Object Detection

  • Hua LI
  • , Gaowei LIN
  • , Zhiyuan LI
  • , Sam KWONG
  • , Runmin CONG*
  • *Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Referred Conference Paperpeer-review

Abstract

Salient object detection (SOD) plays a crucial role in image understanding and visual guidance. However, due to the complexity of underwater environments, the accuracy of underwater salient object detection is often low. To improve the accuracy and robustness of underwater salient object detection, different from the existing spatial domain aware RGB-D methods that rely on pixel-level probabilities, we propose a novel Fourier-Spatial Entangled Conditional Diffusion model (FSCDiff) for underwater salient object detection. The FSCDiff aims to address the insufficient representation and boundary shift issues in underwater salient object detection by leveraging Fourier-domain information and the powerful multi-step iterative generation capability of diffusion models. The FSCDiff framework consists of two key components: the Dual-Domain Entanglement Enhancement Block (DTEB) and the Stable Time-step Mask Prediction Module (STMP). DTEB utilizes Fourier-spatial entanglement learning to fully exploit the Fourier and spatial domain information of RGB images and depth maps, thereby optimizing feature representation. STMP takes advantage of the excellent multi-step iterative mechanism of diffusion models to enhance the accuracy and robustness of the segmentation results. Comprehensive experimental results indicate that our FSCDiff method outperforms the state-of-the-art approaches on the USOD10K and USOD datasets. The source code is available at: https://github.com/lgwplay/FSCDiff.

Original languageEnglish
Title of host publicationMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
PublisherAssociation for Computing Machinery, Inc
Pages8379-8388
Number of pages10
ISBN (Electronic)9798400720352
DOIs
Publication statusPublished - 27 Oct 2025
Event33rd ACM International Conference on Multimedia, MM 2025 - Dublin, Ireland
Duration: 27 Oct 202531 Oct 2025

Publication series

NameMM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025

Conference

Conference33rd ACM International Conference on Multimedia, MM 2025
Country/TerritoryIreland
CityDublin
Period27/10/2531/10/25

Bibliographical note

Publisher Copyright:
© 2025 ACM.

Funding

This work was supported in part by in part by the National Natural Science Foundation of China under Grant 62201179, Grant 62461018 and Grant 62471278; in part by the Hainan Provincial Natural Science Foundation of China under Grant No.625YXQN594; in part by the Innovation Platform for "New Star of South China Sea" of Hainan Province under Grant No. NHXXRCXM202306; in part by the Taishan Scholar Project of Shandong Province under Grant tsqn202306079, and in part by the Research Grants Council of the Hong Kong Special Administrative Region, China under Grant STG5/E-103/24-R.

Keywords

  • diffusion model
  • fourier frequency
  • underwater salient object detection

Fingerprint

Dive into the research topics of 'FSCDiff : Frequency-Spatial Entangled Conditional Diffusion model for Underwater Salient Object Detection'. Together they form a unique fingerprint.

Cite this