TY - JOUR
T1 - A Weakly Supervised Learning Framework for Salient Object Detection via Hybrid Labels
AU - CONG, Runmin
AU - QIN, Qi
AU - ZHANG, Chen
AU - JIANG, Qiuping
AU - WANG, Shiqi
AU - ZHAO, Yao
AU - KWONG, Sam
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2023/2
Y1 - 2023/2
N2 - Fully-supervised salient object detection (SOD) methods have made great progress, but such methods often rely on a large number of pixel-level annotations, which are time-consuming and labour-intensive. In this paper, we focus on a new weakly-supervised SOD task under hybrid labels, where the supervision labels include a large number of coarse labels generated by the traditional unsupervised method and a small number of real labels. To address the issues of label noise and quantity imbalance in this task, we design a new pipeline framework with three sophisticated training strategies. In terms of model framework, we decouple the task into label refinement sub-task and salient object detection sub-task, which cooperate with each other and train alternately. Specifically, the R-Net is designed as a two-stream encoder-decoder model equipped with Blender with Guidance and Aggregation Mechanisms (BGA), aiming to rectify the coarse labels for more reliable pseudo-labels, while the S-Net is a replaceable SOD network supervised by the pseudo labels generated by the current R-Net. Note that, we only need to use the trained S-Net for testing. Moreover, in order to guarantee the effectiveness and efficiency of network training, we design three training strategies, including alternate iteration mechanism, group-wise incremental mechanism, and credibility verification mechanism. Experiments on five SOD benchmarks show that our method achieves competitive performance against weakly-supervised/unsupervised methods both qualitatively and quantitatively. The code and results can be found from the link of https://rmcong.github.io/proj Hybrid-Label-SOD.html.
AB - Fully-supervised salient object detection (SOD) methods have made great progress, but such methods often rely on a large number of pixel-level annotations, which are time-consuming and labour-intensive. In this paper, we focus on a new weakly-supervised SOD task under hybrid labels, where the supervision labels include a large number of coarse labels generated by the traditional unsupervised method and a small number of real labels. To address the issues of label noise and quantity imbalance in this task, we design a new pipeline framework with three sophisticated training strategies. In terms of model framework, we decouple the task into label refinement sub-task and salient object detection sub-task, which cooperate with each other and train alternately. Specifically, the R-Net is designed as a two-stream encoder-decoder model equipped with Blender with Guidance and Aggregation Mechanisms (BGA), aiming to rectify the coarse labels for more reliable pseudo-labels, while the S-Net is a replaceable SOD network supervised by the pseudo labels generated by the current R-Net. Note that, we only need to use the trained S-Net for testing. Moreover, in order to guarantee the effectiveness and efficiency of network training, we design three training strategies, including alternate iteration mechanism, group-wise incremental mechanism, and credibility verification mechanism. Experiments on five SOD benchmarks show that our method achieves competitive performance against weakly-supervised/unsupervised methods both qualitatively and quantitatively. The code and results can be found from the link of https://rmcong.github.io/proj Hybrid-Label-SOD.html.
KW - blender
KW - group-wise incremental mechanism
KW - hybrid labels
KW - Salient object detection
KW - weakly supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85137886257&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2022.3205182
DO - 10.1109/TCSVT.2022.3205182
M3 - Journal Article (refereed)
SN - 1051-8215
VL - 33
SP - 534
EP - 548
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
IS - 2
ER -