Abstract
We explore the way to alleviate the label-hungry problem in a semi-supervised setting for 3D instance segmentation. To leverage the unlabeled data to boost model performance, we present a novel Two-Way Inter-label Self-Training framework named TWIST. It exploits inherent correlations between semantic understanding and instance information of a scene. Specifically, we consider two kinds of pseudo labels for semantic- and instance-level supervision. Our key design is to provide object-level information for denoising pseudo labels and make use of their correlation for two-way mutual enhancement, thereby iteratively promoting the pseudo-label qualities. TWIST attains leading performance on both ScanNet and S3DIS, compared to recent 3D pre-training approaches, and can cooperate with them to further enhance performance, e.g., +4.4% AP50 on 1%-label ScanNet data-efficient benchmark. Code is available at https://github.com/dvlab-research/TWIST.
Original language | English |
---|---|
Title of host publication | Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 |
Publisher | IEEE Computer Society |
Pages | 1090-1099 |
Number of pages | 10 |
ISBN (Electronic) | 9781665469463 |
DOIs | |
Publication status | Published - 2022 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2022 IEEE.
Funding
This work is supported in part by the Hong Kong Research Grant Council - Early Career Scheme (Grant No. 27209621), HKU Startup Fund, HKU Seed Fund for Basic Research, SmartMore donation fund, and project MMT-p2-21 of the Shun Hing Institute of Advanced Engineering, CUHK.
Keywords
- 3D from multi-view and sensors
- categorization
- Recognition: detection
- retrieval
- Self-& semi-& meta- & unsupervised learning