You Only Need One Thing One Click: Self-Training for Weakly Supervised 3D Scene Understanding

Zhengzhe LIU, Xiaojuan QI, Chi-Wing FU*

*Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsBook ChapterResearchpeer-review

Abstract

3D scene understanding, e.g., point cloud semantic and instance segmentation, often requires large-scale annotated training data, but clearly, point-wise labels are too tedious to prepare. While some recent methods propose to train a 3D network with small percentages of point labels, we take the approach to an extreme and propose “One Thing One Click,” meaning that the annotator only needs to label one point per object. To leverage these extremely sparse labels in network training, we design a novel self-training approach, in which we iteratively conduct the training and label propagation, facilitated by a graph propagation module. Also, we adopt a relation network to generate the per-category prototype to enhance the pseudo label quality and guide the iterative training. Besides, our model can be compatible to 3D instance segmentation equipped with a point-clustering strategy. Experimental results on both ScanNet-v2 and S3DIS show that our self-training approach, with extremely sparse annotations, outperforms all existing weakly supervised methods for 3D semantic and instance segmentation by a large margin, and our results are also comparable to those of the fully supervised counterparts. Codes and models are available at https://github.com/liuzhengzhe/One-Thing-One-Click.
Original languageEnglish
Title of host publicationDeep Learning for 3D Vision: Algorithms and Applications
EditorsXiaoli LI, Xulei YANG, Hao SU
PublisherWorld Scientific Publishing Co.
Chapter3
Pages57-89
Number of pages33
ISBN (Electronic)9789811286490, 9789811286506
ISBN (Print)9789811286483
DOIs
Publication statusPublished - Sept 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2024 by World Scientific Publishing Co. Pte. Ltd.

Fingerprint

Dive into the research topics of 'You Only Need One Thing One Click: Self-Training for Weakly Supervised 3D Scene Understanding'. Together they form a unique fingerprint.

Cite this