Abstract
Lifting multi-view 2D instance segmentation to a radiance field has proven to be effective to enhance 3D understanding. Existing methods for addressing multi-view inconsistency in 2D segmentations either use linear assignment for end-to-end lifting, resulting in inferior results, or adopt a two-stage solution, which is limited by complex pre- or post-processing. In this work, we design a new object-aware lifting approach to realize an end-to-end lifting pipeline based on the 3D Gaussian representation, such that we can jointly learn Gaussian-level point features and a global object-level codebook across multiple views. To start, we augment each Gaussian point with an additional Gaussian-level feature learned using a contrastive loss to encode instance information. Importantly, we introduce a learnable \textit{object-level codebook} to account for individual objects in the scene for an explicit object-level understanding and associate the encoded object-level features with the Gaussian-level point features for segmentation predictions. Further, we formulate the association learning module and the noisy label filtering module for effective and robust codebook learning. We conduct experiments on three benchmarks: LERF-Masked, Replica, and Messy Rooms datasets. Both qualitative and quantitative results manifest that our approach clearly outperforms existing methods in terms of segmentation quality and time efficiency.
| Original language | English |
|---|---|
| Title of host publication | Proceedings : 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) |
| Pages | 3656-3665 |
| Number of pages | 10 |
| ISBN (Electronic) | 9798331543648 |
| DOIs | |
| Publication status | Published - Jun 2025 |
| Event | 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025 - Nashville, United States Duration: 11 Jun 2025 → 15 Jun 2025 |
Publication series
| Name | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
|---|---|
| Publisher | IEEE Computer Society |
| ISSN (Print) | 1063-6919 |
Conference
| Conference | 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2025 |
|---|---|
| Country/Territory | United States |
| City | Nashville |
| Period | 11/06/25 → 15/06/25 |
Bibliographical note
Publisher Copyright:© 2025 IEEE.
Funding
This work is supported by the InnoHK Clusters of the Hong Kong SAR Government via the Hong Kong Centre for Logistics Robotics; and the Research Grants Council of the Hong Kong Special Administrative Region, China, under Project CUHK 14200824.
Keywords
- 3d scene segmentation
- end-to-end
- gaussian splatting
Fingerprint
Dive into the research topics of 'Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver