Abstract
Deoccluding the hidden portions of objects in a scene is a formidable task, particularly when addressing real-world scenes. In this paper, we present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, a foundation model for object-level scene deocclusion. Leveraging the rich prior of pre-trained models, we first design the parallel variational autoencoder, which produces a full-view feature map that simultaneously encodes multiple complete objects, and the visible-to-complete latent generator, which learns to implicitly predict the full-view feature map from partial-view feature map and text prompts extracted from the incomplete objects in the input image. To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning, avoiding tedious annotations of the amodal masks and occluded regions. At inference, we devise a layer-wise deocclusion strategy to improve efficiency while maintaining the deocclusion quality. Extensive experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin. Our method can also be extended to cross-domain scenes and novel categories that are not covered by the training set. Further, we demonstrate the deocclusion applicability of PACO in single-view 3D scene reconstruction and object recomposition.
| Original language | English |
|---|---|
| Title of host publication | Proceedings : SIGGRAPH 2024 Conference Papers |
| Editors | Andres BURBANO, Denis ZORIN, Wojciech JAROSZ |
| Publisher | Association for Computing Machinery, Inc |
| Number of pages | 11 |
| ISBN (Electronic) | 9798400705250 |
| DOIs | |
| Publication status | Published - 13 Jul 2024 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2024 ACM.
Funding
This work is supported by Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CUHK 14201921).
Keywords
- c.
- completion-w.
- image recomposition
- object
- scene deocclusion