CosCAD: Cross-Modal CAD Model Retrieval and Pose Alignment from a Single Image

Zhikun WEN, Honghua CHEN, Zhe ZHU, Zeyong WEI, Liangliang NAN, Mingqiang WEI*

*Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsBook ChapterResearchpeer-review

Abstract

We introduce CosCAD, a novel framework for CAD model retrieval and pose alignment from a single image. Unlike previous methods that rely solely on image data and are sensitive to occlusion, CosCAD leverages cross-modal contrastive learning to integrate image, CAD model, and text features into a shared representation space. This improves retrieval accuracy, even when visual cues are ambiguous or objects are partially occluded. To enhance retrieval efficiency, we introduce Tri-Indexed Quantized Graph Search, which accelerates CAD retrieval using an optimized indexing structure. For pose alignment, we combine image and geometric features of CAD models to predict object rotation and scale, using an attention-based method to capture spatial correlations within the scene. This improves multi-object location estimation and 9-DoF pose alignment. Experimental results demonstrate that CosCAD outperforms existing methods such as ROCA and SPARC in both CAD model retrieval and pose estimation, while offering more than 6x speedup in retrieval for large datasets, underscoring its potential for interactive environments and autonomous systems.
Original languageEnglish
Title of host publicationComputational Visual Media: 13th International Conference, CVM 2025, Hong Kong SAR, China, April 19–21, 2025, Proceedings, Part I
EditorsPiotr DIDYK, Junhui HOU
PublisherSpringer
Chapter19
Pages367-387
Number of pages21
ISBN (Electronic)9789819658091
ISBN (Print)9789819658084
DOIs
Publication statusPublished - 2025
Externally publishedYes

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume15663
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Funding

This work was supported by the National Natural Science Foundation of China (No. T2322012, No. 62172218), and the Guangdong Basic and Applied Basic Research Foundation (No. 2022A1515010170).

Fingerprint

Dive into the research topics of 'CosCAD: Cross-Modal CAD Model Retrieval and Pose Alignment from a Single Image'. Together they form a unique fingerprint.

Cite this