Abstract
There is a prevailing trend towards fusing multi-modal information for 3D object detection (3OD). However, challenges related to computational efficiency, plug-and-play capabilities, and accurate feature alignment have not been adequately addressed in the design of multi-modal fusion networks. In this paper, we present PointSee , a lightweight, flexible, and effective multi-modal fusion solution to facilitate various 3OD networks by se mantic feature e nhancement of point clouds (e.g., LiDAR or RGB-D data) assembled with scene images. Beyond the existing wisdom of 3OD, PointSee consists of a hidden module (HM) and a seen module (SM): HM decorates point clouds using 2D image information in an offline fusion manner, leading to minimal or even no adaptations of existing 3OD networks; SM further enriches the point clouds by acquiring point-wise representative semantic features, leading to enhanced performance of existing 3OD networks. Besides the new architecture of PointSee, we propose a simple yet efficient training strategy, to ease the potential inaccurate regressions of 2D object detection networks. Extensive experiments on the popular outdoor/indoor benchmarks show quantitative and qualitative improvements of our PointSee over thirty-five state-of-the-art methods.
Original language | English |
---|---|
Pages (from-to) | 6291-6308 |
Number of pages | 18 |
Journal | IEEE Transactions on Visualization and Computer Graphics |
Volume | 30 |
Issue number | 9 |
Early online date | 10 Nov 2023 |
DOIs | |
Publication status | Published - Sept 2024 |
Bibliographical note
Publisher Copyright:© 1995-2012 IEEE.
Funding
This work was supported in part by the National Natural Science Foundation of China under Grants T2322012, 62172218,and 62032011, in part by Shenzhen Science and Technology Program under Grants JCYJ20220818103401003 and JCYJ20220530172403007, in part by the National Defense Basic Scientific Research Program of China under Grant JCKY2020605C003, in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2022A1515010170, and in part by the General Research Fund of Hong Kong Research Grants Council under Grant 15218521.
Keywords
- 3D object detection
- Data augmentation
- Feature extraction
- Object detection
- Point cloud compression
- PointSee
- Proposals
- Semantics
- Three-dimensional displays
- feature enhancement
- multi-modal fusion