Abstract
Inaccurate detections remain a critical bottleneck in 3D multi-object tracking (MOT). Recent detection fusion-based methods incorporate camera detections as supplementary to reduce false detections and compensate for missing ones in LiDAR. However, their unidirectional camera-LiDAR correction lacks a feedback mechanism, precluding iterative mutual refinement between modalities for more robust LiDAR-based tracking. Inspired by the coarse-to-fine strategy in two-stage object detection, we introduce CrossTracker, a novel two-stage framework for online multi-modal 3D MOT. CrossTracker first constructs coarse camera and LiDAR trajectories independently, then performs trajectory fusion using both current and historical frames, without requiring future data. This ensures more robust mutual refinement between modalities. Specifically, CrossTracker comprises three core modules: i) the multi-modal modeling (M3) module, which fuses data from images, point clouds, and even planar geometry derived from images to establish a robust tracking constraint; ii) the coarse trajectory generation (C-TG) module, which independently generates coarse trajectories for both modalities using the M3 constraint; and iii) the trajectory fusion (TF) module, which applies mutual refinement between coarse LiDAR and camera trajectories through cross correction to ensure robust LiDAR trajectories. Extensive experiments show that CrossTracker outperforms 19 state-of-the-art methods, highlighting its effectiveness in leveraging the synergistic strengths of camera and LiDAR sensors for robust multi-modal 3D MOT.
| Original language | English |
|---|---|
| Pages (from-to) | 2191-2206 |
| Number of pages | 16 |
| Journal | IEEE Transactions on Circuits and Systems for Video Technology |
| Volume | 36 |
| Issue number | 2 |
| Early online date | 22 Aug 2025 |
| DOIs | |
| Publication status | Published - Feb 2026 |
Bibliographical note
Publisher Copyright:© 2025 IEEE.
Funding
This work was supported by the National Defense Basic Scientific Research Program of China (No. JCKY2020605C003), a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (No. UGC/FDS16/E03/24), the Hong Kong Metropolitan University Research Grant (No. RD/2024/1.16), and the Changzhou City Science and Technology Project Applied Basic Research (No. CJ20241078).
Keywords
- CrossTracker
- cross correction
- multi-modal 3D MOT
- trajectory fusion
- two-stage solution
Fingerprint
Dive into the research topics of 'CrossTracker: Robust Multi-Modal 3D Multi-Object Tracking via Cross Correction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver