Skip to main navigation Skip to search Skip to main content

CrossTracker: Robust Multi-Modal 3D Multi-Object Tracking via Cross Correction

  • Lipeng GU
  • , Xuefeng YAN
  • , Weiming WANG
  • , Honghua CHEN
  • , Dingkun ZHU
  • , Liangliang NAN
  • , Mingqiang WEI

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Inaccurate detections remain a critical bottleneck in 3D multi-object tracking (MOT). Recent detection fusion-based methods incorporate camera detections as supplementary to reduce false detections and compensate for missing ones in LiDAR. However, their unidirectional camera-LiDAR correction lacks a feedback mechanism, precluding iterative mutual refinement between modalities for more robust LiDAR-based tracking. Inspired by the coarse-to-fine strategy in two-stage object detection, we introduce CrossTracker, a novel two-stage framework for online multi-modal 3D MOT. CrossTracker first constructs coarse camera and LiDAR trajectories independently, then performs trajectory fusion using both current and historical frames, without requiring future data. This ensures more robust mutual refinement between modalities. Specifically, CrossTracker comprises three core modules: i) the multi-modal modeling (M3) module, which fuses data from images, point clouds, and even planar geometry derived from images to establish a robust tracking constraint; ii) the coarse trajectory generation (C-TG) module, which independently generates coarse trajectories for both modalities using the M3 constraint; and iii) the trajectory fusion (TF) module, which applies mutual refinement between coarse LiDAR and camera trajectories through cross correction to ensure robust LiDAR trajectories. Extensive experiments show that CrossTracker outperforms 19 state-of-the-art methods, highlighting its effectiveness in leveraging the synergistic strengths of camera and LiDAR sensors for robust multi-modal 3D MOT.

Original languageEnglish
Pages (from-to)2191-2206
Number of pages16
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume36
Issue number2
Early online date22 Aug 2025
DOIs
Publication statusPublished - Feb 2026

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Funding

This work was supported by the National Defense Basic Scientific Research Program of China (No. JCKY2020605C003), a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (No. UGC/FDS16/E03/24), the Hong Kong Metropolitan University Research Grant (No. RD/2024/1.16), and the Changzhou City Science and Technology Project Applied Basic Research (No. CJ20241078).

Keywords

  • CrossTracker
  • cross correction
  • multi-modal 3D MOT
  • trajectory fusion
  • two-stage solution

Fingerprint

Dive into the research topics of 'CrossTracker: Robust Multi-Modal 3D Multi-Object Tracking via Cross Correction'. Together they form a unique fingerprint.

Cite this