Abstract
Monocular depth estimation (MDE) remains a fundamental yet not well-solved problem in computer vision. Current wisdom of MDE often achieves blurred or even indistinct depth boundaries, degenerating the quality of vision-based intelligent transportation systems. This paper presents an edge-enhanced vision transformer bins network for monocular depth estimation, termed eViTBins. eViTBins has three core modules to predict monocular depth maps with exceptional smoothness, accuracy, and fidelity to scene structures and object edges. First, a multi-scale feature fusion module is proposed to circumvent the loss of depth information at various levels during depth regression. Second, an image-guided edge-enhancement module is proposed to accurately infer depth values around image boundaries. Third, a vision transformer-based depth discretization module is introduced to comprehend the global depth distribution. Meanwhile, unlike most MDE models that rely on high-performance GPUs, eViTBins is optimized for seamless deployment on edge devices, such as NVIDIA Jetson Nano and Google Coral SBC, making it ideal for real-time intelligent transportation systems applications. Extensive experimental evaluations corroborate the superiority of eViTBins over competing methods, notably in terms of preserving depth edges and global depth representations.
Original language | English |
---|---|
Pages (from-to) | 20320-20334 |
Number of pages | 15 |
Journal | IEEE Transactions on Intelligent Transportation Systems |
Volume | 25 |
Issue number | 12 |
Early online date | 23 Oct 2024 |
DOIs | |
Publication status | Published - 2024 |
Bibliographical note
Publisher Copyright:© 2000-2011 IEEE.
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant T2322012, Grant 62172218, and Grant 62032011; in part by Shenzhen Science and Technology Program under Grant JCYJ20220818103401003 and Grant JCYJ20220530172403007; and in part by Guangdong Basic and Applied Basic Research Foundation under Grant 2022A1515010170.
Keywords
- Edge-enhanced vision transformer
- adaptive depth bins
- edge AI
- monocular depth estimation
- traffic monitoring
- unmanned aerial vehicle