This paper investigates crowd counting in the frequency domain, which is a novel direction compared to the traditional view in the spatial domain. By transforming the density map into the frequency domain and using the properties of the characteristic function, we propose a novel method that is simple, effective, and efficient. The solid theoretical analysis ends up as an implementation-friendly loss function, which requires only standard tensor operations in the training process. We prove that our loss function is an upper bound of the pseudo sup norm metric between the ground truth and the prediction density map (over all of their sub-regions), and demonstrate its efficacy and efficiency versus other loss functions. The experimental results also show its competitiveness to the state-of-the-art on five benchmark data sets: ShanghaiTech A & B, UCF-QNRF, JHU++, and NWPU.
|Title of host publication||Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition|
|Number of pages||10|
|Publication status||Published - 2022|
|Event||2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR - New Orleans, United States|
Duration: 19 Jun 2022 → 24 Jun 2022
|Name||Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition|
|Conference||2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR|
|Period||19/06/22 → 24/06/22|
Bibliographical noteFunding Information:
Acknowledgements. This work was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Proj. No. CityU 11212518), and a Strategic Research Grant from City University of Hong Kong (Proj. No. 7005665).
© 2022 IEEE.
- Scene analysis and understanding
- Vision applications and systems