Fine-Grained Visual Categorization : A Spatial-Frequency Feature Fusion Perspective

Min WANG*, Peng ZHAO, Xin LU, Fan MIN, Xizhao WANG

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

5 Citations (Scopus)

Abstract

Fine-grained visual categorization is a challenging issue owing to high intra-class and low inter-class variances. Classical approaches rely on pre-trained models or many fine annotations. In this paper, we observe that spatial and frequency information provides distinct image views, and propose a new spatial-frequency feature fusion (SFFF) perspective to handle this challenging issue. Specifically, we design a heterogeneous feature extraction loss function, construct a global and local fusion SFFF network, and propose an importance-sparsity selection strategy. For feature extraction, we focus on the frequency domain feature learning network, extract fine-grained features, and achieve feature complementarity. For feature selection, we propose importance ranking and sparse regularity to constrain spatial-frequency features. For feature fusion, we design a spatial-frequency loss and an inter-layer switching strategy to achieve local-global collaboration. Comparative experiments were performed on popular fine-grained datasets and classic datasets such as CUB200-2011, Stanford Cars, Stanford Dogs, FGVC-Aircraft, and CIFAR100. The effectiveness and outstanding performance of SFFF are confirmed by comparisons with more than 40 state-of-the-art fine-grained categorization methods. Ablation studies and visualizations are provided to facilitate an understanding of our approach.

Original languageEnglish
Pages (from-to)2798-2812
Number of pages15
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume33
Issue number6
Early online date8 Dec 2022
DOIs
Publication statusPublished - Jun 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 1991-2012 IEEE.

Keywords

  • deep fusion
  • Fine-grained recognition
  • frequency domain learning
  • training from scratch
  • weakly supervised learning

Fingerprint

Dive into the research topics of 'Fine-Grained Visual Categorization : A Spatial-Frequency Feature Fusion Perspective'. Together they form a unique fingerprint.

Cite this