Learning Hierarchical and Spatial Semantic Correlations for Enhanced Multi-Scale Feature Compression

  • Liqian ZHANG
  • , Zhaoqing PAN*
  • , Li LI
  • , Fu Lee WANG
  • , Sam KWONG
  • *Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Multi-scale feature compression plays a crucial role in machine vision by reducing transmission and storage overhead while preserving task performance. However, existing methods neglect the distinct roles of features at different scales for recognizing objects of varying sizes and fail to fully capture the hierarchical semantic correlation across these scales. Moreover, they arrange feature representation into an image based on their original channel order and compress it using traditional codecs, which limits the exploitation of the spatial semantic correlation. To address these issues, a Hierarchical and Spatial Semantic Correlations (HSSC)-based multi-scale feature compression method is proposed in this paper, which jointly models hierarchical and spatial semantic correlations to improve compression efficiency. Specifically, to remove task-irrelevant redundancy in multi-scale features, a Semantic-guided Feature Transform Network (SFTNet) is developed, which leverages hierarchical semantic correlation to generate compact representations. In SFTNet, a semantic-guided cascaded fusion module is designed to reduce the data volume of multi-scale features by fusing task-relevant information based on intra- and inter-scale semantic correlations, while an adaptive feature reconstruction module is built to reconstruct multi-scale features by adaptively enhancing semantics for machine task analysis. To further eliminate redundancy in compact representations, a spatial semantic-based channel reordering strategy is proposed, which enhances spatial semantic correlation by reordering feature channels based on their local and global semantic similarities. Experimental results demonstrate that the proposed HSSC-based method outperforms state-of-the-art multi-scale feature compression methods.

Original languageEnglish
Number of pages14
JournalIEEE Transactions on Broadcasting
Early online date11 Dec 2025
DOIs
Publication statusE-pub ahead of print - 11 Dec 2025

Bibliographical note

Publisher Copyright:
© 1963-12012 IEEE.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 62322116.

Keywords

  • Multimedia communication
  • video coding
  • multi-scale feature compression
  • semantic-guided feature transform
  • spatial semantic-based channel reordering

Fingerprint

Dive into the research topics of 'Learning Hierarchical and Spatial Semantic Correlations for Enhanced Multi-Scale Feature Compression'. Together they form a unique fingerprint.

Cite this