Abstract
Multi-scale feature compression plays a crucial role in machine vision by reducing transmission and storage overhead while preserving task performance. However, existing methods neglect the distinct roles of features at different scales for recognizing objects of varying sizes and fail to fully capture the hierarchical semantic correlation across these scales. Moreover, they arrange feature representation into an image based on their original channel order and compress it using traditional codecs, which limits the exploitation of the spatial semantic correlation. To address these issues, a Hierarchical and Spatial Semantic Correlations (HSSC)-based multi-scale feature compression method is proposed in this paper, which jointly models hierarchical and spatial semantic correlations to improve compression efficiency. Specifically, to remove task-irrelevant redundancy in multi-scale features, a Semantic-guided Feature Transform Network (SFTNet) is developed, which leverages hierarchical semantic correlation to generate compact representations. In SFTNet, a semantic-guided cascaded fusion module is designed to reduce the data volume of multi-scale features by fusing task-relevant information based on intra- and inter-scale semantic correlations, while an adaptive feature reconstruction module is built to reconstruct multi-scale features by adaptively enhancing semantics for machine task analysis. To further eliminate redundancy in compact representations, a spatial semantic-based channel reordering strategy is proposed, which enhances spatial semantic correlation by reordering feature channels based on their local and global semantic similarities. Experimental results demonstrate that the proposed HSSC-based method outperforms state-of-the-art multi-scale feature compression methods.
| Original language | English |
|---|---|
| Number of pages | 14 |
| Journal | IEEE Transactions on Broadcasting |
| Early online date | 11 Dec 2025 |
| DOIs | |
| Publication status | E-pub ahead of print - 11 Dec 2025 |
Bibliographical note
Publisher Copyright:© 1963-12012 IEEE.
Funding
This work was supported by the National Natural Science Foundation of China under Grant 62322116.
Keywords
- Multimedia communication
- video coding
- multi-scale feature compression
- semantic-guided feature transform
- spatial semantic-based channel reordering