Multi-Domain Spatial-Temporal Redundancy Mining for Efficient Learned Video Compression

Feng YUAN, Zhaoqing PAN, Jianjun LEI, Bo PENG, Fu Lee WANG, Sam KWONG

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

The Conditional Coding-based Learned Video Compression (CC-LVC) has become an important paradigm in learned video compression, because it can effectively explore spatial-temporal redundancies within a huge context space. However, existing CC-LVC methods cannot accurately model motion information and efficiently mine contextual correlations for complex regions with non-rigid motions and non-linear deformations. To address these problems, an efficient CC-LVC method is proposed in this paper, which mines spatial-temporal dependencies across multiple motion domains and receptive domains for improving the video coding efficiency. To accurately model complex motions and generate precise temporal contexts, a Multi-domain Motion modeling Network (MMNet) is proposed to capture robust motion information from both spatial and frequency domains. Moreover, a multi-domain context refinement module is developed to discriminatively highlight frequency-domain temporal contexts and adaptively fuse multi-domain temporal contexts, which can effectively mitigate inaccuracies in temporal contexts caused by motion errors. In order to efficiently compress video signals, a Multi-scale Long Short-range Decorrelation Module (MLSDM)-based context codec is proposed, in which an MLSDM is designed to learn long short-range spatial-temporal dependencies and channel-wise correlations across varying receptive domains. Extensive experimental results show that the proposed method significantly outperforms VTM 17.0 and other state-of-the-art learned video compression methods in terms of both PSNR and MS-SSIM.
Original languageEnglish
Pages (from-to)808-820
Number of pages13
JournalIEEE Transactions on Broadcasting
Volume71
Issue number3
Early online date23 Jul 2025
DOIs
Publication statusPublished - Sept 2025

Bibliographical note

Publisher Copyright:
© 1963-12012 IEEE.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 62322116.

Keywords

  • Learned video compression
  • conditional coding
  • frequency decomposition
  • multi-scale long short-range decorrelation
  • visual state space block

Fingerprint

Dive into the research topics of 'Multi-Domain Spatial-Temporal Redundancy Mining for Efficient Learned Video Compression'. Together they form a unique fingerprint.

Cite this