Abstract
The conditional coding paradigm is widely used in learned video compression, which shows superior performance in capturing redundancies within a large context space. However, existing Conditional coding-based Learned Video Compression (C-LVC) methods ignore that the predicted motion vectors usually contain large uncertainty due to complex motions, occlusions, etc., which consequently decrease the accuracy of the generated temporal contexts. In addition, existing C-LVC methods have a weak ability to mine diverse dependencies within the context space, which are closely related to the coding efficiency. To address these issues, an efficient temporal redundancy mining method is proposed to improve the coding efficiency of C-LVC in this paper. To generate accurate temporal contexts, a Long Short-Term Motion Aggregation (LSTMA) model is proposed, in which an LSTMA-based motion estimation module is developed to capture both current and aggregated long short-term motion information to reduce the uncertainty of predicted motion vectors. Based on the dual motion information, an LSTMA-based temporal context mining module is developed to exploit the aggregated long short-term motion information and increase the accuracy of the generated temporal contexts. In order to fully eliminate spatial-temporal redundancies in a video, a Global-Local Information Decorrelation Module (GLIDM)-based context codec is proposed, in which the GLIDM is designed based on the visual state space block (namely vmamba), the residual block, and the squeeze-and-excitation block to effectively capture long-range, short-range spatial-temporal dependencies and channel-wise dependencies. Experimental results demonstrate that our proposed method can effectively improve the coding performance of C-LVC, and outperforms other state-of-the-art LVC methods.
| Original language | English |
|---|---|
| Number of pages | 15 |
| Journal | IEEE Transactions on Circuits and Systems for Video Technology |
| Early online date | 27 Nov 2025 |
| DOIs | |
| Publication status | Published - 2025 |
Bibliographical note
Publisher Copyright:© 1991-2012 IEEE.
Funding
This work was supported by the National Natural Science Foundation of China under Grant 62322116.
Keywords
- Learned video compression
- conditional coding
- long short-term motion aggregation
- global-local decorrelation
- visual state space model
- vmamba