Abstract
Just Recognizable Difference (JRD) represents the minimum visual difference that is detectable by machine vision, which can be exploited to promote machine vision-oriented visual signal processing. In this paper, we propose a Deep Transformer-based JRD (DT-JRD) prediction model for Video Coding for Machines (VCM), where the accurately predicted JRD can be used to reduce the coding bit rate while maintaining the accuracy of machine tasks. Firstly, we model the JRD prediction as a multi-class classification and propose a DT-JRD prediction model that integrates an improved embedding, a content and distortion feature extraction, a multi-class classification, and a novel learning strategy. Secondly, inspired by the perception property that machine vision exhibits a similar response to distortions near JRD, we propose an asymptotic JRD loss by using Gaussian Distribution-based Soft Labels (GDSL), which significantly extends the number of training labels and relaxes classification boundaries. Finally, we propose a DT-JRD-based VCM to reduce the coding bits while maintaining the accuracy of object detection. Extensive experimental results demonstrate that the mean absolute error of the predicted JRD by the DT-JRD is 5.574, outperforming the state-of-the-art JRD prediction model by 13.1%. Coding experiments show that compared with the VVC, the DT-JRD-based VCM achieves an average of 29.58% bit rate reduction while maintaining the object detection accuracy.
| Original language | English |
|---|---|
| Pages (from-to) | 114-127 |
| Number of pages | 14 |
| Journal | IEEE Transactions on Multimedia |
| Volume | 28 |
| Early online date | 6 Oct 2025 |
| DOIs | |
| Publication status | Published - 2026 |
Bibliographical note
Publisher Copyright:© 1999-2012 IEEE.
Funding
This work was supported by in part by the National Natural Science Foundation of China under Grant 62172400, in part by Shenzhen Natural Science Foundation under Grant JCYJ20240813180503005, in part by Shenzhen Key Science and Technology Program under Grant JCYJ20241202124415021, and in part by Zhejiang “Pioneer and Leading +X” Science and Technology Project under Grant 2025C01035.
Keywords
- Just Recognizable Distortion
- Object Detection
- Video Coding for Machines
- Vision Transformer
Fingerprint
Dive into the research topics of 'DT-JRD : Deep Transformer-based Just Recognizable Difference Prediction Model for Video Coding for Machines'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver