Skip to main navigation Skip to search Skip to main content

DT-JRD : Deep Transformer-based Just Recognizable Difference Prediction Model for Video Coding for Machines

  • Junqi LIU
  • , Yun ZHANG*
  • , Xiaoqi WANG
  • , Long XU
  • , Sam KWONG
  • *Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Just Recognizable Difference (JRD) represents the minimum visual difference that is detectable by machine vision, which can be exploited to promote machine vision-oriented visual signal processing. In this paper, we propose a Deep Transformer-based JRD (DT-JRD) prediction model for Video Coding for Machines (VCM), where the accurately predicted JRD can be used to reduce the coding bit rate while maintaining the accuracy of machine tasks. Firstly, we model the JRD prediction as a multi-class classification and propose a DT-JRD prediction model that integrates an improved embedding, a content and distortion feature extraction, a multi-class classification, and a novel learning strategy. Secondly, inspired by the perception property that machine vision exhibits a similar response to distortions near JRD, we propose an asymptotic JRD loss by using Gaussian Distribution-based Soft Labels (GDSL), which significantly extends the number of training labels and relaxes classification boundaries. Finally, we propose a DT-JRD-based VCM to reduce the coding bits while maintaining the accuracy of object detection. Extensive experimental results demonstrate that the mean absolute error of the predicted JRD by the DT-JRD is 5.574, outperforming the state-of-the-art JRD prediction model by 13.1%. Coding experiments show that compared with the VVC, the DT-JRD-based VCM achieves an average of 29.58% bit rate reduction while maintaining the object detection accuracy.

Original languageEnglish
Pages (from-to)114-127
Number of pages14
JournalIEEE Transactions on Multimedia
Volume28
Early online date6 Oct 2025
DOIs
Publication statusPublished - 2026

Bibliographical note

Publisher Copyright:
© 1999-2012 IEEE.

Funding

This work was supported by in part by the National Natural Science Foundation of China under Grant 62172400, in part by Shenzhen Natural Science Foundation under Grant JCYJ20240813180503005, in part by Shenzhen Key Science and Technology Program under Grant JCYJ20241202124415021, and in part by Zhejiang “Pioneer and Leading +X” Science and Technology Project under Grant 2025C01035.

Keywords

  • Just Recognizable Distortion
  • Object Detection
  • Video Coding for Machines
  • Vision Transformer

Fingerprint

Dive into the research topics of 'DT-JRD : Deep Transformer-based Just Recognizable Difference Prediction Model for Video Coding for Machines'. Together they form a unique fingerprint.

Cite this