Abstract
Human visual system has a limitation of sensitivity in detecting small distortion in an image/video and the minimum perceptual threshold is so called Just Noticeable Difference (JND). JND modelling is challenging since it highly depends on visual contents and perceptual factors are not fully understood. In this paper, we propose deep learning based JND and perceptual quality prediction models, which are able to predict the Satisfied User Ratio (SUR) and Video Wise JND (VWJND) of compressed videos with different resolutions and coding parameters. Firstly, the SUR prediction is modeled as a regression problem that fits deep learning tools. Then, Video Wise Spatial SUR method (VW-SSUR) is proposed to predict the SUR value for compressed video, which mainly considers the spatial distortion. Thirdly, we further propose Video Wise Spatial-Temporal SUR (VW-STSUR) method to improve the SUR prediction accuracy by considering the spatial and temporal information. Two fusion schemes that fuse the spatial and temporal information in quality score level and in feature level, respectively, are investigated. Finally, key factors including key frame and patch selections, cross resolution prediction and complexity are analyzed. Experimental results demonstrate the proposed VW-SSUR method outperforms in both SUR and VWJND prediction as compared with the state-of-the-art schemes. Moreover, the proposed VW-STSUR further improves the accuracy as compared with the VW-SSUR and the conventional JND models, where the mean SUR prediction error is 0.049, and mean VWJND prediction error is 1.69 in quantization parameter and 0.84 dB in peak signal-to-noise ratio.
Original language | English |
---|---|
Pages (from-to) | 1197-1212 |
Number of pages | 16 |
Journal | IEEE Transactions on Circuits and Systems for Video Technology |
Volume | 32 |
Issue number | 3 |
Early online date | 28 Apr 2021 |
DOIs | |
Publication status | Published - Mar 2022 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 1991-2012 IEEE.
Funding
This work was supported in part by the Guangdong International Science and Technology Cooperative Research Project under Grant 2018A050506063, in part by the Shenzhen Science and Technology Program under Grant JCYJ20180507183823045 and Grant JCYJ20200109110410133, in part by the National Natural Science Foundation of China under Grant 61971203, in part by the Membership of Youth Innovation Promotion Association, Chinese Academy of Sciences under Grant 2018392, and in part by the Special Funds for the Construction of an Innovative Province of Hunan under Grant 2020GK2028.
Keywords
- deep learning
- Just noticeable difference
- satisfied user ratio
- video quality assessment