Spatiotemporal Feature Hierarchy-Based Blind Prediction of Natural Video Quality via Transfer Learning

Weizhi XIAN, Mingliang ZHOU*, Bin FANG, Xingran LIAO, Cheng JI, Tao XIANG, Weijia JIA

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

8 Citations (Scopus)


In this paper, we propose a pyramidal spatiotemporal feature hierarchy (PSFH)-based no-reference (NR) video quality assessment (VQA) method using transfer learning. First, we generate simulated videos by a generative adversarial network (GAN)-based image restoration model. The residual maps between the distorted frames and simulated frames, which can capture rich information, are utilized as one input of the quality regression network. Second, we use 3D convolution operations to construct a PSFH network with five stages. The spatiotemporal features incorporating the shared features transferred from the pretrained image restoration model are fused stage by stage. Third, with the guidance of the transferred knowledge, each stage generates multiple feature mapping layers that encode different semantics and degradation information using 3D convolution layers and gated recurrent units (GRUs). Finally, five approximate perceptual quality scores and a precise prediction score are obtained by fully connected (FC) networks. The whole model is trained under a finely designed loss function that combines pseudo-Huber loss and Pearson linear correlation coefficient (PLCC) loss to improve the robustness and prediction accuracy. According to the extensive experiments, outstanding results can be obtained compared with other state-of-the-art methods. Both the source code and models are available online.1

Original languageEnglish
Pages (from-to)130-143
Number of pages14
JournalIEEE Transactions on Broadcasting
Issue number1
Publication statusPublished - Mar 2023
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 1963-12012 IEEE.


  • 3D convolution
  • generative adversarial network
  • pyramidal spatiotemporal feature
  • transfer learning
  • Video quality assessment


Dive into the research topics of 'Spatiotemporal Feature Hierarchy-Based Blind Prediction of Natural Video Quality via Transfer Learning'. Together they form a unique fingerprint.

Cite this