A Blind Video Quality Assessment Method via Spatiotemporal Pyramid Attention

Wenhao SHEN, Mingliang ZHOU*, Xuekai WEI, Heqiang WANG, Bin FANG, Cheng JI, Xu ZHUANG, Jason WANG, Jun LUO, Huayan PU, Xiaoxu HUANG, Shilong WANG, Huajun CAO, Yong FENG, Tao XIANG, Zhaowei SHANG

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review


As social media communication develops, reliable multimedia quality evaluation indicators have become a prerequisite for enriching user experience services. In this paper, we propose a multiscale spatiotemporal pyramid attention (SPA) block for constructing a blind video quality assessment (VQA) method to evaluate the perceptual quality of videos. First, we extract motion information from the video frames at different temporal scales to form a feature pyramid, which provides a feature representation with multiple visual perceptions. Second, an SPA module, which can effectively extract multiscale spatiotemporal information at various temporal scales and develop a cross-scale dependency relationship, is proposed. Finally, the quality estimation process is completed by passing the extracted features obtained from a network of multiple stacked spatiotemporal pyramid blocks through a regression network to determine the perceived quality. The experimental results demonstrate that our method is on par with the state-of-the-art approaches. The source code necessary for conducting groundbreaking scientific research is accessible online https://github.com/Land5cape/SPBVQA.

Original languageEnglish
Pages (from-to)251-264
Number of pages14
JournalIEEE Transactions on Broadcasting
Issue number1
Early online date28 Dec 2023
Publication statusPublished - Mar 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 1963-12012 IEEE.


  • deep neural network
  • no-reference
  • spatiotemporal attention
  • Video quality assessment


Dive into the research topics of 'A Blind Video Quality Assessment Method via Spatiotemporal Pyramid Attention'. Together they form a unique fingerprint.

Cite this