CoSTA: Co-training spatial-temporal attention for blind video quality assessment

Fengchuang XING, Yuan-Gen WANG*, Weixuan TANG, Guopu ZHU, Sam KWONG

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

1 Citation (Scopus)

Abstract

Self-attention-based Transformer has achieved great success in many computer vision tasks. However, its application to blind video quality assessment (VQA) is far from comprehensive. Evaluating the quality of in-the-wild videos is challenging due to the unknown of pristine reference and shooting distortion. This paper presents a Co-trained Space-Time Attention network for the blind VQA problem, termed CoSTA. Specifically, we first build CoSTA by alternately concatenating the divided space–time attention. Then, to facilitate the training of CoSTA, we design a vectorized regression loss by encoding the mean opinion score (MOS) to the probability vector and embedding a special token as the learnable variable of MOS, leading to the better fitting of the human rating process. Finally, to solve the data-hungry problem within Transformer, we propose to co-train the spatial and temporal attention weights using both images and videos. Various experiments are conducted on the de-facto in-the-wild video datasets, including LIVE-Qualcomm, LIVE-VQC, KoNViD-1k, YouTube-UGC, LSVQ, LSVQ-1080p, and DVL2021. Experimental results demonstrate the superiority of the proposed CoSTA over the state-of-the-art. The source code is publicly available at https://github.com/GZHU-DVL/CoSTA
Original languageEnglish
Article number124651
JournalExpert Systems with Applications
Volume255
Early online date2 Jul 2024
DOIs
Publication statusPublished - 1 Dec 2024

Bibliographical note

Publisher Copyright:
© 2024 Elsevier Ltd

Funding

This work was supported in part by the National Natural Science Foundation of China under Grants 62272116 and 62172402, in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2023A1515011428, in part by the Science and Technology Foundation of Guangzhou under Grant 2023A04J1723, in part by the Fundamental Research Funds for the Central Universities under Grant FRFCU5710011322 and Grant HIT.OCEF.2022050, in part by the Guangdong Scientific Research Platform and Projects for the Higher Educational Institution under Grant 2023ZDZX4039, and in part by the Characteristic Innovation Project of Colleges and Universities of Guangdong Province under Grant 2022KTSCX091.

Keywords

  • Co-training
  • In-the-wild videos
  • Self-attention
  • Transformer
  • Video quality assessment

Fingerprint

Dive into the research topics of 'CoSTA: Co-training spatial-temporal attention for blind video quality assessment'. Together they form a unique fingerprint.

Cite this