Abstract
Action Quality Assessment (AQA) aims to evaluate and score human actions in videos accurately. Existing approaches involve extracting features from the input video and implementing regression based on those features. However, representations derived from a single branch often lack the necessary diversity and flexibility to capture the complexity of human actions effectively. This work addresses these limitations by introducing a multi-branch architecture designed to capture a broad spectrum of video dynamics at varying levels of granularity. Specifically, we enhance video representation in the flow-guided branch by integrating optical flow with video features. This combination of multimodal features offers a more comprehensive context of global motion. Meanwhile, the momentfocused branch is tailored to extract frame-specific features, constructing two distinct quality-based representations with different focuses on moments, which achieves adaptive clues aggregation. Furthermore, the detail-aware branch leverages multiscale deep embeddings from a hierarchy convolutional neural network to capture fine-grained spatial information, which is useful when objects have complex spatial changes. Finally, a post-fusion strategy is employed to merge outputs from all branches, contributing to the comprehensive action quality assessment. Experimental evaluations on three benchmark datasets, FineDiving, MTLAQA, and AQA-7, demonstrate the superiority of our model in providing reliable assessments of action quality.
| Original language | English |
|---|---|
| Pages (from-to) | 8776-8789 |
| Number of pages | 14 |
| Journal | IEEE Transactions on Multimedia |
| Volume | 27 |
| Early online date | 9 Sept 2025 |
| DOIs | |
| Publication status | Published - 2025 |
Bibliographical note
Publisher Copyright:© 1999-2012 IEEE.
Funding
This work was supported in part by the Hong Kong Innovation and Technology Commission through InnoHK Project ClMDA, in part by Hong Kong General Research Fund under Grant 11209819 and Grant 11203820, in part by the Key Project of Science and Technology Innovation 2030 under Grant 2018AAA0101301, in part by ARG - CityU Applied Research under Grant 9667255, and in part by Start-up under Grant SUG-007/2425.
Keywords
- Action quality assessment
- Multi-branch modeling
- Multi-modal learning
Fingerprint
Dive into the research topics of 'Comprehensive Action Quality Assessment Through Multi-Branch Modeling'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver