Abstract
Assessing the quality of a player's performance, such as in diving events, requires precise measurement of subtle action details and overall fluidity. Existing methods primarily utilize appearance information from RGB frames, often neglecting crucial motion information that could contribute to a more comprehensive assessment. In response to this limitation, this paper introduces a novel Multi-Modality Network for Action Quality Assessment (AQA). The proposed method first employs a self-attention based module to foster interaction between optical flow and appearance clues, facilitating the extraction of discriminative features from each modality. Subsequently, a pairwise cross-attention mechanism is designed to comprehensively capture subtle differences via both intra-modality and inter-modality relationships between the query and exemplar video. Finally, to enhance the robustness and achieve accurate score prediction, an adaptive clip aggregation module is introduced to weigh the reliability of each patch based on multi-modal difference features. Experimental results on two benchmarks, FineDiving and MTL-AQA, validate the effectiveness of the proposed model.
Original language | English |
---|---|
Title of host publication | 2024 IEEE 26th International Workshop on Multimedia Signal Processing (MMSP) |
Publisher | IEEE |
Number of pages | 6 |
ISBN (Electronic) | 9798350387254 |
DOIs | |
Publication status | E-pub ahead of print - 2 Nov 2024 |
Event | 2024 IEEE 26th International Workshop on Multimedia Signal Processing (MMSP) - West Lafayette, IN, USA Duration: 2 Oct 2024 → 4 Oct 2024 |
Conference
Conference | 2024 IEEE 26th International Workshop on Multimedia Signal Processing (MMSP) |
---|---|
Period | 2/10/24 → 4/10/24 |