Skip to main navigation Skip to search Skip to main content

Reveal Fluidity Behind Frames: A Multi-Modality Framework for Action Quality Assessment

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

Abstract

Assessing the quality of a player's performance, such as in diving events, requires precise measurement of subtle action details and overall fluidity. Existing methods primarily utilize appearance information from RGB frames, often neglecting crucial motion information that could contribute to a more comprehensive assessment. In response to this limitation, this paper introduces a novel Multi-Modality Network for Action Quality Assessment (AQA). The proposed method first employs a self-attention based module to foster interaction between optical flow and appearance clues, facilitating the extraction of discriminative features from each modality. Subsequently, a pairwise cross-attention mechanism is designed to comprehensively capture subtle differences via both intra-modality and inter-modality relationships between the query and exemplar video. Finally, to enhance the robustness and achieve accurate score prediction, an adaptive clip aggregation module is introduced to weigh the reliability of each patch based on multi-modal difference features. Experimental results on two benchmarks, FineDiving and MTL-AQA, validate the effectiveness of the proposed model.
Original languageEnglish
Title of host publication2024 IEEE 26th International Workshop on Multimedia Signal Processing, MMSP 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Number of pages6
ISBN (Electronic)9798350387254
DOIs
Publication statusPublished - 2024
Event26th IEEE International Workshop on Multimedia Signal Processing, MMSP 2024 - West Lafayette, United States
Duration: 2 Oct 20244 Oct 2024

Publication series

Name2024 IEEE 26th International Workshop on Multimedia Signal Processing, MMSP 2024

Conference

Conference26th IEEE International Workshop on Multimedia Signal Processing, MMSP 2024
Country/TerritoryUnited States
CityWest Lafayette
Period2/10/244/10/24

Bibliographical note

Publisher Copyright:
© 2024 IEEE.

Funding

This work was supported in part by the Hong Kong Innovation and Technology Commission (InnoHK Project CIMDA).

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 4 - Quality Education
    SDG 4 Quality Education
  2. SDG 11 - Sustainable Cities and Communities
    SDG 11 Sustainable Cities and Communities

Keywords

  • Action quality assessment
  • Attention mechanism
  • Multi-modal learning

Fingerprint

Dive into the research topics of 'Reveal Fluidity Behind Frames: A Multi-Modality Framework for Action Quality Assessment'. Together they form a unique fingerprint.

Cite this