Skip to main navigation Skip to search Skip to main content

Decoupled Motion Expression Video Segmentation

  • Hao FANG
  • , Runmin CONG
  • , Xiankai LU
  • , Xiaofei ZHOU
  • , Sam KWONG
  • , Wei ZHANG

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

Abstract

Motion expression video segmentation aims to segment objects based on input motion descriptions. Compared with traditional referring video object segmentation, it focuses on motion and multi-object expressions and is more challenging. Previous works achieved it by simply injecting text information into the video instance segmentation (VIS) model. However, this requires retraining the entire model and optimization is difficult. In this work, we propose DMVS, a simple framework constructed on the existing query-based VIS model, emphasizing decoupling the task into video instance segmentation and motion expression understanding. Firstly, we use a frozen video instance segmenter to extract object-specific contexts and convert them into frame-level and video-level queries. Secondly, we interact two levels of queries with static and motion cues, respectively, to further encode visually enhanced motion expressions. Furthermore, we propose a novel query initialization strategy that uses video queries guided by classification priors to initialize motion queries, greatly reducing the difficulty of optimization. Without bells and whistles, DMVS achieves state-of-the-art performance on the MeViS dataset at a lower training cost. Extensive experiments verify the effectiveness and efficiency of our framework.
Original languageEnglish
Title of host publication2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Pages13821-13831
DOIs
Publication statusPublished - 13 Aug 2025
EventThe IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025 - Music City Center, Nashville, United States
Duration: 11 Jun 202515 Jun 2025
https://cvpr.thecvf.com/

Conference

ConferenceThe IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025
Abbreviated titleCVPR 2025
Country/TerritoryUnited States
CityNashville
Period11/06/2515/06/25
Internet address

Fingerprint

Dive into the research topics of 'Decoupled Motion Expression Video Segmentation'. Together they form a unique fingerprint.

Cite this