TY - JOUR
T1 - Concept-Level Semantic Transfer and Context-Level Distribution Modeling for Few-Shot Segmentation
AU - LUO, Yuxuan
AU - CHEN, Jinpeng
AU - CONG, Runmin
AU - IP, Horace Ho Shing
AU - KWONG, Sam
N1 - Publisher Copyright:
© 1991-2012 IEEE.
PY - 2025/3/24
Y1 - 2025/3/24
N2 - Few-shot segmentation (FSS) methods aim to segment objects using only a few pixel-level annotated samples. Current approaches either derive a generalized class representation from support samples to guide the segmentation of query samples, which often discards crucial spatial contextual information, or rely heavily on spatial affinity between support and query samples, without adequately summarizing and utilizing the core information of the target class. Consequently, the former struggles with fine detail accuracy, while the latter tends to produce errors in overall localization. To address these issues, we propose a novel FSS framework, CCFormer, which balances the transmission of core semantic concepts with the modeling of spatial context, improving both macro and micro-level segmentation accuracy. Our approach introduces three key modules: (1) the Concept Perception Generation (CPG) module, which leverages pre-trained category perception capabilities to capture high-quality core representations of the target class; (2) the Concept-Feature Integration (CFI) module, which injects the core class information into both support and query features during feature extraction; and (3) the Contextual Distribution Mining (CDM) module, which utilizes a Brownian Distance Covariance matrix to model the spatial-channel distribution between support and query samples, preserving the fine-grained integrity of the target. Experimental results on the PASCAL-5i and COCO-20i datasets demonstrate that CCFormer achieves state-of-the-art performance, with visualizations further validating its effectiveness. Our code is available at github.com/lourise/ccformer.
AB - Few-shot segmentation (FSS) methods aim to segment objects using only a few pixel-level annotated samples. Current approaches either derive a generalized class representation from support samples to guide the segmentation of query samples, which often discards crucial spatial contextual information, or rely heavily on spatial affinity between support and query samples, without adequately summarizing and utilizing the core information of the target class. Consequently, the former struggles with fine detail accuracy, while the latter tends to produce errors in overall localization. To address these issues, we propose a novel FSS framework, CCFormer, which balances the transmission of core semantic concepts with the modeling of spatial context, improving both macro and micro-level segmentation accuracy. Our approach introduces three key modules: (1) the Concept Perception Generation (CPG) module, which leverages pre-trained category perception capabilities to capture high-quality core representations of the target class; (2) the Concept-Feature Integration (CFI) module, which injects the core class information into both support and query features during feature extraction; and (3) the Contextual Distribution Mining (CDM) module, which utilizes a Brownian Distance Covariance matrix to model the spatial-channel distribution between support and query samples, preserving the fine-grained integrity of the target. Experimental results on the PASCAL-5i and COCO-20i datasets demonstrate that CCFormer achieves state-of-the-art performance, with visualizations further validating its effectiveness. Our code is available at github.com/lourise/ccformer.
KW - Few-shot Learning
KW - Few-shot Segmentation
KW - Semantic Segmentation
UR - http://www.scopus.com/inward/record.url?scp=105001258140&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2025.3554013
DO - 10.1109/TCSVT.2025.3554013
M3 - Journal Article (refereed)
SN - 1051-8215
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
ER -