Skip to main navigation Skip to search Skip to main content

GCA-DETR: Global-context-aware-based detection transformer

  • Zhenzhe HECHEN
  • , Mingliang ZHOU
  • , Xuekai WEI
  • , Jun LUO
  • , Sam KWONG*
  • *Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Detection Transformer (DETR)-based models leverage key modules such as self-attention, cross-attention, and feedforward networks to extract discriminative features from individual images. Despite their strong performance, these approaches mainly emphasize feature learning driven by single-image statistics, whereas dataset-level semantic regularities shared across training samples have yet to be fully exploited. To address these issues, we present a Global-Context-Aware-based DEtection TRansformer (GCA-DETR). First, to capture global feature distributions, GCA-DETR uses a global-distribution-extraction (GDE) module. This module extracts per-image data distributions and preserves an aggregated global distribution throughout training, thereby providing a dataset-level contextual prior beyond individual images for subsequent feature modelling. During the evaluation stage, the global data distributions remain fixed. Second, a global-distribution-fusion (GDF) module is designed to adjust the distributions of candidate target tokens in the decoder by leveraging the preserved global context, facilitating more effective global reasoning. Comprehensive experiments demonstrate that the proposed GCA-DETR achieves significant improvements in COCO 2017, requiring only minimal additional computational overhead. The code is available at https://github.com/sleevewind/GCA-DETR.
Original languageEnglish
Article number123468
JournalInformation Sciences
DOIs
Publication statusE-pub ahead of print - 4 Apr 2026

Keywords

  • DETR
  • Global distribution extraction
  • Global distribution fusion

Fingerprint

Dive into the research topics of 'GCA-DETR: Global-context-aware-based detection transformer'. Together they form a unique fingerprint.

Cite this