Token Calibration for Transformer-based Domain Adaptation

  • Xiaowei FU
  • , Shiyu YE
  • , Chenxu ZHANG
  • , Fuxiang HUANG
  • , Xin XU
  • , Lei ZHANG*
  • *Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain by learning domain-invariant representations. Motivated by the recent success of Vision Transformers (ViTs), several UDA approaches have adopted ViT architectures to exploit fine-grained patch-level representations, which are unified as Transformer-based Domain Adaptation (TransDA) independent of CNN-based. However, we have a key observation in TransDA: due to inherent domain shifts, patches (tokens) from different semantic categories across domains may exhibit abnormally high similarities, which can mislead the self-attention mechanism and degrade adaptation performance. To solve that, we propose a novel Patch-Adaptation Transformer (PATrans), which first identifies similarity-anomalous patches and then adaptively suppresses their negative impact to domain alignment, i.e. token calibration. Specifically, we introduce a Patch-Adaptation Attention (PAA) mechanism to replace the standard self-attention mechanism, which consists of a weight-shared triple-branch mixed attention mechanism and a patch-level domain discriminator. The mixed attention integrates self-attention and cross-attention to enhance intra-domain feature modeling and inter-domain similarity estimation. Meanwhile, the patch-level domain discriminator quantifies the anomaly probability of each patch, enabling dynamic reweighting to mitigate the impact of unreliable patch correspondences. Furthermore, we introduce a contrastive attention regularization strategy, which leverages category-level information in a contrastive learning framework to promote class-consistent attention distributions. Extensive experiments on four benchmark datasets demonstrate that PATrans attains significant improvements over existing state-of-the-art UDA methods (e.g., 89.2% on the VisDA-2017). Code is available at: https://github.com/YSY145/PATrans.
Original languageEnglish
Pages (from-to)57-68
Number of pages12
JournalIEEE Transactions on Image Processing
Volume35
Early online date1 Jan 2025
DOIs
Publication statusPublished - 2026

Bibliographical note

Publisher Copyright:
© 1992-2012 IEEE.

Funding

This work was partially supported by National Natural Science Fund of China under Grants 92570110 and 62271090, Chongqing Natural Science Fund under Grant CSTB2024NSCQ-JQX0038, National Key R&D Program of China under Grant 2021YFB3100800 and National Youth Talent Project.

Keywords

  • Unsupervised Domain Adaptation
  • Vision Transformer
  • Attention Mechanism
  • Contrastive Learning

Fingerprint

Dive into the research topics of 'Token Calibration for Transformer-based Domain Adaptation'. Together they form a unique fingerprint.

Cite this