Abstract
Unsupervised Domain Adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain by learning domain-invariant representations. Motivated by the recent success of Vision Transformers (ViTs), several UDA approaches have adopted ViT architectures to exploit fine-grained patch-level representations, which are unified as Transformer-based Domain Adaptation (TransDA) independent of CNN-based. However, we have a key observation in TransDA: due to inherent domain shifts, patches (tokens) from different semantic categories across domains may exhibit abnormally high similarities, which can mislead the self-attention mechanism and degrade adaptation performance. To solve that, we propose a novel Patch-Adaptation Transformer (PATrans), which first identifies similarity-anomalous patches and then adaptively suppresses their negative impact to domain alignment, i.e. token calibration. Specifically, we introduce a Patch-Adaptation Attention (PAA) mechanism to replace the standard self-attention mechanism, which consists of a weight-shared triple-branch mixed attention mechanism and a patch-level domain discriminator. The mixed attention integrates self-attention and cross-attention to enhance intra-domain feature modeling and inter-domain similarity estimation. Meanwhile, the patch-level domain discriminator quantifies the anomaly probability of each patch, enabling dynamic reweighting to mitigate the impact of unreliable patch correspondences. Furthermore, we introduce a contrastive attention regularization strategy, which leverages category-level information in a contrastive learning framework to promote class-consistent attention distributions. Extensive experiments on four benchmark datasets demonstrate that PATrans attains significant improvements over existing state-of-the-art UDA methods (e.g., 89.2% on the VisDA-2017). Code is available at: https://github.com/YSY145/PATrans.
| Original language | English |
|---|---|
| Pages (from-to) | 57-68 |
| Number of pages | 12 |
| Journal | IEEE Transactions on Image Processing |
| Volume | 35 |
| Early online date | 1 Jan 2025 |
| DOIs | |
| Publication status | Published - 2026 |
Bibliographical note
Publisher Copyright:© 1992-2012 IEEE.
Funding
This work was partially supported by National Natural Science Fund of China under Grants 92570110 and 62271090, Chongqing Natural Science Fund under Grant CSTB2024NSCQ-JQX0038, National Key R&D Program of China under Grant 2021YFB3100800 and National Youth Talent Project.
Keywords
- Unsupervised Domain Adaptation
- Vision Transformer
- Attention Mechanism
- Contrastive Learning