Skip to main navigation Skip to search Skip to main content

U-MLLA: A Cognitive-Inspired Enhancement of Linear Attention for Medical Image Segmentation

  • Yufeng JIANG
  • , Zongxi LI*
  • , Xiangyan CHEN
  • , Haoran XIE
  • , Jing CAI*
  • *Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Referred Conference Paperpeer-review

Abstract

Medical image segmentation is fundamental to computer-assisted diagnosis but faces challenges across diverse imaging modalities. Linear attention mechanisms succeed in natural images but are limited in medical segmentation due to insufficient spatial dependency and tissue heterogeneity modeling. Research indicates that successful dense prediction requires balanced permutation variance, strong inductive capabilities, and precise absolute position information. Current linear attention approaches satisfy the first two requirements but critically lack the third, significantly impacting medical segmentation where spatial localization is essential. To address these limitations, we propose U-MLLA, which integrates U-Net with mamba-like linear attention (MLLA) for multiscale feature and context capture. We further introduce complementary conditional and absolute positional encoding (APE) to compensate for position information deficits in linear attention. Experiments show U-MLLA provides robust features, and the complementary strategies significantly improve multi-organ and tumor segmentation. APE particularly excels with complex structures requiring precise boundary delineation. This cognitively inspired architecture adapts 93% of ImageNet-1k weights and increases effectiveness. Comprehensive evaluations across six challenging datasets (e.g., FLARE22, AMOS22CT/MR, ACDC) and 24 tasks, U-MLLA achieves state-of-the-art performance with an average DSC of 88.32%, outperforming nnUNetV2-2D and SwinUNetR by 4.37% and 1.98%. These results highlight U-MLLA’s potential for clinical applications that require precise anatomical delineation, where APE is essential for maintaining spatial context and differentiating similar structures. The code is available at https://github.com/csyfjiang/U-MLLA.

Original languageEnglish
Title of host publicationPattern Recognition and Computer Vision - 8th Chinese Conference, PRCV 2025, Proceedings
EditorsJosef KITTLER, Hongkai XIONG, Weiyao LIN, Jian YANG, Xilin CHEN, Jiwen LU, Jingyi YU, Weishi ZHENG
PublisherSpringer Science and Business Media Deutschland GmbH
Pages76-90
Number of pages15
ISBN (Electronic)9789819556342
ISBN (Print)9789819556335
DOIs
Publication statusPublished - 21 Jan 2026
Event8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025 - Shanghai, China
Duration: 15 Oct 202518 Oct 2025

Publication series

NameLecture Notes in Computer Science
Volume16284 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2025
Country/TerritoryChina
CityShanghai
Period15/10/2518/10/25

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2026.

Funding

This work was supported by the General Research Fund (GRF 15104323) and the RGC Theme-based Research Scheme (project no. T45-401/22-N). Additional support was provided by Lingnan University through the Faculty Research Grant (No. SDS24A2, SDS24A8, SDS24A12, and SDS24A19), Direct Grant (DR25E8), and Lam Woo Research Fund (LWP20040), as well as by the Hong Kong Research Grants Council through the Faculty Development Scheme (Project No. UGC/FDS16/E10/23).

Keywords

  • Linear Attention in Vision
  • Medical Image Segmentation
  • Position Encoding
  • Semantic Segmentation
  • UNet

Fingerprint

Dive into the research topics of 'U-MLLA: A Cognitive-Inspired Enhancement of Linear Attention for Medical Image Segmentation'. Together they form a unique fingerprint.

Cite this