Skip to main navigation Skip to search Skip to main content

Text-Guided Sparse Mixture of Experts for Degradation-Robust Multi-Modal Image Fusion

  • Cheng FENG
  • , Yunchao ZHANG
  • , Rongchen LING
  • , Luoluo WANG

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

Abstract

Image fusion generates more information-rich integrated images by integrating multi-source image data, and boasts extensive applications in fields such as environmental monitoring and battlefield reconnaissance. Although existing technologies can achieve favorable multi-modal image fusion in non-interfering scenarios and aid in the presentation of scene information, they exhibit significant limitations when confronted with the degradation of low-quality source images induced by extreme scenarios, including low light, haze, rain and snow, motion blur, and strong noise interference. To tackle these challenges, this paper proposes a novel anti-degradation multi-modal image fusion framework guided by supplementary text instructions. Firstly, the Low-rank mechanism is employed to conduct targeted fine-tuning on the CLIP vision-language large model, enabling accurate capture of scene degradation features such as low light, noise, and motion blur, and facilitating the generation of semantic descriptions. Secondly, a heterogeneous mixture-of experts feature extraction backbone network is designed, which leverages scene degradation semantic descriptions to dynamically regulate heterogeneous expert modules. Meanwhile, a mutual information loss function is introduced to optimize the sparsity of experts during multi-task processing, thereby avoiding redundant computations. Finally, each module undergoes training-testing collaborative optimization to enhance the controllability of the fusion process and improve the generalization capability of the model. Experiments on 4 mainstream benchmark datasets (LLVIP, MSRS, M3FD, ROAD) demonstrate that the proposed text-semantic-guided sparse mixture-of-experts mechanism exhibits significant advantages over state-of-the-art methods in terms of image fusion performance and degradation correction. This research outcome provides new insights for multi-modal image fusion in complex degraded environments and is expected to promote technological advancement in fields such as remote sensing images and medical images.
Original languageEnglish
Title of host publicationProceedings: 2025 International Conference on Virtual Reality and Visualization: ICVRV 2025
PublisherIEEE
Pages447-452
Number of pages6
ISBN (Electronic)9798331556297
ISBN (Print)9798331556303
DOIs
Publication statusPublished - 2025
Externally publishedYes
Event2025 International Conference on Virtual Reality and Visualization (ICVRV) - Bogota, Colombia
Duration: 19 Dec 202521 Dec 2025

Conference

Conference2025 International Conference on Virtual Reality and Visualization (ICVRV)
Country/TerritoryColombia
CityBogota
Period19/12/2521/12/25

Keywords

  • Multi-Modal Image Fusion
  • Degradation Correction
  • Multi-Task Sparse Model
  • Multi-Task Incompatibility

Fingerprint

Dive into the research topics of 'Text-Guided Sparse Mixture of Experts for Degradation-Robust Multi-Modal Image Fusion'. Together they form a unique fingerprint.

Cite this