Abstract
Image fusion generates more information-rich integrated images by integrating multi-source image data, and boasts extensive applications in fields such as environmental monitoring and battlefield reconnaissance. Although existing technologies can achieve favorable multi-modal image fusion in non-interfering scenarios and aid in the presentation of scene information, they exhibit significant limitations when confronted with the degradation of low-quality source images induced by extreme scenarios, including low light, haze, rain and snow, motion blur, and strong noise interference. To tackle these challenges, this paper proposes a novel anti-degradation multi-modal image fusion framework guided by supplementary text instructions. Firstly, the Low-rank mechanism is employed to conduct targeted fine-tuning on the CLIP vision-language large model, enabling accurate capture of scene degradation features such as low light, noise, and motion blur, and facilitating the generation of semantic descriptions. Secondly, a heterogeneous mixture-of experts feature extraction backbone network is designed, which leverages scene degradation semantic descriptions to dynamically regulate heterogeneous expert modules. Meanwhile, a mutual information loss function is introduced to optimize the sparsity of experts during multi-task processing, thereby avoiding redundant computations. Finally, each module undergoes training-testing collaborative optimization to enhance the controllability of the fusion process and improve the generalization capability of the model. Experiments on 4 mainstream benchmark datasets (LLVIP, MSRS, M3FD, ROAD) demonstrate that the proposed text-semantic-guided sparse mixture-of-experts mechanism exhibits significant advantages over state-of-the-art methods in terms of image fusion performance and degradation correction. This research outcome provides new insights for multi-modal image fusion in complex degraded environments and is expected to promote technological advancement in fields such as remote sensing images and medical images.
| Original language | English |
|---|---|
| Title of host publication | Proceedings: 2025 International Conference on Virtual Reality and Visualization: ICVRV 2025 |
| Publisher | IEEE |
| Pages | 447-452 |
| Number of pages | 6 |
| ISBN (Electronic) | 9798331556297 |
| ISBN (Print) | 9798331556303 |
| DOIs | |
| Publication status | Published - 2025 |
| Externally published | Yes |
| Event | 2025 International Conference on Virtual Reality and Visualization (ICVRV) - Bogota, Colombia Duration: 19 Dec 2025 → 21 Dec 2025 |
Conference
| Conference | 2025 International Conference on Virtual Reality and Visualization (ICVRV) |
|---|---|
| Country/Territory | Colombia |
| City | Bogota |
| Period | 19/12/25 → 21/12/25 |
Keywords
- Multi-Modal Image Fusion
- Degradation Correction
- Multi-Task Sparse Model
- Multi-Task Incompatibility
Fingerprint
Dive into the research topics of 'Text-Guided Sparse Mixture of Experts for Degradation-Robust Multi-Modal Image Fusion'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver