Abstract
Recent breakthroughs in multimodal large language models (MLLMs) have endowed AI systems with unified perception, reasoning, and natural-language interaction across text, image, and video modalities. Meanwhile, unmanned aerial vehicle (UAV) swarms are increasingly deployed in dynamic, safety-critical missions that demand rapid situational awareness and autonomous adaptation. This paper explores potential solutions for integrating MLLMs with UAV swarms to enhance intelligence and adaptability across diverse tasks. Specifically, we first outline the fundamental architectures and functions of UAVs and MLLMs. Then, we present a comprehensive framework for an MLLM-enabled UAV swarm system and discuss the opportunities it offers. Next, we demonstrate the capabilities of the proposed framework through a forest firefighting case study that includes both simulation and real-world experiments. Finally, we discuss the challenges and future research directions for MLLM-enabled UAV swarms.
| Original language | English |
|---|---|
| Number of pages | 9 |
| Journal | IEEE Wireless Communications |
| DOIs | |
| Publication status | Published - 16 Dec 2025 |
Bibliographical note
Publisher Copyright:© 2002-2012 IEEE.
Funding
This work was supported in part by the National Natural Science Foundation of China under Grant 62501194 and Grant 62171160, in part by Guangdong Provincial Key Laboratory (2024) under Grant 2024KSYS023, in part by China Postdoctoral Science Foundation under Grant GZC20252782, in part by Shenzhen Science and Technology Program under Grant KJZD20240903100022029, and in part by the Major Key Project of PCL under Grant PCL2024A07.
Keywords
- UAV swarm
- forest fire protection
- multimodal large language models