Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems

  • Yuqi PING*
  • , Tianhao LIANG*
  • , Huahao DING
  • , Guangyu LEI
  • , Junwei WU
  • , Xuan ZOU
  • , Kuan SHI
  • , Rui SHAO
  • , Chiya ZHANG
  • , Weizheng ZHANG
  • , Weijie YUAN
  • , Tingting ZHANG
  • *Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Recent breakthroughs in multimodal large language models (MLLMs) have endowed AI systems with unified perception, reasoning, and natural-language interaction across text, image, and video modalities. Meanwhile, unmanned aerial vehicle (UAV) swarms are increasingly deployed in dynamic, safety-critical missions that demand rapid situational awareness and autonomous adaptation. This paper explores potential solutions for integrating MLLMs with UAV swarms to enhance intelligence and adaptability across diverse tasks. Specifically, we first outline the fundamental architectures and functions of UAVs and MLLMs. Then, we present a comprehensive framework for an MLLM-enabled UAV swarm system and discuss the opportunities it offers. Next, we demonstrate the capabilities of the proposed framework through a forest firefighting case study that includes both simulation and real-world experiments. Finally, we discuss the challenges and future research directions for MLLM-enabled UAV swarms.

Original languageEnglish
Number of pages9
JournalIEEE Wireless Communications
DOIs
Publication statusPublished - 16 Dec 2025

Bibliographical note

Publisher Copyright:
© 2002-2012 IEEE.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62501194 and Grant 62171160, in part by Guangdong Provincial Key Laboratory (2024) under Grant 2024KSYS023, in part by China Postdoctoral Science Foundation under Grant GZC20252782, in part by Shenzhen Science and Technology Program under Grant KJZD20240903100022029, and in part by the Major Key Project of PCL under Grant PCL2024A07.

Keywords

  • UAV swarm
  • forest fire protection
  • multimodal large language models

Fingerprint

Dive into the research topics of 'Multimodal Large Language Models-Enabled UAV Swarm: Towards Efficient and Intelligent Autonomous Aerial Systems'. Together they form a unique fingerprint.

Cite this