DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images

Ya-Ling LI, Yong FENG*, Ming-Liang ZHOU, Xian-cai XIONG, Yong-heng WANG, Bao-hua QIANG

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

2 Citations (Scopus)


Unmanned aerial vehicles are increasingly popular due to their ease of operation, low noise, and portability. However, existing object detection methods perform poorly in detecting small targets in densely arranged, sparsely distributed aerial images. To tackle this issue, we enhanced the general object detection method YOLOv5 and introduced a multi-scale detection method called Detach-Merge Attention YOLO (DMA-YOLO). Specifically, we proposed a Detach-Merge Convolution (DMC) module and embedded it into the backbone network to maximize feature retention. Furthermore, we embedded the Bottleneck Attention Module (BAM) into the detection head to suppress interference from complex background information without significantly increasing computational complexity. To represent and process multi-scale features more effectively, we have integrated an extra detection head and enhanced the neck network into the Bi-directional Feature Pyramid Network (BiFPN) structure. Finally, we adopted the SCYLLA-IoU (SIoU) as a loss function to expedite the convergence rate of our model and enhance the precision of detection results. A series of experiments on the VisDrone2019 and UAVDT datasets have illustrated the effectiveness of DMA-YOLO. Code is available at

Original languageEnglish
Pages (from-to)4505-4518
Number of pages14
JournalVisual Computer
Issue number6
Publication statusPublished - Jun 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2023.


  • Aerial images
  • Attention mechanism
  • Object detection
  • YOLOv5


Dive into the research topics of 'DMA-YOLO: multi-scale object detection method with attention mechanism for aerial images'. Together they form a unique fingerprint.

Cite this