Abstract
Human action recognition (HAR) is a fundamental component of ubiquitous computing, yet its wide-range applications are hindered by privacy concerns. Specifically, high-accuracy models typically require cloud-based processing that compromises sensitive visual data, while privacy-preserving on-device models suffer from limited reasoning capacities and frequent hallucinations. To resolve this conflict, we introduce multiagent debate for HAR (MAD-HAR), a novel framework designed for strictly local environments. MAD-HAR leverages a lightweight vision–language model (VLM) with a granular prompt to convert visual inputs into semantic captions, anonymizing data before inference. To mitigate reasoning failures, a heterogeneous ensemble of (N = 7) diverse small and medium language model agents (ranging from 8B to 14B parameters) engages in a structured multiround debate. Rather than outputting simple labels, agents are prompted to generate structured rationales to explicitly justify their logic, utilizing collaborative critique to override hallucinations. We evaluate our approach on public benchmarks. Preliminary experiments guided the selection of the optimal VLM backbone, while extensive main and ablation studies suggest that scaling to a seven-agent pool with rationale-driven debate synthesizes higher-order reasoning. Experimental results show that MAD-HAR significantly improves macro-F1, while maximizing consensus and yielding consistent net error rectification.
| Original language | English |
|---|---|
| Article number | 8714926 |
| Number of pages | 13 |
| Journal | IET Software |
| Volume | 2026 |
| Issue number | 1 |
| Early online date | 10 Apr 2026 |
| DOIs | |
| Publication status | Published - 2026 |
Bibliographical note
Publisher Copyright:Copyright © 2026 Xuecheng Zhou et al. IET Software published by John Wiley & Sons Ltd.
Funding
This study was funded by the Research Grants Council Theme-based Research Scheme (Grant T43-513/23-N), the National Natural Science Foundation of China (Grant 62372486), the Guangxi Key Research and Development Program (Grant AB24010160), and the Guangdong Provincial Pearl River Talents Program (Grants 2023QN10X579 and 2024QN11X183).
Keywords
- human action recognition
- multiagent debate
- on-device LLM
- vision–language models
Fingerprint
Dive into the research topics of 'MAD-HAR: Privacy-Preserving On-Device Human Action Recognition via Multiagent LLM Debate'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver