Skip to main navigation Skip to search Skip to main content

MAD-HAR: Privacy-Preserving On-Device Human Action Recognition via Multiagent LLM Debate

  • Xuecheng ZHOU
  • , Zepeng GU
  • , Chenyu ZUO
  • , Shan JIANG*
  • , Mingjin ZHANG
  • , Jianguo CHEN
  • *Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Human action recognition (HAR) is a fundamental component of ubiquitous computing, yet its wide-range applications are hindered by privacy concerns. Specifically, high-accuracy models typically require cloud-based processing that compromises sensitive visual data, while privacy-preserving on-device models suffer from limited reasoning capacities and frequent hallucinations. To resolve this conflict, we introduce multiagent debate for HAR (MAD-HAR), a novel framework designed for strictly local environments. MAD-HAR leverages a lightweight vision–language model (VLM) with a granular prompt to convert visual inputs into semantic captions, anonymizing data before inference. To mitigate reasoning failures, a heterogeneous ensemble of (N = 7) diverse small and medium language model agents (ranging from 8B to 14B parameters) engages in a structured multiround debate. Rather than outputting simple labels, agents are prompted to generate structured rationales to explicitly justify their logic, utilizing collaborative critique to override hallucinations. We evaluate our approach on public benchmarks. Preliminary experiments guided the selection of the optimal VLM backbone, while extensive main and ablation studies suggest that scaling to a seven-agent pool with rationale-driven debate synthesizes higher-order reasoning. Experimental results show that MAD-HAR significantly improves macro-F1, while maximizing consensus and yielding consistent net error rectification.

Original languageEnglish
Article number8714926
Number of pages13
JournalIET Software
Volume2026
Issue number1
Early online date10 Apr 2026
DOIs
Publication statusPublished - 2026

Bibliographical note

Publisher Copyright:
Copyright © 2026 Xuecheng Zhou et al. IET Software published by John Wiley & Sons Ltd.

Funding

This study was funded by the Research Grants Council Theme-based Research Scheme (Grant T43-513/23-N), the National Natural Science Foundation of China (Grant 62372486), the Guangxi Key Research and Development Program (Grant AB24010160), and the Guangdong Provincial Pearl River Talents Program (Grants 2023QN10X579 and 2024QN11X183).

Keywords

  • human action recognition
  • multiagent debate
  • on-device LLM
  • vision–language models

Fingerprint

Dive into the research topics of 'MAD-HAR: Privacy-Preserving On-Device Human Action Recognition via Multiagent LLM Debate'. Together they form a unique fingerprint.

Cite this