TY - JOUR
T1 - Distributed Semi-Supervised Learning with Consensus Consistency on Edge Devices
AU - CHEN, Hao-Rui
AU - YANG, Lei
AU - ZHANG, Xinglin
AU - SHEN, Jiaxing
AU - CAO, Jiannong
N1 - Publisher Copyright:
© 1990-2012 IEEE.
PY - 2024/2
Y1 - 2024/2
N2 - Distributed learning has been increasingly studied in edge computing, enabling edge devices to learn a model collaboratively without exchanging their private data. However, existing approaches assume the private data owned by edge devices are all labeled while the reality is that massive private data are unlabeled and remain to be utilized, which leads to suboptimal performance. To overcome this limitation, we study a new practical problem, Distributed Semi-Supervised Learning (DSSL), to learn models collaboratively with mixed private labeled and unlabeled data on each device. We also propose a novel method DistMatch that exploits private unlabeled data by self-training on each device with the help of models from neighboring devices. DistMatch generates pseudo-labels for unlabeled data by properly averaging the predictions of these received models. Furthermore, to avoid self-training with wrong pseudo-labels, DistMatch proposes a consensus consistency loss to filter pseudo-labels with high consensus and force the output of the trained model to be consistent with these pseudo-labels. Extensive evaluation results via our self-developed testbed indicate the proposed method outperforms all baselines on commonly used image classification benchmark datasets.
AB - Distributed learning has been increasingly studied in edge computing, enabling edge devices to learn a model collaboratively without exchanging their private data. However, existing approaches assume the private data owned by edge devices are all labeled while the reality is that massive private data are unlabeled and remain to be utilized, which leads to suboptimal performance. To overcome this limitation, we study a new practical problem, Distributed Semi-Supervised Learning (DSSL), to learn models collaboratively with mixed private labeled and unlabeled data on each device. We also propose a novel method DistMatch that exploits private unlabeled data by self-training on each device with the help of models from neighboring devices. DistMatch generates pseudo-labels for unlabeled data by properly averaging the predictions of these received models. Furthermore, to avoid self-training with wrong pseudo-labels, DistMatch proposes a consensus consistency loss to filter pseudo-labels with high consensus and force the output of the trained model to be consistent with these pseudo-labels. Extensive evaluation results via our self-developed testbed indicate the proposed method outperforms all baselines on commonly used image classification benchmark datasets.
KW - Consistency regularization
KW - distributed machine learning
KW - semi-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85179829931&partnerID=8YFLogxK
U2 - 10.1109/TPDS.2023.3340707
DO - 10.1109/TPDS.2023.3340707
M3 - Journal Article (refereed)
SN - 1045-9219
VL - 35
SP - 310
EP - 323
JO - IEEE Transactions on Parallel and Distributed Systems
JF - IEEE Transactions on Parallel and Distributed Systems
IS - 2
ER -