Abstract
This work addresses the dual challenges of enhancing training efficiency and protecting data privacy in Vertical Federated Learning (VFL) through secure synthetic dataset generation. VFL typically involves an active party with labels collaborating with a passive party possessing features of the same set of samples. Traditional VFL methods, however, rely on training with entire datasets of sensitive real data, leading to two primary issues: 1) reduced training efficiency due to large dataset sizes, a concern exacerbated in cryptography-based training methods; and 2) potential privacy leakage at the sample level during training. To mitigate these issues, we introduce the Vertical Federated Dataset Condensation (VFDC) method. VFDC employs a novel mixed protection mechanism, integrating class-wise secure aggregation, differential privacy and repetitive initialization, to securely match the distributions of real and synthetic data. Empirical evaluations on six real-world datasets validate VFDC’s efficacy in generating small synthetic data for VFL, achieving a superior utility-privacy-efficiency trade-off during federated training.
Original language | English |
---|---|
Title of host publication | Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2024, Proceedings |
Editors | Albert Bifet, Jesse Davis, Tomas Krilavičius, Meelis Kull, Eirini Ntoutsi, Indrė Žliobaitė |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 212-229 |
Number of pages | 18 |
ISBN (Print) | 9783031703409 |
DOIs | |
Publication status | Published - 2024 |
Event | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2024 - Vilnius, Lithuania Duration: 9 Sept 2024 → 13 Sept 2024 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 14941 LNAI |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2024 |
---|---|
Country/Territory | Lithuania |
City | Vilnius |
Period | 9/09/24 → 13/09/24 |
Bibliographical note
Publisher Copyright:© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
Keywords
- Dataset condensation
- Privacy protection
- Training efficiency
- Vertical federated learning