Secure Dataset Condensation for Privacy-Preserving and Efficient Vertical Federated Learning

Dashan GAO*, Canhui WU, Xiaojin ZHANG, Xin YAO, Qiang YANG

*Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Referred Conference Paperpeer-review

Abstract

This work addresses the dual challenges of enhancing training efficiency and protecting data privacy in Vertical Federated Learning (VFL) through secure synthetic dataset generation. VFL typically involves an active party with labels collaborating with a passive party possessing features of the same set of samples. Traditional VFL methods, however, rely on training with entire datasets of sensitive real data, leading to two primary issues: 1) reduced training efficiency due to large dataset sizes, a concern exacerbated in cryptography-based training methods; and 2) potential privacy leakage at the sample level during training. To mitigate these issues, we introduce the Vertical Federated Dataset Condensation (VFDC) method. VFDC employs a novel mixed protection mechanism, integrating class-wise secure aggregation, differential privacy and repetitive initialization, to securely match the distributions of real and synthetic data. Empirical evaluations on six real-world datasets validate VFDC’s efficacy in generating small synthetic data for VFL, achieving a superior utility-privacy-efficiency trade-off during federated training.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases. Research Track - European Conference, ECML PKDD 2024, Proceedings
EditorsAlbert Bifet, Jesse Davis, Tomas Krilavičius, Meelis Kull, Eirini Ntoutsi, Indrė Žliobaitė
PublisherSpringer Science and Business Media Deutschland GmbH
Pages212-229
Number of pages18
ISBN (Print)9783031703409
DOIs
Publication statusPublished - 2024
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2024 - Vilnius, Lithuania
Duration: 9 Sept 202413 Sept 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14941 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2024
Country/TerritoryLithuania
CityVilnius
Period9/09/2413/09/24

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Keywords

  • Dataset condensation
  • Privacy protection
  • Training efficiency
  • Vertical federated learning

Fingerprint

Dive into the research topics of 'Secure Dataset Condensation for Privacy-Preserving and Efficient Vertical Federated Learning'. Together they form a unique fingerprint.

Cite this