Surviving in Diverse Biases : Unbiased Dataset Acquisition in Online Data Market for Fair Model Training

Jiashi GAO, Ziwei WANG, Xiangyu ZHAO, Xin YAO, Xuetao WEI*

*Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Referred Conference Paperpeer-review

Abstract

The online data markets have emerged as a valuable source of diverse datasets for training machine learning (ML) models. However, datasets from different data providers may exhibit varying levels of bias with respect to certain sensitive attributes in the population (such as race, sex, age, and marital status). Recent dataset acquisition research has focused on maximizing accuracy improvements for downstream model training, ignoring the negative impact of biases in the acquired datasets, which can lead to an unfair model. Can a consumer obtain an unbiased dataset from datasets with diverse biases? In this work, we propose a fairness-aware data acquisition framework (FAIRDA) to acquire high-quality datasets that maximize both accuracy and fairness for consumer local classifier training while remaining within a limited budget. Given the biases of data commodities remain opaque to consumers, the data acquisition in FAIRDA employs explore-exploit strategies. Based on whether exploration and exploitation are conducted sequentially or alternately, we introduce two algorithms: the knowledge-based offline data acquisition (KDA) and the reward-based online data acquisition algorithms (RDA). Each algorithm is tailored to specific customer needs, giving the former an advantage in computational efficiency and the latter an advantage in robustness. We conduct experiments to demonstrate the effectiveness of the proposed data acquisition framework in steering users toward fairer model training compared to existing baselines under varying market settings.
Original languageEnglish
Title of host publicationProceedings of the Seventh AAAI/ACM Conference on AI, Ethics, and Society (AIES2024)
EditorsSanmay DAS, Brian Patrick GREEN, Kush VARSHNEY, Marianna GANAPINI, Andrea RENDA
PublisherAAAI press
Pages451-462
Volume7
ISBN (Electronic)9781577358923
Publication statusPublished - 16 Oct 2024

Funding

This work was supported by Key Programs of Guangdong Province under Grant2021QN02X166. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the funding parties.

Fingerprint

Dive into the research topics of 'Surviving in Diverse Biases : Unbiased Dataset Acquisition in Online Data Market for Fair Model Training'. Together they form a unique fingerprint.

Cite this