Optimal and Efficient Binary Questioning for Accelerated Annotation

Franco MARCHESONI-ACLAND, Jean-Michel MOREL, Josselin KHERROUBI, Gabriele FACCIOLO

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Even though data annotation is extremely important for interpretability, research, and development of artificial intelligence solutions, annotating data remains costly. Research efforts such as active learning or few-shot learning alleviate the cost by increasing sample efficiency, yet the problem of annotating data more quickly has received comparatively little attention. Leveraging a predictor has been shown to reduce annotation cost in practice but has not been theoretically considered. We ask the following question: to annotate a binary classification dataset with N samples, can the annotator answer less than N yes/no questions? Framing this question- and-answer (Q&A) game as an optimal encoding problem, we find a positive answer given by the Huffman encoding of the possible labelings. Unfortunately, the algorithm is computationally intractable even for small dataset sizes. As a practical method, we propose to minimize a cost function a few steps ahead, similarly to lookahead minimization in optimal control. This solution is analyzed, compared with the optimal one, and evaluated using several synthetic and real-world datasets. The method allows a significant improvement (23−86%) in the annotation efficiency of real-world datasets.
Original languageEnglish
Pages (from-to)14336-14343
Number of pages8
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume39
Issue number13
DOIs
Publication statusPublished - 11 Apr 2025
Externally publishedYes
Event39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States
Duration: 25 Feb 20254 Mar 2025

Bibliographical note

Publisher Copyright:
Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Funding

This work was supported by a CIFRE grant from ANRT. It was also partially financed by ANII Uruguay. Centre Borelli is also with Université Paris Cité, SSA and INSERM.

Fingerprint

Dive into the research topics of 'Optimal and Efficient Binary Questioning for Accelerated Annotation'. Together they form a unique fingerprint.

Cite this