Normalized vs Diplomatic Annotation: A Case Study of Automatic Information Extraction from Handwritten Uruguayan Birth Certificates

Natalia BOTTAIOLI*, Solène TARRIDE, Jérémy ANGER, Seginus MOWLAVI, Marina GARDELLA, Antoine TADROS, Gabriele FACCIOLO, Rafael Grompone VON GIOI, Christopher KERMORVANT, Jean Michel MOREL, Javier PRECIOZZI

*Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

2 Citations (Scopus)

Abstract

This study evaluates the recently proposed Document Attention Network (DAN) for extracting key-value information from Uruguayan birth certificates, handwritten in Spanish. We investigate two annotation strategies for automatically transcribing handwritten documents, fine-tuning DAN with minimal training data and annotation effort. Experiments were conducted on two datasets containing the same images (201 scans of birth certificates written by more than 15 different writers) but with different annotation methods. Our findings indicate that normalized annotation is more effective for fields that can be standardized, such as dates and places of birth, whereas diplomatic annotation performs much better for fields containing names and surnames, which can not be standardized.
Original languageEnglish
Title of host publicationDocument Analysis and Recognition: ICDAR 2024 Workshops, Proceedings
EditorsHarold MOUCHÈRE, Anna ZHU
PublisherSpringer, Cham
Pages40-54
Number of pages15
ISBN (Electronic)9783031706455
ISBN (Print)9783031706448
DOIs
Publication statusPublished - 2024
Externally publishedYes

Publication series

NameLecture Notes in Computer Science
Volume14935 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Bibliographical note

Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Funding

The research that originated the results presented in this publication was partly supported by the Agencia Nacional de Investigación e Innovación (ANII) and the France 2030 CollabNext project.

Keywords

  • Automatic information extraction
  • Birth certificates transcription
  • Handwritten text recognition
  • Normalized and diplomatic annotation

Fingerprint

Dive into the research topics of 'Normalized vs Diplomatic Annotation: A Case Study of Automatic Information Extraction from Handwritten Uruguayan Birth Certificates'. Together they form a unique fingerprint.

Cite this