Advancing automatic assessment of target-language quality in interpreter training with large language models: insights from explainable AI

  • Xiaoman WANG*
  • , Binhua WANG*
  • *Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

2 Citations (Scopus)

Abstract

Assessment of target-language quality in interpreting is considered one of the most important aspects in interpreter training, which is very time- and effort-consuming but has been under-explored in previous studies, particularly in leveraging AI technologies to facilitate automatic assessment. This study investigates the capability of LLMs, specifically GPT and Claude, in facilitating automatic assessment of target-language quality in interpreting. We conducted a descriptive analysis of the scores generated by LLMs and correlated them with human evaluation. Additionally, we examined the processes and rating criteria of LLMs by comparing revisions made by these models to enhance target-language quality. Our analysis of the differences between human evaluation and LLM scores, along with feedback from LLMs on their scoring rationale, suggests that LLMs can be applied to assess target-language quality in interpreting. The study revealed that while there is a general alignment between human and automatic assessments, discrepancies occur in approximately 7.6% of the cases. These discrepancies often involve differences in sentence structure, complexity, vocabulary, register and flow, underscoring divergent perceptions of quality between humans and LLMs. This study indicates the potential of applying AI technology to supplement traditional human evaluations.
Original languageEnglish
Pages (from-to)465-485
Number of pages21
JournalInterpreter and Translator Trainer
Volume19
Issue number3-4
Early online date17 Jul 2025
DOIs
Publication statusPublished - 2025
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2025 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.

Keywords

  • Automatic assessment of interpreting
  • automatic metrics
  • explainable AI
  • large language models
  • target-language quality

Fingerprint

Dive into the research topics of 'Advancing automatic assessment of target-language quality in interpreter training with large language models: insights from explainable AI'. Together they form a unique fingerprint.

Cite this