Skip to main navigation Skip to search Skip to main content

Benchmarking international construction contract knowledge of large language models

  • Hanzhao XIA
  • , Yongqiang CHEN
  • , Xuan QI
  • , Yu WANG*
  • *Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

Large language models (LLMs) exhibit strong capabilities across multiple domains. However, their knowledge of international construction contracts and reliability in performing tasks related to this domain remain largely unexplored. This study introduces a multi-level benchmark comprising 1,131 questions designed to assess knowledge memorization, understanding, and application, supported by hybrid evaluation metrics. Testing 14 representative models reveals three key findings: 1) the effect of few-shot learning on accuracy is inconsistent, indicating its uncertain effectiveness; 2) while LLMs are capable of effectively answering questions requiring expertise in international construction contracts, they struggle with foundational knowledge elements such as concepts and factual details; 3) LLMs demonstrate relative strengths in relevance, professionalism, and clarity but exhibit significant shortcomings in accuracy, completeness, and referencing. This study provides a structured evaluation framework for both model selection and performance enhancement, while also establishing a foundation for future research in intelligent contract management systems by identifying strengths and weaknesses of current LLMs.
Original languageEnglish
Article number131754
Number of pages23
JournalExpert Systems with Applications
Volume317
Early online date3 Mar 2026
DOIs
Publication statusE-pub ahead of print - 3 Mar 2026

Bibliographical note

Publisher Copyright:
© 2026 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.

Funding

This work was supported by the National Natural Science Foundation of China (Award Nos. 72031008).

Keywords

  • International construction contracts
  • Large language models
  • Model evaluation

Fingerprint

Dive into the research topics of 'Benchmarking international construction contract knowledge of large language models'. Together they form a unique fingerprint.

Cite this