Abstract
Large language models (LLMs) exhibit strong capabilities across multiple domains. However, their knowledge of international construction contracts and reliability in performing tasks related to this domain remain largely unexplored. This study introduces a multi-level benchmark comprising 1,131 questions designed to assess knowledge memorization, understanding, and application, supported by hybrid evaluation metrics. Testing 14 representative models reveals three key findings: 1) the effect of few-shot learning on accuracy is inconsistent, indicating its uncertain effectiveness; 2) while LLMs are capable of effectively answering questions requiring expertise in international construction contracts, they struggle with foundational knowledge elements such as concepts and factual details; 3) LLMs demonstrate relative strengths in relevance, professionalism, and clarity but exhibit significant shortcomings in accuracy, completeness, and referencing. This study provides a structured evaluation framework for both model selection and performance enhancement, while also establishing a foundation for future research in intelligent contract management systems by identifying strengths and weaknesses of current LLMs.
| Original language | English |
|---|---|
| Article number | 131754 |
| Number of pages | 23 |
| Journal | Expert Systems with Applications |
| Volume | 317 |
| Early online date | 3 Mar 2026 |
| DOIs | |
| Publication status | E-pub ahead of print - 3 Mar 2026 |
Bibliographical note
Publisher Copyright:© 2026 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Funding
This work was supported by the National Natural Science Foundation of China (Award Nos. 72031008).
Keywords
- International construction contracts
- Large language models
- Model evaluation
Fingerprint
Dive into the research topics of 'Benchmarking international construction contract knowledge of large language models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver