Combining Fine-Tuning and LLM-Based Agents for Intuitive Smart Contract Auditing with Justifications

  • Wei MA
  • , Daoyuan WU*
  • , Yuqiang SUN
  • , Tianwen WANG
  • , Shangqing LIU
  • , Jian ZHANG
  • , Yue XUE
  • , Yang LIU
  • *Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

2 Citations (Scopus)

Abstract

Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent research has shown that large language models (LLMs) have potential in auditing smart contracts, but the state-of-the-art indicates that even GPT-4 can achieve only 30% precision (when both decision and justification are correct). This is likely because off-the-shelf LLMs were primarily pre-trained on a general text/code corpus and not fine-tuned on the specific domain of Solidity smart contract auditing. In this paper, we propose iAudit, a general framework that combines fine-tuning and LLM-based agents for intuitive smart contract auditing with justifications. Specifically, iAudit is inspired by the observation that expert human auditors first perceive what could be wrong and then perform a detailed analysis of the code to identify the cause. As such, iAudit employs a two-stage fine-tuning approach: it first tunes a Detector model to make decisions and then tunes a Reasoner model to generate causes of vulnerabilities. However, fine-tuning alone faces challenges in accurately identifying the optimal cause of a vulnerability. Therefore, we introduce two LLM-based agents, the Ranker and Critic, to iteratively select and debate the most suitable cause of vulnerability based on the output of the fine-tuned Reasoner model. To evaluate iAudit, we collected a balanced dataset with 1,734 positive and 1,810 negative samples to fine-tune iAudit. We then compared it with traditional fine-tuned models (CodeBERT, GraphCodeBERT, CodeT5, and UnixCoder) as well as prompt learning-based LLMs (GPT4, GPT-3.5, and CodeLlama-13b/34b). On a dataset of 263 real smart contract vulnerabilities, iAudit achieves an F1 score of 91.21% and an accuracy of 91.11%. The causes generated by iAudit achieved a consistency of about 38% compared to the ground truth causes.

Original languageEnglish
Title of host publicationProceedings - 2025 IEEE/ACM 47th International Conference on Software Engineering, ICSE 2025
PublisherIEEE Computer Society
Pages1742-1754
Number of pages13
ISBN (Electronic)9798331505691
ISBN (Print)9798331505707
DOIs
Publication statusPublished - Jul 2025
Externally publishedYes
Event47th IEEE/ACM International Conference on Software Engineering - Ottawa, Canada
Duration: 26 Apr 20256 May 2025

Conference

Conference47th IEEE/ACM International Conference on Software Engineering
Abbreviated title ICSE 2025
Country/TerritoryCanada
CityOttawa
Period26/04/256/05/25

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Funding

This research/project is supported by the National Research Foundation, Singapore, and the Cyber Security Agency under its National Cybersecurity R&D Programme (NCRP25-P04-TAICeN), the National Research Foundation, Singapore, and DSO National Laboratories under the AI Singapore Programme (AISG Award No: AISG2-GC-2023-008), and NRF Investigatorship NRF-NRFI06-2020-0001. Any opinions, findings and conclusions or recommen-dations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore and Cyber Security Agency of Singapore. Daoyuan Wu was also partially supported by an HKUST grant.

Keywords

  • Smart Contract
  • Vulnerability Detection
  • Fine-tuning
  • LLM
  • Agent

Fingerprint

Dive into the research topics of 'Combining Fine-Tuning and LLM-Based Agents for Intuitive Smart Contract Auditing with Justifications'. Together they form a unique fingerprint.

Cite this