Combining Fine-Tuning and LLM-Based Agents for Intuitive Smart Contract Auditing with Justifications

Wei MA; Daoyuan WU; Yuqiang SUN; Tianwen WANG; Shangqing LIU; Jian ZHANG; Yue XUE; Yang LIU

doi:10.1109/ICSE55347.2025.00027

Combining Fine-Tuning and LLM-Based Agents for Intuitive Smart Contract Auditing with Justifications

Wei MA
, Daoyuan WU^*
, Yuqiang SUN
, Tianwen WANG
, Shangqing LIU
, Jian ZHANG
, Yue XUE
, Yang LIU

^*Corresponding author for this work

Research output: Book Chapters | Papers in Conference Proceedings › Conference paper (refereed) › Research › peer-review

10 Citations (Scopus)

Abstract

Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent research has shown that large language models (LLMs) have potential in auditing smart contracts, but the state-of-the-art indicates that even GPT-4 can achieve only 30% precision (when both decision and justification are correct). This is likely because off-the-shelf LLMs were primarily pre-trained on a general text/code corpus and not fine-tuned on the specific domain of Solidity smart contract auditing. In this paper, we propose iAudit, a general framework that combines fine-tuning and LLM-based agents for intuitive smart contract auditing with justifications. Specifically, iAudit is inspired by the observation that expert human auditors first perceive what could be wrong and then perform a detailed analysis of the code to identify the cause. As such, iAudit employs a two-stage fine-tuning approach: it first tunes a Detector model to make decisions and then tunes a Reasoner model to generate causes of vulnerabilities. However, fine-tuning alone faces challenges in accurately identifying the optimal cause of a vulnerability. Therefore, we introduce two LLM-based agents, the Ranker and Critic, to iteratively select and debate the most suitable cause of vulnerability based on the output of the fine-tuned Reasoner model. To evaluate iAudit, we collected a balanced dataset with 1,734 positive and 1,810 negative samples to fine-tune iAudit. We then compared it with traditional fine-tuned models (CodeBERT, GraphCodeBERT, CodeT5, and UnixCoder) as well as prompt learning-based LLMs (GPT4, GPT-3.5, and CodeLlama-13b/34b). On a dataset of 263 real smart contract vulnerabilities, iAudit achieves an F1 score of 91.21% and an accuracy of 91.11%. The causes generated by iAudit achieved a consistency of about 38% compared to the ground truth causes.

Original language	English
Title of host publication	Proceedings - 2025 IEEE/ACM 47th International Conference on Software Engineering, ICSE 2025
Publisher	IEEE Computer Society
Pages	1742-1754
Number of pages	13
ISBN (Electronic)	9798331505691
ISBN (Print)	9798331505707
DOIs	https://doi.org/10.1109/ICSE55347.2025.00027
Publication status	Published - Jul 2025
Externally published	Yes
Event	47th IEEE/ACM International Conference on Software Engineering - Ottawa, Canada Duration: 26 Apr 2025 → 6 May 2025

Conference

Conference	47th IEEE/ACM International Conference on Software Engineering
Abbreviated title	ICSE 2025
Country/Territory	Canada
City	Ottawa
Period	26/04/25 → 6/05/25

Bibliographical note

Publisher Copyright:
© 2025 IEEE.

Funding

This research/project is supported by the National Research Foundation, Singapore, and the Cyber Security Agency under its National Cybersecurity R&D Programme (NCRP25-P04-TAICeN), the National Research Foundation, Singapore, and DSO National Laboratories under the AI Singapore Programme (AISG Award No: AISG2-GC-2023-008), and NRF Investigatorship NRF-NRFI06-2020-0001. Any opinions, findings and conclusions or recommen-dations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore and Cyber Security Agency of Singapore. Daoyuan Wu was also partially supported by an HKUST grant.

Keywords

Smart Contract
Vulnerability Detection
Fine-tuning
LLM
Agent

Access to Document

10.1109/ICSE55347.2025.00027

Cite this

MA, W., WU, D., SUN, Y., WANG, T., LIU, S., ZHANG, J., XUE, Y., & LIU, Y. (2025). Combining Fine-Tuning and LLM-Based Agents for Intuitive Smart Contract Auditing with Justifications. In Proceedings - 2025 IEEE/ACM 47th International Conference on Software Engineering, ICSE 2025 (pp. 1742-1754). Article 11029966 IEEE Computer Society. https://doi.org/10.1109/ICSE55347.2025.00027

@inproceedings{1abe7d95d40044f5ae66f9f4cf09c6c7,

title = "Combining Fine-Tuning and LLM-Based Agents for Intuitive Smart Contract Auditing with Justifications",

abstract = "Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent research has shown that large language models (LLMs) have potential in auditing smart contracts, but the state-of-the-art indicates that even GPT-4 can achieve only 30\% precision (when both decision and justification are correct). This is likely because off-the-shelf LLMs were primarily pre-trained on a general text/code corpus and not fine-tuned on the specific domain of Solidity smart contract auditing. In this paper, we propose iAudit, a general framework that combines fine-tuning and LLM-based agents for intuitive smart contract auditing with justifications. Specifically, iAudit is inspired by the observation that expert human auditors first perceive what could be wrong and then perform a detailed analysis of the code to identify the cause. As such, iAudit employs a two-stage fine-tuning approach: it first tunes a Detector model to make decisions and then tunes a Reasoner model to generate causes of vulnerabilities. However, fine-tuning alone faces challenges in accurately identifying the optimal cause of a vulnerability. Therefore, we introduce two LLM-based agents, the Ranker and Critic, to iteratively select and debate the most suitable cause of vulnerability based on the output of the fine-tuned Reasoner model. To evaluate iAudit, we collected a balanced dataset with 1,734 positive and 1,810 negative samples to fine-tune iAudit. We then compared it with traditional fine-tuned models (CodeBERT, GraphCodeBERT, CodeT5, and UnixCoder) as well as prompt learning-based LLMs (GPT4, GPT-3.5, and CodeLlama-13b/34b). On a dataset of 263 real smart contract vulnerabilities, iAudit achieves an F1 score of 91.21\% and an accuracy of 91.11\%. The causes generated by iAudit achieved a consistency of about 38\% compared to the ground truth causes.",

keywords = "Smart Contract, Vulnerability Detection, Fine-tuning, LLM, Agent",

author = "Wei MA and Daoyuan WU and Yuqiang SUN and Tianwen WANG and Shangqing LIU and Jian ZHANG and Yue XUE and Yang LIU",

note = "Publisher Copyright: {\textcopyright} 2025 IEEE.; 47th IEEE/ACM International Conference on Software Engineering, ICSE 2025 ; Conference date: 26-04-2025 Through 06-05-2025",

year = "2025",

month = jul,

doi = "10.1109/ICSE55347.2025.00027",

language = "English",

isbn = "9798331505707",

pages = "1742--1754",

booktitle = "Proceedings - 2025 IEEE/ACM 47th International Conference on Software Engineering, ICSE 2025",

publisher = "IEEE Computer Society",

address = "United States",

}

MA, W, WU, D, SUN, Y, WANG, T, LIU, S, ZHANG, J, XUE, Y & LIU, Y 2025, Combining Fine-Tuning and LLM-Based Agents for Intuitive Smart Contract Auditing with Justifications. in Proceedings - 2025 IEEE/ACM 47th International Conference on Software Engineering, ICSE 2025., 11029966, IEEE Computer Society, pp. 1742-1754, 47th IEEE/ACM International Conference on Software Engineering, Ottawa, Canada, 26/04/25. https://doi.org/10.1109/ICSE55347.2025.00027

Combining Fine-Tuning and LLM-Based Agents for Intuitive Smart Contract Auditing with Justifications. / MA, Wei; WU, Daoyuan; SUN, Yuqiang et al.
Proceedings - 2025 IEEE/ACM 47th International Conference on Software Engineering, ICSE 2025. IEEE Computer Society, 2025. p. 1742-1754 11029966.

Research output: Book Chapters | Papers in Conference Proceedings › Conference paper (refereed) › Research › peer-review

TY - GEN

T1 - Combining Fine-Tuning and LLM-Based Agents for Intuitive Smart Contract Auditing with Justifications

AU - MA, Wei

AU - WU, Daoyuan

AU - SUN, Yuqiang

AU - WANG, Tianwen

AU - LIU, Shangqing

AU - ZHANG, Jian

AU - XUE, Yue

AU - LIU, Yang

PY - 2025/7

Y1 - 2025/7

N2 - Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent research has shown that large language models (LLMs) have potential in auditing smart contracts, but the state-of-the-art indicates that even GPT-4 can achieve only 30% precision (when both decision and justification are correct). This is likely because off-the-shelf LLMs were primarily pre-trained on a general text/code corpus and not fine-tuned on the specific domain of Solidity smart contract auditing. In this paper, we propose iAudit, a general framework that combines fine-tuning and LLM-based agents for intuitive smart contract auditing with justifications. Specifically, iAudit is inspired by the observation that expert human auditors first perceive what could be wrong and then perform a detailed analysis of the code to identify the cause. As such, iAudit employs a two-stage fine-tuning approach: it first tunes a Detector model to make decisions and then tunes a Reasoner model to generate causes of vulnerabilities. However, fine-tuning alone faces challenges in accurately identifying the optimal cause of a vulnerability. Therefore, we introduce two LLM-based agents, the Ranker and Critic, to iteratively select and debate the most suitable cause of vulnerability based on the output of the fine-tuned Reasoner model. To evaluate iAudit, we collected a balanced dataset with 1,734 positive and 1,810 negative samples to fine-tune iAudit. We then compared it with traditional fine-tuned models (CodeBERT, GraphCodeBERT, CodeT5, and UnixCoder) as well as prompt learning-based LLMs (GPT4, GPT-3.5, and CodeLlama-13b/34b). On a dataset of 263 real smart contract vulnerabilities, iAudit achieves an F1 score of 91.21% and an accuracy of 91.11%. The causes generated by iAudit achieved a consistency of about 38% compared to the ground truth causes.

AB - Smart contracts are decentralized applications built atop blockchains like Ethereum. Recent research has shown that large language models (LLMs) have potential in auditing smart contracts, but the state-of-the-art indicates that even GPT-4 can achieve only 30% precision (when both decision and justification are correct). This is likely because off-the-shelf LLMs were primarily pre-trained on a general text/code corpus and not fine-tuned on the specific domain of Solidity smart contract auditing. In this paper, we propose iAudit, a general framework that combines fine-tuning and LLM-based agents for intuitive smart contract auditing with justifications. Specifically, iAudit is inspired by the observation that expert human auditors first perceive what could be wrong and then perform a detailed analysis of the code to identify the cause. As such, iAudit employs a two-stage fine-tuning approach: it first tunes a Detector model to make decisions and then tunes a Reasoner model to generate causes of vulnerabilities. However, fine-tuning alone faces challenges in accurately identifying the optimal cause of a vulnerability. Therefore, we introduce two LLM-based agents, the Ranker and Critic, to iteratively select and debate the most suitable cause of vulnerability based on the output of the fine-tuned Reasoner model. To evaluate iAudit, we collected a balanced dataset with 1,734 positive and 1,810 negative samples to fine-tune iAudit. We then compared it with traditional fine-tuned models (CodeBERT, GraphCodeBERT, CodeT5, and UnixCoder) as well as prompt learning-based LLMs (GPT4, GPT-3.5, and CodeLlama-13b/34b). On a dataset of 263 real smart contract vulnerabilities, iAudit achieves an F1 score of 91.21% and an accuracy of 91.11%. The causes generated by iAudit achieved a consistency of about 38% compared to the ground truth causes.

KW - Smart Contract

KW - Vulnerability Detection

KW - Fine-tuning

KW - LLM

KW - Agent

UR - https://www.scopus.com/pages/publications/105010315630

U2 - 10.1109/ICSE55347.2025.00027

DO - 10.1109/ICSE55347.2025.00027

M3 - Conference paper (refereed)

AN - SCOPUS:105010315630

SN - 9798331505707

SP - 1742

EP - 1754

BT - Proceedings - 2025 IEEE/ACM 47th International Conference on Software Engineering, ICSE 2025

PB - IEEE Computer Society

T2 - 47th IEEE/ACM International Conference on Software Engineering

Y2 - 26 April 2025 through 6 May 2025

ER -

Combining Fine-Tuning and LLM-Based Agents for Intuitive Smart Contract Auditing with Justifications

Abstract

Conference

Bibliographical note

Funding

Keywords

Access to Document

Other files and links

Fingerprint

Cite this