TY - JOUR
T1 - MANDO-LLM: Heterogeneous Graph Transformers with Large Language Models for Smart Contract Vulnerability Detection
AU - NGUYEN, Nhat-Minh
AU - NGUYEN, Hoang H.
AU - THANH, Long Le
AU - AHMADI, Zahra
AU - DOAN, Thanh-Nam
AU - WU, Daoyuan
AU - JIANG, Lingxiao
PY - 2025/12/3
Y1 - 2025/12/3
N2 - Detecting vulnerabilities in smart contracts is vital for the security and reliability of decentralized apps. To facilitate vulnerability detection, contract codes, including bug patterns, are represented as heterogeneous graphs with various nodes and edges, like control-flow and function-call graphs. However, existing graph learning techniques struggle with large, complex graphs. This paper presents MANDO-LLM, a novel framework that combines heterogeneous graph transformers (HGTs) with large language models (LLMs) for detecting vulnerabilities in smart contracts represented as heterogeneous contract graphs built upon control-flow and call graphs. MANDO-LLM uses LLMs to capture code features from control-flow and call data, customizes HGTs to learn embeddings with specific node-edge meta relations, and employs classifiers for vulnerability detection in Solidity code at both contract and line levels. Our evaluation shows that MANDO-LLM significantly outperforms existing methods on real-world large-scale imbalanced datasets, with F1-score improvements from 0.59% to 80.72% at the contract level. It is also one of the first effective methods for identifying line-level vulnerabilities, with performance boosts ranging from 3.09% to over 95% across different vulnerability types. MANDO-LLM’s versatility allows easy retraining for various vulnerabilities without needing manually defined patterns.
AB - Detecting vulnerabilities in smart contracts is vital for the security and reliability of decentralized apps. To facilitate vulnerability detection, contract codes, including bug patterns, are represented as heterogeneous graphs with various nodes and edges, like control-flow and function-call graphs. However, existing graph learning techniques struggle with large, complex graphs. This paper presents MANDO-LLM, a novel framework that combines heterogeneous graph transformers (HGTs) with large language models (LLMs) for detecting vulnerabilities in smart contracts represented as heterogeneous contract graphs built upon control-flow and call graphs. MANDO-LLM uses LLMs to capture code features from control-flow and call data, customizes HGTs to learn embeddings with specific node-edge meta relations, and employs classifiers for vulnerability detection in Solidity code at both contract and line levels. Our evaluation shows that MANDO-LLM significantly outperforms existing methods on real-world large-scale imbalanced datasets, with F1-score improvements from 0.59% to 80.72% at the contract level. It is also one of the first effective methods for identifying line-level vulnerabilities, with performance boosts ranging from 3.09% to over 95% across different vulnerability types. MANDO-LLM’s versatility allows easy retraining for various vulnerabilities without needing manually defined patterns.
UR - https://doi.org/10.1145/3765751
U2 - 10.1145/3765751
DO - 10.1145/3765751
M3 - Journal Article (refereed)
SN - 1049-331X
JO - ACM Transactions on Software Engineering and Methodology
JF - ACM Transactions on Software Engineering and Methodology
ER -