TY - JOUR
T1 - Pre-trained Model-based Actionable Warning Identification: A Feasibility Study
AU - GE, Xiuting
AU - FANG, Chunrong
AU - ZHANG, Quanjun
AU - WU, Daoyuan
AU - YU, Bowen
AU - ZHENG, Qirui
AU - GUO, An
AU - LIN, Shangwei
AU - ZHAO, Zhihong
AU - LIU, Yang
AU - CHEN, Zhenyu
PY - 2025/11/18
Y1 - 2025/11/18
N2 - Actionable Warning Identification (AWI) plays a pivotal role in improving the usability of Static Code Analyzers (SCAs). Currently, Machine Learning (ML)-based AWI approaches, which mainly learn an AWI classifier from labeled warnings, are notably common. However, these approaches still face the problem of restricted performance due to the direct reliance on a limited number of labeled warnings to develop a classifier. Very recently, Pre-Trained Models (PTMs), which have been trained through billions of text/code tokens and have demonstrated substantial successful applications in various code-related tasks, could potentially address the above problem. Nevertheless, the performance of PTMs on AWI has not been systematically investigated, leaving a gap in understanding their pros and cons. In this paper, we are the first to explore the feasibility of applying various PTMs for AWI. By conducting an extensive evaluation on 12K+ warnings involving four commonly used SCAs (i.e., SpotBugs, Infer, CppCheck, and CSA) and three typical programming languages (i.e., Java, C, and C++), we (1) investigate the overall PTM-based AWI performance compared to the state-of-the-art ML-based AWI approach, (2) analyze the impact of three primary aspects (i.e., data preprocessing, model training, and model prediction) in the typical PTM-based AWI workflow, and (3) identify the reasons for the current underperformance of PTMs on AWI, thereby obtaining a series of findings. Based on the above findings, we further provide several potential directions to enhance PTM-based AWI.
AB - Actionable Warning Identification (AWI) plays a pivotal role in improving the usability of Static Code Analyzers (SCAs). Currently, Machine Learning (ML)-based AWI approaches, which mainly learn an AWI classifier from labeled warnings, are notably common. However, these approaches still face the problem of restricted performance due to the direct reliance on a limited number of labeled warnings to develop a classifier. Very recently, Pre-Trained Models (PTMs), which have been trained through billions of text/code tokens and have demonstrated substantial successful applications in various code-related tasks, could potentially address the above problem. Nevertheless, the performance of PTMs on AWI has not been systematically investigated, leaving a gap in understanding their pros and cons. In this paper, we are the first to explore the feasibility of applying various PTMs for AWI. By conducting an extensive evaluation on 12K+ warnings involving four commonly used SCAs (i.e., SpotBugs, Infer, CppCheck, and CSA) and three typical programming languages (i.e., Java, C, and C++), we (1) investigate the overall PTM-based AWI performance compared to the state-of-the-art ML-based AWI approach, (2) analyze the impact of three primary aspects (i.e., data preprocessing, model training, and model prediction) in the typical PTM-based AWI workflow, and (3) identify the reasons for the current underperformance of PTMs on AWI, thereby obtaining a series of findings. Based on the above findings, we further provide several potential directions to enhance PTM-based AWI.
U2 - 10.1145/3777369
DO - 10.1145/3777369
M3 - Journal Article (refereed)
SN - 1049-331X
JO - ACM Transactions on Software Engineering and Methodology
JF - ACM Transactions on Software Engineering and Methodology
ER -