Online cross-project approach with project-level similarity for just-in-time software defect prediction

Cong TENG, Liyan SONG, Xin YAO

Research output: Journal PublicationsJournal Article (refereed)peer-review

Abstract

The adoption of additional Other Project (OP) data has shown to be effective for online Just-In-Time Software Defect Prediction (JIT-SDP). However, state-of-the-art online Cross-Project (CP) methods, such as All-In-One (AIO) and Filtering, which operate at the data-level, encounter the difficulties in balancing diversity and validity of the selected OP data, which can negatively impact predictive performance. AIO may select unrelated OP data, resulting in a lack of validity, while Filtering tends to select OP data that closely resemble Target Project (TP) data, leading to a lack of diversity. To address this validity-vs-diversity challenge, a promising approach is to utilize an online project-level OP selection methodology. This approach selects instructive other projects that exhibit similarities to TP and can positively impact predictive performance, achieving better data validity compared to AIO and maintaining higher diversity compared to Filtering. To accomplish this, we propose a project-level Cross-Project method with Similarity (CroPS), which employs appropriate project-level similarity metrics to identify instructive other projects for model updating over time. CroPS applies a specified threshold to determine the selection of other projects at any given moment. Furthermore, we propose an ensemble-like framework called Multi-threshold CroPS (Multi-CroPS), which incorporates multiple threshold options for selecting other projects and poses the importance of defect-inducing changes. Experimental results based on 23 open-source projects validate the effectiveness of our project-level metrics for calculating similarities between projects. The results also demonstrate that CroPS significantly enhances the predictive performance while reducing computational costs compared to existing data-level CP approaches. Moreover, Multi-CroPS achieves significantly better performance than state-of-the-art CP approaches including our CroPS.
Original languageEnglish
Article number158
JournalEmpirical Software Engineering
Volume29
Issue number6
Early online date1 Oct 2024
DOIs
Publication statusPublished - 1 Nov 2024

Funding

This work was supported by State Key Laboratory of Robotics (Grant No. 2023-O11), National Natural Science Foundation of China (Grant Nos. 62002148 and 62250710682), Harbin Institute of Technology Talent Start-up Project (Grant No. AUGA5710010924), Guangdong Provincial Key Laboratory (Grant No. 2020B121201001), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), and Research Institute of Trustworthy Autonomous Systems (RITAS).

Keywords

  • Cross-project
  • Just-in-time software defect prediction
  • Online learning
  • Project-level similarity
  • Verification latency

Fingerprint

Dive into the research topics of 'Online cross-project approach with project-level similarity for just-in-time software defect prediction'. Together they form a unique fingerprint.

Cite this