Abstract
The adoption of additional Other Project (OP) data has shown to be effective for online Just-In-Time Software Defect Prediction (JIT-SDP). However, state-of-the-art online Cross-Project (CP) methods, such as All-In-One (AIO) and Filtering, which operate at the data-level, encounter the difficulties in balancing diversity and validity of the selected OP data, which can negatively impact predictive performance. AIO may select unrelated OP data, resulting in a lack of validity, while Filtering tends to select OP data that closely resemble Target Project (TP) data, leading to a lack of diversity. To address this validity-vs-diversity challenge, a promising approach is to utilize an online project-level OP selection methodology. This approach selects instructive other projects that exhibit similarities to TP and can positively impact predictive performance, achieving better data validity compared to AIO and maintaining higher diversity compared to Filtering. To accomplish this, we propose a project-level Cross-Project method with Similarity (CroPS), which employs appropriate project-level similarity metrics to identify instructive other projects for model updating over time. CroPS applies a specified threshold to determine the selection of other projects at any given moment. Furthermore, we propose an ensemble-like framework called Multi-threshold CroPS (Multi-CroPS), which incorporates multiple threshold options for selecting other projects and poses the importance of defect-inducing changes. Experimental results based on 23 open-source projects validate the effectiveness of our project-level metrics for calculating similarities between projects. The results also demonstrate that CroPS significantly enhances the predictive performance while reducing computational costs compared to existing data-level CP approaches. Moreover, Multi-CroPS achieves significantly better performance than state-of-the-art CP approaches including our CroPS.
Original language | English |
---|---|
Article number | 158 |
Journal | Empirical Software Engineering |
Volume | 29 |
Issue number | 6 |
Early online date | 1 Oct 2024 |
DOIs | |
Publication status | Published - 1 Nov 2024 |
Funding
This work was supported by State Key Laboratory of Robotics (Grant No. 2023-O11), National Natural Science Foundation of China (Grant Nos. 62002148 and 62250710682), Harbin Institute of Technology Talent Start-up Project (Grant No. AUGA5710010924), Guangdong Provincial Key Laboratory (Grant No. 2020B121201001), the Program for Guangdong Introducing Innovative and Enterpreneurial Teams (Grant No. 2017ZT07X386), and Research Institute of Trustworthy Autonomous Systems (RITAS).
Keywords
- Cross-project
- Just-in-time software defect prediction
- Online learning
- Project-level similarity
- Verification latency