Context: Ensembles of learning machines and locality are considered two important topics for the next research frontier on Software Effort Estimation (SEE). Objectives We aim at (1) evaluating whether existing automated ensembles of learning machines generally improve SEEs given by single learning machines and which of them would be more useful; (2) analysing the adequacy of different locality approaches; and getting insight on (3) how to improve SEE and (4) how to evaluate/choose machine learning (ML) models for SEE. Method A principled experimental framework is used for the analysis and to provide insights that are not based simply on intuition or speculation. A comprehensive experimental study of several automated ensembles, single learning machines and locality approaches, which present features potentially beneficial for SEE, is performed. Additionally, an analysis of feature selection and regression trees (RTs), and an investigation of two tailored forms of combining ensembles and locality are performed to provide further insight on improving SEE. Results Bagging ensembles of RTs show to perform well, being highly ranked in terms of performance across different data sets, being frequently among the best approaches for each data set and rarely performing considerably worse than the best approach for any data set. They are recommended over other learning machines should an organisation have no resources to perform experiments to chose a model. Even though RTs have been shown to be more reliable locality approaches, other approaches such as k-Means and k-Nearest Neighbours can also perform well, in particular for more heterogeneous data sets. Conclusion Combining the power of automated ensembles and locality can lead to competitive results in SEE. By analysing such approaches, we provide several insights that can be used by future research in the area. © 2012 Elsevier B.V. All rights reserved.
Bibliographical noteThe authors would like to thank all the participants of PROMISE’11 and especially Dr. Tim Menzies and Prof. Martin Shepperd for the fruitful discussions and suggestions. We would also like to thank Dr. Peter Coxhead and Dr. Rami Bahsoon for their help and advice, and the anonymous reviewers for their constructive comments, which have helped to improve the quality of this paper significantly. This work was supported by EPSRC Grants (Nos. EP/D052785/1 and EP/J017515/1 ) and European Commission through its FP7 Grant (No. 270428 ).
- Empirical validation
- Ensembles of learning machines
- Software effort estimation