Abstract
Background: Software effort estimation (SEE) is a task of strategic importance in software management. Recently, some studies have attempted to use ensembles of learning machines for this task. Aims: We aim at (1) evaluating whether readily available ensemble methods generally improve SEE given by single learning machines and which of them would be more useful; getting insight on (2) how to improve SEE; and (3) how to choose machine learning (ML) models for SEE. Method: A principled and comprehensive statistical com- parison of three ensemble methods and three single learn- ers was carried out using thirteen data sets. Feature selec- tion and ensemble diversity analyses were performed to gain insight on how to improve SEE based on the approaches singled out. In addition, a risk analysis was performed to investigate the robustness to outliers. Therefore, the bet- ter understanding/insight provided by the paper is based on principled experiments, not just an intuition or speculation. Results: None of the compared methods is consistently the best, even though regression trees and bagging using mul- tilayer perceptrons (MLPs) are more frequently among the best. These two approaches usually perform similarly. Re- gression trees place more important features in higher levels of the trees, suggesting that feature weights are important when using ML models for SEE. The analysis of bagging with MLPs suggests that a self-tuning ensemble diversity method may help improving SEE. Conclusions: Ideally, principled experiments should be done in an individual basis to choose a model. If an organisa- tion has no resources for that, regression trees seem to be a good choice for its simplicity. The analysis also suggests approaches to improve SEE. Copyright © 2011 ACM.
Original language | English |
---|---|
Title of host publication | ACM International Conference Proceeding Series |
DOIs | |
Publication status | Published - 20 Sept 2011 |
Externally published | Yes |
Keywords
- Ensembles of learning machines
- Machine learning
- Software cost/effort estimation