Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models

Leandro L. MINKU, Xin YAO

Research output: Journal PublicationsJournal Article (refereed)peer-review

24 Citations (Scopus)


Software Effort Estimation (SEE) models can be used for decision-support by software managers to determine the effort required to develop a software project. They are created based on data describing projects completed in the past. Such data could include past projects from within the company that we are interested in (WC projects) and/or from other companies (cross-company, i.e., CC projects). In particular, the use of CC data has been investigated in an attempt to overcome limitations caused by the typically small size of WC datasets. However, software companies operate in non-stationary environments, where changes may affect the typical effort required to develop software projects. Our previous work showed that both WC and CC models of the past can become more or less useful over time, i.e., they can sometimes be helpful and sometimes misleading. So, how can we know if and when a model created based on past data represents well the current projects being estimated? We propose an approach called Dynamic Cross-company Learning (DCL) to dynamically identify which WC or CC past models are most useful for making predictions to a given company at the present. DCL automatically emphasizes the predictions given by these models in order to improve predictive performance. Our experiments comparing DCL against existing WC and CC approaches show that DCL is successful in improving SEE by emphasizing the most useful past models. A thorough analysis of DCL’s behaviour is provided, strengthening its external validity. © 2016, The Author(s).
Original languageEnglish
Pages (from-to)499-542
Number of pages44
JournalAutomated Software Engineering
Issue number3
Early online date28 Dec 2016
Publication statusPublished - Sept 2017
Externally publishedYes

Bibliographical note

This work was supported by an EPSRC Grant (No. EP/J017515/1).


  • Cross-company learning
  • Machine learning
  • Model-based software effort estimation
  • Non-stationary environments
  • Online learning


Dive into the research topics of 'Which models of the past are relevant to the present? A software effort estimation approach to exploiting useful past models'. Together they form a unique fingerprint.

Cite this