Can cross-company data improve performance in software effort estimation?

Leandro L. MINKU, Xin YAO

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

37 Citations (Scopus)


Background: There has been a long debate in the software engineering literature concerning how useful cross-company (CC) data are for software effort estimation (SEE) in comparison to within-company (WC) data. Studies indicate that models trained on CC data obtain either similar or worse performance than models trained solely on WC data. Aims: We aim at investigating if CC data could help to increase performance and under what conditions. Method: The work concentrates on the fact that SEE is a class of online learning tasks which operate in changing environments, even though most work so far has neglected that. We conduct an analysis based on the performance of different approaches considering CC and WC data. These are: (1) an approach not designed for changing environments, (2) approaches designed for changing environments and (3) a new online learning approach able to identify when CC data are helpful or detrimental. Results: Interesting features of data sets commonly used in the SEE literature are revealed, showing that different subsets of CC data can be beneficial or detrimental depending on the moment in time. The newly proposed approach is able to benefit from that, successfully using CC data to improve performance over WC models. Conclusions: This work not only shows that CC data can help to increase performance for SEE tasks, but also demonstrates that the online nature of software prediction tasks should be exploited, being an important issue to be considered in the future. Copyright © 2012 ACM.
Original languageEnglish
Title of host publicationACM International Conference Proceeding Series
Number of pages10
Publication statusPublished - 21 Sept 2012
Externally publishedYes


  • Chronological split
  • Concept drift
  • Cross-company estimation models
  • Ensembles of learning machines
  • Online learning
  • Software effort estimation


Dive into the research topics of 'Can cross-company data improve performance in software effort estimation?'. Together they form a unique fingerprint.

Cite this