Abstract
This paper proposes a novel profiling method for SELDI-TOF and MALDI-TOF MS data that integrates a novel peak detection method based on modified smoothed non-linear energy operator, correlation-based peak selection and Bayesian additive regression trees. The peak detection and classification performance of the proposed approach is validated on two publicly available MS data sets, namely MALDI-TOF simulation data and high-resolution SELDI-TOF ovarian cancer data. The results compared favorably with three state-of-the-art peak detection algorithms and four machine-learning algorithms. For the high-resolution ovarian cancer data set, seven biomarkers (m/z windows) were found by our method, which achieved 97.30 and 99.10% accuracy at 25th and 75th percentiles, respectively, from 50 independent cross-validation samples, which is significantly better than other profiling and dimensional reduction methods. The results show that the method is capable of finding parsimonious sets of biologically meaningful biomarkers with better accuracy than existing methods. Supporting Information material and MATLAB/R scripts to implement the methods described in the article are available at: http://www.cs.bham.ac.uk/szh/SourceCodefor-Proteomics.zip. © 2009 Wiley-VCH Verlag GmbH & Co. KGaA.
Original language | English |
---|---|
Pages (from-to) | 4176-4191 |
Number of pages | 16 |
Journal | Proteomics |
Volume | 9 |
Issue number | 17 |
Early online date | 31 Aug 2009 |
DOIs | |
Publication status | Published - Sept 2009 |
Externally published | Yes |
Keywords
- Bioinformatics
- Cancer diagnosis
- Machine learning
- MS
- Peak detection