Semisupervised classification (SSC) consists of using both labeled and unlabeled data to classify unseen instances. Due to the large number of unlabeled data typically available, SSC algorithms must be able to handle large-scale data sets. Recently, various ensemble algorithms have been introduced with improved generalization performance when compared to single classifiers. However, existing ensemble methods are not able to handle typical large-scale data sets. We propose efficient cluster-based boosting (ECB), a multiclass SSC algorithm with cluster-based regularization that avoids generating decision boundaries in high-density regions. A semisupervised selection procedure reduces time and space complexities by selecting only the most informative unlabeled instances for the training of each base learner. We provide evidences to demonstrate that ECB is able to achieve good performance with small amounts of selected data and a relatively small number of base learners. Our experiments confirmed that ECB scales to large data sets while delivering comparable generalization to state-of-the-art methods. © 2012 IEEE.
|Number of pages
|IEEE Transactions on Neural Networks and Learning Systems
|Early online date
|21 Mar 2018
|Published - Nov 2018
Bibliographical noteThis work was supported in part by the National Key Research and Development Program of China under Grant 2016YFB1000905 and in part by the National Natural Science Foundation of China under Grant 91546116 and Grant 91746209.
- Cluster-based regularization
- ensemble learning
- multiclass classification
- semisupervised classification