Abstract
Semi-supervised learning (SSL) concerns the problem of how to improve classifiers' performance through making use of prior knowledge from unlabeled data. Many SSL methods have been developed to integrate unlabeled data into the classifiers based on either the manifold or cluster assumption in recent years. In particular, the graph-based approaches, following the manifold assumption, have achieved a promising performance in many real-world applications. However, most of them work well on small-scale data sets only and lack probabilistic outputs. In this paper, a scalable graph-based SSL framework through sparse Bayesian model is proposed by defining a graph-based sparse prior. Based on the traditional Bayesian inference technique, a sparse Bayesian SSL algorithm (SBS $2$ L) is obtained, which can remove the irrelevant unlabeled samples and make probabilistic prediction for out-of-sample data. Moreover, in order to scale SBS $2$ L to large-scale data sets, an incremental SBS $2$ L (ISBS$2$ L) is derived. The key idea of ISBS $2$ L is employing an incremental strategy and sequentially selecting parts of unlabeled samples that contribute to the learning instead of using all available unlabeled samples directly. ISBS$2$ L has lower time and space complexities than previous SSL algorithms with the use of all unlabeled samples. Extensive experiments on various data sets verify that our algorithms can achieve comparable classification effectiveness and efficiency with much better scalability. Finally, the generalization error bound is derived based on robustness analysis. © 2012 IEEE.
Original language | English |
---|---|
Article number | 8027086 |
Pages (from-to) | 2758-2771 |
Number of pages | 14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 29 |
Issue number | 12 |
Early online date | 7 Sept 2017 |
DOIs | |
Publication status | Published - 1 Dec 2017 |
Externally published | Yes |
Funding
The authors are grateful to the associate editor and the anonymous reviewers for their constructive comments. This work is supported in part by the National Key Research and Development Program of China (Grant No. 2016YFB1000905), the National Natural Science Foundation of China (Grant Nos. 61673363, 91546116, 61329302 and 61503357), and the Science and Technology Innovation Committee Foundation of Shenzhen (Grant Nos. ZDSYS201703031748284, and JCYJ20170307105521943). Xin Yao was also supported by a Royal Society Wolfson Research Merit Award.
Keywords
- Graph-based methods
- Incremental learning
- Large-scale data sets
- Semi-supervised learning
- Sparse Bayesian model