Scalable graph-based semi-supervised learning through sparse Bayesian model

Bingbing JIANG, Huanhuan CHEN, Bo YUAN, Xin YAO

Research output: Journal PublicationsJournal Article (refereed)peer-review

52 Citations (Scopus)


Semi-supervised learning (SSL) concerns the problem of how to improve classifiers' performance through making use of prior knowledge from unlabeled data. Many SSL methods have been developed to integrate unlabeled data into the classifiers based on either the manifold or cluster assumption in recent years. In particular, the graph-based approaches, following the manifold assumption, have achieved a promising performance in many real-world applications. However, most of them work well on small-scale data sets only and lack probabilistic outputs. In this paper, a scalable graph-based SSL framework through sparse Bayesian model is proposed by defining a graph-based sparse prior. Based on the traditional Bayesian inference technique, a sparse Bayesian SSL algorithm (SBS $2$ L) is obtained, which can remove the irrelevant unlabeled samples and make probabilistic prediction for out-of-sample data. Moreover, in order to scale SBS $2$ L to large-scale data sets, an incremental SBS $2$ L (ISBS$2$ L) is derived. The key idea of ISBS $2$ L is employing an incremental strategy and sequentially selecting parts of unlabeled samples that contribute to the learning instead of using all available unlabeled samples directly. ISBS$2$ L has lower time and space complexities than previous SSL algorithms with the use of all unlabeled samples. Extensive experiments on various data sets verify that our algorithms can achieve comparable classification effectiveness and efficiency with much better scalability. Finally, the generalization error bound is derived based on robustness analysis. © 2012 IEEE.
Original languageEnglish
Article number8027086
Pages (from-to)2758-2771
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Issue number12
Early online date7 Sept 2017
Publication statusPublished - 1 Dec 2017
Externally publishedYes

Bibliographical note

The authors are grateful to the associate editor and the anonymous reviewers for their constructive comments. This work is supported in part by the National Key Research and Development Program of China (Grant No. 2016YFB1000905), the National Natural Science Foundation of China (Grant Nos. 61673363, 91546116, 61329302 and 61503357), and the Science and Technology Innovation Committee Foundation of Shenzhen (Grant Nos. ZDSYS201703031748284, and JCYJ20170307105521943). Xin Yao was also supported by a Royal Society Wolfson Research Merit Award.


  • Graph-based methods
  • Incremental learning
  • Large-scale data sets
  • Semi-supervised learning
  • Sparse Bayesian model


Dive into the research topics of 'Scalable graph-based semi-supervised learning through sparse Bayesian model'. Together they form a unique fingerprint.

Cite this