In classification tasks, active learning is often used to select out a set of informative examples from a big unlabeled dataset. The objective is to learn a classification pattern that can accurately predict labels of new examples by using the selection result which is expected to contain as few examples as possible. The selection of informative examples also reduces the manual effort for labeling, data complexity, and data redundancy, thus improves learning efficiency. In this paper, a new active learning strategy with pool-based settings, called inconsistency-based active learning, is proposed. This strategy is built up under the guidance of two classical works: (1) the learning philosophy of query-by-committee (QBC) algorithm; and (2) the structure of the traditional concept learning model: from-general-to-specific (GS) ordering. By constructing two extreme hypotheses of the current version space, the strategy evaluates unlabeled examples by a new sample selection criterion as inconsistency value, and the whole learning process could be implemented without any additional knowledge. Besides, since active learning is favorably applied to support vector machine (SVM) and its related applications, the strategy is further restricted to a specific algorithm called inconsistency-based active learning for SVM (I-ALSVM). By building up a GS structure, the sample selection process in our strategy is formed by searching through the initial version space. We compare the proposed I-ALSVM with several other pool-based methods for SVM on selected datasets. The experimental result shows that, in terms of generalization capability, our model exhibits good feasibility and competitiveness. © 2012 Elsevier Ltd.
Bibliographical noteThis work is partially supported by City University of Hong Kong Research Grant 9610025, and the National Natural Science Foundation of China under the Grant 71171080 .
- Active learning
- Concept learning
- Sample selection
- Support vector machine