Clustering is traditionally viewed as an unsupervised method for data analysis. However, several recent studies have shown that some limited prior instance-level knowledge can significantly improve the performance of clustering algorithm. This paper proposes a semi-supervised clustering algorithm termed as the Probabilistic and Graphical Model based Genetic Algorithm Driven Clustering with Instance-level Constraints (Cop-CGA). In Cop-CGA, all prior knowledge about pairs of instances that should or should not be classified into the same groups is denoted as a graph and all candidate clustering solutions are sampled from this graph with different orders to assign instances into a certain number of groups. We illustrate how to design the Cop-CGA to guarantee that all candidate solutions satisfy the given constraints and demonstrate the usefulness of background knowledge for genetic algorithm driven clustering algorithm through experiments on several real data sets with artificial hard constraints. One advantage of Cop-CGA is both positive and negative instance-level constraints can be easily incorporated. Moreover, the performance of Cop-CGA is not sensitive to the order of assignment of instances to groups. © 2008 IEEE.
|Title of host publication||2008 IEEE Congress on Evolutionary Computation, CEC 2008|
|Publication status||Published - 2008|