Scalable model-based cluster analysis using clustering features

Huidong JIN, Kwong Sak LEUNG, Man Leung WONG, Zong Ben XU

Research output: Journal PublicationsJournal Article (refereed)peer-review

23 Citations (Scopus)

Abstract

We present two scalable model-based clustering systems based on a Gaussian mixture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm - EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably converges. The experiments show that our clustering systems run one or two orders of magnitude faster than the traditional EM algorithm with few losses of accuracy.
Original languageEnglish
Pages (from-to)637-649
Number of pages13
JournalPattern Recognition
Volume38
Issue number5
DOIs
Publication statusPublished - 1 May 2005

Bibliographical note

The authors appreciate the anonymous referees for their valuable comments to strengthen the paper, and T. Zhang, R. Ramakrishnan, M. Livny, and V. Ganti for the BIRCH source code.

Funding

The work was partially supported by RGC Grant CUHK 4212/01E of Hong Kong, Lingnan University direct grant (RES-021/200), RGC Grant LU 3009/02E of Hong Kong, and the Nature Science Foundation Project (No. 10371097) of China.

Keywords

  • Cluster analysis
  • Clustering feature
  • Convergence
  • Data mining
  • Expectation maximization
  • Gaussian mixture model
  • Scalable

Fingerprint

Dive into the research topics of 'Scalable model-based cluster analysis using clustering features'. Together they form a unique fingerprint.

Cite this