Scalable model-based cluster analysis using clustering features

Huidong JIN, Kwong Sak LEUNG, Man Leung WONG, Zong Ben XU

Research output: Journal PublicationsJournal Article (refereed)

12 Citations (Scopus)

Abstract

We present two scalable model-based clustering systems based on a Gaussian mixture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm - EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably converges. The experiments show that our clustering systems run one or two orders of magnitude faster than the traditional EM algorithm with few losses of accuracy.
Original languageEnglish
Pages (from-to)637-649
Number of pages13
JournalPattern Recognition
Volume38
Issue number5
DOIs
Publication statusPublished - 1 May 2005

Fingerprint

Cluster analysis
Experiments

Keywords

  • Cluster analysis
  • Clustering feature
  • Convergence
  • Data mining
  • Expectation maximization
  • Gaussian mixture model
  • Scalable

Cite this

JIN, Huidong ; LEUNG, Kwong Sak ; WONG, Man Leung ; XU, Zong Ben. / Scalable model-based cluster analysis using clustering features. In: Pattern Recognition. 2005 ; Vol. 38, No. 5. pp. 637-649.
@article{01ec27d183e5411b91423784eec5a7f8,
title = "Scalable model-based cluster analysis using clustering features",
abstract = "We present two scalable model-based clustering systems based on a Gaussian mixture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm - EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably converges. The experiments show that our clustering systems run one or two orders of magnitude faster than the traditional EM algorithm with few losses of accuracy.",
keywords = "Cluster analysis, Clustering feature, Convergence, Data mining, Expectation maximization, Gaussian mixture model, Scalable",
author = "Huidong JIN and LEUNG, {Kwong Sak} and WONG, {Man Leung} and XU, {Zong Ben}",
year = "2005",
month = "5",
day = "1",
doi = "10.1016/j.patcog.2004.07.012",
language = "English",
volume = "38",
pages = "637--649",
journal = "Pattern Recognition",
issn = "0031-3203",
publisher = "Elsevier Ltd",
number = "5",

}

Scalable model-based cluster analysis using clustering features. / JIN, Huidong; LEUNG, Kwong Sak; WONG, Man Leung; XU, Zong Ben.

In: Pattern Recognition, Vol. 38, No. 5, 01.05.2005, p. 637-649.

Research output: Journal PublicationsJournal Article (refereed)

TY - JOUR

T1 - Scalable model-based cluster analysis using clustering features

AU - JIN, Huidong

AU - LEUNG, Kwong Sak

AU - WONG, Man Leung

AU - XU, Zong Ben

PY - 2005/5/1

Y1 - 2005/5/1

N2 - We present two scalable model-based clustering systems based on a Gaussian mixture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm - EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably converges. The experiments show that our clustering systems run one or two orders of magnitude faster than the traditional EM algorithm with few losses of accuracy.

AB - We present two scalable model-based clustering systems based on a Gaussian mixture model with independent attributes within clusters. They first summarize data into sub-clusters, and then generate Gaussian mixtures from their clustering features using a new algorithm - EMACF. EMACF approximates the aggregate behavior of each sub-cluster of data items in the Gaussian mixture model. It provably converges. The experiments show that our clustering systems run one or two orders of magnitude faster than the traditional EM algorithm with few losses of accuracy.

KW - Cluster analysis

KW - Clustering feature

KW - Convergence

KW - Data mining

KW - Expectation maximization

KW - Gaussian mixture model

KW - Scalable

UR - http://commons.ln.edu.hk/sw_master/2385

U2 - 10.1016/j.patcog.2004.07.012

DO - 10.1016/j.patcog.2004.07.012

M3 - Journal Article (refereed)

VL - 38

SP - 637

EP - 649

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

IS - 5

ER -