An Expectation-Maximization Algorithm Working on Data Summary

Huidong JIN, Kwong Sak LEUNG, Man Leung WONG

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Research

36 Downloads (Pure)


Scalable cluster analysis addresses the problem of processing large data sets with limited resources, e.g., memory and computation time. A data summarization or sampling procedure is an essential step of most scalable algorithms. It forms a compact representation of the data. Based on it, traditional clustering algorithms can process large data sets efficiently. However, there is little work on how to effectively perform cluster analysis on data summaries. From the principle of the general expectation-maximization algorithm, we propose a model-based clustering algorithm to make better use of these data summaries in this paper. The proposed EMACF (Expectation-Maximization Algorithm on Clustering Features) algorithm employs data summary features including weight, mean, and variance explicitly. We prove that EMACF converges to a local maximum likelihood value. The computation time of EMACF is linear with the number of data summaries instead of the number of data items, and thus can be integrated with any efficient data summarization procedure to construct a scalable clustering algorithm.
Original languageEnglish
Title of host publicationProceedings of Second international workshop on Intelligent systems design and application
PublisherDynamic Publishers, Inc.
ISBN (Print)9780964039803
Publication statusPublished - 2002
EventProceedings of the Second International Workshop on Intelligent Systems Design and Applications -
Duration: 1 Jan 20021 Jan 2002


ConferenceProceedings of the Second International Workshop on Intelligent Systems Design and Applications
OtherAn organization


Dive into the research topics of 'An Expectation-Maximization Algorithm Working on Data Summary'. Together they form a unique fingerprint.

Cite this