Principal component analysis for distributed data sets with updating

Zheng Jian BAI*, Raymond H. CHAN, Franklin T. LUK

*Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

53 Citations (Scopus)

Abstract

Identifying the patterns of large data sets is a key requirement in data mining. A powerful technique for this purpose is the principal component analysis (PCA). PCA-based clustering algorithms are effective when the data sets are found in the same location. In applications where the large data sets are physically far apart, moving huge amounts of data to a single location can become an impractical, or even impossible, task. A way around this problem was proposed in [10], where truncated singular value decompositions (SVDs) are computed locally and used to reduce the communication costs. Unfortunately, truncated SVDs introduce local approximation errors that could add up and would adversely affect the accuracy of the final PCA. In this paper, we introduce a new method to compute the PCA without incurring local approximation errors. In addition, we consider the situation of updating the PCA when new data arrive at the various locations.

Original languageEnglish
Title of host publicationAdvanced Parallel Processing Technologies: 6th International Workshop, APPT 2005, Proceedings
EditorsJiannong CAO, Wolfgang NEJDL, Ming XU
PublisherSpringer Berlin Heidelberg
Pages471-483
Number of pages13
ISBN (Electronic)9783540321071
ISBN (Print)9783540296393
DOIs
Publication statusPublished - 2005
Externally publishedYes
Event6th International Workshop on Advanced Parallel Processing Technologies, APPT 2005 - Hong Kong, China
Duration: 27 Oct 200528 Oct 2005

Publication series

NameLecture Notes in Computer Science
Volume3756
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference6th International Workshop on Advanced Parallel Processing Technologies, APPT 2005
Country/TerritoryChina
CityHong Kong
Period27/10/0528/10/05

Fingerprint

Dive into the research topics of 'Principal component analysis for distributed data sets with updating'. Together they form a unique fingerprint.

Cite this