Autoencoding Keyword Correlation Graph for Document Clustering

Billy CHIU, Sunil Kumar SAHU, Derek THOMAS, Neha SENGUPTA, Mohammady MAHDY

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

14 Citations (Scopus)

Abstract

Document clustering requires a deep understanding of the complex structure of long-text; in particular, the intra-sentential (local) and inter-sentential features (global). Existing representation learning models do not fully capture these features. To address this, we present a novel graph-based representation for document clustering that builds a graph autoencoder (GAE) on a Keyword Correlation Graph. The graph is constructed with topical keywords as nodes and multiple local and global features as edges. A GAE is employed to aggregate the two sets of features by learning a latent representation which can jointly reconstruct them. Clustering is then performed on the learned representations, using vector dimensions as features for inducing document classes. Extensive experiments on two datasets show that the features learned by our approach can achieve better clustering performance than other existing features, including term frequency-inverse document frequency and average embedding.
Original languageEnglish
Title of host publicationProceedings of the 58th Annual Meeting of the Association for Computational Linguistics
EditorsDan JURAFSKY, Joyce CHAI, Natalie SCHLUTER, Joel TETREAULT
PublisherAssociation for Computational Linguistics (ACL)
Pages3974–3981
Number of pages8
ISBN (Electronic)9781952148255
DOIs
Publication statusPublished - Jul 2020
Externally publishedYes
EventThe 58th Annual Meeting of the Association for Computational Linguistics -
Duration: 5 Jul 202010 Jul 2020
https://virtual.acl2020.org/

Conference

ConferenceThe 58th Annual Meeting of the Association for Computational Linguistics
Abbreviated titleACL2020
Period5/07/2010/07/20
Internet address

Fingerprint

Dive into the research topics of 'Autoencoding Keyword Correlation Graph for Document Clustering'. Together they form a unique fingerprint.

Cite this