Abstract
In terms of the generative process, the Gamma-Gamma-Poisson Process (G2PP) is equivalent to the nonparametric topic model of Hierarchical Dirichlet Process (HDP). Considering the high computational cost of estimating parameters in HDP, a parallel G2PP was developed to generate topics efficiently via multi-threading. Unfortunately, the above model needs to predefine the number of topics. To address this issue, we first propose a Topic Self-Adaptive Model (TSAM) for nonparametric and parallel topic discovery. In TSAM, a monitor-executor mechanism is developed to manage the global topic information using a hierarchical structure of threads. Based on the apparatus of copulas, we further extend our TSAM to TSAMcop for coherent topic modeling by exploiting a copula guided parallel Gibbs sampling algorithm. Extensive experiments validate the effectiveness of both TSAM and TSAMcop.
Original language | English |
---|---|
Title of host publication | Proceedings - 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023 |
Pages | 3823-3824 |
Number of pages | 2 |
ISBN (Electronic) | 9798350322279 |
DOIs | |
Publication status | Published - 26 Jul 2023 |
Event | 2023 IEEE 39th International Conference on Data Engineering (ICDE) - Anaheim, United States Duration: 3 Apr 2023 → 7 Apr 2023 https://icde2023.ics.uci.edu/ |
Publication series
Name | Proceedings - International Conference on Data Engineering |
---|---|
Volume | 2023-April |
ISSN (Print) | 1084-4627 |
Conference
Conference | 2023 IEEE 39th International Conference on Data Engineering (ICDE) |
---|---|
Country/Territory | United States |
City | Anaheim |
Period | 3/04/23 → 7/04/23 |
Internet address |
Bibliographical note
Publisher Copyright:© 2023 IEEE.
Funding
This work was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (UGC/FDS16/E01/19), the National Natural Science Foundation of China (61972426), the Direct Grant (DR23B2) and the Faculty Research Grant (DB23A3) of Lingnan University, Hong Kong, a grant from the Research Grants Council of the HKSAR, China (Project: CityU 11507219), and a grant from the City University of Hong Kong SRG (Project: 7005780).
Keywords
- copulas
- parallel gibbs sampling
- topic modelling