In terms of the generative process, the Gamma-Gamma-Poisson Process (G2PP) is equivalent to the nonparametric topic model of Hierarchical Dirichlet Process (HDP). Considering the high computational cost of estimating parameters in HDP, a parallel G2PP was developed to generate topics efficiently via multi-threading. Unfortunately, the above model needs to predefine the number of topics. To address this issue, we first propose a Topic Self-Adaptive Model (TSAM) for nonparametric and parallel topic discovery. In TSAM, a monitor-executor mechanism is developed to manage the global topic information using a hierarchical structure of threads. Based on the apparatus of copulas, we further extend our TSAM to TSAMcop for coherent topic modeling by exploiting a copula guided parallel Gibbs sampling algorithm. Extensive experiments validate the effectiveness of both TSAM and TSAMcop.
|Title of host publication||Proceedings - 2023 IEEE 39th International Conference on Data Engineering, ICDE 2023|
|Number of pages||2|
|Publication status||Published - 26 Jul 2023|
|Event||2023 IEEE 39th International Conference on Data Engineering (ICDE) - Anaheim, United States|
Duration: 3 Apr 2023 → 7 Apr 2023
|Name||Proceedings - International Conference on Data Engineering|
|Conference||2023 IEEE 39th International Conference on Data Engineering (ICDE)|
|Period||3/04/23 → 7/04/23|
Bibliographical noteFunding Information:
*Corresponding author. This work was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (UGC/FDS16/E01/19), the National Natural Science Foundation of China (61972426), the Direct Grant (DR23B2) and the Faculty Research Grant (DB23A3) of Lingnan University, Hong Kong, a grant from the Research Grants Council of the HKSAR, China (Project: CityU 11507219), and a grant from the City University of Hong Kong SRG (Project: 7005780).
© 2023 IEEE.
- parallel gibbs sampling
- topic modelling