Mixed counting models that use the negative binomial distribution as the prior can well model over-dispersed and hierarchically dependent random variables; thus they have attracted much attention in mining dispersed document topics. However, the existing parameter inference method like Monte Carlo sampling is quite time-consuming. In this paper, we propose two efficient neural mixed counting models, i.e., the Negative Binomial-Neural Topic Model (NB-NTM) and the Gamma Negative Binomial-Neural Topic Model (GNB-NTM) for dispersed topic discovery. Neural variational inference algorithms are developed to infer model parameters by using the reparameterization of Gamma distribution and the Gaussian approximation of Poisson distribution. Experiments on real-world datasets indicate that our models outperform state-of-the-art baseline models in terms of perplexity and topic coherence. The results also validate that both NB-NTM and GNB-NTM can produce explainable intermediate variables by generating dispersed proportions of document topics.
|Title of host publication||Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics|
|Editors||Dan JURAFSKY, Joyce CHAI, Natalie SCHLUTER, Joel TETREAULT|
|Publisher||Association for Computational Linguistics (ACL)|
|Number of pages||11|
|Publication status||Published - Jul 2020|
|Event||58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 - Virtual, Online, United States|
Duration: 5 Jul 2020 → 10 Jul 2020
|Name||Proceedings of the Annual Meeting of the Association for Computational Linguistics|
|Conference||58th Annual Meeting of the Association for Computational Linguistics, ACL 2020|
|Period||5/07/20 → 10/07/20|
Bibliographical noteThe first two authors contributed equally to this work which was finished when Jiemin Wu was an undergraduate student of his final year.
We are grateful to the reviewers for their constructive comments and suggestions on this study. This work has been supported by the National Natural Science Foundation of China (61972426), Guangdong Basic and Applied Basic Research Foundation (2020A1515010536), HKIBS Research Seed Fund 2019/20 (190-009), the Research Seed Fund (102367), and LEO Dr David P. Chan Institute of Data Science of Lingnan University, Hong Kong. This work has also been supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (UGC/FDS16/E01/19), Hong Kong Research Grants Council through a General Research Fund (project no. PolyU 1121417), and by the Hong Kong Polytechnic University through a start-up fund (project no. 980V).