Abstract
First Story Detection (FSD) aims to identify the first story for an emerging event previously unreported, which is essential to practical applications in news analysis, intelligence gathering, and national security. Compared to information retrieval, text clustering, text classification, and other subject-based tasks, FSD is event-based and thus faces the challenging issues of multiple events on the same subject and the evolution of events. To tackle these challenges, several schemes for exploiting temporal information, named entity, and topic modeling, have been proposed for FSD. In this paper, we present a new term weighting scheme called LGT, which jointly models the Local element, Global element, and Topical association of each story. An unsupervised algorithm based on LGT is then devised and applied to FSD. We evaluate 4 feature reduction strategies and test our LGT scheme on an online model. Experiments show that our approach yields better results than existing baseline schemes on both retrospective and online FSD.
Original language | English |
---|---|
Pages (from-to) | 42-52 |
Number of pages | 11 |
Journal | Neurocomputing |
Volume | 254 |
Early online date | 3 Mar 2017 |
DOIs | |
Publication status | Published - 6 Sept 2017 |
Externally published | Yes |
Bibliographical note
This paper is an extension of our previous work. Compared to our previous work, we have added the following new contents in this paper: (1) the LGT is elaborated and compared theoretically in Section 3.2.1; (2) three existing strategies and a newly proposed nonparametric method of feature reduction are included in Section 3.2.2; (3) the experimental part is further extended by analyzing the performance with different topic numbers in Section 4.4.1; (4) four feature reduction methods are evaluated and compared on two datasets in Section 4.4.2; and (5) we have added more discussions on related studies in Section 2, and made many improvements on the introduction, method, experimental analysis, conclusion, and future research directions.Funding
The work described in this paper was fully supported by the National Natural Science Foundation of China (61502545, 61472453, U1401256, U1501252), the Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase), the Fundamental Research Funds for the Central Universities under Project 46000?31610009, a grant from the Soft Science Research Project of Guangdong Province (No. 2014A030304013), a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (UGC/FDS11/E06/14) and the Internal Research Grant (RG 66/2016-2017) of The Education University of Hong Kong.
Keywords
- Feature reduction
- First story detection
- Latent Dirichlet allocation
- polysemous
- Synonymous