Abstract
Neural bag-of-words models (NBOW) have achieved great success in text classification. They compute a sentence or document representation by mathematical operations such as simply adding and averaging over the word embedding of each sequence element. Thus, NBOW models have few parameters and require low computation cost. Intuitively, considering the important degree of each word and the word-order information for text classification are beneficial to obtain informative sentence or document representation. However, NBOW models hardly consider the above two factors when generating a sentence or document representation. Meanwhile, term weighting schemes assigning relatively high weight values to important words have exhibited successful performance in traditional bag-of-words models. However, it is still seldom used in neural models. In addition, n-grams capture word-order information in short context. In this paper, we propose a model called weighted word embedding model (WWEM). It is a variant of NBOW model introducing term weighting schemes and n-grams. Our model generates informative sentence or document representation considering the important degree of words and the word-order information. We compare our proposed model with other popular neural models on five datasets in text classification. The experimental results show that our proposed model exhibits comparable or even superior performance.
Original language | English |
---|---|
Title of host publication | Database Systems for Advanced Applications - 24th International Conference, DASFAA 2019, Proceedings |
Editors | Guoliang LI, Jun YANG, Joao GAMA, Juggapong NATWICHAI, Yongxin TONG |
Place of Publication | Switzerland |
Publisher | Springer Nature Switzerland AG |
Pages | 419-434 |
Number of pages | 16 |
ISBN (Electronic) | 9783030185763 |
ISBN (Print) | 9783030185756 |
DOIs | |
Publication status | Published - 2019 |
Externally published | Yes |
Event | 24th International Conference on Database Systems for Advanced Applications - Chiang Mai, Thailand Duration: 22 Apr 2019 → 25 Apr 2019 https://dasfaa2019.eng.cmu.ac.th/ |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11446 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 24th International Conference on Database Systems for Advanced Applications |
---|---|
Abbreviated title | DASFAA 2019 |
Country/Territory | Thailand |
Period | 22/04/19 → 25/04/19 |
Internet address |
Bibliographical note
This work was supported by the Fundamental Research Funds for the Central Universities, SCUT (No. 2017ZD048, D2182480), the Tiptop Scientific and Technical Innovative Youth Talents of Guangdong special support program (No. 2015-TQ01X633), the Science and Technology Planning Project of Guangdong Province (No. 2017B050506004), the Science and Technology Program of Guangzhou International Science & Technology Cooperation Program (No. 201704030076). The research described in this paper has been supported by a collaborative research grant from the Hong Kong Research Grants Council (project No. C1031-18G).Keywords
- N-grams
- Neural bag-of-words models
- Term weighting schemes
- Text classification