A weighted word embedding model for text classification

Haopeng REN, ZeQuan ZENG, Yi CAI*, Qing DU, Qing LI, Haoran XIE

*Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Research

12 Citations (Scopus)

Abstract

Neural bag-of-words models (NBOW) have achieved great success in text classification. They compute a sentence or document representation by mathematical operations such as simply adding and averaging over the word embedding of each sequence element. Thus, NBOW models have few parameters and require low computation cost. Intuitively, considering the important degree of each word and the word-order information for text classification are beneficial to obtain informative sentence or document representation. However, NBOW models hardly consider the above two factors when generating a sentence or document representation. Meanwhile, term weighting schemes assigning relatively high weight values to important words have exhibited successful performance in traditional bag-of-words models. However, it is still seldom used in neural models. In addition, n-grams capture word-order information in short context. In this paper, we propose a model called weighted word embedding model (WWEM). It is a variant of NBOW model introducing term weighting schemes and n-grams. Our model generates informative sentence or document representation considering the important degree of words and the word-order information. We compare our proposed model with other popular neural models on five datasets in text classification. The experimental results show that our proposed model exhibits comparable or even superior performance.
Original languageEnglish
Title of host publicationDatabase Systems for Advanced Applications - 24th International Conference, DASFAA 2019, Proceedings
EditorsGuoliang LI, Jun YANG, Joao GAMA, Juggapong NATWICHAI, Yongxin TONG
Place of PublicationSwitzerland
PublisherSpringer Nature Switzerland AG
Pages419-434
Number of pages16
ISBN (Electronic)9783030185763
ISBN (Print)9783030185756
DOIs
Publication statusPublished - 2019
Externally publishedYes
Event24th International Conference on Database Systems for Advanced Applications - Chiang Mai, Thailand
Duration: 22 Apr 201925 Apr 2019
https://dasfaa2019.eng.cmu.ac.th/

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11446 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference24th International Conference on Database Systems for Advanced Applications
Abbreviated titleDASFAA 2019
Country/TerritoryThailand
Period22/04/1925/04/19
Internet address

Bibliographical note

This work was supported by the Fundamental Research Funds for the Central Universities, SCUT (No. 2017ZD048, D2182480), the Tiptop Scientific and Technical Innovative Youth Talents of Guangdong special support program (No. 2015-TQ01X633), the Science and Technology Planning Project of Guangdong Province (No. 2017B050506004), the Science and Technology Program of Guangzhou International Science & Technology Cooperation Program (No. 201704030076). The research described in this paper has been supported by a collaborative research grant from the Hong Kong Research Grants Council (project No. C1031-18G).

Keywords

  • N-grams
  • Neural bag-of-words models
  • Term weighting schemes
  • Text classification

Fingerprint

Dive into the research topics of 'A weighted word embedding model for text classification'. Together they form a unique fingerprint.

Cite this