Weighted N-grams CNN for Text Classification

Zequan ZENG, Yi CAI*, Fu Lee WANG, Haoran XIE, Junying CHEN

*Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)

Abstract

Text categorization can solve the problem of information clutter to a large extent, and it also provides a more efficient search strategy and more effective search results for information retrieval. In recent years, Convolutional Neural Networks have been widely applied to this task. However, most existing CNN models are difficult to extract longer n-grams features for the reason as follow: the parameters of the standard CNN model will increase with the increase of the length of n-grams features because it extracts n-grams features through convolution filters of fixed window size. Meanwhile, the term weighting schemes assigning reasonable weight values to words have exhibited excellent performance in traditional bag-of-words models. Intuitively, considering the weight value of each word in n-grams features may be beneficial in text classification. In this paper, we proposed a model called weighted n-grams CNN model. It is a variant of CNN introducing a weighted n-grams layer. The parameters of the weighted n-grams layer are initialized by term weighting schemes. Only by adding fixed parameters can the model generate any length of weighted n-grams features. We compare our proposed model with other popular and latest CNN models on five datasets in text classification. The experimental results show that our proposed model exhibits comparable or even superior performance.
Original languageEnglish
Title of host publicationInformation Retrieval Technology - 15th Asia Information Retrieval Societies Conference, AIRS 2019, Proceedings
EditorsFu Lee WANG, Haoran XIE, Wai LAM, Aixin SUN, Lun-Wei KU, Tainyong HAO, Wei CHEN, Tak-Lam WONG, Xiaohui TAO
PublisherSpringer, Cham
Chapter14
Pages158-169
Number of pages12
ISBN (Electronic)9783030428358
ISBN (Print)9783030428341
DOIs
Publication statusE-pub ahead of print - 27 Feb 2020
EventThe 15th Asia Information Retrieval Societies Conference - Open University of Hong Kong, Hong Kong, Hong Kong
Duration: 7 Nov 20199 Nov 2019
http://airs2019.ouhk.edu.hk/

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12004 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceThe 15th Asia Information Retrieval Societies Conference
Abbreviated titleAIRS2019
CountryHong Kong
CityHong Kong
Period7/11/199/11/19
Internet address

Bibliographical note

This work was supported by the Fundamental Research Funds for the Central Universities, SCUT (No. 2017ZD048, D2182480), the Science and Technology Planning Project of Guangdong Province (No. 2017B050506004), the Science and Technology Programs of Guangzhou (No. 201704030076, 201707010223, 201802010027, 201902010046).

Keywords

  • CNN model
  • Text classification
  • Weighted n-grams features

Fingerprint Dive into the research topics of 'Weighted N-grams CNN for Text Classification'. Together they form a unique fingerprint.

  • Cite this

    ZENG, Z., CAI, Y., WANG, F. L., XIE, H., & CHEN, J. (2020). Weighted N-grams CNN for Text Classification. In F. L. WANG, H. XIE, W. LAM, A. SUN, L-W. KU, T. HAO, W. CHEN, T-L. WONG, & X. TAO (Eds.), Information Retrieval Technology - 15th Asia Information Retrieval Societies Conference, AIRS 2019, Proceedings (pp. 158-169). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12004 LNCS). Springer, Cham. https://doi.org/10.1007/978-3-030-42835-8_14