A novel hierarchical discourse model for scientific article and it's efficient Top-K resampling-based text classification approach

Min GAO, Chun-Hua CHEN*, Zhi-Han GAO, Wei-Long CHEN, Yuan REN, Sam KWONG, Zhi-Hui ZHAN*

*Corresponding author for this work

Research output: Book Chapters | Papers in Conference ProceedingsConference paper (refereed)Researchpeer-review

Abstract

Scientific articles contain rich knowledge that can significantly assists scientific research, but it is difficult to precisely extract knowledge information due to the complexity of the discourse structure of scientific articles. To provide more accurate scientific research knowledge for researchers in a specific academic domain, it is necessary to study the discourse structure of domain scientific articles and to propose an automatic annotation approach to automatically annotate discourse information from articles. Unfortunately, few works have studied the discourse structure of domain scientific articles and the corresponding automatic discourse annotation. To fill this gap, we take scientific articles of the wastewater-based epidemiology domain as a case to study how to automatically and efficiently annotate discourse information. This paper has three contributions. Firstly, we propose a hierarchical discourse model with two layers to cover all potential discourses in various domain scientific articles. Specifically, the first layer defines four core discourse concepts to describe the main process of a scientific research which can be applied in all scientific articles in various domains. The second layer defines fine-granular domain-specific structure, which can accurately describe the entire research contents of a specific domain. Secondly, based on the proposed model, we build a corpus dataset of 100 annotated scientific articles in the wastewater-based epidemiology domain. Thirdly, based on the model and dataset, we propose a simple yet efficient Top-K resampling-based approach to train a more effective classifier for automatic annotation. Extensive experiments verify the effectiveness and efficiency of our proposed hierarchical discourse model and the Top-K resampling-based classification approach.
Original languageEnglish
Title of host publicationProceedings of the 2022 IEEE International Conference on Systems, Man and Cybernetics (SMC)
PublisherIEEE
Pages774-781
Number of pages8
ISBN (Print)9781665452588
DOIs
Publication statusPublished - Oct 2022
Externally publishedYes
Event2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022 - Prague, Czech Republic
Duration: 9 Oct 202212 Oct 2022

Conference

Conference2022 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2022
Country/TerritoryCzech Republic
CityPrague
Period9/10/2212/10/22

Funding

This work was supported in part by the National Key Research and Development Program of China under Grant 2019YFB2102102, in part by the National Natural Science Foundations of China (NSFC) under Grant 62176094 and Grant 61873097, in part by the Key-Area Research and Development of Guangdong Province under Grant 2020B010166002, and in part by the Guangdong Natural Science Foundation Research Team under Grants 2018B030312003.

Keywords

  • automatic annotation
  • discourse
  • scientific articles
  • text classification

Fingerprint

Dive into the research topics of 'A novel hierarchical discourse model for scientific article and it's efficient Top-K resampling-based text classification approach'. Together they form a unique fingerprint.

Cite this