Abstract
Domain-specific knowledge graphs usually have requirements for deeper and more accurate knowledge. Existing knowledge graphs in academics mainly focus on authors, abstracts, keywords, and citations, which help explore themes of papers and analyze relationships between different papers. However, these contents are summarizations and only reveal shallow meanings, not involving cores of scientific papers. Mathematical models, ignored by existing knowledge graphs, are what authors really want to express through papers. Knowledge from mathematical models makes it possible to use knowledge graphs for mathematical derivation, not just literal reasoning. To model this knowledge, we propose a knowledge graph construction framework, named M2R, from Mathematical Models to Resource Description Framework. Mathematical models are usually described in formulae. We first identify formula positions according to pre-defined rules and find out contexts explaining variables in the formulae. Next, we split the formulae and related contexts from PDF papers in the form of images, and employ optical character recognition to identify image contents. Then, regular expressions designed based on sentence patterns are used to extract variable symbols and variable explanations. Finally, the formulae are regarded as relations between the variables to form triples whose subjects and objects are the variables, and predicates are the formulae. Similar triples are fused to generate a final knowledge graph. Experimental results demonstrate that precision of the formula extraction is up to 76.97%. Besides, a convincing case study shows that we can effectively extract formulae and related variables, and construct a knowledge graph about mathematical models of scientific papers.
Original language | English |
---|---|
Title of host publication | Web and Big Data : 6th International Joint Conference, APWeb-WAIM 2022, Nanjing, China, November 25–27, 2022, Proceedings, Part II |
Editors | Bohan LI, Lin YUE, Chuanqi TAO, Xuming HAN, Diego CALVANESE, Toshiyuki AMAGASA |
Publisher | Springer, Cham |
Chapter | 18 |
Pages | 225-238 |
Number of pages | 14 |
ISBN (Print) | 9783031251979 |
DOIs | |
Publication status | Published - 10 Feb 2023 |
Event | 6th International Joint Conference on Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM), APWeb-WAIM 2022 - Nanjing, China Duration: 25 Nov 2022 → 27 Nov 2022 |
Publication series
Name | Lecture Notes in Computer Science |
---|---|
Volume | 13422 |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 6th International Joint Conference on Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM), APWeb-WAIM 2022 |
---|---|
Country/Territory | China |
City | Nanjing |
Period | 25/11/22 → 27/11/22 |
Bibliographical note
Funding Information:Acknowledgments.. This work was supported in part by the National Natural Science Foundation of China under Grant No. 61602149, and in part by the Fundamental Research Funds for the Central Universities, China under Grant No. B210202078.
Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Keywords
- Formulae
- Knowledge graph construction
- Mathematical models
- Scientific papers
- Variables