TY - GEN
T1 - M2R : From Mathematical Models to Resource Description Framework
AU - ZOU, Chenxin
AU - LI, Xiaodong
AU - WU, Pangjing
AU - XIE, Haoran
N1 - This work was supported in part by the National Natural Science Foundation of China under Grant No. 61602149, and in part by the Fundamental Research Funds for the Central Universities, China under Grant No. B210202078.
PY - 2023/2/10
Y1 - 2023/2/10
N2 - Domain-specific knowledge graphs usually have requirements for deeper and more accurate knowledge. Existing knowledge graphs in academics mainly focus on authors, abstracts, keywords, and citations, which help explore themes of papers and analyze relationships between different papers. However, these contents are summarizations and only reveal shallow meanings, not involving cores of scientific papers. Mathematical models, ignored by existing knowledge graphs, are what authors really want to express through papers. Knowledge from mathematical models makes it possible to use knowledge graphs for mathematical derivation, not just literal reasoning. To model this knowledge, we propose a knowledge graph construction framework, named M2R, from Mathematical Models to Resource Description Framework. Mathematical models are usually described in formulae. We first identify formula positions according to pre-defined rules and find out contexts explaining variables in the formulae. Next, we split the formulae and related contexts from PDF papers in the form of images, and employ optical character recognition to identify image contents. Then, regular expressions designed based on sentence patterns are used to extract variable symbols and variable explanations. Finally, the formulae are regarded as relations between the variables to form triples whose subjects and objects are the variables, and predicates are the formulae. Similar triples are fused to generate a final knowledge graph. Experimental results demonstrate that precision of the formula extraction is up to 76.97%. Besides, a convincing case study shows that we can effectively extract formulae and related variables, and construct a knowledge graph about mathematical models of scientific papers.
AB - Domain-specific knowledge graphs usually have requirements for deeper and more accurate knowledge. Existing knowledge graphs in academics mainly focus on authors, abstracts, keywords, and citations, which help explore themes of papers and analyze relationships between different papers. However, these contents are summarizations and only reveal shallow meanings, not involving cores of scientific papers. Mathematical models, ignored by existing knowledge graphs, are what authors really want to express through papers. Knowledge from mathematical models makes it possible to use knowledge graphs for mathematical derivation, not just literal reasoning. To model this knowledge, we propose a knowledge graph construction framework, named M2R, from Mathematical Models to Resource Description Framework. Mathematical models are usually described in formulae. We first identify formula positions according to pre-defined rules and find out contexts explaining variables in the formulae. Next, we split the formulae and related contexts from PDF papers in the form of images, and employ optical character recognition to identify image contents. Then, regular expressions designed based on sentence patterns are used to extract variable symbols and variable explanations. Finally, the formulae are regarded as relations between the variables to form triples whose subjects and objects are the variables, and predicates are the formulae. Similar triples are fused to generate a final knowledge graph. Experimental results demonstrate that precision of the formula extraction is up to 76.97%. Besides, a convincing case study shows that we can effectively extract formulae and related variables, and construct a knowledge graph about mathematical models of scientific papers.
U2 - 10.1007/978-3-031-25198-6_18
DO - 10.1007/978-3-031-25198-6_18
M3 - Conference paper (refereed)
SN - 9783031251979
T3 - Lecture Notes in Computer Science
SP - 225
EP - 238
BT - Web and Big Data : 6th International Joint Conference, APWeb-WAIM 2022, Nanjing, China, November 25–27, 2022, Proceedings, Part II
A2 - LI, Bohan
A2 - YUE, Lin
A2 - TAO, Chuanqi
A2 - HAN, Xuming
A2 - CALVANESE, Diego
A2 - AMAGASA, Toshiyuki
PB - Springer, Cham
ER -