TY - GEN

T1 - M2R : From Mathematical Models to Resource Description Framework

AU - ZOU, Chenxin

AU - LI, Xiaodong

AU - WU, Pangjing

AU - XIE, Haoran

N1 - This work was supported in part by the National Natural Science Foundation of China under Grant No. 61602149, and in part by the Fundamental Research Funds for the Central Universities, China under Grant No. B210202078.

PY - 2023/2/10

Y1 - 2023/2/10

N2 - Domain-specific knowledge graphs usually have requirements for deeper and more accurate knowledge. Existing knowledge graphs in academics mainly focus on authors, abstracts, keywords, and citations, which help explore themes of papers and analyze relationships between different papers. However, these contents are summarizations and only reveal shallow meanings, not involving cores of scientific papers. Mathematical models, ignored by existing knowledge graphs, are what authors really want to express through papers. Knowledge from mathematical models makes it possible to use knowledge graphs for mathematical derivation, not just literal reasoning. To model this knowledge, we propose a knowledge graph construction framework, named M2R, from Mathematical Models to Resource Description Framework. Mathematical models are usually described in formulae. We first identify formula positions according to pre-defined rules and find out contexts explaining variables in the formulae. Next, we split the formulae and related contexts from PDF papers in the form of images, and employ optical character recognition to identify image contents. Then, regular expressions designed based on sentence patterns are used to extract variable symbols and variable explanations. Finally, the formulae are regarded as relations between the variables to form triples whose subjects and objects are the variables, and predicates are the formulae. Similar triples are fused to generate a final knowledge graph. Experimental results demonstrate that precision of the formula extraction is up to 76.97%. Besides, a convincing case study shows that we can effectively extract formulae and related variables, and construct a knowledge graph about mathematical models of scientific papers.

AB - Domain-specific knowledge graphs usually have requirements for deeper and more accurate knowledge. Existing knowledge graphs in academics mainly focus on authors, abstracts, keywords, and citations, which help explore themes of papers and analyze relationships between different papers. However, these contents are summarizations and only reveal shallow meanings, not involving cores of scientific papers. Mathematical models, ignored by existing knowledge graphs, are what authors really want to express through papers. Knowledge from mathematical models makes it possible to use knowledge graphs for mathematical derivation, not just literal reasoning. To model this knowledge, we propose a knowledge graph construction framework, named M2R, from Mathematical Models to Resource Description Framework. Mathematical models are usually described in formulae. We first identify formula positions according to pre-defined rules and find out contexts explaining variables in the formulae. Next, we split the formulae and related contexts from PDF papers in the form of images, and employ optical character recognition to identify image contents. Then, regular expressions designed based on sentence patterns are used to extract variable symbols and variable explanations. Finally, the formulae are regarded as relations between the variables to form triples whose subjects and objects are the variables, and predicates are the formulae. Similar triples are fused to generate a final knowledge graph. Experimental results demonstrate that precision of the formula extraction is up to 76.97%. Besides, a convincing case study shows that we can effectively extract formulae and related variables, and construct a knowledge graph about mathematical models of scientific papers.

U2 - 10.1007/978-3-031-25198-6_18

DO - 10.1007/978-3-031-25198-6_18

M3 - Conference paper (refereed)

SN - 9783031251979

T3 - Lecture Notes in Computer Science

SP - 225

EP - 238

BT - Web and Big Data : 6th International Joint Conference, APWeb-WAIM 2022, Nanjing, China, November 25–27, 2022, Proceedings, Part II

A2 - LI, Bohan

A2 - YUE, Lin

A2 - TAO, Chuanqi

A2 - HAN, Xuming

A2 - CALVANESE, Diego

A2 - AMAGASA, Toshiyuki

PB - Springer, Cham

ER -