TY - GEN
T1 - A bibliometric analysis of natural language processing in medical research
AU - CHEN, Xieling
AU - XIE, Haoran
AU - WANG, Fu Lee
AU - LIU, Ziqing
AU - XU, Juan
AU - HAO, Tianyong
N1 - Publication of the article is supported by grants from National Natural Science Foundation of China (No.61772146), Research Grants Council of Hong Kong Special Administrative Region, China (UGC/FDS11/E04/16), and Innovative School Project in Higher Education of Guangdong Province (No.YQ2015062).
PY - 2018/3/22
Y1 - 2018/3/22
N2 - Background: Natural language processing (NLP) has become an increasingly significant role in advancing medicine. Rich research achievements of NLP methods and applications for medical information processing are available. It is of great significance to conduct a deep analysis to understand the recent development of NLP-empowered medical research field. However, limited study examining the research status of this field could be found. Therefore, this study aims to quantitatively assess the academic output of NLP in medical research field. Methods: We conducted a bibliometric analysis on NLP-empowered medical research publications retrieved from PubMed in the period 2007-2016. The analysis focused on three aspects. Firstly, the literature distribution characteristics were obtained with a statistics analysis method. Secondly, a network analysis method was used to reveal scientific collaboration relations. Finally, thematic discovery and evolution was reflected using an affinity propagation clustering method. Results: There were 1405 NLP-empowered medical research publications published during the 10 years with an average annual growth rate of 18.39%. 10 most productive publication sources together contributed more than 50% of the total publications. The USA had the highest number of publications. A moderately significant correlation between country's publications and GDP per capita was revealed. Denny, Joshua C was the most productive author. Mayo Clinic was the most productive affiliation. The annual co-affiliation and co-country rates reached 64.04% and 15.79% in 2016, respectively. 10 main great thematic areas were identified including Computational biology, Terminology mining, Information extraction, Text classification, Social medium as data source, Information retrieval, etc. Conclusions: A bibliometric analysis of NLP-empowered medical research publications for uncovering the recent research status is presented. The results can assist relevant researchers, especially newcomers in understanding the research development systematically, seeking scientific cooperation partners, optimizing research topic choices and monitoring new scientific or technological activities.
AB - Background: Natural language processing (NLP) has become an increasingly significant role in advancing medicine. Rich research achievements of NLP methods and applications for medical information processing are available. It is of great significance to conduct a deep analysis to understand the recent development of NLP-empowered medical research field. However, limited study examining the research status of this field could be found. Therefore, this study aims to quantitatively assess the academic output of NLP in medical research field. Methods: We conducted a bibliometric analysis on NLP-empowered medical research publications retrieved from PubMed in the period 2007-2016. The analysis focused on three aspects. Firstly, the literature distribution characteristics were obtained with a statistics analysis method. Secondly, a network analysis method was used to reveal scientific collaboration relations. Finally, thematic discovery and evolution was reflected using an affinity propagation clustering method. Results: There were 1405 NLP-empowered medical research publications published during the 10 years with an average annual growth rate of 18.39%. 10 most productive publication sources together contributed more than 50% of the total publications. The USA had the highest number of publications. A moderately significant correlation between country's publications and GDP per capita was revealed. Denny, Joshua C was the most productive author. Mayo Clinic was the most productive affiliation. The annual co-affiliation and co-country rates reached 64.04% and 15.79% in 2016, respectively. 10 main great thematic areas were identified including Computational biology, Terminology mining, Information extraction, Text classification, Social medium as data source, Information retrieval, etc. Conclusions: A bibliometric analysis of NLP-empowered medical research publications for uncovering the recent research status is presented. The results can assist relevant researchers, especially newcomers in understanding the research development systematically, seeking scientific cooperation partners, optimizing research topic choices and monitoring new scientific or technological activities.
KW - Bibliometrics
KW - Medical
KW - Natural language processing
KW - Scientific collaboration
KW - Statistical characteristics
KW - Thematic discovery and evolution
UR - http://www.scopus.com/inward/record.url?scp=85044184084&partnerID=8YFLogxK
U2 - 10.1186/s12911-018-0594-x
DO - 10.1186/s12911-018-0594-x
M3 - Conference paper (refereed)
C2 - 29589569
AN - SCOPUS:85044184084
T3 - BMC Medical Informatics and Decision Making
BT - BMC Medical Informatics and Decision Making (Vol. 18, Supp. 1, 2018) : Proceedings from the 3rd China Health Information Processing Conference (CHIP 2017)
A2 - TANG, Buzhou
A2 - CHEN, Qingcai
A2 - HAO, Tianyong
A2 - HE, Zhe
PB - BioMed Central Ltd.
T2 - 3rd China Health Information Processing Conference
Y2 - 24 November 2017 through 25 November 2017
ER -