Abstract
In recent years, the means of disease diagnosis and treatment have been improved remarkably, along with the continuous development of technology and science. Researchers have spent tremendous time and effort to build models that aim to assist medical practitioners in decision-making support. However, one of the greatest challenges remains how to identify the connection between different diseases. This study aims to discover the relationship between diseases and symptoms to predict potential diseases for patients. Considering it a multi-label classification problem, the study proposed a new multi-disease prediction model learning from NHANES, an extensive health related dataset, and MEDLINE, a corpus with medical domain knowledge. A heterogeneous information graph is firstly constructed and then populated using medical domain knowledge discovered from MEDLINE. The knowledge graph is analysed for clarification of the relevancy within nodes in positive or negative space, helping to access to the correlation amongst multiple diseases and their symptoms. A multi-label disease prediction model is then developed adopting the medical domain knowledge graph. Empirical experiments are conducted to evaluate the proposed model. The experimental results show that the performance of the proposed model surpassed state-of-the-art related works representing the mainstreams of multi-label classification. This study contributes to the medical community with a novel model for multi-disease prediction and represents a new endeavour on multi-label classification using knowledge graphs.
Original language | English |
---|---|
Article number | 107662 |
Number of pages | 15 |
Journal | Knowledge-Based Systems |
Volume | 235 |
Early online date | 2 Nov 2021 |
DOIs | |
Publication status | Published - 10 Jan 2022 |
Bibliographical note
Publisher Copyright:© 2021 Elsevier B.V.
Funding
The work is conducted with approval from the Human Research Ethics Committee of the University of Southern Queensland, Australia (Approval ID: H18REA049). The authors acknowledge the use of the National Health and Nutrition Examination Survey (NHANES) in the study and especially, thank the Centers for Disease Control and Prevention of the Department of Health and Human Services, the United States for making the dataset available for research purpose. Thanks also go to the U.S. National Library of Medicine for allowing the use of MEDLINE. The authors also thank numerous colleagues including Dr. Leonard Tan and Dr. Patrick Delaney for their time helping to proofread the paper.
Keywords
- Disease prediction
- Knowledge graph
- MEDLINE
- Medicine domain knowledge
- Multi-label classification
- NHANES