Application of machine learning algorithms for clinical predictive modeling and missing data imputation

  • Zeyi FAN (Speaker)

Activity: Talks or PresentationsOther Invited Talks or Presentations


According to the World Health Organization (WHO; 2019), the number of people aged 60 years and older was 1 billion in 2019. And this number will increase to 1.4 billion by 2030 and 2.1 billion by 2050. The rapid aging of populations around the world presents an unprecedented set of challenges: increased expenditure on health for individuals, medical resource planning for hospitals, and potential problems with the fiscal burden for the government. Nevertheless, hospitalization and nursing home use are the most expensive services used by older persons and may cause economic burdens. Therefore, it will be costeffective to design a model to identify high-risk patients and assist them in the decisionmaking process of choosing the right treatment.

It has been proved that multimodal data streams have the capacity to give more accurate predictions. Therefore, in the first study, we propose a multimodal machine learning model for early risk stratification of mortality in geriatric patients. The model will use an improved graph convolution network to deal with the highly multi-relational data characteristic and adjust the weights of clinical features based on a patient’s current health condition and demographics. The extracted features from medical code, clinical biomarkers, and demographics are further passed through a fusion classifier for the final prediction.

In addition, missing data is a common problem for prediction models. Some clinical entries are unmeasured or unknown for different patients and they may only be available at irregular intervals. In the second study, we proposed a machine learning model on a MIMIC-III data set for missing data imputation. The model calculates patients’ medical code and demographics similarities and uses the similarities to impute data. The design principles tend to inform future efforts to better model clinical missing data and optimal knowledge derivation from clinical data.
Period15 May 2023
Event titlePostgraduate Seminar Series
Event typePublic Lecture