Missing value imputation through shorter interval selection driven by Fuzzy C-Means clustering

Hufsa KHAN, Xizhao WANG, Han LIU*

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review

26 Citations (Scopus)

Abstract

The presence of missing data is a common and pivotal issue, which generally leads to a serious decrease of data quality and thus indicates the necessity to effectively handle missing data. In this paper, we propose a missing value imputation approach driven by Fuzzy C-Mean clustering to improve the classification accuracy by referring only to the known feature values of some selected instances. In particular, the missing values for each instance are imputed by selecting a shorter interval based on the cluster membership value within the certain threshold limit of each feature, while using a short interval is considered to improve the imputation effectiveness and get more accurate estimation of the values in comparison with using a long interval. Our method is evaluated through comparing with state-of-the-art imputation methods on UCI datasets. The experimental results demonstrate that the proposed approach performs closely to or better than those state-of-the-art imputation methods.

Original languageEnglish
Article number107230
JournalComputers and Electrical Engineering
Volume93
Early online date1 Jun 2021
DOIs
Publication statusPublished - Jul 2021
Externally publishedYes

Keywords

  • Fuzzy C-Means clustering
  • Incomplete data processing
  • Missing value handling
  • Missing value imputation

Fingerprint

Dive into the research topics of 'Missing value imputation through shorter interval selection driven by Fuzzy C-Means clustering'. Together they form a unique fingerprint.

Cite this