Abstract
The presence of missing data is a common and pivotal issue, which generally leads to a serious decrease of data quality and thus indicates the necessity to effectively handle missing data. In this paper, we propose a missing value imputation approach driven by Fuzzy C-Mean clustering to improve the classification accuracy by referring only to the known feature values of some selected instances. In particular, the missing values for each instance are imputed by selecting a shorter interval based on the cluster membership value within the certain threshold limit of each feature, while using a short interval is considered to improve the imputation effectiveness and get more accurate estimation of the values in comparison with using a long interval. Our method is evaluated through comparing with state-of-the-art imputation methods on UCI datasets. The experimental results demonstrate that the proposed approach performs closely to or better than those state-of-the-art imputation methods.
Original language | English |
---|---|
Article number | 107230 |
Journal | Computers and Electrical Engineering |
Volume | 93 |
Early online date | 1 Jun 2021 |
DOIs | |
Publication status | Published - Jul 2021 |
Externally published | Yes |
Keywords
- Fuzzy C-Means clustering
- Incomplete data processing
- Missing value handling
- Missing value imputation