The k-nearest neighbor method is a classifier based on the evaluation of the distances to each pattern in the training set. The edited version of this method consists of the application of this classifier with a subset of the complete training set in which some of the training patterns are excluded, in order to reduce the classification error rate. In recent works, genetic algorithms have been successfully applied to determine which patterns must be included in the edited subset. In this paper we propose a novel implementation of a genetic algorithm for designing edited k-nearest neighbor classifiers. It includes the definition of a novel mean square error based fitness function, a novel clustered crossover technique, and the proposal of a fast smart mutation scheme. In order to evaluate the performance of the proposed method, results using the breast cancer database, the diabetes database and the letter recognition database from the UCI machine learning benchmark repository have been included. Both error rate and computational cost have been considered in the analysis. Obtained results show the improvement achieved by the proposed editing method. © 2008 World Scientific Publishing Company.
Bibliographical noteThis work has been both partially funded by the Comunidad de Madrid/Universidad de Alcalá (CCG07-UAH/TIC-1572) and the Spanish Ministry of Education and Science (TEC2006-13883-C04-04/TCM) and the Fund for Foreign Scholars in University Research and Teaching Programs (Grant no. B07033) in China.
- Evolutionary algorithms
- Genetic algorithms
- Machine learning
- Nearest neighbour classifiers