In this research, we have implemented a parallel EP on consumer-level graphics processing units and proposed indirect indexing and many optimization skills to achieve maximal efficiency. The parallel EP is a hybrid of masterslave and fine-grained models. Competition and selection are performed by CPU (i.e. the master) while fitness evaluation, mutation, and reproduction are performed by GPU which is essentially a massively parallel machine with shared memory. Unlike other fine-grained parallel computers such as Maspar, GPU allows processors to communicate with, not only nearby processors, but also any other processors. Hence more flexible fine-grained EAs can be implemented on GPU. We have done experiments to compare our parallel EP on GPU and an ordinary EP on CPU. It is found that the speed-up factor of our parallel EP ranges from 1.25 to 5.02, when the population size is large enough. Moreover, there is a sub-linear relation between the population size and the execution time. Thus, our parallel EP will be very useful for solving difficult problems that require huge population sizes. For future work, we plan to implement a parallel genetic algorithm on GPU and compare it with the approach reported in this paper.