Abstract
Identifying and segmenting various kinds of highly profitable customers is a critical issue for telecom enterprises. However, the continual increase in the dimension and the volume of data makes traditional approaches inefficient and even unfeasible. To overcome these problems, a novel statistically motivated parallel large sum submatrix biclustering algorithm based on Spark MapReduce (SP-PLSS) is proposed in this paper. Different from traditional approaches, the SP-PLSS is driven by a newly proposed bicluster model, and clusters both customer samples and consumer attributes simultaneously so that it could finely identify and segment the highly profitable customers who share similarly upscale purchasing behavior on a small fraction of attributes. Furthermore, with the implementation of the MapReduce framework on a Spark platform, the SP-PLSS significantly improves the efficiency and scalability of handling the large dataset. The extensive experiments on a real-world telecom consumption data and synthetic large datasets show that, in comparison with other competing algorithms, the SP-PLSS could provide operators with a comparatively advanced, scalable, and feasible solution in identifying and segmenting highly profitable telecom customers with superior clustering results.
Original language | English |
---|---|
Article number | 8656467 |
Pages (from-to) | 28696-28711 |
Number of pages | 16 |
Journal | IEEE Access |
Volume | 7 |
DOIs | |
Publication status | Published - 1 Mar 2019 |
Externally published | Yes |
Bibliographical note
This work was supported in part by the National Natural Science Foundation of China under Grant 81871433 and Grant 71371063, and in part by the Basic Research Project of Knowledge Innovation Program in Shenzhen under Grant JCYJ20150324140036825.Keywords
- Biclustering
- cloud computing
- clustering effectiveness evaluation spark
- MapReduce
- market segmentation parallel computing