Potential Off-Grid User Prediction System Based on Spark
LI Xuebing1,3, SUN Ying1,2, ZHUANG Fuzhen1,2, HE Jia1,2, ZHANG Zhao1,2, ZHU Shijun4, and HE Qing1,2
( 1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China;
2. University of Chinese Academy of Sciences, Beijing 100049, China;
3. College of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China.
4. ZTE Corporation, Shenzhen 518057, China )
With the increasingly fierce competition among communication operators, it is more and more important to make an accurate prediction of potential off-grid users. To solve the above problem, it is inevitable to consider the effectiveness of learning algorithms, the efficiency of data processing, and other factors. Therefore, in this paper, we, from the practical application point of view, propose a potential customer off-grid prediction system based on Spark, including data pre-processing, feature selection, model building, and effective display. Furthermore, in the research of off-grid system, we use the Spark parallel framework to improve the gcForest algorithm which is a novel decision tree ensemble approach. The new parallel gcForest algorithm can be used to solve practical problems, such as the off-grid prediction problem. Experiments on two real-world datasets demonstrate that the proposed prediction system can handle large-scale data for the off-grid user prediction problem and the proposed parallel gcForest can achieve satisfying performance.
data mining; off-grid prediction; Spark; parallel computing; deep forest