地理分布式机器学习:超越局域的框架与技术

发布时间:2020-10-22 作者:李宗航, 虞红芳, 汪漪 阅读量:

 

 

地理分布式机器学习:超越局域的框架与技术
 
李宗航1, 虞红芳1,3, 汪漪2,3
(1. 电子科技大学,中国 成都,611731; 2. 南方科技大学,中国 深圳,518055; 3. 鹏城实验室,中国 深圳,518055 )
 
摘要:提出了一种面向地理分布式机器学习的软件框架GeoMX,该框架从通信架构和压缩传输机制两方面着手优化通信。对应设计了分层参数服务器(HiPS)架构和双向稀疏梯度传输(BiSparse)技术,旨在分别减少广域传输的梯度流数量和流大小。GeoMX在跨广域分布的数据中心上最高可取得4倍于数据中心内MXNET的训练效率,且几乎无精度损失。
关键词:大数据;人工智能;地理分布式机器学习;梯度稀疏化


Geo-Distributed Machine Learning: Framework and Technology Exceeding LAN Speed
 
LI Zonghang1, YU Hongfang1,3,WANG Yi2,3
(1. University of Electronic Science and Technology of China, Chengdu 611731, China; 2. Southern University of Science and Technology, Shenzhen 518055, China; 3. Peng Cheng Laboratory, Shenzhen 518055, China )
 
Abstract:A software framework, called GeoMX, is proposed for geo-distributed machine learning. GeoMX improves communication efficiency in terms of architecture and compression, and accordingly hierarchical parameter server (HiPS) architecture and bi-directional sparsification (BiSparse) technology are designed to reduce the number and size of gradients transmitted via wide area network (WAN) respectively. In the experiments, GeoMX is deployed on multiple data centers distributed across WAN, while MXNET is deployed in a data center within local area network (LAN). The results show that GeoMX is up to 4 times faster than MXNET with little loss of accuracy.
Keywords: big data; artificial intelligence; geo-distributed machine learning; gradient sparsification

 

 

在线PDF浏览: PDF