分布式深度学习系统网络通信优化技术

发布时间:2020-10-22 作者:董德尊, 欧阳硕 阅读量:

 

分布式深度学习系统网络通信优化技术
 
董德尊, 欧阳硕
(国防科技大学,中国 长沙 410073 )
 
摘要:针对分布式深度学习系统网络通信的全协议栈定制优化问题,提出了一种分布式深度学习系统的网络通信优化技术的分类方法。从网络协议栈层次的角度,分析了通信流量调度和网络通信执行的关键技术;自顶向下地从算法层面和网络层面分别讨论了分布式深度学习通信瓶颈优化的几种基本技术途径,并展望其未来发展的机遇与挑战。
关键词:分布式深度学习系统;通信优化;全协议栈


Optimization Techniques of Network Communication in Distributed Deep Learning Systems
 
DONG Dezun, OUYANG Shuo
(National University of Defense Technology, Changsha 410073, China )
 
Abstract:Aiming at optimizing the full protocol stack of the network communication in distributed deep learning systems (DDLS), a classification method of the network communication optimization techniques in DDLS is proposed. From the perspective of the entire network protocol stack, the key techniques of communication traffic scheduling and network implementation in DDLS are analyzed. Some basic techniques of bottleneck optimization of distributed deep learning communication from algorithm level and network level are discussed, and future research opportunities and challenges are identified.
Keywords: distributed deep learning systems; communication optimization; full protocol stack

 

在线PDF浏览: PDF