深度神经网络并行化研究综述

资源描述

《深度神经网络并行化研究综述》由会员分享，可在线阅读，更多相关《深度神经网络并行化研究综述（23页珍藏版）》请在金锄头文库上搜索。

1、计算机学报 2018 年在线发布 CHINESE JOURNAL OF COMPUTERS 2018Online 深度神经网络并行化研究综述深度神经网络并行化研究综述朱虎明1) 李佩1) 焦李成1) 杨淑媛1) 侯彪1) 1)（西安电子科技大学智能感知与图像理解教育部重点实验室、智能感知与计算国际联合研究中心、智能感知与计算国际合作联合实验室西安 710071）摘要神经网络是人工智能领域的核心研究内容之一。七十年发展历史中，神经网络经历了从浅层神经网络到深度神经网络的重要变革。深度神经网络通过增加模型深度来提高其特征提取和数据拟合的能力，在自然语言处理、自动驾驶、图像分析等

2、问题上相较浅层模型具有显著优势。随着训练数据规模的增加和模型的日趋复杂，深度神经网络的训练成本越来越高，并行化成为增强其应用时效性的迫切需求。近年来计算平台的硬件架构更新迭代和计算能力飞速提高，特别是多核众核以及分布式异构计算平台发展迅速，为深度神经网络的并行化提供了硬件基础；另一方面，日趋丰富的并行编程框架也为计算设备和深度神经网络的并行化架起了桥梁。本文首先介绍了深度神经网络发展背景和常用的计算模型，然后对多核处理器、众核处理器和异构计算设备分别从功耗、计算能力、并行算法的开发难度等角度进行对比分析，对并行编程框架分别从支持的编程语言和硬件设备、编程难度等角度进行阐述。然后以 Al

3、exNet 为例简要说明了深度神经网络模型并行和数据并行两种方法的实施过程。接下来，从支持硬件、并行接口、并行模式等角度比较了常用的深度神经网络开源软件，并且实验比较和分析了卷积神经网络在多核 CPU 和 GPU 上的并行性能。最后，对并行深度神经网络的未来发展趋势和面临的挑战进行展望。关键词深度神经网络；并行计算；异构计算；模型并行；数据并行中图法分类号 TP311 Review of Parallel Deep Neural Network ZHU Hu-Ming1) LI Pei1) JIAO Li-Cheng1) YANG Shu-Yuan1) HOU Biao1) 1) ( K

4、ey Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, International Research Center of Intelligent Perception Computation, International Collaboration Joint Lab in Intelligent Perception and Computation, Xidian University, Xian 710071) Abstract Neural network is o

5、ne of the main fields of research in artificial intelligence. In recent seventy years, the development of neural network has experienced two important stages: shallow neural network and deep neural network. Overfitting usually occurs when shallow neural networks have been used to solve very complex

6、problems. Compared with shallow neural network, deep neural network has obvious advantages in feature extraction and data fitting while becoming deeper. In the meanwhile, deep neural network has been widely applied in many industry areas, such as nature language processing, automatic drive and image

7、 analysis and so on. However, the training cost of deep neural network grows with the increasing the training data size and complexity of neural network models. And parallelization has become a necessity to reduce the training time of 本课题得到国家自然科学基金（61303032）、国家“九七三”重点基础研究发展计划项目基金（2013CB329402）、国家自

8、然科学基金重大研究计划（91438201, 91438103）以及教育部“和创新团队发展计划”（IRT_15R53）资助. 朱虎明朱虎明，男，1978 年生，博士，副教授，主要研究领域为高性能计算及其应用和大规模并行机器学习等，CCF 会员，手机 13759917628. E-mail: . 李佩，女，1992 年生，硕士研究生,主要研究领域为高性能计算.焦李成焦李成，男，1959 年出生，博士，教授，博士生导师，中国计算机学会（CCF）高级会员，主要研究领域为智能感知、图像理解等. 杨淑媛杨淑媛，女，1978 年生，博士，教授，博士生导师，主要研究领域为智能信号与图像处理、机器学习等. 侯彪

9、侯彪，男，1974年生，博士，教授，博士生导师，主要研究领域为合成孔径雷达图像处理. 2 计算机学报 2018 年 deep neural networks recently. The architecture quick update of computing platforms and the leap in their computing capability, especially the development of multi-core and many-core computing devices and emerge of distributed heterogeneo

10、us computing technology provide suitable hardware resources for accelerating deep neural network. A parallel programming model allow users to specify different kinds of parallelism that could easily be mapped to parallel hardware architectures and that facilitate expression of parallel algorithms. O

11、n the other hand, parallel programming frameworks rapidly evolve to meet the performance demands of high performance computing with continuous development. Combining an appropriate pair of software and hardware and fully exploiting fine-grained and coarse-grained parallelism for efficient neural net

12、work acceleration are the challenging tasks. This paper starts with an introduction of neural network model, optimization algorithm for solving cost function, the most popular open source software framework and research progress both in academia and industry. This paper then discusses hardware platf

13、orms and parallel programming model for applications of deep neural network. Hardware such as multi-core central processing unit, graphics processing unit, many integrated core, field-programmable gate array and application-specific integrated circuit is summarized from three aspects of power, compu

14、ting power and challenges in developing parallel algorithms. Parallel programming models, whose open source and commercial implementations include compute unified device architecture, open computing language, open multiple processing, message passing interface and Spark, are compared in programming

15、language, available hardware and programming difficulty for parallelizing neural network. In addition, the principle of deep neural network model and data parallelization is described and how can use it to parallelize AlexNet is provided. Then a comparison of the six open source software system for

16、deep neural network is presented in the parallelization strategies, supporting hardware, parallel mode and so on. The software system under consideration are Caffe, TensorFlow, MxNet, CNTK, Torch and Theano. Next, the state-of-the-art papers of parallel neural network are reviewed and multi-level parallel methods for the training and inference process on different computing devices are summarized. Parallel convolutio

展开阅读全文