您当前访问的的浏览器版本过低,为了给您带来更好的体验,建议您升级至Edge浏览器或者推荐使用Google浏览器
取消
鹏程·盘古:大规模自回归中文预训练语言模型及应用
发布时间:2022-04-08  作者:曾炜,苏腾,王晖,田永鸿,高文  阅读量:

鹏程•盘古:大规模自回归中文预训练语言模型及应用

曾炜1,2, 苏腾3, 王晖1, 田永鸿1,2, 高文1
(1. 鹏城实验室,中国 深圳 518055;2. 北京大学,中国 北京 100871;3. 华为技术有限公司,中国 杭州 310052

摘要:在鹏城云脑II上训练了全球首个拥有全开源2 000亿参数的自回归中文预训练语言大模型——鹏程•盘古。鹏程•盘古模型基于1.1 TB高质量中文训练数据,采用全场景人工智能计算框架MindSpore自动并行技术实现了五维并行训练策略,从而可将训练任务高效扩展到4 096个处理器上。对比实验表明,在少样本或零样本情况下,鹏程•盘古模型在多个中文自然语言理解或生成任务上都具有较优的性能。在此基础上,鹏程•盘古模型在大模型压缩、提示微调学习、多任务学习以及持续学习等方面也取得了很好的应用效果。  
关键词:大规模预训练语言模型;鹏城云脑Ⅱ;大规模分布式训练;中文理解与生成;提示微调学习


Pengcheng-PanGu: Large-Scale Autoregressive Pre-trained Chinese Language Model with Auto-Parallel Computation and Its Application

ZENG Wei1,2, SU Teng3, WANG Hui1, TIAN Yonghong1,2, GAO Wen1
(1. Pengcheng Laboratory, Shenzhen 518055, China;2. Peking University, Beijing 100871, China;3. Huawei Technologies Co., Ltd., Hangzhou 310052, China)

Abstract: This paper presents the world's first large-scale autoregressive pre-trained Chinese language model named Pengcheng-PanGu with up to 200 billion parameters. Pengcheng-PanGu is developed under the Pengcheng cloud brain Ⅱ. We collect 1.1 TB high-quality Chinese data from a wide range of domains to pre-train the model. The training parallelism strategy is implemented based on all-scenarios artificial intelligence computing framework MindSpore Auto-parallel, which composes five parallelism dimensions to scale the training task to 4 096 processors efficiently. The experimental results demonstrate the superior capabilities of Pengcheng-PanGu in performing various natural language understanding and natural language generation tasks under few-shot or zero-shot settings. On this basis, Pengcheng-PanGu model has also achieved better application results in large model compression, prompt fine-tuning, multi-task, and continuous learning.
Keywords: large-scale pre-trained language models; Pengcheng cloud brain Ⅱ; large-scale distributed training; Chinese language understanding and generation; tip fine-tuning learning

在线PDF浏览: PDF
本期相关文章