大语言模型算法演进综述

发布时间:2024-04-25 作者:朱炫鹏,姚海东,刘隽,熊先奎 阅读量:

 

摘要:基于Transformer架构的大语言模型展现出强大的能力,是人类迈向通用人工智能(AGI)的一个重大进步。大语言模型架构和算法的演进,分为提高推理效率、提高模型能力两条技术路线。介绍了两条技术路线主流的技术方案和思路。提高推理效率的方法有分布式推理、计算优化、访存优化、量化等;提高模型能力主要是引入新的架构,如混合专家(MoE)模型、状态空间模型(SSM)等。

关键词:大语言模型;Transformer;注意力

 

Abstract: The large language model based on the Transformer architecture shows powerful capabilities, and it is a major progress towards artificial general intelligence (AGI). The evolution of large language model architecture and algorithms is divided into two technical paths: improving the inference efficiency and model capability. The mainstream technical solutions and ideas for the two technical routes are described. Methods for improving inference efficiency include distributed inference, computing optimization, memory access optimization, and quantification. To improve model capabilities, new architectures such as mixture of experts (MoE) and state space model(SSM) are introduced.

Keywords: large language model; transformer; attention

在线PDF浏览: PDF