选择语言

大语言模型算法演进综述

发布时间：2024-04-25 作者：朱炫鹏，姚海东，刘隽，熊先奎

摘要：基于Transformer架构的大语言模型展现出强大的能力，是人类迈向通用人工智能（AGI）的一个重大进步。大语言模型架构和算法的演进，分为提高推理效率、提高模型能力两条技术路线。介绍了两条技术路线主流的技术方案和思路。提高推理效率的方法有分布式推理、计算优化、访存优化、量化等；提高模型能力主要是引入新的架构，如混合专家（MoE）模型、状态空间模型（SSM）等。

关键词：大语言模型；Transformer；注意力

Abstract: The large language model based on the Transformer architecture shows powerful capabilities, and it is a major progress towards artificial general intelligence (AGI). The evolution of large language model architecture and algorithms is divided into two technical paths: improving the inference efficiency and model capability. The mainstream technical solutions and ideas for the two technical routes are described. Methods for improving inference efficiency include distributed inference, computing optimization, memory access optimization, and quantification. To improve model capabilities, new architectures such as mixture of experts (MoE) and state space model(SSM) are introduced.

Keywords: large language model; transformer; attention

在线PDF浏览： PDF

本期相关文章

生成式大模型承载网络架构与关键技术探索

低资源集群中的大语言模型分布式推理技术

基于存算一体集成芯片的大语言模型专用硬件架构

通信网络与大模型的融合与协同

大模型训练技术综述

基于动态通道绑定的更高速无源光网络

智能算力核心基础系统软件的现状与展望

反无人机技术综述：通信技术与人工智能的融合

专题导读