
发布时间:2024-04-25 作者:朱炫鹏,姚海东,刘隽,熊先奎





Abstract: The large language model based on the Transformer architecture shows powerful capabilities, and it is a major progress towards artificial general intelligence (AGI). The evolution of large language model architecture and algorithms is divided into two technical paths: improving the inference efficiency and model capability. The mainstream technical solutions and ideas for the two technical routes are described. Methods for improving inference efficiency include distributed inference, computing optimization, memory access optimization, and quantification. To improve model capabilities, new architectures such as mixture of experts (MoE) and state space model(SSM) are introduced.

Keywords: large language model; transformer; attention

在线PDF浏览: PDF