摘要:现代大模型规模随扩展定律持续扩大,近万亿的模型参数量带来了算法、硬件、工程领域的一系列困境。Transformer架构本身计算效率较低的瓶颈愈发凸显,也引发了研究人员对通用人工智能(AGI)实现路径的深入思考。一方面,针对现有自回归Transformer架构,已形成注意力机制、低精度量化、参数共享等算法改进方向,以及集群系统优化、硬件系统升级等工程改进方向;另一方面,下一代AI大模型计算范式正朝着不以Next Token Prediction为核心的方向演进,具体包括两类路径:一是从更高抽象层次进行预测的扩散和联合嵌入预测架构,二是从物理第一性原理和计算基材特性出发构建的动力学模型、热力学模型和能量模型。同时,新型计算范式与新型计算基材相结合,有望从根本上改变传统AI算法软件与硬件割裂的局面,成为迈向AGI的高效路径。
关键词:大语言模型;计算范式;人工智能
Abstract: The continuous expansion of modern large-scale models, guided by scaling laws, has led to a series of challenges in algorithms, hardware, and engineering due to model parameters approaching the trillion-scale mark. The inherent computational inefficiency of the Transformer architecture has become increasingly evident, prompting in-depth reflection among researchers regarding the path to achieving artificial general intelligence (AGI). On one hand, improvements to the existing autoregressive Transformer architecture are being pursued along two main avenues: algorithmic enhancements such as refined attention mechanisms, low-precision quantization, and parameter sharing, as well as engineering advancements including cluster system optimization and hardware upgrades. On the other hand, the next-generation computational paradigms for AI models are evolving away from the core framework of next token prediction. This shift includes two distinct pathways: first, architectures that operate at higher levels of abstraction, such as diffusion and joint embedding prediction models; and second, approaches grounded in first principles of physics and the characteristics of computational substrates, including dynamic, thermodynamic, and energy-based models. Concurrently, the integration of novel computational paradigms with new computational substrates holds the potential to fundamentally alter the traditional disconnect between AI software and hardware, constituting an efficient pathway toward AGI.
Keywords: large language model; computational paradigm; artificial intelligence