选择语言

基于存算一体集成芯片的大语言模型专用硬件架构

发布时间：2024-04-25 作者：何斯琪，穆琛，陈迟晓

摘要：目前以ChatGPT为代表的人工智能（AI）大模型在参数规模和系统算力需求上呈现指数级的增长趋势。深入研究了大型模型专用硬件架构，详细分析了大模型在部署过程中面临的带宽问题，以及这些问题对当前数据中心的重大影响。提出采用存算一体集成芯片架构的解决方案，旨在缓解数据传输压力，同时提高大模型推理的能量效率。此外，还深入研究了在存算一体架构下轻量化-存内压缩协同设计的可能性，以实现稀疏网络在存算一体硬件上的稠密映射，从而显著提高存储密度和计算能效。

关键词：大模型；存算一体；集成芯粒；存内压缩

Abstract: Currently, AI models represented by ChatGPT are showing an exponential growth trend in parameter size and system computing power requirements. The dedicated hardware architecture for large models is studied, and a detailed analysis of the bandwidth bottleneck issues faced by large models during deployment is provided, as well as the significant impact of this challenge on current data centers. To address this issue, a solution of using integrated compute-in-memory chiplets has been proposed, aiming to alleviate data transmission pressure and improve the energy efficiency of large-scale model inference. In addition, the possibility of lightweight in-memory compression collaborative design under the in-memory computing architecture is studied, in order to achieve dense mapping of sparse networks on the integrated in-memory computing architecture hardware, thereby significantly improving storage density and computational energy efficiency.

Keywords: large language model; compute-in-memory; chiplet; in-memory compression

在线PDF浏览： PDF

本期相关文章

大模型知识管理系统

大语言模型时代的智能运维

生成式大模型承载网络架构与关键技术探索

低资源集群中的大语言模型分布式推理技术

通信网络与大模型的融合与协同

大模型训练技术综述

大语言模型算法演进综述

基于动态通道绑定的更高速无源光网络

智能算力核心基础系统软件的现状与展望