为了实现这一目标,,来自 MIT 和 MIT-IBM Watson AI 实验室的研究人员开发了一种快速预测工具,可以告诉数据中心运营商在特定处理器或 AI 加速器芯片上运行特定 AI 工作负载会消耗多少电量。

Toward that goal, researchers from MIT and the MIT-IBM Watson AI Lab developed a rapid prediction tool that tells data center operators how much power will be consumed by running a particular AI workload on a certain processor or AI accelerator chip.

他们的方法可以在几秒, 内产生可靠的功率估计,这与传统的建模技术不同,传统的建模技术可能需要数小时甚至数天才能产生结果。此外,他们的预测工具可以应用于广泛的硬件配置—,甚至还可以应用于尚未部署’的新兴设计。

Their method produces reliable power estimates in a few seconds, unlike traditional modeling techniques that can take hours or even days to yield results. Moreover, their prediction tool can be applied to a wide range of hardware configurations — even emerging designs that haven’t been deployed yet.

数据中心运营商可以利用这些估计在多个人工智能模型和处理器之间有效分配有限的资源,,从而提高能源效率。此外, 该工具可以让算法开发人员和模型提供商在部署新模型之前评估其潜在的能耗。

Data center operators could use these estimates to effectively allocate limited resources across multiple AI models and processors, improving energy efficiency. In addition, this tool could allow algorithm developers and model providers to assess potential energy consumption of a new model before they deploy it.

“人工智能可持续发展挑战是我们必须回答的紧迫问题。因为我们的估算方法快速, 方便, 并提供直接反馈,,我们希望它使算法开发人员和数据中心运营商更有可能考虑减少能耗,” 麻省理工学院博士后 Kyungmi Lee, 和该技术论文的主要作者说。

“The AI sustainability challenge is a pressing question we have to answer. Because our estimation method is fast, convenient, and provides direct feedback, we hope it makes algorithm developers and data center operators more likely to think about reducing energy consumption,” says Kyungmi Lee, an MIT postdoc and lead author of a paper on this technique.

与她一起撰写论文的还有志业宋,,电气工程和计算机科学(EECS) 研究生; Eun Kyung Lee 和 Xin Zhang, IBM 研究院和 MIT-IBM Watson AI 实验室的研究经理; Tamar Eilam, IBM 院士, IBM 研究院可持续计算首席科学家,、MIT-IBM Watson AI 实验室; 成员以及高级作者 Anantha P. Chandrakasan, MIT教务长, Vannevar Bush 电气工程和计算机科学教授,,也是 MIT-IBM Watson AI 实验室的成员。该研究将于本周在 IEEE 国际系统和软件性能分析研讨会上公布。

She is joined on the paper by Zhiye Song, an electrical engineering and computer science (EECS) graduate student; Eun Kyung Lee and Xin Zhang, research managers at IBM Research and the MIT-IBM Watson AI Lab; Tamar Eilam, IBM Fellow, chief scientist of sustainable computing at IBM Research, and a member of the MIT-IBM Watson AI Lab; and senior author Anantha P. Chandrakasan, MIT provost, Vannevar Bush Professor of Electrical Engineering and Computer Science, and a member of the MIT-IBM Watson AI Lab. The research is being presented this week at the IEEE International Symposium on Performance Analysis of Systems and Software.

在数据中心, 内,数千个强大的图形处理单元(GPU) 执行操作来训练和部署人工智能模型。特定 GPU 的功耗将根据其配置和处理的工作负载而有所不同。

Inside a data center, thousands of powerful graphics processing units (GPUs) perform operations to train and deploy AI models. The power consumption of a particular GPU will vary based on its configuration and the workload it is handling.

许多用于预测能耗的传统方法涉及将工作负载分解为单独的步骤,并模拟 GPU 内的每个模块一次一个步骤的使用情况。但模型训练和数据预处理等人工智能工作负载非常大,可能需要数小时甚至数天的时间才能以这种方式进行模拟。

Many traditional methods used to predict energy consumption involve breaking a workload into individual steps and emulating how each module inside the GPU is being utilized one step at a time. But AI workloads like model training and data preprocessing are extremely large and can take hours or even days to simulate in this manner.

“A 作为操作员, 如果我想比较不同的算法或配置以找到最节能的方式来继续, 如果单个仿真将花费数天, 这将变得非常不切实际,” Lee 说。

“As an operator, if I want to compare different algorithms or configurations to find the most energy-efficient manner to proceed, if a single emulation is going to take days, that is going to become very impractical,” Lee says.

为了加快预测过程,,麻省理工学院的研究人员试图使用可以更快估计的不太详细的信息。他们发现人工智能工作负载通常有许多可重复的模式。他们可以使用这些模式来生成可靠而快速的功率估计所需的信息。

To speed up the prediction process, the MIT researchers sought to use less-detailed information that could be estimated faster. They found that AI workloads often have many repeatable patterns. They could use these patterns to generate the information needed for reliable but quick power estimation.

在许多情况下,, 算法开发人员编写的程序是为了在 GPU 上尽可能高效地运行。例如,,他们使用结构良好的优化来跨并行处理核心分配工作,并以最有效的方式移动数据块。

In many cases, algorithm developers write programs to run as efficiently as possible on a GPU. For instance, they use well-structured optimizations to distribute the work across parallel processing cores and move chunks of data around in the most efficient manner.

“软件开发人员使用的这些优化创建了一个规则的结构,,这就是我们试图利用的,” Lee 解释道。

“These optimizations that software developers use create a regular structure, and that is what we are trying to leverage,” explains Lee.

研究人员开发了一种名为 EnergAIzer, 的轻量级估计模型,,它可以从这些优化中捕获 GPU 的功耗模式。

The researchers developed a lightweight estimation model, called EnergAIzer, that captures the power usage pattern of a GPU from those optimizations.

但是,虽然他们的估计很快,,但研究人员发现它并没有 考虑到所有能源成本。例如, 每次 GPU 运行程序, 时,设置和配置该程序都需要固定的能源成本。那么每次 GPU 对一块数据, 运行操作时,都必须支付额外的能源成本。

But while their estimation was fast, the researchers found that it didn’t take all energy costs into account. For instance, every time a GPU runs a program, there is a fixed energy cost required for setting up and configurating that program. Then each time the GPU runs an operation on a chunk of data, an additional energy cost must be paid.

由于硬件波动或访问或移动数据时发生冲突,,GPU 可能无法使用所有可用带宽,,从而减慢操作速度并随着时间的推移消耗更多能量。

Due to fluctuations in the hardware or conflicts in accessing or moving data, a GPU might not be able to use all available bandwidth, slowing operations down and drawing more energy over time.

为了包含这些额外的成本和差异,,研究人员从 GPU 收集了真实的测量结果,以生成应用于估计模型的校正项。

To include these additional costs and variances, the researchers gathered real measurements from GPUs to generate correction terms they applied to their estimation model.

“这样,我们可以得到一个快速的估计,而且也非常准确,”她说。

“This way, we can get a fast estimation that is also very accurate,” she says.

最后,,用户可以提供他们的工作负载信息,,例如他们想要运行的人工智能模型,以及用户输入处理的数量和长度,,EnergAIzer 将在几秒钟内输出能耗估计值。

In the end, a user can provide their workload information, like the AI model they want to run and the number and length of user inputs to process, and EnergAIzer will output an energy consumption estimation in a matter of seconds.

用户还可以更改 GPU 配置或调整运行速度,以了解此类设计选择如何影响整体功耗。

The user can also change the GPU configuration or adjust the operating speed to see how such design choices impact the overall power consumption.

当研究人员使用来自实际 GPU, 的真实 AI 工作负载信息测试 EnergAIzer 时,它可以估计功耗,误差仅为 8%, 左右,这与可能需要数小时才能得出结果的传统方法相当。

When the researchers tested EnergAIzer using real AI workload information from actual GPUs, it could estimate the power consumption with only about 8 percent error, which is comparable to traditional methods that can take hours to produce results.

他们的方法还可以用于预测未来 GPU 和新兴设备配置, 的功耗,只要硬件在短时间内不会 发生巨大变化。

Their method could also be used to predict the power consumption of future GPUs and emerging device configurations, as long as the hardware doesn’t change drastically in a short amount of time.

在未来,,研究人员希望在最新的 GPU 配置上测试 EnergAIzer 并扩展模型,以便它可以应用于协作运行工作负载的许多 GPU。

In the future, the researchers want to test EnergAIzer on the newest GPU configurations and scale the model up so it can be applied to many GPUs that are collaborating to run a workload.

“为了真正对可持续发展产生影响,,我们需要一种工具,可以为硬件设计人员,数据中心运营商,和算法开发人员,提供跨堆栈的快速能源估算解决方案,,以便他们都可以更加了解功耗。有了这个工具,,我们’已经朝着这个目标迈出了一步,” Lee 说。

“To really make an impact on sustainability, we need a tool that can provide a fast energy estimation solution across the stack, for hardware designers, data center operators, and algorithm developers, so they can all be more aware of power consumption. With this tool, we’ve taken one step toward that goal,” Lee says.

这项研究由 MIT-IBM Watson AI 实验室资助,,部分,。

This research was funded, in part, by the MIT-IBM Watson AI Lab.