为了提高数据中心效率,,多个存储设备通常通过网络汇集在一起,以便许多应用程序可以共享它们。但即使使用池,,由于设备之间的性能差异,大量设备容量仍然未得到充分利用。

To improve data center efficiency, multiple storage devices are often pooled together over a network so many applications can share them. But even with pooling, significant device capacity remains underutilized due to performance variability across the devices.

麻省理工学院的研究人员现已开发出一种系统,可以通过同时处理三个主要的变化源来提高存储设备的性能。与一次仅处理一种可变性来源的传统方法相比,他们的方法显着提高了速度。

MIT researchers have now developed a system that boosts the performance of storage devices by handling three major sources of variability simultaneously. Their approach delivers significant speed improvements over traditional methods that tackle only one source of variability at a time.

该系统使用两层架构,,其中一个中央控制器对每个存储设备执行哪些任务做出总体决策,,而每台机器的本地控制器则在该设备出现问题时快速重新路由数据。

The system uses a two-tier architecture, with a central controller that makes big-picture decisions about which tasks each storage device performs, and local controllers for each machine that rapidly reroute data if that device is struggling.

方法,可以实时适应不断变化的工作负载,,不需要专门的硬件。当研究人员在人工智能模型训练和图像压缩,等实际任务上测试该系统时,它的性能几乎是传统方法的两倍。通过智能地平衡多个存储设备, 的工作负载,系统可以提高数据中心的整体效率。

The method, which can adapt in real-time to shifting workloads, does not require specialized hardware. When the researchers tested this system on realistic tasks like AI model training and image compression, it nearly doubled the performance delivered by traditional approaches. By intelligently balancing the workloads of multiple storage devices, the system can increase overall data center efficiency.

“人们倾向于在问题上投入更多资源来解决它,,但这在很多方面都是不可持续的。我们希望能够最大限度地延长这些非常昂贵和碳密集型资源的寿命,” 电气工程和计算机科学 (EECS) 研究生、该技术论文的主要作者 Gohar Chaudhry, 说。 “使用我们的自适应软件解决方案,,您仍然可以在需要扔掉现有设备并购买新设备之前充分发挥现有设备的性能。”

“There is a tendency to want to throw more resources at a problem to solve it, but that is not sustainable in many ways. We want to be able to maximize the longevity of these very expensive and carbon-intensive resources,” says Gohar Chaudhry, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this technique. “With our adaptive software solution, you can still squeeze a lot of performance out of your existing devices before you need to throw them away and buy new ones.”

固态硬盘 (SSD) 是高性能数字存储设备,允许应用程序读取和写入数据。例如, SSD 可以存储大量数据集并将数据快速发送到处理器以进行机器学习模型训练。

Solid-state drives (SSDs) are high-performance digital storage devices that allow applications to read and write data. For instance, an SSD can store vast datasets and rapidly send data to a processor for machine-learning model training.   

将多个 SSD 汇集在一起,以便许多应用程序可以共享它们,从而提高效率,,因为并非每个应用程序都需要在给定时间使用 SSD 的全部容量。但并非所有 SSD 的性能都相同,,最慢的设备可能会限制池的整体性能。

Pooling multiple SSDs together so many applications can share them improves efficiency, since not every application needs to use the entire capacity of an SSD at a given time. But not all SSDs perform equally, and the slowest device can limit the overall performance of the pool.

这些低效率是由于 SSD 硬件及其执行的任务的可变性造成的。

These inefficiencies arise from variability in SSD hardware and the tasks they perform.

为了利用这种未开发的 SSD 性能,,研究人员开发了 Sandook, 一个基于软件的系统,可同时解决阻碍性能变化的三种主要形式。 “Sandook” 是一个乌尔都语单词,意思是 “box,” 来表示 “storage。”

To utilize this untapped SSD performance, the researchers developed Sandook, a software-based system that tackles three major forms of performance-hampering variability simultaneously. “Sandook” is an Urdu word that means “box,” to signify “storage.”

一种类型的变异性是由可能在不同时间从多个供应商购买的 SSD 的寿命, 磨损量, 和容量的差异引起的。

One type of variability is caused by differences in the age, amount of wear, and capacity of SSDs that may have been purchased at different times from multiple vendors.

第二种类型的可变性是由于同一 SSD 上发生的读取和写入操作之间的不匹配造成的。要将新数据写入设备,,SSD 必须擦除一些现有数据。此过程可能会减慢同时发生的数据读取, 或检索, 的速度。

The second type of variability is due to the mismatch between read and write operations occurring on the same SSD. To write new data to the device, the SSD must erase some existing data. This process can slow down data reads, or retrievals, happening at the same time.

第三个可变性来源是垃圾收集,,这是一个收集和删除过时数据以释放空间的过程。此进程, 会减慢SSD 操作,,它是按数据中心操作员无法控制的随机时间间隔触发的。

The third source of variability is garbage collection, a process of gathering and removing outdated data to free up space. This process, which slows SSD operations, is triggered at random intervals that a data center operator cannot control.

“I 可以’t 假设所有 SSD 在我的整个部署周期中都将表现相同。即使我给他们所有相同的工作负载,,其中一些人也会掉队,,这会损害我可以实现的净吞吐量,” Chaudhry 解释道。

“I can’t assume all SSDs will behave identically through my entire deployment cycle. Even if I give them all the same workload, some of them will be stragglers, which hurts the net throughput I can achieve,” Chaudhry explains.

为了处理所有三个变异源, Sandook 采用两层结构。全局调度优化了整个池,的任务分配,同时每个SSD上更快的调度程序对紧急事件做出反应并将操作从拥塞的设备转移开。

To handle all three sources of variability, Sandook utilizes a two-tier structure. A global schedular optimizes the distribution of tasks for the overall pool, while faster schedulers on each SSD react to urgent events and shift operations away from congested devices.

该系统通过轮换应用程序可用于读写的 SSD 来克服读写干扰造成的延迟。这减少了在同一台机器上同时发生读取和写入的机会。

The system overcomes delays from read-write interference by rotating which SSDs an application can use for reads and writes. This reduces the chance reads and writes happen simultaneously on the same machine.

Sandook 还分析了每个 SSD 的典型性能。它使用此信息来检测垃圾收集何时可能会减慢操作速度。一旦检测到,,Sandook 就会通过转移一些任务直到垃圾收集完成来减少 SSD 上的工作负载。

Sandook also profiles the typical performance of each SSD. It uses this information to detect when garbage collection is likely slowing operations down. Once detected, Sandook reduces the workload on that SSD by diverting some tasks until garbage collection is finished.

“如果该 SSD 正在进行垃圾收集并且不能再 处理相同的工作负载, 我想给它一个较小的工作负载并慢慢恢复。我们希望找到它仍在做一些工作的最佳点, 并利用该性能,” Chaudhry 说。

“If that SSD is doing garbage collection and can’t handle the same workload anymore, I want to give it a smaller workload and slowly ramp things back up. We want to find the sweet spot where it is still doing some work, and tap into that performance,” Chaudhry says.

SSD配置文件还允许Sandook的全局控制器以考虑每个设备的特性和容量的加权方式分配工作负载。

The SSD profiles also allow Sandook的 global controller to assign workloads in a weighted fashion that considers the characteristics and capacity of each device.

因为全局控制器可以看到全局,而本地控制器可以即时做出反应, Sandook 可以同时管理不同时间范围内发生的各种形式的变化。例如,垃圾收集造成的,延迟突然发生,,而磨损引起的延迟会在数月内累积。

Because the global controller sees the overall picture and the local controllers react on the fly, Sandook can simultaneously manage forms of variability that happen over different time scales. For instance, delays from garbage collection occur suddenly, while latency caused by wear and tear builds up over many months.

研究人员在 10 个 SSD 池上测试了 Sandook,并在四个任务: 上运行数据库, 训练机器学习模型, 压缩图像, 并存储用户数据对该系统进行了评估。与静态方法, 相比,Sandook 将每个应用程序的吞吐量提高了 12% 至 94%,并将 SSD 容量的整体利用率提高了 23%。

The researchers tested Sandook on a pool of 10 SSDs and evaluated the system on four tasks: running a database, training a machine-learning model, compressing images, and storing user data. Sandook boosted the throughput of each application between 12 and 94 percent when compared to static methods, and improved the overall utilization of SSD capacity by 23 percent.

该系统使 SSD 能够实现其理论最大性能, 的 95%,而无需专门的硬件或特定于应用程序的更新。

The system enabled SSDs to achieve 95 percent of their theoretical maximum performance, without the need for specialized hardware or application-specific updates.

“我们的动态解决方案可以为所有SSD释放更多性能,并真正将它们推向极限。在这种规模下,您可以节省的每一点容量都非常重要,” Chaudhry 说。

“Our dynamic solution can unlock more performance for all the SSDs and really push them to the limit. Every bit of capacity you can save really counts at this scale,” Chaudhry says.

在未来,,研究人员希望整合最新 SSD 上可用的新协议,使操作员能够更好地控制数据放置。他们还希望利用人工智能工作负载的可预测性来提高 SSD 操作的效率。

In the future, the researchers want to incorporate new protocols available on the latest SSDs that give operators more control over data placement. They also want to leverage the predictability in AI workloads to increase the efficiency of SSD operations.

“闪存存储是一项强大的技术,支撑现代数据中心应用程序,,但在性能需求差异很大的工作负载之间共享此资源仍然是一个突出的挑战。这项工作有意义地向前推进了一个优雅而实用的解决方案,可供部署,,使闪存存储更接近其在生产云中的全部潜力,” Josh Fried, 是 Google 的软件工程师,即将成为宾夕法尼亚大学的助理教授,,他没有参与这项工作。

“Flash storage is a powerful technology that underpins modern datacenter applications, but sharing this resource across workloads with widely varying performance demands remains an outstanding challenge. This work moves the needle meaningfully forward with an elegant and practical solution ready for deployment, bringing flash storage closer to its full potential in production clouds,” says Josh Fried, a software engineer at Google and incoming assistant professor at the University of Pennsylvania, who was not involved with this work.

这项研究的,部分,由国家科学基金会,、美国国防高级研究计划局,和半导体研究公司资助。

This research was funded, in part, by the National Science Foundation, the U.S. Defense Advanced Research Projects Agency, and the Semiconductor Research Corporation.