A new method developed by MIT researchers can accelerate a privacy-preserving artificial intelligence training method by about 81 percent. This advance could enable a wider array of resource-constrained edge devices, like sensors and smartwatches, to deploy more accurate AI models while keeping user data secure.

麻省理工学院的研究人员提高了一种名为联邦学习, 的技术的效率,该技术涉及一个由互联设备组成的网络,这些设备协同工作来训练共享的人工智能模型。

The MIT researchers boosted the efficiency of a technique known as federated learning, which involves a network of connected devices that work together to train a shared AI model.

在联邦学习, 中,模型从中央服务器广播到无线设备。每个设备使用其本地数据训练模型,然后将模型更新传输回服务器。数据保持安全,因为它们保留在每台设备上。

In federated learning, the model is broadcast from a central server to wireless devices. Each device trains the model using its local data and then transfers model updates back to the server. Data are kept secure because they remain on each device.

But not all devices in the network have enough capacity, computational capability, and connectivity to store, train, and transfer the model back and forth with the server in a timely manner. This causes delays that worsen training performance.

麻省理工学院的研究人员开发了一种技术来克服这些内存限制和通信瓶颈。他们的方法旨在处理具有各种限制的无线设备异构网络。

The MIT researchers developed a technique to overcome these memory constraints and communication bottlenecks. Their method is designed to handle a heterogenous network of wireless devices with varied limitations.

This new approach could make it more feasible for AI models to be used in high-stakes applications with strict security and privacy standards, like health care and finance.

“This work is about bringing AI to small devices where it is not currently possible to run these kinds of powerful models. We carry these devices around with us in our daily lives.我们需要 AI 能够在这些设备, 上运行,而不仅仅是在巨型服务器和 GPU, 上运行,这项工作是朝着实现这一目标迈出的重要一步,” 电气工程和计算机科学 (EECS) 研究生、该技术论文的主要作者 Irene Tenison, 说道。

“This work is about bringing AI to small devices where it is not currently possible to run these kinds of powerful models. We carry these devices around with us in our daily lives. We need AI to be able to run on these devices, not just on giant servers and GPUs, and this work is an important step toward enabling that,” says Irene Tenison, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this technique.

Many federated learning approaches assume all devices in the network have enough memory to train the full AI model, and stable connectivity to transmit updates back to the server quickly.

But these assumptions fall short with a network of heterogenous devices, like smartwatches, wireless sensors, and mobile phones. These edge devices have limited memory and computational power, and often face intermittent network connectivity.

The central server usually waits to receive model updates from all devices, then averages them to complete the training round.重复此过程直到训练完成。

The central server usually waits to receive model updates from all devices, then averages them to complete the training round. This process repeats until training is complete.

“This lag time can slow down the training procedure or even cause it to fail,” Tenison says.

为了克服这些限制,,麻省理工学院的研究人员开发了一种名为 FTTE(Federated Tiny Training Engine) 的新框架,可减少每个移动设备所需的内存和通信开销。

To overcome these limitations, the MIT researchers developed a new framework called FTTE (Federated Tiny Training Engine) that reduces the memory and communication overhead needed by each mobile device.

他们的框架涉及三个主要创新。

Their framework involves three main innovations.

First, rather than broadcasting the entire model to all devices, FTTE sends a smaller subset of model parameters instead, reducing the memory requirement for each device.参数是模型在训练期间调整的内部变量。

First, rather than broadcasting the entire model to all devices, FTTE sends a smaller subset of model parameters instead, reducing the memory requirement for each device. Parameters are internal variables the model adjusts during training.

FTTE uses a special search procedure to identify parameters that will maximize the model的 accuracy while staying within a certain memory budget.该限制是根据内存最受限的设备设置的。

FTTE uses a special search procedure to identify parameters that will maximize the model的 accuracy while staying within a certain memory budget. That limit is set based on the most memory-constrained device.

Second, the server updates the model using an asynchronous approach. Rather than waiting for responses from all devices, the server accumulates incoming updates until it reaches a fixed capacity, then proceeds with the training round.

Third, the server weights updates from each device based on when it received them. In this way, older updates don’t contribute as much to the training process. These outdated data can hold the model back, slowing the training process and reducing accuracy.

“We use this semi-asynchronous approach because want to involve the least powerful devices in the training process so they can contribute their data to the model, but we don’t want the more powerful devices in the network to stay idle for a long time and waste resources,” Tenison says.

研究人员在数百个异构设备以及各种模型和数据集的模拟中测试了他们的框架。 On average, FTTE enabled the training procedure to reach completing 81 percent faster than standard federated learning approaches.

The researchers tested their framework in simulations with hundreds of heterogeneous devices and a variety of models and datasets. On average, FTTE enabled the training procedure to reach completing 81 percent faster than standard federated learning approaches.

Their method reduced the on-device memory overhead by 80 percent and the communication payload by 69 percent, while attaining near the accuracy of other techniques.

“因为我们希望模型尽可能快地训练,以节省这些资源受限设备的电池寿命,,我们确实在准确性方面进行了权衡。但在某些应用中,精度的小幅下降是可以接受的,,特别是因为我们的方法执行速度快得多,”,她说。

“Because we want the model to train as fast as possible to save the battery life of these resource-constrained devices, we do have a tradeoff in accuracy. But a small drop in accuracy could be acceptable in some applications, especially since our method performs so much faster,” she says.

FTTE 还展示了有效的可扩展性,并为更大的设备组提供了更高的性能增益。

FTTE also demonstrated effective scalability and delivered higher performance gains for larger groups of devices.

In addition to these simulations, the researchers tested FTTE on a small network of real devices with varying computational capabilities.

“并不是每个人都拥有最新的 Apple iPhone。例如,在许多发展中国家, 用户可能拥有功能较弱的移动电话。她说,通过我们的技术,,我们可以将联合学习的优势带入这些设置,”。

“Not everyone has the latest Apple iPhone. In many developing countries, for instance, users might have less powerful mobile phones. With our technique, we can bring the benefits of federated learning to these settings,” she says.

未来, 研究人员希望研究如何使用他们的方法来提高每个设备, 上人工智能模型的个性化性能,而不是关注模型的平均性能。他们还想在真实的硬件上进行更大规模的实验。

In the future, the researchers want to study how their method could be used to increase the personalized performance of AI models on each device, rather than focusing on the average performance of the model. They also want to conduct larger experiments on real hardware.

这项工作的,部分由武田博士奖学金资助,。

This work was funded, in part, by a Takeda PhD Fellowship.