评估自治系统的伦理

人工智能越来越多地被用来帮助优化高风险环境中的决策。例如,，自主系统可以确定一种配电策略，在保持电压稳定的同时最大限度地降低成本。

Artificial intelligence is increasingly being used to help optimize decision-making in high-stakes settings. For instance, an autonomous system can identify a power distribution strategy that minimizes costs while keeping voltages stable.

但是，尽管这些人工智能驱动的输出在技术上可能是最优的, 它们公平吗? 如果低成本配电策略使弱势社区比高收入地区更容易受到停电影响怎么办?

But while these AI-driven outputs may be technically optimal, are they fair? What if a low-cost power distribution strategy leaves disadvantaged neighborhoods more vulnerable to outages than higher-income areas?

为了帮助利益相关者在部署之前快速查明潜在的道德困境, 麻省理工学院的研究人员开发了一种自动评估方法，该方法可以平衡可衡量的结果,（如成本或可靠性,）与定性或主观价值,（如公平性）之间的相互作用。

To help stakeholders quickly pinpoint potential ethical dilemmas before deployment, MIT researchers developed an automated evaluation method that balances the interplay between measurable outcomes, like cost or reliability, and qualitative or subjective values, such as fairness.

该系统使用大型语言模型 (LLM) 作为人类捕获和纳入利益相关者偏好的代理，将客观评估与用户定义的人类价值观, 分开。

The system separates objective evaluations from user-defined human values, using a large language model (LLM) as a proxy for humans to capture and incorporate stakeholder preferences.

自适应框架选择最佳场景进行进一步评估, 简化通常需要昂贵且耗时的手动操作的流程。这些测试用例可以显示自主系统与人类价值观, 非常一致的情况，以及意外不符合道德标准的情况。

The adaptive framework selects the best scenarios for further evaluation, streamlining a process that typically requires costly and time-consuming manual effort. These test cases can show situations where autonomous systems align well with human values, as well as scenarios that unexpectedly fall short of ethical criteria.

Fan 与主要作者 Anjali Parashar,、机械工程研究生; Yingke Li,、AeroAstro 博士后; 以及麻省理工学院和 Saab 的其他人一起参与了这篇论文。该研究将在国际学习表征会议上公布。

Fan is joined on the paper by lead author Anjali Parashar, a mechanical engineering graduate student; Yingke Li, an AeroAstro postdoc; and others at MIT and Saab. The research will be presented at the International Conference on Learning Representations.

在像电网,这样的大型系统中，以考虑所有目标的方式评估人工智能模型’的道德一致性尤其困难。

In a large system like a power grid, evaluating the ethical alignment of an AI model的 recommendations in a way that considers all objectives is especially difficult.

大多数测试框架依赖于预先收集的数据,，但主观道德标准的标记数据通常很难获得。此外,因为道德价值观和人工智能系统都在不断发展,基于书面代码或监管文件的静态评估方法需要经常更新。

Most testing frameworks rely on pre-collected data, but labeled data on subjective ethical criteria are often hard to come by. In addition, because ethical values and AI systems are both constantly evolving, static evaluation methods based on written codes or regulatory documents require frequent updates.

范和她的团队从不同的角度解决了这个问题。根据他们之前评估机器人系统, 的工作，他们开发了一个实验设计框架来识别信息最丰富的场景,，然后人类利益相关者将更仔细地评估这些场景。

Fan and her team approached this problem from a different perspective. Drawing on their prior work evaluating robotic systems, they developed an experimental design framework to identify the most informative scenarios, which human stakeholders would then evaluate more closely.

他们的两部分系统, 称为系统级道德测试的可扩展实验设计(SEED-SET),，包含定量指标和道德标准。它可以识别有效满足可衡量要求并与人类价值观,保持一致的场景，反之亦然。

Their two-part system, called Scalable Experimental Design for System-level Ethical Testing (SEED-SET), incorporates quantitative metrics and ethical criteria. It can identify scenarios that effectively meet measurable requirements and align well with human values, and vice versa.

“我们不想不想将所有资源用于随机评估。因此,将框架引导到我们最关心的测试用例非常重要,” Li说。

“We don’t want to spend all our resources on random evaluations. So, it is very important to guide the framework toward the test cases we care the most about,” Li says.

重要的是, SEED-SET 不需要预先存在的评估数据, 并且它适应多个目标。

Importantly, SEED-SET does not need pre-existing evaluation data, and it adapts to multiple objectives.

例如,，电网可能有多个用户组,，包括大型农村社区和数据中心。虽然两个群体都可能想要低成本和可靠的电力,，但从道德角度来看，每个群体的优先事项可能差别很大。

For instance, a power grid may have several user groups, including a large rural community and a data center. While both groups may want low-cost and reliable power, each group的 priority from an ethical perspective may vary widely.

这些道德标准可能没有明确规定,，因此无法通过分析来衡量’。

These ethical criteria may not be well-specified, so they can’t be measured analytically.

电网运营商希望找到最符合所有利益相关者主观道德偏好的最具成本效益的策略。

The power grid operator wants to find the most cost-effective strategy that best meets the subjective ethical preferences of all stakeholders.

SEED-SET 通过按照分层结构将问题分为两个, 来应对这一挑战。客观模型考虑系统如何在成本等有形指标上执行。然后，在客观评估的基础上建立一个考虑利益相关者判断,（如感知公平性,）的主观模型。

SEED-SET tackles this challenge by splitting the problem into two, following a hierarchical structure. An objective model considers how the system performs on tangible metrics like cost. Then a subjective model that considers stakeholder judgements, like perceived fairness, builds on the objective evaluation.

“我们方法的客观部分与人工智能系统,相关，而主观部分与评估它的用户相关。通过以分层方式分解偏好,，我们可以用更少的评估生成所需的场景,” Parashar 说。

“The objective part of our approach is tied to the AI system, while the subjective part is tied to the users who are evaluating it. By decomposing the preferences in a hierarchical fashion, we can generate the desired scenarios with fewer evaluations,” Parashar says.

为了执行主观评估,，系统使用法学硕士作为人类评估员的代理。研究人员将每个用户组的偏好编码为模型的自然语言提示。

To perform the subjective assessment, the system uses an LLM as a proxy for human evaluators. The researchers encode the preferences of each user group into a natural language prompt for the model.

法学硕士使用这些说明来比较两种情况, 根据道德标准选择首选设计。

The LLM uses these instructions to compare two scenarios, selecting the preferred design based on the ethical criteria.

“在看到数百或数千个场景后,，人类评估者可能会感到疲劳，并且评估结果不一致,，因此我们使用基于 LLM 的策略,” Parashar 解释道。

“After seeing hundreds or thousands of scenarios, a human evaluator can suffer from fatigue and become inconsistent in their evaluations, so we use an LLM-based strategy instead,” Parashar explains.

SEED-SET 使用选定的场景来模拟整个系统(，在本例中, 是配电策略)。这些模拟结果指导其搜索下一个要测试的最佳候选场景。

SEED-SET uses the selected scenario to simulate the overall system (in this case, a power distribution strategy). These simulation results guide its search for the next best candidate scenario to test.

最后, SEED-SET 智能地选择符合或不符合客观指标和道德标准的最具代表性的场景。通过这种方式,用户可以分析AI系统的性能并调整其策略。

In the end, SEED-SET intelligently selects the most representative scenarios that either meet or are not aligned with objective metrics and ethical criteria. In this way, users can analyze the performance of the AI system and adjust its strategy.

例如，, SEED-SET 可以查明在需求高峰期间优先考虑高收入地区的配电情况,，从而使贫困社区更容易发生断电。

For instance, SEED-SET can pinpoint cases of power distribution that prioritize higher-income areas during periods of peak demand, leaving underprivileged neighborhoods more prone to outages.

为了测试 SEED-SET,，研究人员评估了现实的自主系统,，例如人工智能驱动的电网和城市交通路由系统。他们衡量了生成的场景与道德标准的符合程度。

To test SEED-SET, the researchers evaluated realistic autonomous systems, like an AI-driven power grid and an urban traffic routing system. They measured how well the generated scenarios aligned with ethical criteria.

该系统在相同的时间内生成的最佳测试用例是基准策略的两倍多,，同时发现了其他方法忽略的许多场景。

The system generated more than twice as many optimal test cases as the baseline strategies in the same amount of time, while uncovering many scenarios other approaches overlooked.

“随着我们改变用户偏好,，SEED-SET 生成的场景集发生了巨大变化。这告诉我们评估策略很好地响应了用户的偏好,” Parashar 说。

“As we shifted the user preferences, the set of scenarios SEED-SET generated changed drastically. This tells us the evaluation strategy responds well to the preferences of the user,” Parashar says.

为了衡量 SEED-SET 在实践中的有用性,，研究人员需要进行用户研究，看看它生成的场景是否有助于真正的决策。

To measure how useful SEED-SET would be in practice, the researchers will need to conduct a user study to see if the scenarios it generates help with real decision-making.

除了进行这样的研究,之外，研究人员还计划探索使用更有效的模型，这些模型可以通过更多标准,扩展到更大的问题，例如评估LLM决策。

In addition to running such a study, the researchers plan to explore the use of more efficient models that can scale up to larger problems with more criteria, such as evaluating LLM decision-making.

这项研究由美国国防高级研究计划局资助,，部分,。

This research was funded, in part, by the U.S. Defense Advanced Research Projects Agency.