解决 “Whac — Aegis AI

在今天’的医院和诊所,，皮肤科医生可以使用人工智能模型对皮肤病变进行分类，以评估病变是否有发展成癌症的风险或是否是良性的。但如果模型偏向于某些肤色,，它可能无法识别高风险患者。

In today的 hospitals and clinics, a dermatologist may use an artificial intelligence model for classifying skin lesions to assess if the lesion is at risk of developing into a cancer or if it is benign. But if the model is biased toward certain skin tones, it could fail to identify a high-risk patient.

也许人工智能研究持续面临的最著名、最持久的挑战之一就是偏见。人们经常讨论与训练数据,有关的偏差，但模型架构也可以包含并放大偏差,，从而对现实环境中的模型性能产生负面影响。在高风险的医疗场景,中，表现不佳的真实后果已经使偏见成为一个典型的安全问题。

Perhaps one of the best known and most persistent challenges that AI research continues to reckon with is bias. Bias is often discussed in relation to training data, but model architecture can also contain and amplify bias, negatively influencing model performance in real-world settings. In high-stakes medical scenarios, the very real consequences of poor performance have made bias into a quintessential safety issue.

MIT, 伍斯特理工学院, 和 Google 的研究人员发表的一篇新论文被 2026 年国际学习表示会议接受，提出了一种新颖的去偏方法，称为 “Weighted Rotational DebiasING” (i.e., WRING)，可应用于视觉语言模型(VLMs), 类似 OpenAI的 OpenCLIP。

A new paper from researchers at MIT, Worcester Polytechnic Institute, and Google that was accepted to the 2026 International Conference for Learning Representations proposes a novel debiasing approach called “Weighted Rotational DebiasING” (i.e., WRING) that can be applied to vision language models (VLMs), like OpenAI的 OpenCLIP.

VLM 是多模态模型，可以同时理解和解释不同的数据模态，例如视频, 图像, 和文本。虽然 VLM 的去偏方法确实存在,，但最常用的方法被称为 “ 投影去偏,”，这导致了所谓的 “Whac-A-Mole 困境”,，这是一种经验观察，于 2023 年正式引入人工智能研究。

VLMs are multi-modal models that can understand and interpret different data modalities like video, image, and text simultaneously. While debiasing approaches for VLMs do exist, the most commonly used approach is known as “projection debiasing,” which leads to what has been termed the “Whac-A-Mole dilemma”, an empirical observation that was formally introduced to AI research in 2023.

投影去偏差是一种后处理方法，通过将子空间从关系表示空间,投影“投影%2��，从而消除偏差，从而从模型嵌入中删除不需要的,偏差信息。但这种方法有其缺点。

Projection debiasing is a post-processing approach that removes the undesirable, biased information from model embeddings by “projecting” the subspace out of a representation space of relationships, thereby cutting out the bias. But this approach has its drawbacks.

“当你这样做,时，你会无意中压扁,”周围的一切，论文’的第一作者, Walter Gerych,说，他去年在麻省理工学院作为博士后进行了这项研究。 “当您执行此操作时，模型学习的所有其他关系都会发生变化。”

“When you do that, you inadvertently squish everything around,” says Walter Gerych, the paper的 first author, who conducted this research last year as a postdoc at MIT. “All the other relationships that the model learns change when you do that.”

虽然投影去偏差阻止模型对的从子空间,投影出来的偏差起作用，但它最终可能会放大并产生其他偏差,，因此出现打地鼠困境。根据 Ghassemi, 的说法，模型偏差的意外放大是 “，这都是技术和实际挑战。例如, 当对检索临床工作人员图像的 VLM 进行去偏时 — 如果种族偏见被消除 — 它可能会产生放大性别偏见的意外后果。”

While projection debiasing stops the model from acting upon the bias that的 been projected out of the subspace, it can end up amplifying and creating other biases, hence the Whac-A-Mole dilemma. According to Ghassemi, the unintended amplification of model biases is “both a technical and practical challenge. For instance, when debiasing a VLM that retrieves images of clinical staff — if racial bias is removed — it could have the unintended consequence of amplifying gender bias.”

WRING 的工作原理是将模型 — 的高维空间内的某些坐标（似乎导致偏差 — 的坐标）移动到不同的角度,，因此模型无法再区分某个概念内的不同组。这会改变特定空间内的表示，同时保持模型’的其他关系完好无损。与投影去偏, WRING 一样，, WRING 是一种后处理方法，这意味着它可以将“on 飞行” 应用于预先训练的VLM。

WRING works by moving certain coordinates within the high-dimensional space of a model — the ones that appear to be responsible for bias — to a different angle, so the model can no longer distinguish between different groups within a certain concept. This changes the representation within a specific space while leaving the model的 other relationships intact. And like projection debiasing, WRING is a post-processing approach, which means it can be applied “on the fly” to a pre-trained VLM.

“人们已经花费了大量资源,很多钱,训练这些巨大的模型,，我们并不’t真的想在训练过程中修改一些东西，因为那样你就必须从头开始,” Gerych解释道。 “[WRING ] 非常高效。它不需要需要更多的模型训练，并且是微创的。”

“People already spent a lot of resources, a lot of money, training these huge models, and we don’t really want to go in and modify something during training because then you have to start from scratch,” Gerych explains. “[WRING is] very efficient. It doesn’t require more training of the model and it的 minimally invasive.”

在他们的结果,中，研究人员发现 WRING 显着减少了目标概念的偏差，而没有增加其他领域的偏差。但目前, 该方法在某种程度上仅限于对比语言-图像预训练(CLIP) 模型, 一种将图像连接到语言以进行搜索或分类的 VLM。

In their results, the researchers found that WRING significantly reduced bias for a target concept without increasing bias in other areas. But for now, the approach is somewhat limited to Contrastive Language-Image Pre-training (CLIP) models, a type of VLM that connects images to language for search or classification.

“将此扩展到 ChatGPT 风格, 生成语言模型, 对我们来说是合理的下一步,” Gerych 说。

“Extending this for ChatGPT-style, generative language models, is the reasonable next step for us,” says Gerych.

这项工作得到, 部分, 国家科学基金会职业奖, AI2050 奖早期职业奖学金, 斯隆研究员奖, 戈登和贝蒂摩尔基金会奖, 和麻省理工学院-谷歌计算创新奖的支持。

This work was supported, in part, by a National Science Foundation CAREER Award, AI2050 Award Early Career Fellowship, Sloan Research Fellow Award, the Gordon and Betty Moore Foundation Award, and MIT-Google Computing Innovation Award.