人工智能已经证明它可以加速药物开发并提高我们对疾病的理解。但为了将人工智能转化为新颖的治疗方法,我们需要将最新,最强大的模型交到科学家手中。
Artificial intelligence is already proving it can accelerate drug development and improve our understanding of disease. But to turn AI into novel treatments we need to get the latest, most powerful models into the hands of scientists.
问题是大多数科学家都是’t 机器学习专家。现在,OpenProtein.AI 公司正在通过一个无代码平台帮助科学家保持在人工智能的前沿,该平台使他们能够访问强大的基础模型和一套用于设计蛋白质,、预测蛋白质结构和功能, 以及训练模型的工具。
The problem is that most scientists aren’t machine-learning experts. Now the company OpenProtein.AI is helping scientists stay on the cutting edge of AI with a no-code platform that gives them access to powerful foundation models and a suite of tools for designing proteins, predicting protein structure and function, and training models.
由 Tristan Bepler 博士 ’20 和前麻省理工学院副教授 Tim Lu 博士 ’07, 创立的公司, 已经为各种规模的制药和生物技术公司的研究人员配备了其工具,,包括内部开发的蛋白质工程基础模型。 OpenProtein.AI 还向学术界的科学家免费提供其平台。
The company, founded by Tristan Bepler PhD ’20 and former MIT associate professor Tim Lu PhD ’07, is already equipping researchers in pharmaceutical and biotech companies of all sizes with its tools, including internally developed foundation models for protein engineering. OpenProtein.AI also offers its platform to scientists in academia for free.
“It 现在是一个非常令人兴奋的时刻,因为这些模型不仅可以使蛋白质工程更加高效 — 从而缩短治疗和工业用途的开发周期 — 它们还可以增强我们设计具有特定特征的新蛋白质的能力,” Bepler 说。 “We’re 也在考虑将这些方法应用于非蛋白质模式。总体而言,我们’正在创建一种用于描述生物系统的语言。”
“It的 a really exciting time right now because these models can not only make protein engineering more efficient — which shortens development cycles for therapeutics and industrial uses — they can also enhance our ability to design new proteins with specific traits,” Bepler says. “We’re also thinking about applying these approaches to non-protein modalities. The big picture is we’re creating a language for describing biological systems.”
Bepler 于 2014 年来到麻省理工学院,作为计算和系统生物学博士项目, 的一部分,在 Bonnie Berger, MIT的 Simons 应用数学教授的指导下学习。正是在那里,他意识到我们对构成生物学基础的分子知之甚少。
Bepler came to MIT in 2014 as part of the Computational and Systems Biology PhD Program, studying under Bonnie Berger, MIT的 Simons Professor of Applied Mathematics. It was there that he realized how little we understand about the molecules that make up the building blocks of biology.
“我们没有’t足够好地表征生物分子和蛋白质,无法创建良好的预测模型,说,整个基因组电路将做什么,或蛋白质相互作用网络将如何表现,” Bepler回忆道。 “这让我对更精细地了解蛋白质产生了兴趣。”
“We hadn’t characterized biomolecules and proteins well enough to create good predictive models of what, say, a whole genome circuit will do, or how a protein interaction network will behave,” Bepler recalls. “It got me interested in understanding proteins at a more fine-grained level.”
贝普勒开始探索通过分析进化数据来预测构成蛋白质的氨基酸链的方法。这是在 Google 发布 AlphaFold,(一种强大的蛋白质结构预测模型)之前。这项工作催生了第一个用于理解和设计蛋白质 — 的生成人工智能模型,该团队称之为蛋白质语言模型。
Bepler began exploring ways to predict the chains of amino acids that make up proteins by analyzing evolutionary data. This was before Google released AlphaFold, a powerful prediction model for protein structure. The work led to one of the first generative AI models for understanding and designing proteins — what the team calls a protein language model.
“I 对蛋白质的经典框架及其序列, 结构, 和功能之间的关系感到非常兴奋。我们’不太了解这些链接,” Bepler 说。 “那么我们如何使用这些基础模型来跳过‘结构’组件并直接从序列到函数?”
“I was really excited about the classical framework of proteins and the relationships between their sequence, structure, and function. We don’t understand those links well,” Bepler says. “So how could we use these foundation models to skip the ‘structure component and go straight from sequence to function?”
“ 正是在这个时候,人工智能与生物学相结合的想法开始兴起,” Lu 回忆道。 “Tristan 帮助我们建立了更好的生物设计计算模型。我们还意识到的可用的最尖端工具与生物学家,之间存在脱节,他们喜欢使用这些东西但不’不知道如何编码。 OpenProtein 源于扩大对这些工具的访问范围的想法。”
“This was around the time when the idea of integrating AI with biology was starting to pick up,” Lu recalls. “Tristan helped us build better computational models for biologic design. We also realized there的 a disconnect between the most cutting-edge tools available and the biologists, who would love to use these things but don’t know how to code. OpenProtein came from the idea of broadening access to these tools.”
贝普勒在攻读博士学位时曾在人工智能的前沿工作。他知道这项技术可以帮助科学家加速他们的工作。
Bepler had worked at the forefront of AI as part of his PhD. He knew the technology could help scientists accelerate their work.
“我们最初的想法是构建一个通用平台,用于进行机器学习循环蛋白质工程,” Bepler 说。 “我们想要构建一些用户友好的东西,因为机器学习的想法有点深奥。它们需要实现, GPU, 微调, 序列设计库。尤其是在那个时候, 对于生物学家来说需要学习的东西很多。”
“We started with the idea to build a general-purpose platform for doing machine learning-in-the-loop protein engineering,” Bepler says. “We wanted to build something that was user friendly because machine-learning ideas are kind of esoteric. They require implementation, GPUs, fine-tuning, designing libraries of sequences. Especially at that time, it was a lot for biologists to learn.”
OpenProtein的 平台, 相比之下, 具有直观的 Web 界面,供生物学家上传数据并通过机器学习进行蛋白质工程工作。它具有一系列开源模型,,包括 PoET, OpenProtein的 旗舰蛋白质语言模型。
OpenProtein的 platform, in contrast, features an intuitive web interface for biologists to upload data and conduct protein engineering work with machine learning. It features a range of open-source models, including PoET, OpenProtein的 flagship protein language model.
PoET, 是 Protein Evolutionary Transformer, 的缩写,在蛋白质组上进行训练以生成相关蛋白质组。 Bepler 和他的合作者表明,它可以概括蛋白质的进化限制,并整合蛋白质序列的新信息,而无需重新训练,,从而允许其他研究人员添加实验数据来改进模型。
PoET, short for Protein Evolutionary Transformer, was trained on protein groups to generate sets of related proteins. Bepler and his collaborators showed it could generalize about evolutionary constraints on proteins and incorporate new information on protein sequences without retraining, allowing other researchers to add experimental data to improve the model.
“研究人员可以使用自己的数据来训练模型并优化蛋白质序列,,然后他们可以使用我们的其他工具来分析这些蛋白质,” Bepler 说。 “人们正在计算机[on计算机]中生成蛋白质序列库,然后通过预测模型运行它们以获得验证和结构预测因子。它 基本上是一个无代码前端,,但我们也为想要使用代码访问它的人提供 API。”
“Researchers can use their own data to train models and optimize protein sequences, and then they can use our other tools to analyze those proteins,” Bepler says. “People are generating libraries of protein sequences in silico [on computers] and then running them through predictive models to get validation and structural predictors. It的 basically a no-code front-end, but we also have APIs for people who want to access it with code.”
这些模型帮助研究人员更快地设计蛋白质,,然后决定哪些蛋白质有足够的前景进行进一步的实验室测试。研究人员还可以输入感兴趣的蛋白质,,模型可以生成具有相似特性的新蛋白质。
The models help researchers design proteins faster, then decide which ones are promising enough for further lab testing. Researchers can also input proteins of interest, and the models can generate new ones with similar properties.
自成立以来, OpenProtein的 团队不断为研究人员在其平台上添加工具,无论其实验室规模或资源如何。
Since its founding, OpenProtein的 team has continued to add tools to its platform for researchers regardless of their lab size or resources.
“We’我们非常努力地使该平台成为一个开放式工具箱,” Bepler 说。 “它有特定的工作流程,,但它’不专门与一种蛋白质功能或蛋白质类别相关。这些模型的优点之一是它们非常擅长广泛地理解蛋白质。他们了解可能的蛋白质的整个空间。”
“We’ve tried really hard to make the platform an open-ended toolbox,” Bepler says. “It has specific workflows, but it的 not tied specifically to one protein function or class of proteins. One of the great things about these models is they are very good at understanding proteins broadly. They learn about the whole space of possible proteins.”
实现下一代疗法
Enabling the next generation of therapies
大型制药公司勃林格殷格翰 (Boehringer Ingelheim) 于 2025 年初开始使用 OpenProtein的 平台。最近,,两家公司宣布扩大合作,将 OpenProtein的 平台和模型嵌入勃林格殷格翰 (Boehringer Ingelheim)的 中,设计蛋白质来治疗癌症、自身免疫或炎症等疾病。
The large pharmaceutical company Boehringer Ingelheim began using OpenProtein的 platform in early 2025. Recently, the companies announced an expanded collaboration that will see OpenProtein的 platform and models embedded into Boehringer Ingelheim的 work as it engineers proteins to treat diseases like cancer and autoimmune or inflammatory conditions.
去年, OpenProtein 还发布了其蛋白质语言模型, PoET-2, 的新版本,该模型在使用一小部分计算资源和实验数据的情况下优于更大的模型。
Last year, OpenProtein also released a new version of its protein language model, PoET-2, that outperforms much larger models while using a small fraction of the computing resources and experimental data.
“我们真的想解决如何描述蛋白质的问题,” Bepler 说。 “什么的我们在生成蛋白质约束时使用的有意义的,特定领域的蛋白质约束语言?我们如何引入更多的进化约束?我们如何描述蛋白质进行的酶促反应,以便模型可以生成执行该反应的序列?”
“We really want to solve the question of how we describe proteins,” Bepler says. “What的 the meaningful, domain-specific language of protein constraints we use as we generate them? How can we bring in more evolutionary constraints? How can we describe an enzymatic reaction a protein carries out such that a model can generate sequences to do that reaction?”
展望未来,,创始人希望建立能够考虑蛋白质功能不断变化的, 互连性质的模型。
Moving forward, the founders are hoping to make models that factor in the changing, interconnected nature of protein function.
“我感兴趣的领域是超越蛋白质结合事件,使用这些模型来预测和设计动态特征,,其中蛋白质必须同时参与两个,、三个,或四个生物机制,,或者在结合后改变其功能,”,目前担任该公司顾问的Lu,说道。
“The area I am excited about is going beyond protein binding events to use these models to predict and design dynamic features, where the protein has to engage two, three, or four biological mechanisms at the same time, or change its function after binding,” says Lu, who currently serves in an advisory role for the company.
随着人工智能的不断进步,, OpenProtein 继续将其使命视为为科学家提供最佳工具,以更快地开发新疗法。
As progress in AI races forward, OpenProtein continues to see its mission as giving scientists the best tools to develop new treatments faster.
“A 的工作变得更加复杂,,结合蛋白质逻辑和动态疗法等方法, 现有的实验工具集变得有限,” Lu 说。 “It的 对于围绕人工智能和生物学创建开放的生态系统非常重要。存在人工智能资源过于集中以至于普通研究人员无法使用它们的风险。开放获取对于科学领域取得进步非常重要。”
“As work gets more complex, with approaches incorporating things like protein logic and dynamic therapies, the existing experimental toolsets become limiting,” Lu says. “It的 really important to create open ecosystems around AI and biology. There的 a risk that AI resources could get so concentrated that the average researcher can’t use them. Open access is super important for the scientific field to make progress.”