构建理解化学原理的人工智能模型

在所有可能的化合物中, it的估计1020到1060之间可能具有作为小分子药物的潜力。

Among all of the possible chemical compounds, it的 estimated that between 10²⁰and 10⁶⁰may hold potential as small-molecule drugs.

对于化学家来说，通过实验评估每种化合物都太耗时。因此,近年来,研究人员已经开始使用人工智能来帮助识别可以成为良好候选药物的化合物。

Evaluating each of those compounds experimentally would be far too time-consuming for chemists. So, in recent years, researchers have begun using artificial intelligence to help identify compounds that could make good drug candidates.

“It的是一种非常通用的方法，可以应用于有机分子的任何应用,，但我们考虑的主要应用是小分子药物发现,” 他说。

“It的 a very general approach that could be applied to any application of organic molecules, but the primary application that we think about is small-molecule drug discovery,” he says.

人工智能与科学的交叉点

The intersection of AI and science

作为都柏林, 俄亥俄州, Coley 的一名高中生，他参加了科学奥林匹克竞赛，并于 16 岁时高中毕业。随后，他前往加州理工学院,，在那里他选择了化学工程作为专业，因为它提供了一种将他对科学和数学的兴趣结合起来的方法。

As a high school student in Dublin, Ohio, Coley participated in Science Olympiad competitions and graduated from high school at the age of 16. He then headed to Caltech, where he chose chemical engineering as a major because it offered a way to combine his interests in science and math.

在本科期间,，他还对计算机科学, 产生了兴趣，在结构生物学实验室工作，使用 Fortran 编程语言帮助解决蛋白质的晶体结构。从加州理工学院, 毕业后，他决定继续攻读化学工程，并于 2014 年来到麻省理工学院攻读博士学位。

During his undergraduate years, he also pursued an interest in computer science, working in a structural biology lab using the Fortran programming language to help solve the crystal structure of proteins. After graduating from Caltech, he decided to keep going in chemical engineering and came to MIT in 2014 to start a PhD.

在 Klavs Jensen 和 William Green, Coley 教授的建议下，研究了优化自动化化学反应的方法。他的工作重点是将机器学习和化学信息学结合起来 — 应用计算方法来分析化学数据 — 以规划可以制造新药物分子的反应途径。他还致力于设计可用于自动执行这些反应的硬件。

Advised by professors Klavs Jensen and William Green, Coley worked on ways to optimize automated chemical reactions. His work focused on combining machine learning and cheminformatics — the application of computation methods to analyze chemical data — to plan reaction pathways that could make new drug molecules. He also worked on designing hardware that could be used to perform those reactions automatically.

这项工作的一部分是通过 DARPA 资助的名为 Make-It, 的项目完成的，该项目的重点是利用机器学习和数据科学来改进药物和其他有用化合物从简单构建模块的合成。

Part of that work was done through a DARPA-funded program called Make-It, which was focused on using machine learning and data science to improve the synthesis of medicines and other useful compounds from simple building blocks.

“这是我思考化学信息学的真正切入点,思考机器学习,并思考我们如何使用模型来理解如何制造不同的化学物质以及可能发生什么反应,” Coley说。

“That was my real entry point into thinking about cheminformatics, thinking about machine learning, and thinking about how we can use models to understand how different chemicals can be made and what reactions are possible,” Coley says.

Coley 在还是研究生, 时就开始申请教师职位，并在 25 岁时接受了 MIT 的录用通知。他在他读研究生的同一所学校收到了赞成和反对的建议,，并最终决定 MIT 的职位太诱人，无法拒绝。

Coley began applying for faculty jobs while still a graduate student, and accepted an offer from MIT at age 25. He received a mix of advice for and against taking a job at the same school where he went to graduate school, and eventually decided that a position at MIT was too enticing to turn down.

“MIT 就资源和跨部门流动性而言是一个非常特殊的地方。他说，麻省理工学院似乎在支持人工智能和科学的交叉方面做得非常好,，并且这是一个充满活力的生态系统，可以留在,”。 “学生的素质,学生的热情,以及令人难以置信的合作力量绝对超过了留在同一个地方的任何潜在担忧。”

“MIT is a very special place in terms of the resources and the fluidity across departments. MIT seemed to be doing a really good job supporting the intersection of AI and science, and it was a vibrant ecosystem to stay in,” he says. “The caliber of students, the enthusiasm of the students, and just the incredible strength of collaborations definitely outweighed any potential concerns of staying in the same place.”

Coley 将教职职位推迟一年，前往 Broad Institute, 攻读博士后，在那里他寻求更多化学生物学和药物发现方面的经验。 , 他致力于研究从 DNA 编码文库, 中数十亿个候选分子中识别小分子, 的方法，这些小分子, 可能与疾病相关的突变蛋白具有结合相互作用。

Coley deferred the faculty position for one year to do a postdoc at the Broad Institute, where he sought more experience in chemical biology and drug discovery. There, he worked on ways to identify small molecules, from billions of candidates in DNA-encoded libraries, that might have binding interactions with mutated proteins associated with diseases.

2020 年, 返回麻省理工学院后，他建立了自己的实验室小组，其使命是部署人工智能，不仅合成具有治疗潜力, 的现有化合物，而且设计具有所需特性的新分子以及制造它们的新方法。在过去的几年中, 他的实验室开发了多种计算方法来实现这些目标。

After returning to MIT in 2020, he built his lab group with the mission of deploying AI not only to synthesize existing compounds with therapeutic potential, but also to design new molecules with desirable properties and new ways to make them. Over the past few years, his lab has developed a variety of computational approaches to tackle those goals.

“我们尝试思考如何最好地将化学挑战与潜在的计算解决方案结合起来。通常，这种配对会激发新方法的开发,” Coley 说。他的实验室开发了, 模型，称为 ShEPhERD,，经过训练，可以根据药物分子三维形状与目标蛋白, 的相互作用来评估潜在的新药物分子。该模型现在被制药公司用来帮助他们发现新药。

“We try to think about how to best pair a challenge in chemistry with a potential computational solution. And often that pairing motivates the development of new methods,” Coley says. One model his lab has developed, known as ShEPhERD, was trained to evaluate potential new drug molecules based on how they will interact with target proteins, based on the drug molecules three-dimensional shapes. This model is now being used by pharmaceutical companies to help them discover new drugs.

“我们’正在尝试为生成模型提供更多药物化学直觉,，以便模型了解正确的标准和考虑因素,” Coley 说。

“We’re trying to give more of a medicinal chemistry intuition to the generative model, so the model is aware of the right criteria and considerations,” Coley says.

在另一个项目, Coley的实验室中，开发了一种名为 FlowER, 的生成式 AI 模型，该模型可用于预测组合不同化学输入所产生的反应产物。

In another project, Coley的 lab developed a generative AI model called FlowER, which can be used to predict the reaction products that will result from combining different chemical inputs.

在设计模型, 时，研究人员建立了对基本物理原理, 的理解，例如质量守恒定律。他们还迫使模型考虑从反应物到产物的路径中需要发生的中间步骤的可行性。研究人员发现，这些约束, 提高了模型的预测的准确性。

In designing that model, the researchers built in an understanding of fundamental physical principles, such as the law of conservation of mass. They also compelled the model to consider the feasibility of the intermediate steps that need to take place on the pathway from reactants to products. These constraints, the researchers found, improved the accuracy of the model的 predictions.

“思考那些中间步骤,涉及的机制,以及反应如何演变是化学家非常自然地做的事情。它的化学是如何教授,的，但它的不是模型固有地思考,” Coley说的。 “我们’花了很多时间思考如何确保我们的机器学习模型以对反应机制的理解为基础,，就像专业化学家一样。”

“Thinking about those intermediate steps, the mechanisms involved, and how the reaction evolves is something that chemists do very naturally. It的 how chemistry is taught, but it的 not something that models inherently think about,” Coley says. “We’ve spent a lot of time thinking about how to make sure that our machine-learning models are grounded in an understanding of reaction mechanisms, in the same way an expert chemist would be.”

“通过这些不同的研究线索,，我们希望推进化学领域人工智能的前沿,” Coley 说。

“Through these many different research threads, we hope to advance the frontier of AI in chemistry,” Coley says.