A Celebrated AI has Learned a New Trick: How to do Chemistry

Andriy Onufriyenko/Getty

One AI is thinking like a chemist.

Artificial intelligence has changed the way science is done by allowing researchers to analyze the massive amounts of data modern scientific instruments generate. It can find a needle in a million haystacks of information and, using deep learning, it can learn from the data itself. AI is accelerating advances in gene hunting, medicine, drug design and the creation of organic compounds.

Deep learning uses algorithms, often neural networks that are trained on large amounts of data, to extract information from new data. It is very different from traditional computing with its step-by-step instructions. Rather, it learns from data. Deep learning is far less transparent than traditional computer programming, leaving important questions – what has the system learned, what does it know?

As a chemistry professor I like to design tests that have at least one difficult question that stretches the students’ knowledge to establish whether they can combine different ideas and synthesize new ideas and concepts. We have devised such a question for the poster child of AI advocates, AlphaFold, which has solved the protein-folding problem.

Protein folding

Proteins are present in all living organisms. They provide the cells with structure, catalyze reactions, transport small molecules, digest food and do much more. They are made up of long chains of amino acids like beads on a string. But for a protein to do its job in the cell, it must twist and bend into a complex three-dimensional structure, a process called protein folding. Misfolded proteins can lead to disease.

In his chemistry Nobel acceptance speech in 1972, Christiaan Anfinsen postulated that it should be possible to calculate the three-dimensional structure of a protein from the sequence of its building blocks, the amino acids.

Just as the order and spacing of the letters in this article give it sense and message, so the order of the amino acids determines the protein’s identity and shape, which results in its function.

a graphic showing a thread-like line on the left and a coiled structure on the right
Within milliseconds of the exit of an amino acid chain (left) from the ribosome, it is folded into the lowest-energy 3D shape (right), which is required for the protein’s function. Marc Zimmer, CC BY-ND

Because of the inherent flexibility of the amino acid building blocks, a typical protein can adopt an estimated 10 to the power of 300 different forms. This is a massive number, more than the number of atoms in the universe. Yet within a millisecond every protein in an organism will fold into its very own specific shape – the lowest-energy arrangement of all the chemical bonds that make up the protein. Change just one amino acid in the hundreds of amino acids typically found in a protein and it may misfold and no longer work.

AlphaFold

For 50 years computer scientists have tried to solve the protein-folding problem – with little success. Then in 2016 DeepMind, an AI subsidiary of Google parent Alphabet, initiated its AlphaFold program. It used the protein databank as its training set, which contains the experimentally determined structures of over 150,000 proteins.

In less than five years AlphaFold had the protein-folding problem beat – at least the most useful part of it, namely, determining the protein structure from its amino acid sequence. AlphaFold does not explain how the proteins fold so quickly and accurately. It was a major win for AI, because it not only accrued huge scientific prestige, it also was a major scientific advance that could affect everyone’s lives.

Today, thanks to programs like AlphaFold2 and RoseTTAFold, researchers like me can determine the three-dimensional structure of proteins from the sequence of amino acids that make up the protein – at no cost – in an hour or two. Before AlphaFold2 we had to crystallize the proteins and solve the structures using X-ray crystallography, a process that took months and cost tens of thousands of dollars per structure.

We now also have access to the AlphaFold Protein Structure Database, where Deepmind has deposited the 3D structures of nearly all the proteins found in humans, mice and more than 20 other species. To date they it has solved more than a million structures and plan to add another 100 million structures this year alone. Knowledge of proteins has skyrocketed. The structure of half of all known proteins is likely to be documented by the end of 2022, among them many new unique structures associated with new useful functions.

Thinking like a chemist

AlphaFold2 was not designed to predict how proteins would interact with one another, yet it has been able to model how individual proteins combine to form large complex units composed of multiple proteins. We had a challenging question for AlphaFold – had its structural training set taught it some chemistry? Could it tell whether amino acids would react with one another – a rare yet important occurrence?

I am a computational chemist interested in fluorescent proteins. These are proteins found in hundreds of marine organisms like jellyfish and coral. Their glow can be used to illuminate and study diseases.

two multicolored blobs with bright lines inside them against a black background
Neurons expressing fluorescent proteins reveal the brain structures of two fruit fly larvae. Wen Lu and Vladimir I. Gelfand, Feinberg School of Medicine, Northwestern University

There are 578 fluorescent proteins in the protein databank, of which 10 are “broken” and don’t fluoresce. Proteins rarely attack themselves, a process called autocatalytic posttranslation modification, and it is very difficult to predict which proteins will react with themselves and which ones won’t.

Only a chemist with a significant amount of fluorescent protein knowledge would be able to use the amino acid sequence to find the fluorescent proteins that have the right amino acid sequence to undergo the chemical transformations required to make them fluorescent. When we presented AlphaFold2 with the sequences of 44 fluorescent proteins that are not in the protein databank, it folded the fixed fluorescent proteins differently from the broken ones.

a diagram showing a light bulb on the left and the stem only of a light bulb on the right
AlphaFold2 can take the amino acid sequence of fluorescent proteins (letters at the top) and predict their 3D barrel shapes (middle). This isn’t surprising. What is totally unexpected is that it can also predict which fluorescent proteins are ‘broken’ and can’t fluoresce. Marc Zimmer, CC BY-ND

The result stunned us: AlphaFold2 had learned some chemistry. It had figured out which amino acids in fluorescent proteins do the chemistry that makes them glow. We suspect that the protein databank training set and multiple sequence alignments enable AlphaFold2 to “think” like chemists and look for the amino acids required to react with one another to make the protein fluorescent.

A folding program learning some chemistry from its training set also has wider implications. By asking the right questions, what else can be gained from other deep learning algorithms? Could facial recognition algorithms find hidden markers for diseases? Could algorithms designed to predict spending patterns among consumers also find a propensity for minor theft or deception? And most important, is this capability – and similar leaps in ability in other AI systems – desirable?

The Conversation

Marc Zimmer, Professor of Chemistry, Connecticut College

This article is republished from The Conversation under a Creative Commons license. Read the original article.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.