This year’s Nobel Prize in Chemistry is all about proteins, the chemical building blocks and tools that virtually all living creatures use to function and thrive. The human body builds proteins from DNA to use for countless essential processes. 

First, genes must overcome different processing and translation steps to produce proteins with specific functions, a theory known as the central dogma of biology. For many years, scientists did not have the tools necessary to predict what proteins are going to be built based on “untranslated” information, and whether we could dictate the design of proteins themselves. 

Now, with the help of machine learning and artificial intelligence (AI) from this year’s winners — Demis Hassabis, John Jumper, and David Baker, — we can design and predict proteins with precision and impact. 

The importance of proteins

Proteins are the body’s classic overachievers. These building blocks for life form the physical foundation of hormones, antibodies, transporters, and enzymes, which all aid in chemical reactions necessary to sustain life, such as replicating DNA, cell communication, and breaking down food in the stomach. Ever the workaholics, these molecules are essential to our bodily processes’ structure, function, and regulation. A protein takes shape when amino acids — the proteins’ building blocks whose sequence is coded for by our genes — are strung together in a particular order, which dictates how the protein folds and, consequently, its function. 

If you’re a fan of breathing, you’ll need to have the proper folding of hemoglobin, an oxygen-carrying protein found in red blood cells. Improper folding can have disastrous repercussions — such as illnesses like sickle cell anemia, in which a single amino acid is substituted for another due to a genetic mutation. This switch results in a sickle shape, as opposed to the typical disc shape, a severe structural defect that impacts the blood’s ability to circulate freely, sometimes resulting in a stroke due to blocked blood flow to the brain. 

Using AI to solve Levinthal’s paradox

However, when it comes to predicting the structure of a protein based on its amino acids, a problem known as Levinthal’s paradox emerges. A protein is composed of a hundred amino acids, each with the ability to adopt three unique conformations. Manually creating every possible protein structure by brute force calculation would take 1010 years. That’s longer than the age of our universe and much longer than the time it takes for a cell to create a protein, which is a matter of minutes. 

This paradox begs the question: how can scientists accurately derive protein structure based solely on its amino acid sequence? Many researchers tackled this paradox, but despite their herculean efforts, the best prediction software yielded results that were 40 per cent accurate at most. It seemed the issue would remain unresolved until a new player arrived on the scene. 

Demis Hassabis is the CEO of Google’s DeepMind, an interdisciplinary AI research lab. He developed AlphaFold — an AI system that allows scientists to predict the three-dimensional coordinates of all atoms for a given protein — along with his colleague, physicist John Jumper. The team employed AI and artificial neural networks to take a data set, group it by common characteristics, and derive relationships between these similarities to form an output; in this case, a protein structure. 

Designing proteins and real-life implications

Seventeen years before the birth of AlphaFold, biochemist David Baker and his lab at the University of Washington created the Rosetta program, which is a software capable of predicting protein structures with some success. Eventually, Baker and his team decided to reverse the process, inputting protein structures into the software to determine the necessary amino acid sequence. Through this novel approach, proteins can be engineered using their desired shapes, giving scientists the power to novel proteins with different functions. 

AlphaFold and Rosetta have enabled scientists to tackle pressing problems in science beyond proteins. In the field of waste management, plastic decomposing enzymes have been created because of these innovations, effectively removing the time barrier to identifying the correct amino acid sequence for this novel technology. The treatment and recycling of plastic waste is essential to protect the environment, and biological enzymes can degrade waste into reusable parts that can be used to synthesize new plastics with a high degree of efficiency. 

Through this technology, researchers can create more effective vaccines by attaching virus-like proteins or immunogens, to particles that the body’s immune system can recognize and protect itself from that virus in the future. By maximizing the immunogens’ visibility, our immune system can more effectively protect us from becoming sick. Scientists use the Rosetta software to create precise and large structures, thereby enhancing the body’s response to respiratory viruses. 

Life itself would be inconceivable without the existence of proteins. The 2024 Nobel Prize in Chemistry rightfully recognizes the limitless benefits of creating and predicting the structure of proteins for humanity. The combined achievements of these scientists are a testament to the critical role of large language models and machine learning in revolutionizing the future of scientific discovery.