AI predicts the shape of nearly every protein known to science

In 2020, an artificial intelligence lab called DeepMind unveiled technology that can predict the shape of proteins, the microscopic mechanisms that govern the behavior of the human body and all other living things.

A year later, the lab shared a tool called AlphaFold with scientists and released predictive shapes for over 350,000 proteins, including all proteins expressed by the human genome. This immediately changed the course of biological research. If scientists can identify the shapes of proteins, they can accelerate the understanding of disease, create new drugs, and explore the mysteries of life on Earth in other ways.

Now DeepMind has published predictions for almost every protein known to science. On Thursday, the London-based lab, owned by the same parent company as Google, said it had added more than 200 million predictions to an online database freely available to scientists around the world.

With this new release, the scientists behind DeepMind hope to accelerate research into obscure organisms and open up a new field called metaproteomics.

“Now scientists can explore this entire database and look for patterns — correlations between species and evolutionary patterns that might not have been obvious until now,” Demis Hassabis, CEO of DeepMind, said in a telephone interview.

Proteins start as chains of chemical compounds, then twist and fold into three-dimensional shapes that determine how these molecules bind to others. If scientists can pinpoint the shape of a particular protein, they can decipher how it works.

This knowledge is often a vital part of disease and disease control. For example, bacteria are resistant to antibiotics by expressing certain proteins. If scientists can understand how these proteins work, they can begin to fight antibiotic resistance.

In the past, determining the shape of a protein required extensive experimentation using X-rays, microscopes, and other instruments on a lab bench. Now, knowing the range of chemical compounds that make up a protein, AlphaFold can predict its shape.

The technology isn’t perfect. But it can predict the shape of a protein with an accuracy comparable to physics experiments about 63 percent of the time, according to independent control tests. With a forecast in hand, scientists can test its accuracy relatively quickly.

Clement Verba, a UC San Francisco researcher who uses the technology to understand the coronavirus and prepare for similar pandemics, said the technology has “recharged” that work, often saving months of experimentation time. Others have used this tool to fight gastroenteritis, malaria, and Parkinson’s disease.

The technology has also accelerated research beyond the human body, including efforts to improve the health of honey bees. The expanded DeepMind database can help an even larger community of scientists achieve similar benefits.

Like dr. Hassabis, Dr. Verba believes the database will provide new ways to understand how proteins behave across species. He also sees it as a way to train a new generation of scientists. Not all researchers understand this kind of structural biology; a database of all known proteins lowers the entry bar. “It could bring structural biology to the masses,” says the doctor. Verba said.