AI-based predictions of the three-dimensional structures of nearly every cataloged protein known to science have been made by DeepMind and the EMBL European Bioinformatics Institute (EMBL-EBI). The catalog is freely and openly accessible to the scientific community, via the AlphaFold Protein Structure Database.
Both organizations hope the expanded database will continue to increase our understanding of biology, helping countless other scientists in their work as they strive to address global challenges.
This major milestone marks the expansion of the database by approximately 200 times. It has grown from almost a million protein structures to more than 200 million and now covers almost every organism on Earth whose genome has been sequenced. Predicted structures for a wide range of species, including plants, bacteria, animals and other organisms, are now included in the expanded database. This opens new avenues of research in the life sciences that will impact global challenges including sustainability, food insecurity and neglected diseases.
Now a predicted structure will be available for virtually every protein sequence in the UniProt protein database. This release will also open up new avenues of research, including supporting bioinformatics and computational work by allowing scientists to potentially spot patterns and trends in the database.
“AlphaFold now offers a 3D view of the protein universe,” said Edith Heard, CEO of EMBL. “The popularity and growth of the AlphaFold database is testament to the success of the collaboration between DeepMind and EMBL. This shows us a glimpse of the power of multidisciplinary science.
“We were surprised by how quickly AlphaFold has already become an essential tool for hundreds of thousands of scientists in laboratories and universities around the world,” said Demis Hassabis, founder and CEO of DeepMind. “From fighting disease to tackling plastic pollution, AlphaFold has already enabled incredible impact on some of our biggest global challenges. Our hope is that this expanded database will help countless other scientists in their important work and open up whole new avenues of scientific discovery.
An indispensable tool for scientists
DeepMind and EMBL-EBI spear the AlphaFold database in July 2021. At that time, it contained over 350,000 protein structure predictions, including the entire human proteome. Subsequent updates saw the addition of UniProtKB/SwissProt and 27 new proteomes, 17 of which represent neglected tropical diseases that continue to devastate the lives of more than a billion people around the world.
Over 1,000 scientific papers have cited the database and over 500,000 researchers from over 190 countries have accessed the AlphaFold database to view over two million structures in just over a year.
The team has also seen researchers rely on AlphaFold to create and adapt tools such as Fallback search and Dali which allow users to search for entries similar to a given protein. Others have adopted the fundamental machine learning ideas behind AlphaFold, forming the backbone of a list of new algorithms in this space, or applying them to areas such as RNA structure prediction Where develop new models to design proteins.
Impact and future of AlphaFold and the database
AlphaFold has also shown its impact in areas such as improving our ability to fight against plastic pollutionget an idea of Parkinson’s diseaseincreasing the bee healthagreement how does ice formtackle neglected diseases such as Chagas disease and leishmaniasis, and explore human evolution.
“We started AlphaFold in the hope that other teams could learn and build on the progress we’ve made, and it’s been exciting to see this happen so quickly. Many other AI research organizations have now entered the field and are building on the advances of AlphaFold to create new breakthroughs. This is truly a new era in structural biology, and AI-based methods are going to drive incredible progress,” said John Jumper, Research Scientist and AlphaFold Lead at DeepMind.
“AlphaFold has sent ripples through the molecular biology community. In the past year alone, there have been over a thousand scientific papers on a wide range of research topics that utilize AlphaFold structures; I’ve never seen anything like it,” said Sameer Velankar, Team Leader at EMBL-EBI’s Protein Database in Europe. “And that’s just the impact of a million predictions; imagine the impact of having over 200 million freely accessible protein structure predictions in the AlphaFold database.
DeepMind and EMBL-EBI will continue to periodically update the database, with the aim of improving features and functionality in response to user feedback. Access to structures will continue to be fully open, under a CC-BY 4.0 license, and bulk downloads will be available via Google Cloud Public Datasets.