We know almost a billion protein structures. Scientists from Krakow have created a unique tool.

- Scientists from Sano and the Jagiellonian University have proven that the world's largest databases of computer models of protein structures complement each other
- The project created an interactive web tool that allows visual viewing of huge amounts of data
- - We believe that our platform will contribute to the creation of better diagnostic and therapeutic methods - added Dr. Paweł Szczerbiak from Sano
"In just two years, the number of known protein structures has increased from 200,000 to nearly a billion. Our goal was not only to organize this data, but to create a tool that will help us better understand protein biology," said Dr. Tomasz Kościołek, co-author of the paper and leader of the Structural and Functional Genomics team at the Sano Center for Computational Medicine, quoted in the release.
As mentioned, proteins are the basic building blocks and functional components of every cell. They are responsible for a vast number of processes—from digestion and cellular respiration to immunity and tissue repair. Their function depends on the spatial shape—structure—created by the folding of a chain of amino acids.
Understanding what these structures look like and what functions they have is crucial in biology and medicine – for example, when designing new drugs that must precisely match specific proteins in the body.
Innovation in protein data analysisScientists from Sano and the Jagiellonian University have demonstrated that the world's largest databases of computer models of protein structures – including the renowned AlphaFold database (winner of the 2024 Nobel Prize in Chemistry) – complement each other. Combined into a single system, they create a coherent framework for analyzing protein structures and functions.
The project created an interactive web tool that allows visual browsing of vast amounts of data. Users can intuitively analyze the relationship between protein shape and its role in the body.
A new approach to discovering protein functionsAs we read, this is the next stage of the work of the team that published research on microbiome proteins in the prestigious journal Nature Communications in 2023. Currently, the scientists have integrated data from the three largest sources of predicted protein structures: AlphaFold, ESMAtlas , and the Microbiome Immunity Project .
Researchers have shown that, despite their diverse sources, these data are organized in a logical manner—proteins with similar functions often share similar structures and occur in similar regions of the data space. This phenomenon is called "locality of function."
The research utilized artificial intelligence techniques such as deepFRI, which allow for the prediction of protein function even when their sequences are dissimilar to previously known examples. This allowed the identification of previously unknown protein variants that may be involved in, for example, lipid transport or biochemical reactions in organisms living in extreme conditions.
Access for the entire scientific community- We believe that our platform will contribute to the creation of better diagnostic and therapeutic methods, and will also help understand what molecular life looks like at the level of protein structures - added Dr. Paweł Szczerbiak from Sano, the lead author of the paper.
According to the researchers, the new platform allows researchers around the world to use this data for scientific purposes—from microbiome studies and protein evolution to designing new drugs and therapies. Thanks to its visual approach, this tool can also support education and a better understanding of the complexities of molecular biology.
The project was created in international cooperation with the Flatiron Institute (USA) and the Open Molecular Software Foundation (USA), and the authors include researchers from the Jagiellonian University in Krakow.
Copyrighted material - reprint rules are specified in the regulations .
rynekzdrowia