SciLifeLab AI Seminar Series: Arne Elofsson
June 9 @ 14:00 – 15:00 CEST
SciLifeLab Data Centre hosts a seminar series on the topic of applied AI in life science research. SciLifeLab AI Seminar Series combines scientific highlights from SciLifeLab-affiliated researchers and invited experts on the general topic of AI applications in Life Science. The SciLifeLab AI Seminar Series will be held virtually on Zoom and videos will be published openly after the seminars at the SciLifeLab YouTube channel.
Contact: Prof. Ola Spjuth, AI coordinator, SciLifeLab Data Center.
Using deep learning and coevolution to predict protein-protein interactions
Stockholm University and Science for Life Laboratory
In the last decade de novo protein structure prediction accuracy for individual proteins, by the use of co-evolution and deep learning harvesting the information from large multiple sequence alignments. In Casp14 it was shown that the best method can predict the structure for basically all proteins. This information can, in principle, also be used to extract information about protein-protein interaction, but the success has so far been limited to a handful of proteins. However, most of the earlier studies have not used the latest improvements achieved in contact-based predictions using deep learning to predict the distances between residue pairs. Here, we first show that using one of the best residue-residue contact prediction methods (trRosetta) it is possible to simultaneously predict the structure of two proteins and their interaction for some proteins, even when the structure of the monomers are not known.
Secondly, we apply this method to a standard dataset for protein-protein docking and find that the majority of the protein pairs are not docked correctly. By using alternative alignment methods to generate the multiple sequence alignments it is possible to accurately dock more proteins. The average performance is comparable to the use of alternative docking methods, either template based or methods used by shape-complementarity, although no structural information is needed for the individual proteins in a fold-and-dock pipeline. However, the results are complementary as some methods work on some pairs and some on others.
When estimating differences between successful and unsuccessful fold-and-docked protein pairs. We identify that the current method produces artefacts when there exists homology between the interacting proteins. This bottleneck affects approximately one-third of the proteins pairs in our benchmark set. Further, we find that one-third (?) of the proteins have too few sequences in the joint alignment. However, for the remaining third (?) we can not find a good explanation why the docking was not successful.
Finally, we introduce a novel scoring function, PconsDock, that can be used to evaluate the quality of a protein-protein pair. This simple scoring scheme is very accurate as it can separate 98% of the correct and incorrect proteins.