Discriminating physiological from non-physiological interfaces in structures of protein complexes: A community-wide study

Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.

Saved in:
Bibliographic Details
Main Authors: Schweke, Hugo, Xu, Qifang, Tauriello, Gerardo, Pantolini, Lorenzo, Schwede, Torsten, Cazals, Frédéric, Lhéritier, Alix, Fernández-Recio, Juan, Rodríguez-Lumbreras, Luis A., Schueler-Furman, Ora, Varga, Julia K., Jiménez-García, Brian, Réau, Manon F., Bonvin, Alexandre M. J. J., Savojardo, Castrense, Martelli, Pier-Luigi, Casadio, Rita, Tubiana, Jérôme, Wolfson, Haim J., Oliva, Romina, Barradas-Bautista, Didier, Ricciardelli, Tiziana, Cavallo, Luigi, Venclovas, Česlovas, Olechnovič, Kliment, Guerois, Raphael, Andreani, Jessica, Martin, Juliette, Wang, Xiao, Terashi, Genki, Sarkar, Daipayan, Christoffer, Charles, Aderinwale, Tunde, Verburgt, Jacob, Kihara, Daisuke, Marchand, Anthony, Correia, Bruno E., Duan, Rui, Qiu, Liming, Xu, Xianjin, Zhang, Shuang, Zou, Xiaoqin, Dey, Sucharita, Dunbrack, Roland L., Levy, Emmanuel D., Wodak, Shoshana J.
Other Authors: National Institutes of Health (US)
Format: artículo biblioteca
Language:English
Published: Wiley-VCH 2023-09
Subjects:Crystal contacts, Homodimers, Potential energy, Protein interactions, Protein structure,
Online Access:http://hdl.handle.net/10261/347702
http://dx.doi.org/10.13039/100000001
http://dx.doi.org/10.13039/501100011033
http://dx.doi.org/10.13039/501100002809
http://dx.doi.org/10.13039/501100000781
http://dx.doi.org/10.13039/501100000780
http://dx.doi.org/10.13039/100000002
http://dx.doi.org/10.13039/501100003977
http://dx.doi.org/10.13039/501100003973
https://api.elsevier.com/content/abstract/scopus_id/85162927714
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Reliably scoring and ranking candidate models of protein complexes and assigning their oligomeric state from the structure of the crystal lattice represent outstanding challenges. A community-wide effort was launched to tackle these challenges. The latest resources on protein complexes and interfaces were exploited to derive a benchmark dataset consisting of 1677 homodimer protein crystal structures, including a balanced mix of physiological and non-physiological complexes. The non-physiological complexes in the benchmark were selected to bury a similar or larger interface area than their physiological counterparts, making it more difficult for scoring functions to differentiate between them. Next, 252 functions for scoring protein-protein interfaces previously developed by 13 groups were collected and evaluated for their ability to discriminate between physiological and non-physiological complexes. A simple consensus score generated using the best performing score of each of the 13 groups, and a cross-validated Random Forest (RF) classifier were created. Both approaches showed excellent performance, with an area under the Receiver Operating Characteristic (ROC) curve of 0.93 and 0.94, respectively, outperforming individual scores developed by different groups. Additionally, AlphaFold2 engines recalled the physiological dimers with significantly higher accuracy than the non-physiological set, lending support to the reliability of our benchmark dataset annotations. Optimizing the combined power of interface scoring functions and evaluating it on challenging benchmark datasets appears to be a promising strategy.