Computational protein design

Collaborators: Thomas Simonson and David Mignon
Laboratory: Laboratoire de Biologie Structurale de la Cellule
Institution: École Polytechnique

Protein design aims to create new proteins or modify existing ones to achieve a specific function. Computational approaches are valuable in protein design, helping to rationalize predictions and guide experimental testing. Computational protein design (CPD) has prompted significant methodological efforts and achieved spectacular successes, such as the creation of a protein with a new fold or the engineering of enzyme active sites. The main difficulty of CPD lies in the astronomical number of possible sequences and conformations, on the order of (20 × 10)¹⁰⁰ for a protein with 100 amino acids. Another key element for the success of CPD is the energy function used to evaluate and select sequences and conformations.

The laboratory's structural bioinformatics team has been working on CPD for around 20 years and has developed a software called Proteus (https://proteus.polytechnique.fr), following a physics-inspired approach. It is based on an atomic model of the protein structure and a molecular mechanics energy function. An important aspect is the treatment of the solvent, represented by a dielectric continuum with a generalized Born term, supplemented by a term proportional to the solvent accessible surface area. The particularities of the implementation are: 1) the protein backbone is kept fixed, 2) the conformational space of the side chains is reduced to a discrete library of rotamers, 3) the energy function is decomposed into pairs of interactions. The first step is to calculate a matrix of interactions between each pair of rotamers. In the second step, the sequence and conformation space is explored with an optimization algorithm. Energy evaluations are fast in this second step thanks to the precalculation of the energy matrix. Proteus can handle a wide variety of problems. It has been applied, among others, to side chain prediction, mutant stability prediction, fold recognition, pKa predictions, redesign of full protein sequences, and enzyme active site engineering.

In this project, I contributed more specifically to methodological developments, particularly work on the energy function, solvation models, their decomposition into pairs, and the implementation of energy matrix calculation. In addition, I conducted applications of these models to various CPD problems, with an evaluation of their performance and of the contribution of their components.

sidechain prediction of 2IGD core — **2IGD**

sidechain prediction of 1KF5 core — **2IGD**

sequence design of SH3 domain cores — Examples of full protein sequence design, taken from Gaillard & Simonson [2017]. The core positions are shown. Sequences are indicated in logo form. Predicted sequences are compared to the native sequences and to the Pfam profile of SH3 domains. Predictions have been obtained with the Proteus program, using an energy function of MMGBSA type, with the all-atom AMBER 99SB force field, an internal dielectric constant of 8 and pairwise decomposable GB and SA terms.

transition state model of amino acid adenylation — A)

transition state model of L-Tyr adenylation — A)

References

Transition state-based computational enzyme design.
T. Gaillard, T. Simonson
in S. M. Kahn, F. Pazos (Ed.), Methods in Molecular Biology: Protein Design and Evolution.
Humana, New York, 2026.
doi:10.1007/978-1-0716-4828-5_11
Improved Physics-Based Single-Position Protein Sequence Redesign with a Residue-Pairwise Generalized Born Model.
T. Gaillard^*
Journal of Physical Chemistry B, 2025, 129, 10699-10710.
doi:10.1021/acs.jpcb.5c03662
Physics-Based Computational Protein Design: An Update.
D. Mignon, K. Druart, E. Michael, V. Opuu, S. Polydorides, F. Villa, T. Gaillard, N. Panel, G. Archontis, T. Simonson^*
Journal of Physical Chemistry A, 2020, 124, 10637-10648.
doi:10.1021/acs.jpca.0c07605
Adaptive Landscape Flattening Allows the Design of Both Enzyme:Substrate Binding and Catalytic Power.
V. Opuu, G. Nigro, T. Gaillard, E. Schmitt, Y. Mechulam, T. Simonson^*
PLOS Computational Biology, 2020, 16, e1007600.
doi:10.1371/journal.pcbi.1007600
Full protein sequence redesign with an MMGBSA energy function.
T. Gaillard^*, T. Simonson^*
Journal of Chemical Theory and Computation, 2017, 13, 4932-4943.
doi:10.1021/acs.jctc.7b00202
Protein side chain conformation predictions with an MMGBSA energy function.
T. Gaillard^*, N. Panel, T. Simonson^*
Proteins, 2016, 84, 803-819.
doi:10.1002/prot.25030
Pairwise decomposition of an MMGBSA energy function for computational protein design.
T. Gaillard^*, T. Simonson^*
Journal of Computational Chemistry, 2014, 35, 1371-1387.
doi:10.1002/jcc.23637
Computational protein design: the Proteus software and selected applications.
T. Simonson^*, T. Gaillard, D. Mignon, M. Schmidt am Busch, A. Lopes, N. Amara, S. Polydorides, A. Sedano, K. Druart, G. Archontis
Journal of Computational Chemistry, 2013, 34, 2472-2484.
doi:10.1002/jcc.23418
Computational protein design in the genomic era.
T. Gaillard, M. Schmidt am Busch, A. Lopes, D. Mignon, T. Simonson
Actes des Journées Ouvertes en Biologie, Informatique et Mathématiques, 2013, 151-158.
1-4 July 2013, Toulouse, France.
The inverse protein folding problem: protein design and structure prediction in the genomic era.
M. Schmidt am Busch, A. Lopes, D. Mignon, T. Gaillard, T. Simonson
in J. Zeng, R. Zhang, H. Treutlein (Ed.), Quantum Simulations of Materials and Biological Systems.
Springer Verlag, New York, 2012.

Thomas Gaillard

Computational protein design