Thomas Gaillard

Computational protein design

Collaborators
Thomas Simonson and David Mignon
Laboratory
Laboratoire de Biologie Structurale de la Cellule
Institution
École Polytechnique

Protein design aims to create new proteins or modify existing ones to achieve a specific function. Computational approaches are valuable in protein design, helping to rationalize predictions and guide experimental testing. Computational protein design (CPD) has prompted significant methodological efforts and achieved spectacular successes, such as the creation of a protein with a new fold or the engineering of enzyme active sites. The main difficulty of CPD lies in the astronomical number of possible sequences and conformations, on the order of (20 × 10)100 for a protein with 100 amino acids. Another key element for the success of CPD is the energy function used to evaluate and select sequences and conformations.

The laboratory's structural bioinformatics team has been working on CPD for around 20 years and has developed a software called Proteus (https://proteus.polytechnique.fr), following a physics-inspired approach. It is based on an atomic model of the protein structure and a molecular mechanics energy function. An important aspect is the treatment of the solvent, represented by a dielectric continuum with a generalized Born term, supplemented by a term proportional to the solvent accessible surface area. The particularities of the implementation are: 1) the protein backbone is kept fixed, 2) the conformational space of the side chains is reduced to a discrete library of rotamers, 3) the energy function is decomposed into pairs of interactions. The first step is to calculate a matrix of interactions between each pair of rotamers. In the second step, the sequence and conformation space is explored with an optimization algorithm. Energy evaluations are fast in this second step thanks to the precalculation of the energy matrix. Proteus can handle a wide variety of problems. It has been applied, among others, to side chain prediction, mutant stability prediction, fold recognition, pKa predictions, redesign of full protein sequences, and enzyme active site engineering.

In this project, I contributed more specifically to methodological developments, particularly work on the energy function, solvation models, their decomposition into pairs, and the implementation of energy matrix calculation. In addition, I conducted applications of these models to various CPD problems, with an evaluation of their performance and of the contribution of their components.

2IGD
sidechain prediction of 2IGD core
1KF5
sidechain prediction of 1KF5 core
Examples of protein sidechain reconstructions, taken from Gaillard et al. [2016]. Predicted conformations are in red, experimental structures in blue. Predictions have been obtained with the Proteus program, using an energy function of MM(ϵ) type, with the all-atom AMBER 99SB force field and a dielectric constant of 2 for the Coulomb term.
sequence design of SH3 domain cores
Examples of full protein sequence design, taken from Gaillard & Simonson [2017]. The core positions are shown. Sequences are indicated in logo form. Predicted sequences are compared to the native sequences and to the Pfam profile of SH3 domains. Predictions have been obtained with the Proteus program, using an energy function of MMGBSA type, with the all-atom AMBER 99SB force field, an internal dielectric constant of 8 and pairwise decomposable GB and SA terms.
A)
transition state model of amino acid adenylation
B)
transition state model of L-Tyr adenylation
C)
enzyme design
Example of enzyme design, taken from Gaillard & Simonson [2026]. A) Transition state model of the amino acid adenylation reaction, obtained by quantum chemistry calculations. B) Transition state model of L-Tyr adenylation. TyrRS amino acids close to the ligand are shown. Carbons of amino acids allowed to mutate are in magenta, others are in green. The magnesium ion and the oxygen of a water molecule are represented as spheres. C) Results of TyrRS stereospecificity design. Sequences are indicated in logo form for the four mutated positions: native sequence, predicted sequences for design in favor of L-Tyr, predicted sequences for design in favor of D-Tyr. When the design is in favor of L-Tyr, the native sequence DYQQ is retrieved. Predictions have been obtained with the Proteus program, using an energy function of MMGBSA type, with the all-atom AMBER 99SB force field, an internal dielectric constant of 8 and pairwise decomposable GB and SA terms. The adaptive landscape flattening method was used to generate the sequences. The two states considered were the complexes with L and D transition states.
References