Structural bioinformatics practical

TP AspRS
In silico directed mutagenesis of the aspartyl-tRNA synthetase

The objective of the TP is to study by molecular modeling the specific recognition between the aspartyl-tRNA synthetase and its substrate Asp. We will try to evaluate the specificity by comparing the binding of the ligands Asp and Asn. We will then seek to identify and model mutations in the active site that could favor the binding of Asn instead of Asp. This is a first step towards an engineering of the genetic code.

AspRS

Introduction

Aminoacyl-tRNA synthetases (aaRS) are an enzyme family implicated in protein synthesis. They are involved in translation by allowing the binding of an amino acid to its transfer RNA. They are very specific of the amino acid concerned and corresponding transfer RNA. Thus there exists one for each amino acid.

We will be interested particularly in the aspartyl-tRNA synthetase (AspRS), the goal being to perform point mutations of this enzyme in order to reduce its affinity for its natural ligand aspartate and favor its binding to asparagine.

For that, we will consider the problem in terms of protein sequences and structures. The study includes three steps:

Study of the sequences of some aaRS, including AspRS, in order to highlight the distinctive features of the AspRS.
Inspection of the AspRS structure in order to identify amino acids in the active site that would be good candidates for a mutagenesis.
Mutagenesis and affinity calculations by molecular modeling.

This analysis will lead to propose judicious mutations of the active site, allowing to modify the AspRS specificity by favoring Asn binding over Asp.

Protocol

A) Analysis of aaRS sequences

Retrieve the E. coli AspRS sequence in the UniProt (http://www.uniprot.org) bank
Obtain homologous sequences: BLAST search

The E. coli AspRS has three domains: the tRNA anticodon binding domain, the catalytic site domain, and a third one inserted into the catalytic site domain.

Launch a BLAST search. What types of proteins are found?

Identify important residues: make a multiple sequence alignment

Make a multiple alignment of the previously obtained sequences.
Try different formats for display and coloring by amino acid.
Repeat the operation by refining the choice of sequences.

Identify strongly conserved regions that can correspond to the active site. Choose some positions that seem distinctive of the AspRS and of Asp binding.

Which strategy have you employed? Which mutations do you propose to modify the AspRS affinity for aspartate and asparagine?

B) Structural analysis: inspection of the AspRS structure

Visualize with PyMOL the structure of AspRS and its ligand.
Locate the active site region, with the help of the sequence analysis.
Examine the active site to refine the choice of amino acids to mutate.

With the informations previously obtained, propose judicious mutations to modify the AspRS affinity for aspartate and asparagine. We will try to test several of them in the next step of modeling.

Does the active site examination lead you to modify your mutation propositions made from the sequences?

Can we use the structure to verify the sequence alignment?

C) Molecular modeling study

This is the most ambitious and complex part of the TP. There are two steps:

Estimate the AspRS affinity difference for Asp and Asn.
Mutate one or more residues in the active site and estimate again the affinity difference.

We will follow this protocol with the XPLOR program:

Inspect the files available:

tp_asprs.tar.gz

asprs.seq: sequence of the AspRS protein
asprs.pdb: experimental structure of the AspRS protein
asprs.xplor.pdb: experimental structure of the AspRS protein formatted for XPLOR
asp.xplor.pdb: structure of the Asp ligand formatted for XPLOR
amber.rtf: topology file for XPLOR
isolated_aa.rtf: additional topology file for isolated aa
amber.prm: parameter file for XPLOR
build.inp: model building of the protein:ligand complex
minimize.inp: energy minimization of the complex
energy.inp: energy calculation of the complex
run.sh: script to drive the calculations

Compare the files asprs.xplor.pdb and asprs.pdb

The PDB files must comply with a particular format to be readable by XPLOR. The segment name has to be written on 4 characters in columns 73-76. We also note that the 3-letter code of histidines has been changed from HIS to HIE. There indeed exists 3 possible protonation states for histidines and it is necessary to tell XPLOR which state is chosen among HID, HIE, or HIP (see the topology file amber.rtf for the definition of these states).

Does the HIE protonation state chosen for all histidines seem reasonable to you? If in doubt, try other protonation states and evaluate the impact on the results.

Build a model of the AspRS:Asp complex with XPLOR

xplor < build.inp > build.out

Minimize the energy of the complex to improve its geometry

xplor < minimize.inp > minimize.out

Estimate the energy of the AspRS:Asp complex, and then of each partner alone

xplor < energy.inp > energy.out

What is the affinity of the AspRS protein for the Asp ligand?

Edit the PDB file of the ligand asp.xplor.pdb to change Asp into Asn. We will simply replace one of the carboxylate oxygens of the Asp sidechain by a nitrogen (corresponding to the Asn NH₂ group). It will then be easy with XPLOR to position the two missing hydrogens.

What is the affinity of the AspRS for Asn?

Experimentally, the wild-type enzyme binds Asp considerably stronger than Asn, with an association free energy difference of more than 7 kcal/mol. Do you find the same tendency?

Mutagenesis of the AspRS: choose a mutation among the candidates previously identified.

A simple mutation (for example Asp→Asn or Gln→Glu) can be realized by editing the PDB file, as explained for the ligand.

A more complex mutation can be performed with the SCWRL program. The mutation choice (for example R10K) is done by replacing in the asprs.seq file the one-letter code of the native amino acid in lowercase by the code of the amino acid chosen for the mutation in uppercase (for example replacing the lowercase ``r'' in position 10 by an uppercase ``K'').

We then launch the SCWRL program as follows:

scwrl -s asprs.seq -i asprs.wt.pdb -o asprs.pdb > scwrl.out

Compare the mutated structure obtained asprs.pdb with the native structure asprs.wt.pdb.

It is recommended to work in a separate folder for each mutant.

Perform the affinity calculations for the mutated enzyme.

In order to use the structure mutated by SCWRL with XPLOR, it must be ensured that it is correctly formatted. For that, we will use the pdb2xplor program as follows:

pdb2xplor asprs.pdb A PROT > asprs.xplor.pdb

What are the affinities for Asp and Asn obtained with the mutated protein?

Have you succeeded in inverting the specificity?

Interpret structurally the effect of the mutations.

Which improvements could we bring to the model or the protocol?
Was the AspRS the most judicious target for this engineering?

TP Trp-cage
Structure and stability of Trp-cage

The objective of the TP is to study the structure and stability of a small protein, the Trp-cage.

Trp-cage folded ↔ Trp-cage unfolded

Introduction

Trp-cage is a small artificial protein of 20 amino acids, which has been designed to fold easily. Its amino acid sequence is NLYIQWLKDGGPSSGRPPPS. The protein folding problem is one of the most important challenges of structural bioinformatics. It consists in predicting the three-dimensional structure of a protein from the sequence information alone.

We will employ the methods of molecular mechanics to model the Trp-cage.

The dynamics of the folded protein at equilibrium will be studied first.
Then the unfolding of the Trp-cage will be studied by simulating the denaturation of the folded protein.
At last, the most difficult part of the TP will consist, starting from a linear unfolded conformation of the Trp-cage, to try to fold the protein by calculation without any experimental information.

Protocol

A) Dynamics at equilibrium of the folded Trp-cage

Inspect the files available:

tp_trp-cage.tar.gz

folded.pdb: experimental (NMR) structure of the folded Trp-cage
unfolded.pdb: linear unfolded structure of the Trp-cage
amber.rtf: topology file for XPLOR
amber.prm: parameter file for XPLOR
build.inp: model building and energy minimization
md.inp: molecular dynamics at 300K
traj2mpdb.inp: conversion of the trajectory to the multiple PDB format
analyze.inp: analysis of the trajectory
run.sh: script to drive the calculations

Model building

xplor < build.inp > build.out

This script builds a model of the Trp-cage with XPLOR and performs an energy minimization to improve the geometry.

Inspect the output file and visualize the structures produced.

Molecular dynamics

xplor < md.inp > md.out

This script performs a molecular dynamics of the Trp-cage during 20ps, assigning random initial velocities and then maintaining the temperature at 300K.

Inspect the output file and track the energy and temperature as a function of time.

Visualization of the trajectory

xplor < traj2mpdb.inp > traj2mpdb.out

This script converts the format of the trajectory produced from DCD to multiple PDB.

We can then visualize the trajectory with PyMOL by loading it as follows:

load md.multi.pdb, multiplex=0

Analysis of the trajectory

xplor < analyze.inp > analyze.out

This script reads the produced trajectory (md.dcd) and performs structural or energetic calculations at each step. The results are written in a text file (md.dat). Represent them graphically.

The analyses included in the script are only examples, it is your task to add more relevant ones with the help of the XPLOR documentation.

Which descriptors of the structure, dynamics and stability of the Trp-cage have you studied?
Is the simulation time sufficient to obtain converged results? Extend the simulation if necessary.
Does the Trp-cage simulation show a stable structure?
Indicate possible deformations with respect to the experimental structure.

B) Unfolding the Trp-cage

Design a protocol to denature the Trp-cage by molecular dynamics and implement it by adapting the XPLOR scripts. Describe the procedure followed.
Can we define a limit between the folded and unfolded states?
Can we observe different stages in the unfolding?

C) Folding the Trp-cage

Adapt the XPLOR scripts and repeat the first steps of A) to build a model of the unfolded Trp-cage (replace ``folded'' by ``unfolded'' in scripts) and simulate it by molecular dynamics.
Develop a protocol to fold the Trp-cage. Do not hesitate to draw inspiration from the literature. Describe the strategy followed.
To what extent have you succeeded in folding the protein?
Can we observe intermediate states during folding?
What improvements could be made to the model?

TD/TP Modeller
Homology modeling of mimivirus tyrosyl-tRNA synthetase

Mimivirus is a giant DNA virus. It is larger than many bacteria, and can itself be infected by other viruses. It was discovered that mimivirus possesses certain genes for proteins involved in translation, absent in other viruses that use the host cell's machinery to multiply. These discoveries have fuelled debate about the boundary between living and inert matter.

Mimivirus

Homology modeling aims at building a model of the unknown structure of a target protein, knowing its sequence and the structure of another template protein of homologous sequence. The method can be decomposed into four steps:

template selection
target-template alignment
model building
model evaluation

The aim of this work is to propose the best possible structural model (criteria to be defined) of mimivirus tyrosyl-tRNA synthetase (its structure is assumed to be unknown) by homology modelling with the Modeller program.

Retrieving the sequence

Retrieve the sequence of mimivirus tyrosyl-tRNA synthetase in FASTA format in UniProt (http://www.uniprot.org) database.

Template selection

Carefully select a structure to be used as a guide for homology modeling (of course, we won't use the structure of mimivirus tyrosyl-tRNA synthetase, which we assume to be unknown). Retrieve this structure in PDB format.

Conversion of the sequence format

Convert query sequence from FASTA to PIR format (http://salilab.org/modeller/manual, File formats, Alignment file (PIR)) with which Modeller works. Example of a sequence in PIR format:

>P1;TvLDH
sequence:TvLDH::::::::
MSEAAHVLITGAAGQIGYILSHWIASGELYGDRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGFVATTDPKA
AFKDIDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTNCEIAMLHAKNLKPEN
FSSLSMLDQNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEGKTQKVVDVLDHDYVFDTFFKKI
GHRAWDILEHRGFTSAASPTKAAIQHMKAWLFGTAPGEVLSMGIPVPEGNPYGIKPGVVFSFPCNVDKEGKIHVV
EGFKVNDWLREKLDFTEKDLFHEKEIALNHLAQGG*

Inspect the files available

td-tp_modeller.tar.gz

The Modeller program is launched as follows:

modeller file.py

Target-template alignment

Align the query sequence with the sequence of the selected guide structure by adapting the align2d.py script.

modeller align2d.py

The alignment produced is written in PIR, PAP, and FASTA formats. Examine these files.

Model building

Model by homology the target structure by adapting the model-single.py script.

modeller model-single.py

A summary including the PDB file names of the models produced as well as the value of the Modeller energy function and the DOPE score for each model can be found at the end of the output file (model-single.log). Examine the models produced with PyMOL.

Model evaluation

Evaluate the models generated in the previous step by adapting the evaluate_model.py script.

modeller evaluate_model.py

This script allows a more detailed evaluation of the models produced by calculating the DOPE score for each position of the alignment. Plot the DOPE score as a function of position (columns 1 and 42) for the different models.

How to define the best model produced?

For further information:

Modeller tutorial: http://salilab.org/modeller/tutorial/basic.html

Modeller manual: http://salilab.org/modeller/manual

TD/TP AutoDock
Molecular docking of Indinavir to HIV-1 protease

td-tp_autodock.tar.gz

lock and key

AutoDock is a suite of tools for molecular docking. Docking involves predicting how ligands, such as substrates or potential drugs, bind to a receptor with a known three-dimensional structure.

The AutoDock docking procedure comprises two main steps:

generation of receptor interaction maps with the AutoGrid program
docking the ligand to the receptor using the AutoDock program

We will also be using a graphical interface called AutoDockTools (ADT) to facilitate docking preparation and visualize the results.

The method will be illustrated with the docking of Indinavir, an HIV-1 protease inhibitor, used as an antiretroviral in the treatment of AIDS.

Ligand preparation

AutoDock needs to know the charge and type of each atom, as well as a list of the freely rotating bonds present in the ligand.

Launch ADT

adt

Load the ligand

Ligand → Input → Open... → PDB files → "ind.pdb"

AutoDockTools reads the ligand and performs the following steps: calculation of Gasteiger-type atomic charges, fusion of non-polar hydrogens, assigning atomic types, detecting the number of torsional degrees of freedom.

Atomic types and charges are used in the molecular mechanics terms of AutoDock scoring function. The number of torsional degrees of freedom of the ligand determines its flexibility and is also used in the calculation of the entropy penalty of association.

Detect the root atom

Ligand → Torsion Tree → Detect Root...

The smallest rigid group in the molecule includes this atom and all the atoms connected to it by bonds without free rotation.

Choose torsions

Ligand → Torsion Tree → Choose Torsions...

Bonds without free rotation are shown in red, those that could undergo a rotation but marked as inactive appear in purple, while those marked as active appear in green. Leave the default definition, which corresponds to 14 degrees of freedom.

Ligand → Torsion Tree → Set Number of Torsions...

Reduce the number of degrees of freedom to 6 ("fewest atoms") to speed up the calculation.

Save the ligand

Ligand → Output → Save as PDBQT... → "ind.pdbqt"

Receptor preparation

Grid → Macromolecule → Open... → PDB files → "hsg1.pdb"

AutoDockTools reads the receptor and, as with the ligand, performs the steps of calculating Gasteiger-type atomic charges, merging non-polar hydrogens and assigning atomic types.

Save the receptor as "hsg1.pdbqt".

Preparation of the grid maps

It is necessary to generate an interaction map for each atomic type of ligand, plus one for electrostatics and one for desolvation.

Read the atomic types of the ligand

Grid → Set Map Types → Choose Ligand... → "ind" → Select Ligand

Choose the size and position of the grid

Grid → Grid Box...

Choose 60, 60 and 66 for the number of grid points in x, y and z directions; and 2.5, 6.5 and -7.5 for the x, y and z coordinates of the grid center position.

File → Close saving current

Save the grid parameter file (.gpf)

Grid → Output → Save GPF... → "hsg1.gpf"

The .gpf file contains the parameters for the AutoGrid program.

Generate the grid maps

Run → Run AutoGrid... → Launch

The AutoGrid program is executed with the previously generated parameter file (the corresponding command line is autogrid4 -p hsg1.gpf -l hsg1.glg).

The maps produced are written to the current directory and have the .map extension.

Docking

Load the receptor

Docking → Macromolecule → Set Rigid Filename... → "hsg1.pdbqt"

Load the ligand

Docking → Ligand → Choose... → "ind" → Select Ligand → Accept

Choose the docking parameters

Docking → Search Parameters... → Genetic Algorithm...

Reduce the number of evaluations of the genetic algorithm to 250000 ("short") then click on Accept.

Save the docking parameter file (.dpf)

Docking → Output → Lamarckian GA... → "ind.dpf"

The .dpf file contains the parameters for the AutoDock program. We have chosen the Lamarckian genetic algorithm as conformational sampling method.

Launch the docking

Run → Run Autodock... → Launch

The AutoDock program is executed with the previously generated parameter file (the corresponding command line is autodock4 -p ind.dpf -l ind.dlg).

Analysis of results

Reinitialize ADT

Edit → Delete → Delete All Molecules

Read the docking results

Analyze → Dockings → Open... → "ind.dlg"

Display the receptor

Analyze → Macromolecule → Open...

Visualize the conformations

Analyze → Conformations → Play, ranked by energy...

A control bar appears, allowing you to browse the conformations found by docking. Conformation 0 is the starting one.

Change options by clicking on the "&" button on the bar.

Select "Show Info" and examine the energy terms displayed.

Browse through the conformations.

Change ligand and receptor representation and coloring modes. For example, you could display the molecular surface of the receptor to visualize shape complementarity with the ligand.