homology modeling lu chih-hao 1. why study protein structure? proteins play crucial functional roles...

Homology Modeling

Lu Chih-Hao

Why study protein structure?

• Proteins play crucial functional roles in all biological processes: enzymatic catalysis, signaling messengers …

• Function depends on 3D structure.

• Easy to obtain protein sequences, difficult to determine structure.

Where find the data?

• Protein Data Bank (PDB)– http://www.rcsb.org/pdb/– > ~65,500 structures of proteins

• Text file contain: coordinates for each heavy atom from the first residue to the last

PDB Statistics

TIM barrel

How to determine the protein structure?

• By experimentation– X-Ray– NMR (nuclear magnetic resonance spectroscopy)

• Sequence-Structure gap

Protein Structure Prediction

• The primary sequence already contain all the information necessary to define 3D structure.

• The 3D protein structure can be predicted according to three main categories of methods (Rost & O’Donoghue, 1997): (1) homology modeling; (2) fold recognition (threading); (3) ab initio techniques.

• Homology modeling is currently the most accurate method to predict protein 3D structure (Tramontano, 1998).

Protein Structure Prediction

Sequence

Sequence HomologyTo known fold

HomologyModeling

Threading

Match Found?

Ab initio

0 50 100 150 200 250

identity

Number of residues aligned

equence

identi

(B.Rost, Columbia, NewYork)

Sequence identity implies structural similarity

Sequence similarity implies structural similarity?

Safe zone

Homology Modeling

• Basis– Structure is much more conserved than sequence

during evolution

• Limited applicability– A large number of proteins and ORFs have no

similarity to proteins with known structure

KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSRNICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE

Use as template

8lyz1alc

KVFGRCELAAAMKRHGLDNYRGYSLGNWVCAAKFESNFNTQATNRNTDGSTDYGILQINSRWWCNDGRTPGSRNLCNIPCSALLSSDITASVNCAKKIVSDGNGMNAWVAWRNRCKGTDVQAWIRGCRLShare Similar

Sequence

Homologous

What is Homology Modeling?

Target Template

Structure prediction by homology modeling

Step 1

Step 2

Step 3

Step 4

Homology detection and template selection

• Homology detection– To detect the fold of a probe sequence from a library

of known target fold.

• The three type of sequence based methods:– Pair-wise sequence-sequence comparison

• FASTA, BLAST

– Sequence profile comparison• PSI-BLAST, IMPALA, HMMER, SAM

– Profile-profile comparison• prof_sim, COMPASS

Sequence-Sequence comparison

BLAST, FASTA, SSEARCH

Profile-Sequence comparison

PSI-BLAST15

PSI-BLAST Overview

Sequence-Profile comparison

RPS-BLAST, IMPALA, HMMER, SAM

Profile-Profile comparison

prof_sim, COMPASS18

Method_11lmb3 <-> 1pou shift = 9.34 σ = 39.62LEDARRLKAIYEKKKNELGLSQESVADKMGMGQSGVGALFNGINALNAYNAALLAKILKVSVEEFSPSIAREIYEMYEAHHHHHHHHHHHHHHHHHCCCChhhhhhhhccchhhhhhhhccccccchhhhhhhhhhhccchhhcchhhhhhhhhhhhh||||||||||||||||||||| ++++++++ + ++++++++++++ ++++++++000000000000000000000 99999999 X XXXXXXXXXXXX XXXXXXXXHHHHHHHHHHHHHHHHHHCCC---------cchhhhhhhhhcccccc---chhhhhhhcccccccchhhhhhhhhhhhhLEELEQFAKTFKQRRIKLGFT---------QGDVGLAMGKLYGNDFS---QTTISRFEALNLSFKNMCKLKPLLEKWLN

Method_21lmb3 <-> 1pou Shift = 0.67 σ = 60.78LEDARRLKAIYEKKKNELGLS----QESVADKMG--MGQSGVGALFN-GINALNAYNAALLAKILKVSVEEFSHHHHHHHHHHHHHHHHHCCCC----hhhhhhhhc--cCHHHHHHHHC-cccccchhhhhhhhhhhccchhhcc||||||||||||||||||||| ---- |||||||||| -- ++++++++ ++ 000000000000000000000 4444 0000000000 11 11111111 44 HHHHHHHHHHHHHHHHHHCCCcchhhhhhhhhcccccCCHHHHHHHCccccccchhhhhhhhhhh---hhhccLEELEQFAKTFKQRRIKLGFTQGDVGLAMGKLYGNDFSQTTISRFEALNLSFKNMCKLKPLLEKW---LNDAE

The importance of the sequence alignment

SCR; structure conserved region SVR; structure variable region

Backbone generation

• Rigid-body assembly– Building model core

Construction of loops might be done by:

Wedemeyer,ScheragaJ. Comput. Chem.20, 819-844(1999)

Ab initio methods - without any prior knowledge. This is done by empirical scoring functions that check large number of conformations and evaluates each of them.

data clustereddata

library

Construction of loops might be done by:

Using database of loops which appear in known structures. The loops could be categorized by their length or sequence

Scan database and search protein fragments with correct number of residuesand correct end-to-end distances

Loop length

Method breaksdown for loopslarger than 9

Loop Modeling: A database approach

GDT_TS = 45.96 GDT_TS = 60.48

Predicted model with long loop Without loop

Target: 2bj7A

Errors in Homology Modeling

a) Side chain packing b)Distortions and shifts c) No template

Template ModelTrue structure30

Errors in Homology Modeling

d) Misalignments e) Incorrect template

(Marti-Renom et al., 2000)

Template ModelTrue structure31

PROCHECK, Verify3D, Prosa, Anolea, Bala …

PROCHECK

α http://www.biochem.ucl.ac.uk/~roman/procheck/procheck.html

Verify3D

• Verify3D analyzes the compatibility of an atomic model (3D) with its own amino acid sequence (1D).

Luethy et al., 1992 34

ProQ Server

• ProQ is a neural network-based predictor

– Structural features quality of a protein model.

Arne Elofssons group: http://www.sbc.su.se/~bjorn/ProQ/

Correct Good Very goodLGscore > 1.5 LGscore > 3 LGscore > 5MaxSub > 0.1 MaxSub > 0.5 MaxSub > 0.8

Modeling accuracy

(Marti-Renom et al., 2000)

Utility of Structural Information

(PS)2: protein structure prediction server

Consensus strategy

• The idea of consensus analysis is to gather predictions from a set of different methods.

• The performance of consensus methods is significantly higher than for individual methods.

3d-shotgun (Fischer D., 2003)3d-jury (Ginalski K et al., 2003)Pmodeller (Bjorn W et al., 2003)

Structure prediction by homology modeling

Step 1

Step 2

Step 3

Step 4

Figure 1. Overview of the protein structure prediction server, (PS)2.

Overview of the (PS)2 method

Step1: Template search/selection by the

consensus of PSI-BLAST and IMPALA

Step2: Target-template alignment by the consensus of T-Coffee, PSI-BLAST,

and IMPALA

Step3: Model building by MODELLER and structure evaluation and visualization

by CHIME and Raster3D

(b) (c)

: Aligned path of PSI-BLAST

: Aligned path of T-Coffee

: Aligned path of IMPALA

: Final aligned path

9: aligned in 1st cycle7: aligned in 2nd cycle5: aligned in 3rd cycle3: aligned in 4th cycle4 and 2: unfeasible solution

Input: target and template sequences

Output: target-template aligned sequences

Step 1: Initial all entries of the aligned matrix to 0. Align target and template sequences using PSI-BLAST, IMPALA, and T-Coffee.

Step 2: Sum aligned scores of these three alignments for each position with different scoring weights.

Step 3: Take the positions with the highest score as the aligned points to build the final target-template alignment. (e.g., the highest scoring is 9 for the 1st cycle in (b) )

Step 4: Identify the unfeasible positions. ( 4 and 2 in (b))

Step 5: Change the scores of unfeasible positions and the aligned points to 0.

Step 6: Repeatedly Steps 3 and 5 until all entries are 0.

Step 7: Output the path with the aligned points as the target-template alignment

(b)(a)

Alignment method

http://predictioncenter.org/

CASP3 servers registered:

1. 3D-PSSM (Sternberg) sternber@icrf.icnet.uk 2. Karplus karplus@cse.ucsc.edu 3. frsvr (Fischer) dfischer@cs.bgu.ac.il 4. pscan (Eloffson) arne@bimbo.biokemi.su.se 5. BASIC (Godzik) adam@scripps.edu 6. GenTHREADER jones@globin.bio.warwick.ac.uk 7. Valentina di Francesco valedf@tigr.org 8. TOPITS (Rost) Burkhard.Rost@EMBL-Heidelberg.de 9. Bork

CASP8 servers registered:

}8 4, 2, 1,{(%)4

TSGDT d

- N is the total number residues of the target (native structure)- GDTd is the number of aligned residues whose Cα-atom distance

between the target and predicted model is less than d- d is 1, 2, 4, or 8 Å.

Model Evaluation

• Performance evaluation– Comparing the 47 CM targets to evaluate the

performance with the other groups in CASP6.

• GDT_TS Score

10 272

6 294 Native structure

PSI-BLAST modelGDT_TS = 64.97

272(PS)2 modelGDT_TS = 67.22

GDT_TS = 66.00

10 272 IMPALA modelGDT_TS = 63.32

294 T-Coffee modelGDT_TS = 65.14

T0264 (1wde)

Aligned rate: 91.00 %

Aligned rate: 100 %6

Aligned rate: 100 %6 294

Figure 3. Comparison (PS)2 with PSI-BLAST, IMPALA, and T-Coffee of the prediction accuracies (global / local GDT_TS scores) on target T0264.

Figure 4. Comparison of (PS)2 models with all automated servers in CASP6.

T0199_1

T0222_1

T0223_1

T0226_1

T0229_1

T0229_2

T0233_1

T0233_2

T0235_1

T0247_1

T0247_2

T0247_3

T0264_1

T0264_2

T0268_1

T0268_2

T0269_1

T0269_2

T0279_1

T0279_2

T0280_1

Targets

Table 1. Compare with the other groups in CASP6

(PS)2 RBTA ESYP 3DJR MGTH 3DJS PROS PMO5 PRCM PCO5 PCOB

Average GDT_TS

65.89 64.92 63.14 62.54 61.27 61.08 58.11 57.93 57.62 56.37 37.57

• Cases

T0269, Template 1prxA(PS)2 model, GDT_TS: 85.76

T0269, Template 1qq2AESYP model, GDT_TS: 78.48

http://ps2.life.nctu.edu.tw

homology modeling lu chih-hao 1. why study protein structure? proteins play crucial functional roles...

d protein structure

known structure

study protein structure

sequence similarity

d structure tramontano

type of sequence

probe sequence

protein sequences

Documents

enzymatic activity

gobierno del estado libre y soberano de...

homology review

computational homology of cubical and permutahedral...

homology model

chihuahua - gob...26 ahumada - ricardo flores magón chih 27...

homology & alignment

the enzymatic acetylation of amines the enzymatic

evaluation of enzymatic and non-enzymatic …

homology groups and persistence homology. outline...

homology modelling

chromosomal homology and evolution of phyllostomatoid...

adam corey mcdougall- relating khovanov homology to a...

enzymatic transformation

tai chi chih donna jungbluth, pta accredited tai chi chih...

homology modelling ? x-ray ? nmr ?. homology modelling !

lecture 5 enzymatic destruction (esbl) enzymatic...

chihuahua - gob · 20 ahumada - ricardo flores magon chih...

tutorial homology modelling. a brief introduction to...

enzymatic browning