©cmbi 2009 alignment & secondary structure you have learned about: data & databases tools...

29
©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences After this: You know how to perform structural alignments You are ready to apply this knowledge in your bioinformatics research project!

Upload: hugo-potter

Post on 04-Jan-2016

216 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Alignment & Secondary Structure

You have learned about:

Data & databases

Tools

Amino Acids

Protein Structure

Today we will discuss: Aligning sequences

After this:

You know how to perform structural alignmentsYou are ready to apply this knowledge in your bioinformatics research project!

Page 2: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Why align sequences?

The problem:

There a lots of sequences with unknown structure and/or functionThere are a few sequences with known structure and/or function

Alignment can help:

• If sequences align well, they are likely to be similar

• If they are similar, then they very likely share structural and/or functional aspects

• If one of them has known structure/function, then alignment gives us insight in structural and/or functional aspects of the aligned sequence(s)

TRANSFER OF INFORMATION!

Page 3: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Sequence Alignment (1)

A sequence alignment is a representation of a whole series of evolutionary events, which left traces in the sequences.

Things that are more likely to happen during evolution should be most prominently observed in your alignment.

The purpose of a sequence alignment is to line up all residues in the sequence that were derived from the same residue position in the ancestral gene or protein.

Page 4: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Sequence Alignment (2)

gap = insertion or deletion (indel)

A

B

B

A

Page 5: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Structural alignment

To carry over information from a well studied protein sequence and its structure to a newly discovered protein sequence, we need a sequence alignment that represents the protein structures today, a structural alignment.

The implicit meaning of placing amino acid residues below each other in the same column of a protein (multiple) sequence alignment is that they are at the equivalent position in the 3D structures of the corresponding proteins!!

Page 6: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Examples

1) the 3 active site residues H, D, S, of the serine protease we saw earlier

2) Cysteine bridges (disulfide bridges):STCTKGALKLPVCRKTSCTEG--RLPGCKR

Page 7: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Transfer of information

Such information can be:

Phosphorylation sitesGlycosylation sitesStabilizing mutationsMembrane anchorsIon binding sitesLigand binding residuesCellular localization

Typically what one finds in the feature (FT) records of Swissprot!

Page 8: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Significance of alignment

One can only transfer information if the similarity is significantly high between the two sequences.

Schneider (group of Sander) determined the “threshold curve” for transferring structural information from one known protein structure to another protein sequence:

If the sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information.

If the sequences are smaller in length, a higher percentage of identity is needed.

Structure is much more conserved than sequence!

Page 9: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Significance of alignment (2)

Page 10: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Aligning sequences by hand

Most information that enters the alignment procedure comes from the physico-chemical properties of the amino acids.

Examples: which is the better alignment (left or right)?

1) CPISRTWASIFRCW CPISRTWASIFRCWCPISRT---LFRCW CPISRTL---FRCW

2) CPISRTRASEFRCW CPISRTRASEFRCWCPISRTK---FRCW CPISRT---KFRCW

Page 11: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Aligning sequences by hand (2)

Procedure of aligning depends on information available:

1) Use “only” identity of amino acid and its physico-chemical properties. This is more or less what alignment programs do.

2) Also use explicitly the secondary structure preference of the amino acids.Example: aligning 2 helices when sequence identity is low

3) Use 3D information if one or more of the structures in the alignment are known.

In most cases you will start with a alignment program (e.g. CLUSTAL) and then use your knowledge of the amino acids to improve the alignment, for instance by correcting the position of gaps.

Page 12: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Helix

Page 13: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

-4 -3 -2 -1 1 2 3 4 5 total

- - - - H H H H H

ASP 98 110 121 260 98 197 167 49 86 1186

Dataset of good helices from PDB files

Count all Asp residues in & before helices

Identify preferential positions for Asp residues

Positional preferences in helices (1)

Position 1 in helix

Page 14: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Fill this table for all 20 amino acidsUse this information when aligning helices who have low percentage of sequence identity

-4 -3 -2 -1 1 2 3 4 5 total

- - - - H H H H H

ALA 143 148 99 58 189 205 187 241 268 1538

CYS 24 31 29 22 14 17 18 33 17 205

ASP 98 110 121 260 98 197 167 49 86 1186

GLU 91 100 71 71 152 287 269 70 147 1258

(…)

TRP 29 25 29 14 30 26 28 30 29 240

TYR 66 65 75 33 58 44 56 72 48 517

Positional preferences in helices (2)

Position 1 in helix

Page 15: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Aligning 2 sequences when sequence identity is low

S G V S P D Q L A A L K L I L E L A L K

G T S L E T A L L M Q I A Q K L I A G

Helix 1:

Helix 2:

Page 16: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Aligning 2 helices when sequence identity is low

S G V S P D Q L A A L K L I L E L A L K-1-4-4-1-4-1 3-2 1 1-2 2 -3-2 -3 2 5 1 2 2 1 5 4 -2 3 4 3 3 4 1 5 4 4 5 5 5

G T S L E T A L L M Q I A Q K L I A G-4-1-1-2 2-1 1-2-3 3 1 3 3 2 1 4 3 4 5 4 5 5

Final alignment:S G V S P D Q L A A L K L I L E L A L K

- G T S L E T A L L M Q I A Q K L I A G

Page 17: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

Protein threading

The word threading implies that one drags the sequence (ACDEFG...) step by step through each location on the template

©CMBI 2009

Page 18: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Use of 3D structure info (1)

If you know that in structure 1 the Ala is pointing outside and the Ser is pointing inside:

Where does the Arg in structure 2 go?

(and what will CLUSTAL choose?)

1

2

Page 19: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

Use of 3D structure info (2)

1 2 3 4 5 6 7 8 9 10 11 A ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VALB1 VAL CYS ARG THR PRO --- --- --- GLU ALA ILEB2 VAL CYS ARG --- --- --- THR PRO GLU ALA ILE

Page 20: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

An even more real example

1 2 3 4 5 6 7 8 9 10 11 A ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VALB1 VAL CYS ARG THR PRO --- --- --- GLU ALA ILEB2 VAL CYS ARG --- --- --- THR PRO GLU ALA ILE

IVV CCC

RRR

LT-

PP- G- -

S-T

A-P

EEE

AAAV I I

Page 21: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

1)Are crucial for being able to transfer information

2)Can be optimized by using secondary structure preferences (e.g. helix positioning)

3)Can be optimized by using 3D structure info

We have seen that alignments ….

Page 22: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

If we have more than two related sequences aligned, the alignment is called a multiple sequence alignment (MSA)

MSA’s can:

1)reveal structural information (e.g. cys-bridges, calcium binding sites)

2)validate PROSITE search results

3)confirm or improve pair-wise sequence alignments (Course Day 6)

Multiple sequence alignments

Page 23: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

MSA and cysteine bridges

Multiple sequence alignments can reveal structural information:

1 2 3 4ASCTRGCIKLPTCKKMGRCTGYSTCTKGALKLPVCRKMGKSSAYATSTHGCMKLPCSRRFGKCSSYTSCTEGCLRLPGCKRFGRCTSYTTCTKGLLKLPGCKRFGKSSAYASSTKGCMKLPVSRRFGRCTAY

Page 24: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

MSA to validate PROSITE results (1)

You have seen the PROSITE database of protein patterns before.

PROSITE glycosylation pattern:

N-{P}-[ST]-{P} where N is the glycosylation site.

PROSITE Syntax:A-[BC]-X-D(2,5)-{EFG}-HMeans:

AB or CAnything2-5 D’sNot E,F or GH

Page 25: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

MSA to validate PROSITE results (2)

The chance of finding N-{P}-[ST]-{P} is rather high.

So how can you be sure? Look at the MSA of related sequences!

ASLRNASTVVTIGDTITGNLTLASYHWGSIKNGSSVITLPGTMEGNLSTTTYHYATLRNASTVMEINGTITGDLTLASFHW

Page 26: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

MSA to validate PROSITE results (3)

The chance of finding N-{P}-[ST]-{P} is rather high.

So how can you be sure? Look at the MSA of related sequences!

ASLRNASTVVTIGDTITGNLTLASYHWGSIKNGSSVITLPGTMEGNLSTTTYHYATLRNASTVMEINGTITGDLTLASFHW

Page 27: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

What you have learned today

• A good sequence alignment is necessary to carrying over information between proteins.

•Putting amino acids below each other in a sequence alignment implies that you predict that they are on equivalent positions in both proteins.

• If the aligned sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information.

Page 28: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

You are now capable of…

• Applying these lessons to the practical exercises

• Performing your own bioinformatics research project!

Take home lesson:

Please remember to always use all structural information available to you to optimize a sequence alignment. This can be real 3D data, but can also be “just” your own knowledge about the properties and preferences of the amino acids.

Page 29: ©CMBI 2009 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning

©CMBI 2009

CWPVAASYGRCWPT---YGRCWPTA-SYGRCWPTLGLFGR

MSA for improvement of pair-wise alignments