virtual modelling of proteins jacek leluk interdyscyplinarne centrum modelowania matematycznego i...

26
Virtu al modelli ng of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Upload: mitchell-nutley

Post on 15-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Virtual

modelling

of proteins

Jacek Leluk

Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Page 2: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Main functions of proteins (selected):

Enzymes

ImmunoglobulinsTransport factors (e.g.hemoglobin)

Hormones, neurotransmittersStructural and storage proteinsContractile proteins (muscles, flagella)

Jacek Leluk

Page 3: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Protein – a polymer of amino acids.

Proteins consists of one or more chains.

Some proteins contain other components (sugars, lipids,

nucleotides, metal ions, other compounds...) – proteids.

The basic unit of a protein is amino acid. There are 20 biogenic amino acids (genetically encoded).

Page 4: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Amino acids

Amino acid – organic compound that contains amino group and acidic group (usually it is carboxyl group)

AlanineGeneral formula

Page 5: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Amino acid – polypeptide – protein

Page 6: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Protein chain folding

Page 7: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Diversity of proteins

Glucagon ROP proteinInsulin

Page 8: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Light „harvesting” protein from purple bacteria

Diversity of proteins

Page 9: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Sequence – structure - function

At first the central dogma of molecular biology assumed very strict relationship between genetic information, protein structure and function:

1 gene 1 sequence 1 structure 1 function

At present this dogma is still valid but not in as strict form as before. These relationships are not strictly univocal.

e.g. a protein of the same sequence may reveal different secondary and tertiary structures.

? ? ?

Page 10: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

All information about protein structure (and function as well) is included in its amino acid sequence, which is unique for each protein.

In order to be able to apply these relationships for protein modelling, first we have to learn to read and understand the information „written” in amino acid sequence.

The current level of our understanding this „writing” depends on the protein complexity and the prediction accuracy is between 20% and 80%.

Sequence – structure - function

Page 11: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

What do we have?

Biomolecular databases (genomic, protein and bibliographic)

Tools for theoretical analysis of biomolecules

Labs for experimental verification of the results

Knowledge (theories, hypotheses, theoretical models)

Page 12: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Regular types of structure(secondary structure)

helix-helix

Page 13: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

sheet

-chain (-sheet)

Regular types of structure(secondary structure)

Page 14: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

3D protein structuresStructure-function relationship

Sea anemone -

toxin

Snake - toxin

Page 15: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Bacterial RNase

Mammalian RNase

Rnase inhibitor

(inhibits both RNases)

3D protein structuresStructure-function relationship

Page 16: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Errors (mutations) and resulting implicationsSickle cell anemia

Sickle cell anemia – genetic disease caused by a single amino acid substitution in hemoglobin -chain (one of 146). S hemoglobin has Val instead of Glu in -chain. Homozygotes (HbSS) are lethal, heterozygotes (Hb AS) are anemic, but resistant to malaria.

Normal hemoglobin – chainVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

S Hemoglobin – chainVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH

Page 17: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Hemoglobin

Normal Altered

Mutations and resulting implicationsSickle cell anemia

Page 18: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Mutations and resulting implicationsSickle cell anemia

Page 19: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Mutations and resulting implicationsSickle cell anemia

Page 20: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Glucagon (pig) – hormone, 29 amino acids

HSQGTFTSDYSKYLDSRRAQDFVQWLMNT

Glucagon (synthetic) – hormone, 29 amino acids

HSQGTFTSDYSKYLDSKKAQEFVQWLMNT

Page 21: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Glucagon (pig) – HSQGTFTSDYSKYLDSRRAQDFVQWLMNT

Glucagon (synth.) – HSQGTFTSDYSKYLDSKKAQEFVQWLMNT

Hydrophobic amino acids:

L, I, V, F, M, Y, (W)

„Gluca con” modelling

Gluca con

LAALIAAVAAAIAAVLRRIAEVLAIVAAL

Page 22: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

„Gluca con” design - resultsGlucagon (pig) – HSQGTFTSDYSKYLDSRRAQDFVQWLMNT

Glucagon (synth.) – HSQGTFTSDYSKYLDSKKAQEFVQWLMNT

Gluca con – LAALIAAVAAAIAAVLRRIXEVLAIVAAL

Page 23: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Can we „improve” the Nature at molecular level?

What for?

Our goal is to get the knowledge about natural mechanisms and then

to apply this knowledge for our needs, but not to alter the evolved mechanisms that naturally occur.

Page 24: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

Role and significance of theoretical protein modeling and design

Time economy Money economy Work and material economy Increasing our knowledge Supporting the experimental work

Page 25: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Jacek Leluk

The value of virtual protein design

=

Page 26: Virtual modelling of proteins Jacek Leluk Interdyscyplinarne Centrum Modelowania Matematycznego i Komputerowego, Uniwersytet Warszawski

Zestawienie sekwencji (multiple alignment) 52 inhibitorów proteinaz typu Bowman-Birk sporządzone za pomocą algorytmu

semihomologii genetycznej Reszty konserwatywne i typowe wyszczególniono białymi literami na czarnym tle. Szare tło wskazuje aminokwasy

semihomologiczne. 3 10 20 30 40 50 60 P01055 ESSKPCCDQCACTKSNPPQCRCSDMRLNSCHSACKSCICALSYPAQCF-CVDITDFCYEP-CKP P01057 ESSKPCCDECACTKSIPPQCRCTDVRLNSCHSACSSCVCTFSIPAQCV-CVDMKDFCYAP-CKS P01056 QSSKPCCBHCACTKSIPPQCRCTDLRLDSCHSACKSCICTLSIPAQCV-CBBIBDFCYEP-CKS P01058 ESSKPCCDQCSCTKSMPPKCRCSDIRLNSCHSACKSCACTYSIPAKCF-CTDINDFCYEP-CKS P01059 ESSKPCCDLCTCTKSIPPQCHCNDMRLNSCHSACKSCICALSEPAQCF-CVDTTDFCYKS-CHN P01063 ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS P17734 QSSKPCCRQCACTKSIPPQCRCSQVRLNSCHSACKSCACTFSIPAQCF-CGBIBBFCYKP-CKS P81483 -SSKPCCBHCACTKSIPPQCRCSBLRLNSCHSECKGCICTFSIPAQCI-CTDTNNFCYEP-CKS P81484 -SSKPCCBHCACTKSIPPQCRCSBLRLNSCHSECKGCICTFSIPAQCI-CTDTNNFCYEP-CKS P16343 ESSKPCCSSC-CTRSRPPQCQCTDVRLNSCHSACKSCMCTFSDPGMCS-CLDVTDFCYKP-CKS P01064 EYSKPCCDLCMCTRSMPPQCSCEDIRLNSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS P82469 -SSGPCCDRCRCTKSEPPQCQCQDVRLNSCHSACEACVCSHSMPGLCS-CLDITHFCHEP-CKS P01061 ESSHPCCDLCLCTKSIPPQCQCADIRLDSCHSACKSCMCTRSMPGQCR-CLDTHDFCHKP-CKS P01062 ESSEPCCDSCDCTKSIPPECHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCYKP-CES P01060 QSSPPCCBICVCTASIPPQCVCTBIRLBSCHSACKSCMCTRSMPGKCR-CLBTTBYCYKS-CKS 1BBI: ESSKPCCDQCACTKSNPPQCRCSDMRLNSCHSACKSCICALSYPAQCF-CVDITDFCYEP-CKP 1D6R:I ---KPCCDQCACTKSNPPQCRCSDMRLNSCHSACKSCICALSYPAQCF-CVDITDFCYEP-CK- 1DF9:C ESSEPCCDSCDCTKSIPPQCHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCYKP-CES 1PI2: EYSKPCCDLCMCTRSMPPQCSCED-RINSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS 1PBI:A DVKSACCDTCLCTKSNPPTCRCVDVGET-CHSACLSCICAYSNPPKCQ-CFDTQKFCYKQ-CHN AAB4719 ESSKPCCDQCTCTKSIPPQCRCTDVRLNSCHSACSSCVCTFSIPAQCV-CVDMKDFCYAP-CKS TISYC2 ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS JC2225 ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS TIZB2 ESSKPCCDQC-CTKSMPPKCRCSDIRLDSCHSACKSCACTYSIPAKCF-CTDINDFCYEP-CKS JC2073 ESSKPCCDECKCTKSEPPQCQCVDTRLESCHSACKLCLCALSFPAKCR-CVDTTDFCYKP-CKS JC2072 ESSKPCCDECKCTKSEPPQCQCVDTRLESCHSACKLCLCALSFPAKCR-CVDTTDFCYKP-CKS 0506164 ESSKPCCDQC-CTKSMPPKCRCSDIRLDSCHSACKSCACTYSIPAKCF-CTDINDFCYEP-CKS 0401177 ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS 763679A ESSKPCCDLCMCTASMPPQCHCADIRLNSCHSACDRCACTRSMPGQCR-CLDTTDFCYKP-CKS TISYD2 EYSKPCCDLCMCTRSMPPQCSCEDIRLNSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS 0907248 ESSEPCCDSCRCTKSIPPQCHCADIRLNSCHSACKSCMCTRSMPGKCR-CLDTDDFCYKP-CES 1102213 ESSEPCCDLCLCTKSIPPQCQCADIRLNSCHSACKSCMCTRSMPGQCH-CLDTHDFCHKP-CKS 1102213 ESSEPCCDLCLCTKSIPPQCQCADIRLNSCHSACKSCMCTRSMPGQCR-CLDTHDFCHKP-CKS 0404180 EYSKPCCDLCMCTRSMPPQCSCEDIRLNSCHSDCKSCMCTRSQPGQCR-CLDTNDFCYKP-CKS TIZB1B ESSHPCCDLCLCTKSIPPQCQCADIRLDSCHSACKSCMCTRSMPGQCH-CLDTHDFCHKP-CKS TIMB ESSEPCCDSCDCTKSKPPQCHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCYKP-CES TIZB1P ESSHPCCDLCLCTKSIPPQCQCADIRLNSCHSACKSCMCTRSMPGQCR-CLDTHDFCHKP-CKS JC1066 ESSEPCCDSCDCTKSKPPQCHCANIRLNSCHSACKSCICTRSMPGKCR-CLDTDDFCTKP-CES Q41066 DVKSACCDTCLCTKSDPPTCRCVDVGET-CHSACDSCICALSYPPQCQ-CFDTHKFCYKA-CHN P80321 STTTACCDFCPCTRSIPPQCQCTDVREK-CHSACKSCLCTLSIPPQCH-CYDITDFCYPS-CR- Q41065 DVKSACCDTCLCTKSNPPTCRCVDVRET-CHSACDSCICAYSNPPKCQ-CFDTHKFCYKA-CHN P81705 --TSACCDKCFCTKSNPPICQCRDVGET-CHSACKFCICALSYPAQCH-CLDQNTFCYDK-CDS P56679 DVKSACCDTCLCTKSNPPTCRCVDVGET-CHSACLSCICAYSNPPKCQ-CFDTQKFCYKA-CHN P16346 --TTACCNFCPCTRSIPPQCRCTDIGET-CHSACKTCLCTKSIPPQCH-CADITNFCYPK-CN- P01065 DVKSACCDTCLCTRSQPPTCRCVDVGER-CHSACNHCVCNYSNPPQCQ-CFDTHKFCYKA-CHS P24661 DVKSACCDTCLCTKSEPPTCRCVDVGER-CHSACNSCVCRYSNPPKCQ-CFDTHKFCYKS-CHN P07679 KRPWECCDIAMCTRSIPPICRCVDKVDR-CSDACKDCEETEDN--RHV-CFDTYIGDPGPTCHD P19860 ERPWKCCDLQTCTKSIPAFCRCRDLLEQ-CSDACKECGKVRDSDPPRYICQDVYRGIPAPMCHE P22737 ERPWKCCDLQTCTKSIPAFCRCRDLLEQ-CSDACKECGKVRDSDPPRYICQDVYRGIPAPMCHE 220645 ES-EGCCDRCICTKSMPPQCHCHDVRLDSCHSDCETCICTRSYPAQCR-CADTTDFCYKP-C-S P09864 TRPWKCCDRAICTKSFPPMCRCMDMVEQ-CAATCKKCGPATSDSSRRV-CEDXY----------- P09863 KRPWKCCDQAVCTRSIPPICRCMDQVFE-CPSTCKACGPSVGDPSRRV-CQDQYV---------- KONSENSUS ESSKPCCDXCXCTKSIPPQCRCXDXRLNSCHSACKSCXCTRSXPXQCX-CXDTXDFCYKP-CKS