1001 stories of protein folding ming li school of computer science university of waterloo cs882,...

1001 Stories of Protein Folding

Ming LiSchool of Computer Science

University of Waterloo

CS882, Fall 2006

By the time I finish telling these protein stories, I hope we know better how to fold them by computers.

Prelude: Why should you care?

Through 3 billion years of evolution, nature has created an enormous number of protein structures for different biological functions. Understanding these structures is key to proteomics. Fast computation of protein structures is one of the most important unsolved problems in science today. Much more important than, for example, the P≠NP conjecture.

We now have a real chance to solve it.

This course: I do ½ of the course, so that we understand everything

about proteins. You do ½ of the course, to present all methods for protein

folding. 50% marks. You do a final project designing your method for folding

proteins. 50% marks.

Proteins – the life story

Proteins are building blocks of life. In a cell, 70% is water and 15%-20% are proteins.

Examples: hormones – regulate metabolism structures – hair, wool, muscle,… antibodies – immune response enzymes – chemical reactions

Sickle-cell anemia: hemoglobin protein is made of 4 chains, 2 alphas and 2 betas. Single mutation from Glu to Val happens at residue 6 of the beta chain. This is recessive. Homozygotes die but Heterozygotes have resistance to malaria, hence it had some evolutionary advantage in Africa.

A T

T A

C

C

C

C

G

G

G

G

G

T

T

T

A

A

A

A

T

C

A T

mRNA Proteintranscription translation

Human: 3 billion bases, 30k genes.E. coli: 5 million bases, 4k genes

(A,C,G,U) (20 amino acids)

Codon: three nucleotides encode an amino acid.64 codons20 amino acids, some w/more codes

cDNAreverse transcription

They are built from 20 amino acids and fold in space into functional shapes

Several polypeptide chains can form more complex structures:

What happened in sickle-cell anemia

Mutating toValine.Hydrophobicpatch on thesurface.

Mutating toValine.Hydrophobicpatch on thesurface.

Hemoglobin

Amino acids stories

There are 500 amino acids in nature. Only 20 (22) are used in proteins.

The first amino acid was discovered from asparagus, hence called Asparagine, in 1806. All 20 amino acids in proteins are discovered by 1935.

Traces of glycin, alanine etc were found in a meteorite in Australia in 1969. That brings the conjecture that life began from extraterrestrial origin.

20 Amino acids – the boring part

Polar amino acids Serine Threonine Tyrosine Histidine Cysteine Asparagine Glutamine Tryptophan

Hydrophobic amino acids Alanine Valine Phenylalanine Proline Methionine Isoleucine Lucine

Charged Amino Acids Aspartic acid Glutamic acid Lysine Arginine

Simplest Amino Acid Glycine

Polar: one positive

and one negative charged ends,

e.g. H2O is polar, oil is non-polar.

NeutralNon-polar

Why do protein fold? Some philosophy

The folded structure of a protein is actually thermodynamically less favorable because it reduces the disorder or entropy of the protein. So, why do proteins fold? One of the most important factors driving the folding of a protein is the interaction of polar and nonpolar side chains with the environment. Nonpolar (water hating) side chains tend to push themselves to the inside of a protein while polar (water loving) side chains tend to place themselves to the outside of the molecule. In addition, other noncovalent interactions including electrostatic and van der Waals will enable the protein once folded to be slightly more stable than not.

When oil, a nonpolar, hydrophobic molecule, is placed into water, they push each other away.

Since proteins have nonpolar side chains their reaction in a watery environment is similar to that of oil in water. The nonpolar side chains are pushed to the interior of the protein allowing them to avoid water molecule and giving the protein a globular shape. There is, however, a substantial difference in how the polar side chains react to the water. The polar side chains place themselves to the outside of the protein molecule which allows for their interact with water molecules by forming hydrogen bonds. The folding of the protein increases entropy by placing the nonpolar molecules to the inside, which in turn, compensates for the decrease in entropy as hydrogen bonds form with the polar side chains and water molecules.

1 letter label & how to remember them

If only one amino acid begins with a letter, that letter is used:

C = Cys = Cysteine H = His = Histidine I = Ile = Isoleucine M = Met = Methionine S = Ser = Serine V = Val = Valine

Otherwise the letter is assigned to the more frequent one:

A = Ala = Alanine G = Gly = Glycine L = Leu = Leucine P = Pro = Proline T = Thr = Threonine

The losers try phonetically F = Phe = Phenylalanine R = Arg = Arginine Y = Tyr = Tyrosine W = Trp = Trptophan

(double ring)

When everything fails: D = Asp = Aspartic acid N = Asn = Asparagine E = Glu = Glutamic acid Q = Gln = Glutamine K = Lys = Lysine

They really look all the same:

One amino acid.The difference is only in the side chain R.

Many amino acidsconnected to a polypeptide chain

Lose H2O

The amino acids are connected to form polypeptide chains: going from N terminal to C terminal

Planar, rigid, withknown bond distancesand angles.

Lose water H2Owhen forming the peptide bond

They could have been different

L-form vs D-form: Looking down the H-Cα bond from H, the L-form is CORN. The D-form is NRCO

All amino acids occur in proteins have L-form.

It is unclear why D-form was not chosen

In nature, L, D-forms occur with equal chance.

In functioning proteins, onlyL-form occur

Mirror image

Story of cysteines

Two cysteine residues in different (non-adjacent) parts of a protein sequence can be oxidized to form a disulfide bridge, as end product of air oxidation:

2 cysteines + ½ O2 = 2 linked cysteines + H2O

They have the functions: Stablize single protein fold Linking two chains (linking A and B chains in

insulin)

Disulfide bond between two cystines:

Cystine:

SH | CH2

|

Note: We will not studyamino acids one by one,but we will studytheir structures when we meet them. Red bondconnects to Cα

The Φ and Ψ angles

The angle at N-Cα is Φ angle

The angle at Cα-C’ is Ψ angle

No side chain is involved (which is at Cα)

These angles determine backbone structure.

Cα

The Ramachandran plot

L-amino acids cannot formLarge left-handed helix, butGly (also apn, asp) can formshort left-handed helix, withside chain forming hydrogen bound with main chain.

Red: goodYellow: okWhite: forbidden

Except Glycine

The story of Glycine

Glycines have no side-chain (just H), so it can adopt phi and psi angles in all 4 quadrants of the Ramachadran plot.

Thus, it frequently occur in turn regions of proteins where any other residue would be sterically hindered.

Glycine:

H |

Staggered carbon atoms for side chains

Ethan: CH3CH3

Aligned,too crowded

Most favorable+ 1200 rotations

Valine: (b) is more favorable,least crowded

Cβ

Cα

1001 stories of protein folding ming li school of computer science university of waterloo cs882,...

Documents

protein stories

codons20 amino acids

hemoglobin protein

proteins fold

folding proteins

u20 amino acidscodon

polypeptide chains

interaction of polar