a too simple model for protein folding ethan bolker mathematics and computer science umass boston...

33
A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Post on 20-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

A too simple model forprotein folding

Ethan Bolker

Mathematics and Computer Science

UMass Boston

Clark University

April 14, 2004

Page 2: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Preliminaries• Problem source: biology teaching need,

• Analysis mixes biology, cs, mathematics (= applied mathematics)

• Ongoing help from Bogdan Calota

• See www.cs.umb.edu/~eb/folding

Page 3: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

How life works

• DNA (gene) makes RNA

• RNA makes polypeptide

• Polypeptide folds into protein

• Proteins interact (biochemistry)

• Cells … organisms … communities …

• Natural selection makes gene mix evolve

Page 4: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Virtual teaching laboratories• For Brian White (Biology, UMass Boston)• Virtual Genetics Laboratory (VGL)

– Mendelian genetics– http://intro.bio.umb.edu/VGL/index.htm– Science, April 16, 2004

• GenExplorer – the central dogma– www.cs.umb.edu/genex/

• Watch this space …

Page 5: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Polypeptide protein• Polypeptide: sequence of amino acids

chemical (biological) activity depends on three dimensional configuration (folding)

• Protein: polypeptide folded into active shape• Given the sequence, what’s the shape?

– Wet lab• lots of chemistry• x-ray crystallography• (newer tools)

– Virtual lab• compute shape from chemical principles• need supercomputer or grid

Page 6: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

folding@home

www.stanford.edu/group/pandegroup/folding/

Page 7: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

For beginning biologists

• Problem: give students hands on experience showing how sequence determines shape

• Solution: very simple model– amino acid = disk in the plane, hydrophobic index

hi expresses wish to avoid wet environment– fold polypeptide on hex grid to minimize energy

energy = Σ (# exposed edges) hi acids

Page 8: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

folding@umb51882 possible configurations

(5279 modulo dihedral group symmetry)

minimum energy -131.17

minimum occurs once

topology 0: [2, 7]1: [ ]2: [0, 7]3: [ ]4: [ ]5: [ ]6: [ ]7: [0, 2]

Page 9: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

folding@umb

51882 possible configurations (5279 modulo dihedral group symmetry)

minimum energy -13.161

minimum occurs twice (second - obvious - answer has same topology)

Page 10: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Brute force search• Try all nonintersecting walks of length n on

plane grid of hexagons: 1, 6, 30, 138, 618, 2730, 11946, 51882, 224130,

964134, 4133166, …

• Sequence # A001334 in the Online Encyclopedia of Integer Sequences www.research.att.com/~njas/sequences/

• No closed form expression • Growth rate obviously O(5n), actual 4.25n • To count foldings, divide by 12 (symmetry)

Page 11: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

A (random) chain of length 17• Five of the 11 minimum energy foldings

• All 11 show same 8 acid cool ring, hot core

• Essentially the same topology

• 12 hour computation

Page 12: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Open questions (statistical)

• How many minima?

• What is the energy distribution– for one polypeptide, over all foldings?– of minima, over all polypeptides of fixed length?

• Do all minima for a pp have same topology? (several possible definitions for topology)

• Do approximate minima have same topology? (several possible definitions for approximate)

Page 13: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Which amino acid universe?Random polypeptides – acids chosen • hi uniformly distributed in [-1,1]• hi = (1,-1) with probability (p, 1-p)• from (Ala, Arg, … , Tyr, Val) with

– measured hydrophobic indices– measured probabilities of occurrence

the natural universe

Page 14: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Digression

• How do you interpolate visually between red and green?

• in RGB space, white is halfway

• in HSB space, yellow is halfway

• Application uses cubic interpolation to adjust contrast near the midpoint

Page 15: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Cubic interpolation // Map a range of hydrophobic indices h to a continuum of // colors between RED and GREEN in HSB space. // // First map h linearly to x between 0.0 and 1.1 so that we // can form convex combinations. To get better visual effect // replace x by // f(x) = ax^3 + bx^2 + cx // color(x) = f(x)*RED + (1-f(x))*GREEN // f(0) = 0 means color(0) = GREEN. Then find a, b and c so that // f(1) = 1, f(1/2) = 1/2 and f '(1/2) = k (to be determined). Then // color(1) = RED and color(1/2) = 1/2 (RED+GREEN) = YELLOW, // // When k = 1, f(x) = x is linear, not cubic (check the algebra). // That works well for the natural table. But for the virtual table it // provides too little contrast near the center. k= ½ flattens out the // cubic at its inflection point there and seems to be just about right.

Page 16: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Open questions (biological)

• Nature isn’t random: naturally occurring polypeptides are not a random selection from the natural universe

• Which shapes can occur as the minimum energy configurations of polypeptides?– which are beautiful? (polypeptide tangrams)– which are interesting? (designer drugs)

(I like cool rings, Brian White likes hot cores)

Page 17: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Folding algorithms

• Conjecture: brute force is NP-complete

• Look for an approximate algorithm– polynomial time– close to true minimum with high probability– not stochastic

• Conjecture: no local algorithm will do

Page 18: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Incremental Folding

int lookahead

int step ≤ lookahead

while there are acids to place

explore all positions for the next lookahead acids that minimize

the energy of configuration so far

place the first step of those lookahead acids

Page 19: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Incremental Folding

• lookahead = step = 1 is greedy

• lookahead = step = n is brute force

• time = O( 4.xlookahead )

• linear in n, but exponential in lookahead

n step

Page 20: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

50 acids, randomly chosen from natural universe

seed 2255

minimum energy -352.38

lookahead 8, step 1

time 139 seconds

Page 21: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

50 acids, randomly chosen from natural universe

seed 2255

minimum energy -338.42

lookahead 8, step 4

time 29 seconds

Page 22: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

50 acids, randomly chosen from natural universe

seed 2255

minimum energy -351.54

lookahead 8, step 5

time 27 seconds

Page 23: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

50 acids, randomly chosen from natural universe

seed 2255

minimum energy -343.98

lookahead 8,

step 7

time 15 seconds

Page 24: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

brute force folding for one random chain of length 17

Page 25: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

incremental: step sensitivitylookahead 7

-18

-16

-14

-12

-10

-8

-6

-4

-2

0

1 2 3 4 5 6 7

step

en

erg

y

bru

te f

orc

e

Page 26: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

incremental: lookahead sensitivitystep 5

-18

-16

-14

-12

-10

-8

-6

-4

-2

0

1 2 3 4 5 6 7 8 9 10 11

lookahead

en

erg

y

5 6 10

987 11

13

14

12

bru

te f

orc

e

Page 27: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

• Topology highly sensitive to step• Energy not monotone with step or lookahead• Can always be fooled

• May be realistic biologically• Suffices for teaching goal

Incremental Folding

● ● ●

Page 28: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

More geometry• Square grid folding is faster: O(2.xlookahead)

instead of O(4.xlookahead) • But not nearly as pretty

Page 29: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Folding in space• Cubic grid has same folding complexity as hex

grid in plane since each cell has six neighbors• 3D analogue of hex grid is spherical close

packing – oranges at the market– layers of hexagonally close packed planes– cell is a rhombic dodecahedron– each sphere has 12 neighbors– folding complexity O(10.xn )

Page 30: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Packing spheres

Page 31: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

H. SteinhausMathematical Snapshots

Page 32: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Foldings in space

energy 37.8time 18 secondsexplored 752057 chains

energy 15.6time 0 secondsexplored 8185 chains

Page 33: A too simple model for protein folding Ethan Bolker Mathematics and Computer Science UMass Boston Clark University April 14, 2004

Summary

• The customer is satisfied

• You can play with the applet

• The software needs work

• All the interesting questions are still open