9. proteins and their structuremilotti/didattica/... · bacteria build a tough skin of carbohydrate...
TRANSCRIPT
IntroductorybiophysicsA.Y.2016-17
9.Proteinsandtheirstructure
EdoardoMilottiDipartimento diFisica,Università diTrieste
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
!"#$%"&'()*)#+&,-'),.$/'+0)'0%"&'
F%#-&+?&G
?&H=())(,1
3&HI>
J&KE
&LCMD
EN&OPQ
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Attacking BacteriaLysozyme protects us from the ever-present danger of bacterial infection. It is a smallenzyme that attacks the protective cell walls of bacteria.
Bacteria build a tough skin of carbohydrate chains, interlocked by short peptide strands,that braces their delicate membrane against the cell's high osmotic pressure. Lysozymebreaks these carbohydrate chains, destroying the structural integrity of the cell wall. Thebacteria burst under their own internal osmotic pressure.
The First AntibioticAlexander Fleming discovered lysozyme during a deliberate search for medical antibiotics.Over a period of years, he added everything that he could think of to bacterial cultures,looking for anything that would slow their growth.
He discovered lysozyme by chance. One day, when he had a cold, he added a drop of mucusto the culture and, much to his surprise, it killed the bacteria. He had discovered one of ourown natural defenses against infection. Unfortunately, lysozyme is a large molecule that isnot particularly useful as a drug. It can be applied topically, but cannot rid the entire bodyof disease, because it is too large to travel between cells. Fortunately, Fleming continuedhis search, finding a true antibiotic drug five years later: penicillin.
(from http://www.rcsb.org/pdb/101/motm.do?momID=9 )
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Proteinsareubiquitousandcarryoutmanyfunctionsinlivingorganisms.
Thelowest-levelviewofproteinsisthattheyarelinearheteropolymers.Theindividalmonomersintheselinearchainsareaminoacids.
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromD.Voe
tand
J.G.Voe
t,“Biochem
istry,4
thed
.”,W
iley20
11
carboxylgroup,protondonor(acid)
aminogroup,protonacceptor(base)
side-chain
inchemistry,zwitterionsactatthesametimeasacidandbases
Aminoacids
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromD.Voe
tand
J.G.Voe
t,“Biochem
istry,4
thed
.”,W
iley20
11
Peptidebond
Whenanaminoacid sharestwopeptidebondsithaseffectivelylosttheequivalentofonewatermolecule,andthewholestructureisreferredtoasthe“aminoacidresidue”.
Thereforeitsmolecularweightinthepeptidechaincorrespondstothemolecularweightofthefreeaminoacidminusthemolecularweightofwater.
TheRgroupiscalled“sidechain”.
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromD.Voe
tand
J.G.Voe
t,“Biochem
istry,4
thed
.”,W
iley20
11
massisoftengiveninD=dalton =atomicmassunit
Theresiduemassesaregivenfortheneutralresidues.Formolecularmassesoftheparentaminoacids,add18.0D,themolecularmassofH2O,totheresiduemasses.
Forsidechainmasses,subtract56.0D,theformulamassofapeptidegroup,fromtheresiduemasses.
fromD.Voe
tand
J.G.Voe
t,“Biochem
istry,4
thed
.”,W
iley20
11
+Xaa =unknown
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Disulfidebond
fromD.Voe
tand
J.G.Voe
t,“Biochem
istry,4
thed
.”,W
iley20
11
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromD.VoetandJ.G.Voet,“Biochemistry,4th ed.”,Wiley2011
Proteinsaspolypeptidechains
NotethatDNAcodesforthelinearchain,butpost-transcriptionalmodificationsproduceadditionalconfigurationchangesintheprotein.
I.X);&1;/*=.1(W."&,#);,.,*(".1&(/&*=.&!/"#,)$1-(2&g.*(2:):-&:/".%S#&F(5.&,%(/2(,$)&-#"(F(2$*(#/1&<.F#%.&*=.;&%.$2=&*=.(%&F(/$)&".1*(/$*(#/1\
C? 0#%-$*(#/&#F&"(1:)F(".&<#/"1&L(/&*=.&%#:S=&!gNA? H%#,.%&F#)"(/S&L(/&*=.&%#:S=&!gNQ? >""(*(#/&$/"&,%#2.11(/S&#F&2$%<#=;"%$*.1&Lc#)S(&$,,$%$*:1&$/"&!gNO? J,.2(F(2&,%#*.#);*(2&2).$5$S.1&Lc#)S(&$,,$%$*:1&$/"&!gNK? >11.-<);&(/*#&-:)*(-.%(2&,%#*.(/1&L(/&*=.&%#:S=&!gN
C? I:2).:1&&&A? I:2).$%&,#%.&&&Q? g#:S=&./"#,)$1-(2&%.*(2:):-&Lg!gN&&&O? J-##*=&./"#,)$1-(2&%.*(2:):-&LJ!gN&&&K? g(<#1#-.&#/&*=.&%#:S=&!g&&&D? H%#*.(/1&*=$*&$%.&*%$/1,#%*."&&&E? 7%$/1,#%*&5.1(2).&&&P? c#)S(&$,,$%$*:1&&&M? G(1&F$2.&#F&*=.&c#)S(&$,,$%$*:1&&&CB? 7%$/1&F$2.&#F&*=.&c#)S(&$,,$%$*:1&&&CC? G(1*.%/$.&#F&*=.&c#)S(&$,,$%$*:1
LF%#-&=**,1\]]./?X(T(,."($?#%S]X(T(]!/"#,)$1-(2j%.*(2:):-N
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
7=.&c#)S(&$,,$%$*:1&
7=.&!/"#,)$1-(2&g.*(2:):-
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
F%#-&+?&G
?&H=())(,1
3&HI>
J&KE
&LCMD
EN&OPQ
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
F%#-&=**,\]]XXX?%21<?#%S],"<]CBC]-#*-?"#^-#-9+_M
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Foldingofthehemagglutinin(HA)precursorpolypeptideHA0 andformationofanHA0 trimerwithintheER.
Howdoproteinsfoldintheirfinalshape?TheERhasanimportantroleinthisprocess.
Manyreduced,denaturedproteinscanspontaneouslyrefoldintotheirnativestateinvitro.Inmostcasessuchrefoldingrequireshourstoreachcompletion,yetnewsecretoryproteinsgenerallyfoldintotheirproperconformationintheERlumenwithinminutesaftertheirsynthesis.TheERcontainsseveralproteinsthatacceleratethefoldingofnewlysynthesizedproteinswithintheERlumen.Proteindisulfideisomerase(PDI)isonesuchfoldingcatalyst;thechaperoneHsc70isanother.LikecytosolicHsc70,thisERchaperonetransientlybindstoproteinsandpreventsthemfrommisfoldingorformingaggregates,therebyenhancingtheirabilitytofoldintotheproperconformation.TwootherERproteins,thehomologouslectinscalnexinandcalreticulin,bindtocertaincarbohydratesattachedtonewlymadeproteinsandaidinproteinfolding.(fromMolecularCellBiology.4thedition.LodishH,BerkA,ZipurskySL,etal.NewYork:W.H.Freeman;2000.)
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
'#"(F(."&%.1(":.1L."(*."&F%#-&=**,\]]XXX?:/(,%#*?#%S]=.),]-#"j%.1 N
G=.-(2$)&-#"(F(2$*(#/1&(/2):".&,7$#,7$."-*/+$0()&'/7"-*/+$0()*2'/"-*/+$0()*&+<*/+$0();$.&*/+$0)$;),"..$-+<$0')2*.9$A"-+2)*2+<()+#$&'.+%*/+$0()7"<.$A"-*/+$0()#:-;*/+$0();-*=+0B9+0<+04()2"#/'+0')$A+<*/+$0)*0<)0+/.$#"-*/+$0?&[.%.&X.&2#/1(".%&#/);&*=.&*=%..&-#1*&2#--#/&-#"(F(2$*(#/?
C?&H=#1,=#%;)$*(#/&
H=#1,=#%;)$*(#/&%.F.%1&*#&*=.&*%$/1F.%&#F&$&,=#1,=$*.&S%#:,&*#&$/&$-(/#&$2("?&9*&(1&$&T.;&-.2=$/(1-&F#%&1(S/$)(/S&(/&<#*=&.:T$%;#*(2&$/"&,%#T$%;#*(2&2.))1?&9*&2$/:%&#/&$&/:-<.%&#F&2;*#,)$1-(2&$/"&/:2).$%&%.1(":.13&(?.?&#/&*=.&=;"%#Y;)&S%#:,&#F&1.%(/.3&*=%.#/(/.&#%&*;%#1(/.3&#/&*=.&/(*%#S./&#F&$%S(/(/.3&=(1*("(/.&#%&);1(/.3&#/&*=.&2$%<#Y;)&S%#:,&#F&$1,$%*$*.3&#%&#/&*=.&1:)F=;"%;)&S%#:,&#F&2;1*.(/.?
A?&'.*=;)$*(#/&
G;*#,)$1-(2&$/"&/:2).$%&,%#*.(/1&2$/&<.&./W;-$*(2$));&-#"(F(."&(/&1.5.%$)&X$;1&<;&*=.&$""(*(#/&#F&-.*=;)&S%#:,1?&'.*=;)$*(#/&%.$2*(#/1:%%(/S&#/&2$%<#Y;)&S%#:,1&2$/&<.&%.5.%1(<).&$/"&-#":)$*.&*=.&$2*(5(*;&#F&*=.&*$%S.*&,%#*.(/3&X=().&*=#1.&#/&/(*%#S./&$*#-1&$*&*=.&I8*.%-(/:1&$/"&#/&1(".82=$(/1&$%.&:1:$));&(%%.5.%1(<).?
H=#1,=$*.&S%#:,
'.*=;)&S%#:,
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
Q?&>2.*;)$*(#/&
>0*')&6137<3+'*"73*6$1
I8*.%-(/$)&$2.*;)$*(#/&(1&#/.&#F&*=.&-#1*&2#--#/&,#1*8*%$/1)$*(#/$)&-#"(F(2$*(#/1&(/&.:T$%;#*.13&<:*&(*&(1&%$%.&(/&,%#T$%;#*.1?&9*&%.F.%1&*#&*=.&$""(*(#/&#F&$/&$2.*;)&S%#:,&*#&*=.&$),=$8$-(/#&S%#:,&#F&*=.&F(%1*&%.1(":.&#F&$&,%#*.(/3&#F*./&$F*.%&*=.&2).$5$S.&#F&*=.&(/(*($*#%&-.*=(#/(/.?&7=.&-#1*&2#--#/);&$2.*;)$*."&%.1(":.1&$%.&S);2(/.3&$)$/(/.3&1.%(/.&#%&*=%.#/(/.?&7=(1&%.$2*(#/:%1&(/&*=.&2;*#1#)?&'.*=(#/(/.&%.1(":.1&2$/&$)1#&<.&-#"(F(."&(F&*=.&/.Y*&%.1(":.&(1&$/&$1,$%*$*.3&S):*$-$*.3&).:2(/.3&(1#).:2(/.3&*%;,*#,=$/3&,=./;)$)$/(/.&#%&$1,$%$S(/.&%.1(":.?&I#*.&*=$*&*=.&-#"(F(."&,#1(*(#/&-$;&/#*&2#%%.1,#/"&*#&*=.&F(%1*&$-(/#&$2("&#F&*=.&"(1,)$;."&1.`:./2.&(F&I8*.%-(/$)&$2.*;)$*(#/:%1&$F*.%&,%#*.#);*(2&,%#2.11(/S&#F&*=.&2=$(/?
91*')137<3+'*"73*6$1
9/*.%/$)&$2.*;)$*(#/&(1&*=.&$""(*(#/&#F&$&I8$),=$8$2.*;)&S%#:,&F%#- *#&*=.&1(".&2=$(/&#F&$&);1(/.&%.1(":.?&9/&.:T$%;#*.13&(*&S./.%$));&*$T.1&,)$2.&(/&*=.&/:2).:1&$/"&$FF.2*1&-$(/);3&<:*&/#*&.Y2):1(5.);3&=(1*#/.1?&9*&$)1#:%1&(/&,%#T$%;#*.1?V;1(/.&$2.*;)$*(#/&2$/&2#-,.*.&X(*=&$2.*;)$*(#/&#/&*=.&1$-.&%.1(":.3&(/&X=(2=&2$1.&<#*=&-#"(F(2$*(#/1&$%.&".12%(<."&$1&m$)*.%/$*.n?
>2.*;)&S%#:,
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromP.Echen
ique
,“Introd
uctio
ntoproteinfo
ldingforp
hysic
ists”,
Contem
p.Phys.48
(200
7)81
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromP.Echen
ique
,“Introd
uctio
ntoproteinfo
ldingforp
hysic
ists”,
Contem
p.Phys.48
(200
7)81
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromP.Echen
ique
,“Introd
uctio
ntoproteinfo
ldingforp
hysic
ists”,
Contem
p.Phys.48
(200
7)81
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromD.Voet andJ.G.Voet,“Biochemistry,4th ed.”,Wiley2011
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
>/S).&&&&&&(1&:1:$));&CPBo3&<.2$:1.&#F&*=.&,$%*($)&"#:<).&<#/"
!
>/S).&"(1*%(<:*(#/1&F#%&PC3AQO&/#/8c);3&/#/8H%#3&/#/8,%.H%#&%.1(":.13&X(*=&<$2T<#/.&R8F$2*#%QB&F%#-&*=.&KBB81*%:2*:%.&=(S=8%.1#):*(#/&"$*$<$1.3&$)#/S&X(*=&5$)("$*(#/&2#/*#:%1&F#%&F$5#%."&$/"&$))#X."&%.S(#/1?&0%#-&J?&G?&V#5.))&.*&$)?3&dJ*%:2*:%.&a$)("$*(#/&<;&Gp c.#-.*%;\&,=(3&,1(&$/"&Gq+.5($*(#/e3&H%#*.(/13&EF LABBQN&OQE
g$-$2=$/"%$/ ,)#*
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
!+0:#)G*.-)?*:-+04
R#%/\&AP&0.<%:$%;&CMBC3&H#%*)$/"3&Zg3&4J>
+(."\&CM&>:S:1*&CMMO3&R(S&J:%3&G>3&4J>
I#<.)&H%(W.&(/&G=.-(1*%;&(/&CMKO&rF#%&=(1&%.1.$%2=&(/*#&*=.&/$*:%.&#F&*=.&2=.-(2$)&<#/"&$/"&(*1&$,,)(2$*(#/&*#&*=.&.):2("$*(#/&#F&*=.&1*%:2*:%.&#F&2#-,).Y&1:<1*$/2.1e
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
V(/:1&H$:)(/S&$/"&g#<.%*&G#%.;&L>N&$/"&[.%-$/&R%$/1#/&LRN?&H$:)(/Sn1&"..,&:/".%1*$/"(/S&#F&2=.-(2$)&1*%:2*:%.&$/"&<#/"(/S3&=(1&%.*./*(5.&-.-#%;&F#%&".*$()13&$/"&=(1&2%.$*(5.&F)$(%&X.%.&$))&F$2*#%1&(/&(/&*=.&"(12#5.%;&#F&*=.&$),=$8=.)(Y?&7=.&X##"./&=.)(Y&<.*X../&H$:)(/S&$/"&G#%.;&=$1&$&12$).&#F&C&(/2=&,.%&s3&$/&./)$%S.-./*&#F&AKO3BBB3BBB&*(-.1&LF%#-&+?&!(1./<.%S3&d7=.&"(12#5.%;&#F&*=.&$),=$8=.)(Y&$/"&<.*$81=..*3&*=.&,%(/2(,$)&1*%:2*:%$)&F.$*:%.1&#F&,%#*.(/1e3&HI>J&HFF LABBQN&CCABEN
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
hydrogenbondbetweenN-HandC=Ogroupseveryfourthaminoacid leadstofoldingintoalpha-helix
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromD.Voet andJ.G.Voet,“Biochemistry,4th ed.”,Wiley2011
Helicalstructuresarereinforcedbytheestablishmentofhydrogenbondsbetweenwidelyspacedchemicalgroups
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Ramachandran plot, the quantity and quality of proteinstructures accurately determined by X-ray crystallography issuch that it is possible to Ænally verify and slightly modify theexact shapes and positions of the allowed regions in theRamachandran plot. That is the subject of this study.
2. Methods
We selected a non-redundant set of 1042 protein subunits fromthe PDB of 3 January 2002. The structures were all determinedby X-ray diffraction to a resolution of 2.0 A or higher, reÆnedto R 0.20 and had less than 30% sequence homology(Dunbrack, 2001). The Ærst and last amino acids will notprovide ' and torsion angles. For 9940 amino acids (mainlyat the beginning or end of the sequences) there were nocoordinates given in the PDB. We also removed 9669 aminoacids with temperature factor B > 40 A 2. This left us with237 384 amino acids for which we could calculate torsionangles.
The amino acids were divided into three main groupsaccording to their secondary structure as speciÆed by DSSP(Kabsch & Sander, 2001). There were 88 874 in HELIX(↵-helix and 310-helix) and 52 068 in SHEET (Table 1). Theremaining 96 442 amino acids for which atomic coordinateswere given in the PDB but were not speciÆed as HELIX orSHEET byDSSP are called Random coil in this investigation,although this group of course includes turns and other welldeÆned structures. Ramachandran plots for each of the 20amino acids were made for HELIX, SHEET, Random coil andAll, where All includes all the different secondary structures.
The Ramachandran plot was split into four regions fordetailed analysis of the conformations. Amino acids withtorsion angles in the range�180 < ' < 0�,�100 < < 45� wereconsidered to be in the ↵-helical region. Amino acids withtorsion angles in the range �180 < ' < �45�, 45 < < 225�
were considered to be in the �-sheet region. The area0 < ' < 180�, �90 < < 90� is here called the turn region. Theremaining area of the Ramachandran plot represents 36% ofthe area but contained only 1.9% of the amino acids and wasnot studied further. The bridging region between ↵-helical and�-sheet conformation is often given as �135 < ' < �45�,�25 < < 15�, but we found that this region should really beconsidered ↵-helical and the bordering region should bemoved upwards to �160 < ' < �65�, 45 < < 90�.
Amino acids in SHEET were subdivided into six groupsaccording to their contacts with adjacent strands (Fig. 2).These were (i) parallel with one partner, (ii) antiparallel withone partner, (iii) parallel with two partners, (iv) antiparallelwith two partners, (v) parallel with one and antiparallel withanother partner and (vi) amino acids without partner.
The amino acids in HELIX were subdivided in several ways.There were 1907 three-residue-long helices and 1023 four-residue-long helices and these were treated separately. For theother ↵-helices, we looked speciÆcally at the Ærst and lastamino acids. In order to Ænd the ideal ↵-helical conformation,all but the Ærst two and last two amino acids in helices � 5amino acids long were pooled.
Amino acids in Random coil were Ærst subdivided into thefour areas of the Ramachandran plot: ↵-helical, �-sheets, turnsand others. Within each of these areas, the stretches of random
Acta Cryst. (2002). D58, 768±776 Hovmo» ller et al. ✏ Conformations of amino acids in proteins 769
research papers
Figure 1The classical version of the Ramachandran plot for (a) alanine (but oftentaken as typical for all non-glycines) and (b) glycine according toRamachandran & Sasisekharan (1968). The fully allowed regions areshaded; the partially allowed regions are enclosed by a solid line. Theconnecting regions enclosed by the dashed lines are permissible withslight Øexibility of bond angles. These plots were arrived at by computermodelling. Although some overall features of these plots are correct, thedetails differ from the experimentally observed Ramachandran plots foralanine (see Fig. 5) and (c) all 19 non-glycines and (d) glycine. The mostremarkable differences are that most regions show a 45� slope rather thanbeing parallel to any of the axes, the �-sheet region is split into twodistinct maxima and the two most populated regions for glycine seen in(d) were predicted to be only just permissible as shown in (b). There areÆve areas in the glycine plot; two with ' 0� and three with ' 180�.[(a) and (b) Reproduced from Creighton (1996) with permission.]
Figure 2The deÆnitions of six types of �-strands.
Ramachandran plot, the quantity and quality of proteinstructures accurately determined by X-ray crystallography issuch that it is possible to Ænally verify and slightly modify theexact shapes and positions of the allowed regions in theRamachandran plot. That is the subject of this study.
2. Methods
We selected a non-redundant set of 1042 protein subunits fromthe PDB of 3 January 2002. The structures were all determinedby X-ray diffraction to a resolution of 2.0 A or higher, reÆnedto R 0.20 and had less than 30% sequence homology(Dunbrack, 2001). The Ærst and last amino acids will notprovide ' and torsion angles. For 9940 amino acids (mainlyat the beginning or end of the sequences) there were nocoordinates given in the PDB. We also removed 9669 aminoacids with temperature factor B > 40 A 2. This left us with237 384 amino acids for which we could calculate torsionangles.
The amino acids were divided into three main groupsaccording to their secondary structure as speciÆed by DSSP(Kabsch & Sander, 2001). There were 88 874 in HELIX(↵-helix and 310-helix) and 52 068 in SHEET (Table 1). Theremaining 96 442 amino acids for which atomic coordinateswere given in the PDB but were not speciÆed as HELIX orSHEET byDSSP are called Random coil in this investigation,although this group of course includes turns and other welldeÆned structures. Ramachandran plots for each of the 20amino acids were made for HELIX, SHEET, Random coil andAll, where All includes all the different secondary structures.
The Ramachandran plot was split into four regions fordetailed analysis of the conformations. Amino acids withtorsion angles in the range�180 < ' < 0�,�100 < < 45� wereconsidered to be in the ↵-helical region. Amino acids withtorsion angles in the range �180 < ' < �45�, 45 < < 225�
were considered to be in the �-sheet region. The area0 < ' < 180�, �90 < < 90� is here called the turn region. Theremaining area of the Ramachandran plot represents 36% ofthe area but contained only 1.9% of the amino acids and wasnot studied further. The bridging region between ↵-helical and�-sheet conformation is often given as �135 < ' < �45�,�25 < < 15�, but we found that this region should really beconsidered ↵-helical and the bordering region should bemoved upwards to �160 < ' < �65�, 45 < < 90�.
Amino acids in SHEET were subdivided into six groupsaccording to their contacts with adjacent strands (Fig. 2).These were (i) parallel with one partner, (ii) antiparallel withone partner, (iii) parallel with two partners, (iv) antiparallelwith two partners, (v) parallel with one and antiparallel withanother partner and (vi) amino acids without partner.
The amino acids in HELIX were subdivided in several ways.There were 1907 three-residue-long helices and 1023 four-residue-long helices and these were treated separately. For theother ↵-helices, we looked speciÆcally at the Ærst and lastamino acids. In order to Ænd the ideal ↵-helical conformation,all but the Ærst two and last two amino acids in helices � 5amino acids long were pooled.
Amino acids in Random coil were Ærst subdivided into thefour areas of the Ramachandran plot: ↵-helical, �-sheets, turnsand others. Within each of these areas, the stretches of random
Acta Cryst. (2002). D58, 768±776 Hovmo» ller et al. ✏ Conformations of amino acids in proteins 769
research papers
Figure 1The classical version of the Ramachandran plot for (a) alanine (but oftentaken as typical for all non-glycines) and (b) glycine according toRamachandran & Sasisekharan (1968). The fully allowed regions areshaded; the partially allowed regions are enclosed by a solid line. Theconnecting regions enclosed by the dashed lines are permissible withslight Øexibility of bond angles. These plots were arrived at by computermodelling. Although some overall features of these plots are correct, thedetails differ from the experimentally observed Ramachandran plots foralanine (see Fig. 5) and (c) all 19 non-glycines and (d) glycine. The mostremarkable differences are that most regions show a 45� slope rather thanbeing parallel to any of the axes, the �-sheet region is split into twodistinct maxima and the two most populated regions for glycine seen in(d) were predicted to be only just permissible as shown in (b). There areÆve areas in the glycine plot; two with ' 0� and three with ' 180�.[(a) and (b) Reproduced from Creighton (1996) with permission.]
Figure 2The deÆnitions of six types of �-strands.
betasheets
Left:allowedregionsbasedonstericclashesaccordingtoRamachandran’soriginalanalysisforalanine;right:observedanglesbasedonabout1000experimentallydeterminedproteinstructures,non-glycineresidues.(fromHovmolleretal.:“Conformationsofaminoacidsinproteins.”ActaCrystD58:768–776.)
alphahelices
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
BMC Structural Biology 2005, 5:14 http://www.biomedcentral.com/1472-6807/5/14
Page 3 of 11(page number not for citation purposes)
we consider the variation of the inter-atomic distance withrespect to the φ'-ψ' angles. We compare the observed vari-ation to the variation generated from a model that usescanonical backbone geometry. We divide these interac-tions into 3 categories: the φ' dependent, ψ' dependentand φ'-ψ' co-dependent distances.
For some of the interactions, the results for glycine areidentical to that of the generic Ramachandran plot [3]. Forbrevity, we omit the analysis of these interactions andsummarize the results. The excluded horizontal strip -30°
< ψ' < 30°, due to the N···Hi+1 steric interaction in theglycine steric map (Figure 2A), does not exist in theobserved distribution (Figure 1A). Similarly, the Oi-1···Csteric clash in the original glycine steric map, whichexcludes a vertical strip centered on φ' = 0° (Figure 2A),does not exist in the observed distribution (Figure 1A). Weignore the effect of the N···Hi+1 and Oi-1···C stericclashes. The diagonal boundaries of the observed distribu-tion are defined by the φ'-ψ' co-dependent steric interac-tions Oi-1···O and Oi-1···Ni+1. In Figure 3A, we showthe fit of these steric interactions to the data.
Backbone conformations of glycine and pre-prolineFigure 1Backbone conformations of glycine and pre-proline. Backbone schematic (left) and observed Ramachandran plot (right) of (A) glycine and (B) pre-proline. Taken from the data-set of Lovell et al. (2003). The clustered regions are labeled on the Ramachandran plots.
fromP.Echen
ique
,“Introd
uctio
ntoproteinfo
lding
forp
hysic
ists”,Con
temp.Phys.48
(200
7)81
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
=**,
\]]-
#)5(1:
$)?2=.
-?:21<?."
:],%
#*j1
*%:2
?=*-
)
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17fromD.Voet andJ.G.Voet,“Biochemistry,4th ed.”,Wiley2011
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromP.Echen
ique
,“Introd
uctio
ntoproteinfo
ldingforp
hysic
ists”,Con
temp.Phys.48(200
7)81
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromD.Voe
tand
J.G.Voe
t,“Biochem
istry,4
thed
.”,W
iley20
11
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
F%#-&+?&a
#.*&$
/"&b?&c
?&a#.
*3&dR
(#2=
.-(1*
%;3&O
*=."
?e3&f
().;&AB
CC
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Thestructuralmotifsofproteins
• Primarystructure:aminoacid sequenceofpolypeptidechain
• Secondarystructure:spatialarrangementofpolypeptide
backbone(withoutregardtosidechains)
• Tertiarystructure:three-dimensionalstructureofcomplete
chain
• Quaternarystructure:arrangementofsubunits
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
ImagefromtheIrvingGeis Collection,HowardHughesMedicalInstitute.
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
I.=+04)J'+# LCMBP8CMMEN&X$1&$&S%.$*&12(./*(F(2&$%*(1* #F&*=.&AB&*= 2./*:%;?&
[(1&(//#5$*(#/13&,$%*(2:)$%);&(/&".,(2*(/S&*=.&1*%:2*:%.1&#F&<(#)#S(2$)&-$2%#-#).2:).1&1:2=&$1&+I>3&.$%/."&=(-&$/&(/*.%/$*(#/$)&%.,:*$*(#/?&'$/;&#F&=(1&()):1*%$*(#/1&$,,.$%."&(/&B+6'1*6-6+<A&')6+31<3&(/2):"(/S&$&,$(/*(/S&#F&*=.&F(%1*&,%#*.(/&2%;1*$)&1*%:2*:%.3&#F&-;#S)#<(/3&,:<)(1=."&(/&CMDC?
=**,\]]XXX?==-(?#%S]/.X1]S.(1?=*-)
=**,\]],"<CBC?%21<?#%S]S.(18$%2=(5.]$<#:*
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
J*%:2*:%.&#F&=:-$/&=.-#S)#<(/?&9*&(1&$&*.*%$-.%&2#-,#1."&#F&*X#&p&$/"&*X#&q&1:<:/(*1?&7=.&,%#*.(/1U&p&$/"&q&1:<:/(*1&$%.&1=#X/&(/&%."&$/"&<):.3&$/"&*=.&(%#/82#/*$(/(/S&=.-. S%#:,1&$%.&1=#X/&(/&S%../?
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
7=.&=.-. R&S%#:,&L$/&(%#/&$*#-&(1&1*%#/S);&=.)"&$*&*=.&2./*.%&#F&$&,#%,=;%(/ %(/SN
H#%,=;%(/1 $%.&$&S%#:,&#F&=.*.%#2;2)(2&#%S$/(2&2#-,#:/"13&2#-,#1."&#F&F#:%&,;%%#). 1:<:/(*1&(/*.%2#//.2*."&$*&*=.(%&p&2$%<#/&$*#-1&5($&-.*=(/. <%("S.1&L_G[wN?&
7=.&F(S:%.&1=#X1&*=.&1*%:2*:%.&#F&,#%,=(/3&*=.&1(-,).1*&,#%,=;%(/
Theshapeofproteinsisimportantinmanyways.Inthesicklecelldisease(thalassemia),hemoglobinismalformedanditclustersintostrandsinsideredbloodcells.
Sicklehemoglobin(HbS),astructuralvariantofnormaladulthemoglobin,resultsfromasingleaminoacidsubstitution atposition6ofthebetaglobinmolecule(β6Glu→Val).
Thesicklecelldiseaseisadangerousformofanemia,howeverinititsmildformitisknowntoconfersomeresistanceagainstmalaria.
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
9/&CMOM3&(*&X$1&1:SS.1*."&*=$*&*=.&+$%X(/($/&,$%$"#Y&#F&=(S=&F%.`:./2(.1&#F&S./.*(2&<)##"&"(1#%".%1&2#:)"&%.1:)*&F%#-&$&1.).2*(5.&$"5$/*$S.&2#/F.%%."&<;&1:2=&"(1#%".%1&(/&,%#*.2*(/S&$S$(/1*&E73#&$26,&<-37+6(3),& L$&,%#*#W#$/&,$%$1(*.N&&-$)$%($&(/F.2*(#/&(/&=.*.%#W;S#*.1?&
7=(1&<$)$/2(/S&1.).2*(#/3&2#--#/);&%.F.%%."&*#&$1&*=.&m-$)$%($&=;,#*=.1(1n3&X$1&#%(S(/$));&1:SS.1*."&*#&.Y,)$(/&*=.&S.#S%$,=(2$)&2#%%.1,#/"./2.&<.*X../&*=.&"(1*%(<:*(#/&#F&*=$)$11.-($&$/"&-$)$%($&(/&*=.&'."(*.%%$/.$/&%.S(#/3&$/"&X$1&)$*.%&2#/F(%-."&(/&-$/;&)#2$*(#/1&(/2):"(/S&J$%"(/($3&'.)$/.1($&$/"&y./;$?&>*&*=.&1$-.&*(-.3&$&1(-()$%&%.)$*(#/1=(,&<.*X../&[<J $/"&-$)$%($&X$1&(/".,./"./*);&"(12#5.%."&(/&>F%(2$? ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms1104
NATURE COMMUNICATIONS | 1:104 | DOI: 10.1038/ncomms1104 | www.nature.com/naturecommunications
© 2010 Macmillan Publishers Limited. All rights reserved.
HbS allele frequency calculated within each endemicity area allowed us to quantify the statistical strength of such differences, taking into account the inherent uncertainty of the predicted HbS allele fre-quencies (see Methods). Differences in areal means between ende-micity regions were calculated for 100 unique realizations of the HbS allele frequency map generated by the Bayesian model (Fig. 4 and Supplementary Fig. S2). When combined, these realizations produced predictive probability distributions for the difference in areal mean HbS allele frequency between each successive endemic-ity class (see Table 1 and Methods).
These geostatistical measures provide the first quantitative evidence for a geographical link between the global distribution of HbS and malaria endemicity. At the global level, we found clear
Malaria endemicity
Malaria free
Epidemic
Hypoendemic
Mesoendemic
Hyperendemic
Holoendemic
HbS allele frequency (%)0 – 0.510.52 – 2.022.03 – 4.044.05 – 6.066.07 – 8.088.09 – 9.609.61 – 11.1111.12 – 12.6312.64 – 14.6514.66 – 18.18
HbS data pointsPresenceAbsence
Figure 1 | Global distribution of the sickle cell gene. (a) Distribution of the data points. Red dots represent the presence and blue dots the absence of the HbS gene. The regional subdivisions were informed by Weatherall and Clegg19, and are as follows: the Americas (light grey), Africa, including the western part of Saudi Arabia, and Europe (medium grey) and Asia (dark grey); (b) Raster map of HbS allele frequency (posterior median) generated by a Bayesian model-based geostatistical framework. The Jenks optimized classification method was used to define the classes45; (c) The historical map of malaria endemicity29 was digitized from its source using the method outlined in Hay et al.44 The classes are defined by parasite rates (PR2 − 10, the proportion of 2- up to 10-year olds with the parasite in their peripheral blood): malaria free, PR2 − 10 = 0; epidemic, PR2 − 10 0; hypoendemic, PR2 − 10 < 0.10; mesoendemic, PR2 − 10 0.10 and < 0.50; hyperendemic, PR2 − 10 0.50 and < 0.75; holoendemic, PR0 − 1 0.75 (this class was measured in 0- up to 1-year olds)29,30.
High : 0.47
Low : 0.00
Figure 2 | Map of the uncertainty of the HbS allele frequency prediction. Interval between the 2.5 and 97.5% quantiles (95% probability) of the per-pixel predicted allele frequency using a continuous scale.
F%#-&H(.).
*&$)?&dc
)#<$
)&"(1*
%(<:*
(#/&#F&*=
.&1(2
T).&2.
))&S.
/.&$/"
&S.#
S%$,
=(2$
)&2#/
F(%-$*(#/&#F&*=
.&-$)$%($&
=;,#
*=.1
(1e3&I
$*:%
.&G#
--?&C
\CBO
&LABC
BN
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms1104
NATURE COMMUNICATIONS | 1:104 | DOI: 10.1038/ncomms1104 | www.nature.com/naturecommunications
© 2010 Macmillan Publishers Limited. All rights reserved.
HbS allele frequency calculated within each endemicity area allowed us to quantify the statistical strength of such differences, taking into account the inherent uncertainty of the predicted HbS allele fre-quencies (see Methods). Differences in areal means between ende-micity regions were calculated for 100 unique realizations of the HbS allele frequency map generated by the Bayesian model (Fig. 4 and Supplementary Fig. S2). When combined, these realizations produced predictive probability distributions for the difference in areal mean HbS allele frequency between each successive endemic-ity class (see Table 1 and Methods).
These geostatistical measures provide the first quantitative evidence for a geographical link between the global distribution of HbS and malaria endemicity. At the global level, we found clear
Malaria endemicity
Malaria free
Epidemic
Hypoendemic
Mesoendemic
Hyperendemic
Holoendemic
HbS allele frequency (%)0 – 0.510.52 – 2.022.03 – 4.044.05 – 6.066.07 – 8.088.09 – 9.609.61 – 11.1111.12 – 12.6312.64 – 14.6514.66 – 18.18
HbS data pointsPresenceAbsence
Figure 1 | Global distribution of the sickle cell gene. (a) Distribution of the data points. Red dots represent the presence and blue dots the absence of the HbS gene. The regional subdivisions were informed by Weatherall and Clegg19, and are as follows: the Americas (light grey), Africa, including the western part of Saudi Arabia, and Europe (medium grey) and Asia (dark grey); (b) Raster map of HbS allele frequency (posterior median) generated by a Bayesian model-based geostatistical framework. The Jenks optimized classification method was used to define the classes45; (c) The historical map of malaria endemicity29 was digitized from its source using the method outlined in Hay et al.44 The classes are defined by parasite rates (PR2 − 10, the proportion of 2- up to 10-year olds with the parasite in their peripheral blood): malaria free, PR2 − 10 = 0; epidemic, PR2 − 10 0; hypoendemic, PR2 − 10 < 0.10; mesoendemic, PR2 − 10 0.10 and < 0.50; hyperendemic, PR2 − 10 0.50 and < 0.75; holoendemic, PR0 − 1 0.75 (this class was measured in 0- up to 1-year olds)29,30.
High : 0.47
Low : 0.00
Figure 2 | Map of the uncertainty of the HbS allele frequency prediction. Interval between the 2.5 and 97.5% quantiles (95% probability) of the per-pixel predicted allele frequency using a continuous scale.
F%#-&H(.).
*&$)?&dc
)#<$
)&"(1*
%(<:*
(#/&#F&*=
.&1(2
T).&2.
))&S.
/.&$/"
&S.#
S%$,
=(2$
)&2#/
F(%-$*(#/&#F&*=
.&-$)$%($&
=;,#
*=.1
(1e3&I
$*:%
.&G#
--?&C
\CBO
&LABC
BN
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromJ.S.R
ichardson,“Early
ribb
ondrawingsof
proteins”,NatureStructuralBiology7(2000)624
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Chymotrypsinogen (1CHG)aproteolyticenzyme:
sticks
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Chymotrypsinogen(1CHG):
ribbon
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Chymotrypsinogen (1CHG):
cartoon
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
=**,\]],"<CBC?%21<?#%S]).$%/]%.1#:%2.]X=$*8(18$8,%#*.(/85(".#
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
TheshapeofproteinsTheshapeofproteinschanges,andtheseconformationalchangesareimportantbothinmechanicalactionslikethoseofATP-synthase,helicaseordynein,andintheenzymaticactivityofproteins.
X-raydiffractionstudies
X-raydiffractionisthemostimportantmethodforproteinstructuredetermination.
HighluminosityfemtosecondX-raysourcesprovideanextensionofconventionalX-raymethods.
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Proteinsaresynthesizedbyribosomes
1·106 – 2.5·106 ribosomes/cell(human)1·105 ribosomes/cell(bacteria)
speed:5–40bonds/s
errorrate≈10-5 – 10-6
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
B*3))614<61<*.6#<&$/6'<312<1'/')<5'-$)'<#''1<*$4'*.')<$1<*.'<#3&'<#+)''1
*+,-./%&0 3#<.6&#'7-123.+(4.52.3 3#<*.'<)65$#$&37<#,5,16*#= 6789 3#<*.'<61-$)&3*6$1<#$,)+' #(#$#+$#'(.+(4.$%&6#(+$#'(.:+;$'&)=<3#<*.'<)'4,73*$)"<#,5,16*# +6#('.+;#4) 3#<*.'<'##'1*637<5,672614<57$+:#<312 <=* #*3))614<3#<F.'<G1')4"<B$,)+'.**(HII+&4&8#*31-$)28'2,I&$/6'I
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Moreaboutribosomesinthenextfuture...
...nowthedestructionmechanism:proteinsaredestroyedjustastheyarebuiltbycells...
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
E)$*'61#<3)'<2'#*)$"'2<5"<*.'<,56J,6*610()$*'3#$&'<&'+.316#&
:<(`:(*(/&-#).2:).1&$**$2=&*#&,%#*.(/&*#&<.&"(12$%"."&L.1,.2($));&*#&);1(/.&%.1(":.1N
*=.&,%#*.$1#-.&".1*%#;1&*=.&:<(`:(*(/8*$SS."&,%#*.(/1
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Canwepredictproteinstructure?
Proteinshaveaverylargenumberofconfigurations.
Thisleadstothewell-knownLevinthal’s paradox:aproteinwithNaminoacids,andanaverageof,say,3possibleequilibriumpositionsperaminoacid,has~3N ≈100.5N possiblefoldedconfigurations.
Forasmallproteinlikelysozyme,N =130,andtherefore100.5N =1065 configurations.
Trying1configuration/nswouldmeanabout3·1016configurations/year,andsearchingalltheconfigurationstofindthegroundstatewouldthustakeabout3·1048 years.
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Levinthal’s paradoxisrelevant:
• fornaturalproteinfolding:howdoesaproteinfoldinjustafewmicroseconds?
• forcomputationalproteinfolding:howcanweefficientlyfindthegroundstateconfigurationofproteins?
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Keystounderstanding:1.theFlory-Hugginstheory
TheFlory-Hugginstheorywasdevelopedtomodelthethermodynamicsofsolutionsofhomopolymers.Withsomemodificationsityieldsusefulinformations onheteropolymers likeproteinsaswell.
Thetheoryisusefulinpinpointingsomebasicconceptsthatareimportantinourunderstandingofproteinfolding.
Westartwithasimplelatticemodel,wherethelatticesitescanbeoccupiedeitherbysolventmoleculesorbysolutemolecules,whichareofcomparablesize.
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
(fromP.J.Flory,“PrinciplesofPolymerChemistry”,CornellUniversityPress,1953)
n1 moleculesofsolventn2 moleculesofsolute
N =n1 +n2 latticesites
Thenumberofconfigurationsis
Ω = N!n1!n2 !
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Ω = N!n1!n2 !
lnΩ ≈ N lnN − N( )− n1 lnn1 − n1 + n2 lnn2 − n2( )= N lnN − n1 lnn1 − n2 lnn2
= n1 + n2( )ln n1 + n2( )− n1 lnn1 − n2 lnn2
= −n1 lnn1
n1 + n2− n2 ln
n2n1 + n2
= −N v1 lnv1 − v2 lnv2( ) Volumefractionsorconcentrations
vi =ni
n1 + n2=
ni
N
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Thereforetheentropyofmixingis
Thisholdsforsmallmolecules,withasizeaslargeasthatofthesolvent,butwhathappensforlargermolecules,likethoseofartificialpolymers,orproteins?
Sm = kB lnΩ = −kBN v1 lnv1 − v2 lnv2( )
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Nowasinglesolutemoleculehasasizewhichisx timesaslargeasthatofasolventmolecules,thereforethetotalnumberoflatticesitesis
N =n1 +xn2
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Inordertofindthetotalnumberofwaysinwhichwecanfillthelatticewiththesolutemolecules,weconsideranincrementalconstruction.
Weassumethati solutemoleculeshavealreadybeenplacedsomewhereinthelattice,andwebuildthenextmoleculesegmentbysegment.
Initiallythereare
vacantlatticesites,thereforethisisalsothenumberofpossiblewaysofplacingthefirstsegment.
N − ix
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Letz bethelatticecoordinationnumber,i.e.,thenumberofsitesadjacenttoanygivensite,thenthenumberoffreeadjacentsitesisinitiallyz,andafterwardsitisz-1.
Inthesquarelattice,z =4.
Initialsite:thenextsegmentcanbeplacedinoneofthez neighbors
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Letz bethelatticecoordinationnumber,i.e.,thenumberofsitesadjacenttoanygivensite,thenthenumberoffreeadjacentsitesisinitiallyz,andafterwardsitisz-1.
Inthesquarelattice,z =4.
Nowonlyz-1sitesareavailabletoaddmoresegmentstothemolecule
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Wemustalsorememberthatsitesmayalreadybefilledwithsegmentsfromothermolecules(orwithprevioussegmentsofthesamemolecule).
Letfi betheprobabilitythatasiteadjacenttothethepositionofthefirstsegmentisalreadyfilledbyapreviousmolecule(weneglectself-interaction).
Thentheprobabilitythatagivenadjacentsiteisnotfilledis(1-fi ),andwefindthenumberofwaysofplacingthe(i+1)-th molecule
ν i+1 = N − ix( ) z 1− fi( ) z −1( )x−2 1− fi( )x−2
= N − ix( ) z z −1( )x−2 1− fi( )x−1
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Thismeansthatforn2 identicalmolecules,thetotalnumberofarrangementsis
Itisquitedifficulttofindtheprobabilityfi ,andwetakeasimpleapproximationinstead:
Ω = 1n2 !
ν ii∏
fi ≈x iN; 1− fi ≈
N − x iN
roughlyequaltothevolumefractionoccupiedbythemoleculesthatarealreadypresent
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Then
ν i+1 = N − ix( ) z z −1( )x−2 1− fi( )x−1
≈N − ix( )xN x−1 z −1( )x−1
= N − ix( )x z −1N
⎛⎝⎜
⎞⎠⎟x−1
≈N − ix( )!
N − i +1( )x⎡⎣ ⎤⎦!z −1N
⎛⎝⎜
⎞⎠⎟x−1
Herewetakez ≈z-1
ThisiscertainlyOKforx <<N-ix
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Ω = 1n2 !
ν ii=1
n2
∏ ≈ 1n2 !
N − i −1( )x⎡⎣ ⎤⎦!N − ix( )!
z −1N
⎛⎝⎜
⎞⎠⎟x−1
i=1
n2
∏
= 1n2 !
z −1N
⎛⎝⎜
⎞⎠⎟n2 x−1( ) N!
N − x( )!N − x( )!N − 2x( )!…
N − n2 −1( )x⎡⎣ ⎤⎦!N − n2x( )!
⎧⎨⎪
⎩⎪
⎫⎬⎪
⎭⎪
= 1n2 !
z −1N
⎛⎝⎜
⎞⎠⎟n2 x−1( ) N!
N − n2x( )!
= N!n1!n2 !
z −1N
⎛⎝⎜
⎞⎠⎟n2 x−1( )
Finallywefindthenumberofconfigurations
n1 = N � xn2
Theconfigurationalentropy(mixingwithsolvent+chaindisorder)is
Sincethenon-logarithmictermsare
andthelogarithmictermsare
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
S
c
= k
B
ln⌦ = k
B
ln
"N !
n1!n2!
✓z � 1
N
◆n2(x�1)
#
= k
B
[N lnN �N � n1 lnn1 + n1 � n2 lnn2 + n2 + n2(x� 1) ln(z � 1)� n2(x� 1) lnN ]
�N + n1 + n2 = �(n1 + xn2) + n1 + n2 = �(x� 1)n2
N lnN � n1 lnn1 � n2 lnn2 + n2(x� 1) ln(z � 1)� n2(x� 1) lnN
= (n1 + xn2) lnN � n1 lnn1 � n2 lnn2 + n2(x� 1) ln(z � 1)� xn2 lnN + n2 lnN
= n1 lnN � n1 lnn1 + n2 lnN � n2 lnn2 + n2(x� 1) ln(z � 1)
= �n1 lnn1
N
� n2 lnn2
N
+ n2(x� 1) ln(z � 1)
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
wefind
Sc = kB
h�n1 ln
n1
N
� n2 lnn2
N
+ n2(x� 1) ln(z � 1)� n2(x� 1)i
= kB
�n1 ln
n1
n1 + xn2� n2 ln
n2
n1 + xn2+ n2(x� 1) ln
z � 1
e
�
= kB
✓�n1 ln
n1
n1 + xn2� n2 ln
n2
n1 + xn2
◆+ n2(x� 1) ln
z � 1
e
�
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Wecanpicturethisasatwo-stepprocess,wherewefirstdisorientthemoleculesofsolute(thepolymer),andthenwemixthemwiththesolvent.
Wefindthedisorientationentropybysettingthenumberofsolventmoleculestozero,i.e.,n1 =0and
Sdisorientation = kBn2 ln x + x −1( )ln z −1e
⎡⎣⎢
⎤⎦⎥
≈ kBn2 x −1( )ln z −1e
whenx islargethefirsttermisneglible
Weexpectthisentropytogiveamajorcontributiontotheentropyofmeltingofthepolymer
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Theentropyofmixingcanbeobtainedbysubtraction
whichisthesameresultwefoundearlier.
Sm = Sc − Sdisorientation
= kB −n1 lnn1
n1 + n2x− n2 ln
xn2n1 + n2x
⎛⎝⎜
⎞⎠⎟
= kB −n1 lnv1 − n2 lnv2( )
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
InordertofindtheGibbsfreeenergyofmixingweneedbothentropyandenthalpy,andwestillhavetoevaluatethecontactenergybetweenmolecules.
Puttingmoleculesinsolutionmeanschangingthenumberofcontactterms.
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Ifthecontactenergiesarebetweensolventandsolutemoleculesarew11,w22,w12,thentheaverage energychangefromoneliketooneunlikecontactis
andiftherearep12 suchcontactsthetotalenthalpyofmixingis
soweareleftwiththeproblemofestimatingp12.
Δw12 = w12 −12w11 +w22( )
ΔHm = Δw12p12
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Thenumberofcontactsbetweensolventandsolutecanbeestimatedfromthefollowingconsiderations
• Thetotalnumberofcontactsbetweenapolymermoleculeandallofitsneighborsisz- 2perchainunitplustwoadditionalonesfortheterminalunits,makingatotalof(z- 2)x+2≈zx
• Thetotalnumberofcontactsbetweenasolventmoleculeanditsneighborsisjustz
• p12 isgivenbytheaveragenumberofunlikeneighboringsitesandthisisequaltothenumberofneighboringsitestimestheprobabilitythatasiteisoccupiedbyanunlikemolecule
• theprobabilitythataparticularsiteisoccupiedbyamoleculeisapproximatelyequaltothevolumefraction
p12 ≈ zxn2v1 = zn1v2
ΔHm = Δw12p12 = zΔw12n1v2 = kBT χ12n1v2
Floryinteractionparameter
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Finally,theGibbsfreeenergyofmixingis
ΔGm = ΔHm −TΔSm
= kBT χ12n1v2 − kBT −n1 lnv1 − n2 lnv2( )= kBT n1 lnv1 + n2 lnv2 + χ12n1v2( )
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Keystounderstanding:2.thehydrophobiceffect
ConsideragaintheGibbsfreeenergyexpressionformixing
Nownotethat
• Polaraminoacids formhydrogenbondswithwater.• Nonpolaraminoacids donotformhydrogenbonds,andwater
moleculeshave“lessfreedom”.
ΔGm = ΔHm −TΔSm
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
G#/F(S:%$*(#/1&#F&)(`:("&X$*.%&-#).2:).1&/.$%&=;"%#,=#<(2&2$5(*(.1&(/&-#).2:)$%8";/$-(21&1(-:)$*(#/1?&7=.&<):.&$/"&X=(*.&,$%*(2).1&%.,%.1./*&*=.&#Y;S./&LZN&$/"&=;"%#S./&L[N&$*#-13&%.1,.2*(5.);3&#F&*=.&X$*.%&-#).2:).1?&7=.&"$1=."&)(/.1&(/"(2$*.&=;"%#S./&<#/"1&L*=$*&(13&Z8[&X(*=(/&QKo #F&<.(/S&)(/.$%&$/"&Z8*#8Z&<#/"1&#F&/#&-#%.&*=$/&B?QK&/-&(/&)./S*=N?&7=.&1,$2.8F())(/S&1(W.&#F&*=.&=;"%#,=#<(2&L%."N&,$%*(2).&(/&$&(1&1(-()$%&*#&*=$*&#F&$&-.*=$/.&-#).2:).?&7=.&=;"%#,=#<(2&2):1*.%&(/&<&2#/*$(/1&CQK&-.*=$/.8)(T.&,$%*(2).1&*=$*&$%.&=.Y$S#/$));&2)#1.8,$2T."&*#&F#%-&$&%#:S=);&1,=.%(2$)&:/(*&#F&%$"(:1&)$%S.%&*=$/&C&/-?&9/&<#*=&2$1.13&*=.&X$*.%&-#).2:).1&1=#X/&$%.&*=#1.&*=$*&$%.&X(*=(/&B?P&/-&#F&$*&).$1*&#/.&-.*=$/.8)(T.&,$%*(2).?&0#%&*=.&1(/S).&2$5(*;&,(2*:%."&(/&$3&.$2=&X$*.%&-#).2:).&2$/&%.$"();&,$%*(2(,$*.&(/&F#:%&=;"%#S./&<#/"1?&LZX(/S&*#&*=.%-$)&-#*(#/13&=;"%#S./&<#/"(/S&(/&)(`:("&X$*.%&(1&"(1#%".%."?N&f$*.%&-#).2:).1&(/&$&$%.&*;,(2$)&#F&*=.&<:)T&)(`:("&X=.%.&-#1* -#).2:).1&,$%*(2(,$*.&(/&F#:%&=;"%#S./&<#/"1?&7=.&X$*.%&-#).2:).1&1=#X/&(/&<3&=#X.5.%3&$%.&/#*&*;,(2$)&#F&*=.&<:)T?&[.%.3&*=.&2):1*.%&(1&1:FF(2(./*);&)$%S.&*=$*&=;"%#S./&<#/"1&2$//#*&1(-,);&S#&$%#:/"&*=.&=;"%#,=#<(2&%.S(#/?&9/&*=(1&2$1.3&X$*.%&-#).2:).1&/.$%&*=.&=;"%#,=#<(2&2):1*.%&=$5.&*;,(2$));&*=%..&#%&F.X.%&=;"%#S./&<#/"1?
F%#-&+?&G
=$/"
).%\&d9/*.%
F$2.
1&$/"
&*=.&"%
(5(/S&F#%2.&
#F&=;"
%#,=
#<(2&$11.-
<);e
&I$*:%
.&COKLABB
KN&DOB
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Afreewatermoleculecanformhydrogenbondsin4directions,andprotonscanoccupy2outof4positions,i.e.,thereisatotalof6states.
Awatermoleculeclosetoahydrophobicsurfacecanformhydrogenbondsin(roughly)3directionsonly,andthereforethereisatotalof3states.
Entropydifference(permole):ΔS = NA kB ln6 − kB ln 3( ) = R ln2 ≈1.37 cal K−1 mole−1
TΔS ≈ 0.41 kcal mole−1 at300K
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Alargesurfaceleadstoalargerentropy:thismeansthatthetotalsurfacemustbeminimized.
TheGibbsfreeenergychangeisnegativewhenhydrophobicparticlesaggregateandthusminimizethetotalsurface.
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Computationaleffortsproceedmostlyby“bruteforcecalculations”.
Atthemomentseveralsmallproteinshavebeen“solved”,howeverlargerproteinsstilldonotyieldtocomputationalattacks.
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromK.A
.Dilland
J.L.M
acCa
llum,“Th
eProtein-Folding
Prob
lem,50YearsO
n”,Scien
ce338
(2012)1042
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
fromK.A
.Dilland
J.L.M
acCa
llum,“Th
eProtein-Folding
Prob
lem,50YearsO
n”,Scien
ce338
(2012)1042
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
“... we are still missing a “folding mechanism.” Bymechanism, we mean a narrative that explains how thetime evolution of a protein’s folding to its native statederives from its amino acid sequence and solutionconditions. A mechanism is more than just the sequencesof events followed by any one given protein in experimentsor in computed trajectories. ... ”
fromK.A.DillandJ.L.MacCallum,“TheProtein-FoldingProblem,50YearsOn”,Science338 (2012)1042
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
• Wehavelittleexperimentalknowledgeofprotein-foldingenergylandscapes.• Wecannotconsistentlypredictthestructuresofproteinstohighaccuracy.• Wedonothaveaquantitativemicroscopicunderstandingofthefoldingroutesor
transitionstatesforarbitraryaminoacidsequences.• Wecannotpredictaprotein’spropensitytoaggregate,whichisimportantforagingand
foldingdiseases.• Wedonothavealgorithmsthataccuratelygivethebindingaffinitiesofdrugsandsmall
moleculestoproteins.• Wedonotunderstandwhyacellularproteomedoesnotprecipitate,becauseofthe
highdensityinsideacell.• Weknowlittleabouthowfoldingdiseaseshappen,orhowtointervene.• Despitetheirimportance,westillknowrelativelylittleaboutthestructure,function,
andfoldingofmembraneproteins.• Weknowlittleabouttheensemblesandfunctionsofintrinsicallydisorderedproteins,
eventhoughnearlyhalfofalleukaryoticproteinscontainlargedisorderedregions.Thisissometimescalledthe“proteinnonfolding problem”or“unstructural biology.”
fromK.A.DillandJ.L.MacCallum,“TheProtein-FoldingProblem,50YearsOn”,Science338 (2012)1042
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
E$#6*6/'<2$;1-377#
! c%#X*=&#F&,%#*.(/&1*%:2*:%.&"$*$<$1.1! >"5$/2.1&(/&2#-,:*(/S&*.2=/#)#S;! 9-,%#5.-./*1&(/&<(#-#).2:)$% F#%2.F(.)"1! I.X&1#2(#)#S(2$)&1*%:2*:%.&(/&*=.&12(./*(F(2&./*.%,%(1.! I.X&-$*.%($)1\&1.`:./2.81,.2(F(2&F#)"$<).&,#);-.%1&LF#)"$-.%1N! >"5$/2.1&(/&,%#*.(/&F#)"(/S&"(1.$1.1
!5#):*(#/&#F&$22:%$2;#5.%&*=.&=(1*#%;&#F&G>JH3&1,$//(/S&CP&;.$%1?
LG%(*(2$)&>11.11-./*&#F&,%#*.(/&J*%:2*:%.&H%."(2*(#/3&#%&G>JH3&(1&$&2#--:/(*;8X(".3&X#%)"X(".&.Y,.%(-./*&F#%&,%#*.(/&1*%:2*:%.&,%."(2*(#/&*$T(/S&,)$2.&.5.%;&*X#&;.$%1&1(/2.&CMMON
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
H.,*#("1 $%.&1;/*=.*(23&F#)"$<).3&,%#*.(/8(/1,(%."&,#);-.%1&*=$*&=$5.&5$%(#:1&$,,)(2$*(#/1?J=#X/&=.%.&$%.&,.,*#("1 *=$*&X.%.&".1(S/."&$1&2=$(/1&#F&$)*.%/$*(/S&=;"%#,=#<(2&LS%$;N&$/"&.(*=.%&,#1(*(5.);&L<):.N&#%&/.S$*(5.);&L%."N&2=$%S."&1(".&2=$(/1&*=$*&1,#/*$/.#:1);&F#%-&$&*=(/&A+&1*%:2*:%.&2$))."&-#).2:)$%&,$,.%?&LF%#-&y?&>?&+())&$/"&b?&V?&'$2G$)):-3&d7=.&H%#*.(/80#)"(/S&H%#<).-3&KB&@.$%1&Z/e3&J2(./2.&OOD LABCAN&CBOAN
!"#$%"#&'()#**(&8 9/*%#":2*#%;&<(#,=;1(21&8 >?@?&ABCD8CE
=**,\]]*."Y*$)T1?*."?2#-]5(".#]7=.8,%#*.(/8F#)"(/S8,%#<).-8$8-k7!+YJR4
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
SAWs(SelfAvoidingWalks)
4n walks 4·3n-1 walks ???(mustbe<3n and>2n)
1000stepwalks
numericalworkindicatesthatthenumberofSAWsis≈2.638n
Table 3. Numbers of self avoiding walks.
n cn
0 11 42 123 364 1005 2846 7807 21728 59169 1626810 4410011 12029212 32493213 88150014 237444415 641659616 1724533217 4646667618 12465873219 33511662020 89769716421 240880602822 644456048423 1726661381224 4614639731625 12348135490826 32971278622027 88131749162828 235137858224429 627939622933230 1674195793534831 4467381663095632 11903499791302033 31740659826707634 84527907464870835 225253407775984436 599574049912441237 1596885228170872438 4248675075821004439 113101676587853932
22
fromARCon
wayeta
l(19
93)J.Phys.A:M
ath.Gen
.2615
19
EdoardoMilotti- Introductorybiophysics- A.Y.2016-17
Mean-squaredend-to-enddistancegrowsasalinearfunctionof n forrandomandnonreversingwalks,butforself-avoidingwalksseemstobeequalton3/2.