macromolecular modeling with rosetta - stanford universityrhiju/das_baker_ann... · macromolecular...

23
Macromolecular Modeling with Rosetta Rhiju Das 1 and David Baker 1,2 1 Department of Biochemistry, University of Washington, and 2 Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195; email: [email protected], [email protected] Annu. Rev. Biochem. 2008. 77:363–82 First published online as a Review in Advance on April 14, 2008 The Annual Review of Biochemistry is online at biochem.annualreviews.org This article’s doi: 10.1146/annurev.biochem.77.062906.171838 Copyright c 2008 by Annual Reviews. All rights reserved 0066-4154/08/0707-0363$20.00 Key Words protein design, phasing, proteins, RNA, protein structure prediction Abstract Advances over the past few years have begun to enable prediction and design of macromolecular structures at near-atomic accuracy. Progress has stemmed from the development of reasonably accurate and efficiently computed all-atom potential functions as well as effec- tive conformational sampling strategies appropriate for searching a highly rugged energy landscape, both driven by feedback from struc- ture prediction and design tests. A unified energetic and kinematic framework in the Rosetta program allows a wide range of molec- ular modeling problems, from fibril structure prediction to RNA folding to the design of new protein interfaces, to be readily investi- gated and highlights areas for improvement. The methodology en- ables the creation of novel molecules with useful functions and holds promise for accelerating experimental structural inference. Emerg- ing connections to crystallographic phasing, NMR modeling, and lower-resolution approaches are described and critically assessed. 363 Click here for quick links to Annual Reviews content online, including: Other articles in this volume Top cited articles Top downloaded articles • Our comprehensive search Further ANNUAL REVIEWS Annu. Rev. Biochem. 2008.77:363-382. Downloaded from arjournals.annualreviews.org by Stanford University Robert Crown Law Lib. on 01/04/09. For personal use only.

Upload: others

Post on 03-Jan-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

Macromolecular Modelingwith RosettaRhiju Das1 and David Baker1,2

1Department of Biochemistry, University of Washington, and 2Howard HughesMedical Institute, University of Washington, Seattle, Washington 98195;email: [email protected], [email protected]

Annu. Rev. Biochem. 2008. 77:363–82

First published online as a Review in Advance onApril 14, 2008

The Annual Review of Biochemistry is online atbiochem.annualreviews.org

This article’s doi:10.1146/annurev.biochem.77.062906.171838

Copyright c© 2008 by Annual Reviews.All rights reserved

0066-4154/08/0707-0363$20.00

Key Words

protein design, phasing, proteins, RNA, protein structureprediction

AbstractAdvances over the past few years have begun to enable predictionand design of macromolecular structures at near-atomic accuracy.Progress has stemmed from the development of reasonably accurateand efficiently computed all-atom potential functions as well as effec-tive conformational sampling strategies appropriate for searching ahighly rugged energy landscape, both driven by feedback from struc-ture prediction and design tests. A unified energetic and kinematicframework in the Rosetta program allows a wide range of molec-ular modeling problems, from fibril structure prediction to RNAfolding to the design of new protein interfaces, to be readily investi-gated and highlights areas for improvement. The methodology en-ables the creation of novel molecules with useful functions and holdspromise for accelerating experimental structural inference. Emerg-ing connections to crystallographic phasing, NMR modeling, andlower-resolution approaches are described and critically assessed.

363

Click here for quick links to Annual Reviews content online, including:

• Other articles in this volume• Top cited articles• Top downloaded articles• Our comprehensive search

FurtherANNUALREVIEWS

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 2: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

Contents

INTRODUCTION. . . . . . . . . . . . . . . . . 364KEY INGREDIENTS OF

MOLECULAR MODELING . . . . 364Energy Function . . . . . . . . . . . . . . . . . 364Conformational Sampling . . . . . . . . 367

A UNIFIED FRAMEWORK FORBIOMOLECULAR MODELINGAND DESIGN . . . . . . . . . . . . . . . . . . 368Same Ingredients

for Multiple Problems . . . . . . . . . 368Similar Challenges Across

Multiple Problems . . . . . . . . . . . . 372Toward RNA and Other

Heteropolymers? . . . . . . . . . . . . . . 373CAN MOLECULAR MODELING

BE A PRACTICAL TOOL?. . . . . . 374The Phase Problem

in X-ray Crystallography. . . . . . . 375High-Resolution NMR Structures

from Limited Data . . . . . . . . . . . . 375High-Throughput Techniques . . . . 375From Low-Resolution Maps to

High-Resolution Models? . . . . . 377Achieving Confident Models . . . . . . 377

MACROMOLECULARMODELING FOR THECOMMUNITY. . . . . . . . . . . . . . . . . . 378

INTRODUCTION

Biomolecules have evolved the fascinatingproperty of folding into unique tertiary struc-tures dictated by their chemical sequence.The prediction of these structures from se-quence alone and the design of new func-tional molecules are classic problems in bio-physics. Although general solutions to theseformidable problems have not been achieved,recent years have seen much progress. In2004, the principles and methods imple-mented in the Rosetta algorithm for de novoprotein structure modeling were summarized(1), and since then, this method has led toa handful of blind predictions with back-

bone accuracies better than 2 A (see, e.g.,Figure 1a,b ) (2–4). The first goal of thepresent review is to illustrate how, in theseintervening four years, the same basic ingredi-ents have proven successful in a wide range ofbiomolecular modeling problems beyond denovo protein structure prediction. The sec-ond goal is to outline how, perhaps in thenext four years, these approaches may ma-ture from largely academic pursuits into use-ful tools for characterizing and manipulatingmolecular systems. After briefly summarizingthe main ingredients of the Rosetta method-ology, we describe how these principles canbe generally put into practice in applicationsranging from loop modeling, to protein andligand docking with backbone and side chainflexibility, to RNA folding. We end the re-view by highlighting new connections of thesehigh-resolution molecular modeling effortsto experimental methods as well as currentchallenges in bringing these hybrid computa-tional/experimental approaches into wide use.

KEY INGREDIENTS OFMOLECULAR MODELING

Macromolecular structure prediction and de-sign are based on the premise that theobserved conformations of folded macro-molecules are almost always the lowest free-energy states (see, however, Reference 5).Hence, structure prediction is generally theproblem of finding the lowest-energy struc-ture given the sequence of a biopolymer, anddesign is the problem of finding the lowest-energy sequence for the target structure. Crit-ical to both molecular modeling problemsare a reasonably accurate free-energy functionand a sampling method capable of locating theminima of this function for the biomolecularsystem under study.

Energy Function

The hallmark features of the folded structuresof macromolecules are the burial of nonpolar

364 Das · Baker

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 3: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

a b

Figure 1Blind de novo predictions of protein structure can achieve high resolution. (a) Rosetta prediction for theCritical Assessment of Techniques for Protein Structure Prediction (CASP)6 target, T0281, agrees withthe subsequently released crystal structure shown with rainbow coloring from the N terminus (blue) tothe C terminus (red ) (1whz) with a backbone accuracy of 1.6 A over 70 residues (3). (b) Rosetta predictionfor the CASP7 target, T0283, (white) agrees with the subsequently released crystal structure also shownin rainbow coloring (2hh6) with a backbone accuracy of 1.4 A over 90 residues (4). A nine-residueC-terminal helix that makes contacts with crystal neighbors is not shown. These panels and the followingfigures were prepared in PyMOL (Delano Scientific, Palo Alto, CA).

groups away from water; the close, nearlyvoid-free packing of buried groups and atoms;and the formation of intramolecular hydrogenbonds by nearly all buried polar atoms (6, 7).The first feature is a direct consequence of thehydrophobic effect, recognized by Kauzmann(8) many years ago as the dominant drivingforce in protein folding. The second featurereflects van der Waals interactions betweenburied atoms and, perhaps more importantly,the strong size dependence of the free-energycost of forming a cavity in solvent to accom-modate the protein. The third feature followsfrom the significant free-energy cost of strip-ping water molecules from polar groups uponfolding, which must be compensated by theformation of new hydrogen bonds within theprotein or nucleic acid molecule. Recognitionof these features, especially hydrogen bond-

ing, was fundamentally important for the ear-liest predictions of the basic secondary struc-ture motifs in proteins by Pauling and col-leagues (as reviewed in Reference 9) and, later,in nucleic acids by Watson & Crick (10).

A successful free-energy function mustcapture to some extent these dominant con-tributions to macromolecular stability. InRosetta, atom-atom interactions are effi-ciently computed using a Lennard-Jonespotential to describe packing, an implicit sol-vation model to describe the hydrophobiceffect and the electrostatic desolvation cost as-sociated with burial of polar atoms, and an ex-plicit hydrogen-bonding potential to describehydrogen bonding (1) (see Figure 2c). As wehave discussed elsewhere (11–13), an explicittreatment of hydrogen bonding has advan-tages over the classical electrostatic models

www.annualreviews.org • Macromolecular Modeling with Rosetta 365

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 4: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

a b c

Hydrophobic residuesPositively charged residuesNegatively charged residuesPolar residues

Hydrogenbonds

Nonpolaratoms

Figure 2Schematic of Rosetta molecular modeling in the context of de novo structure prediction. (a) “Snapshot”of low-resolution fragment assembly: conformation after five nine-residue fragment insertions of thesequence of the phage 434 repressor protein. All backbone heavy atoms are simulated and illustrated hereas a ribbon (rainbow). Side chains are represented in the simulation by interaction centers (spheres), withan energy function favoring burial of hydrophobic residues ( gray) and the exposure of positively charged(dark blue), negatively charged (red ), or other polar ( green) residues (21). (b) Final low-energyconformation produced by fragment assembly. (c) All-atom model produced after high-resolutionrefinement. Pair-wise Lennard-Jones and solvation terms give attractive interactions between nonpolaratoms ( gray); hydrogen bonds ( green dotted lines) are also assigned attractive energies (1). For clarity,hydrogen atoms are not shown.

employed in most molecular mechanics po-tentials in that the orientation dependence ismore correctly modeled. Furthermore, long-range electrostatic interactions, which arenotoriously difficult to compute accuratelyowing to induced polarization effects, arestrongly damped (12). An insightful compari-son of the Rosetta and other energy functionsused by different laboratories has been carriedout recently (14).

Bonded interactions are treated in Rosettafor the most part with bond lengths and an-gles fixed at their ideal values. The remain-ing degrees of freedom are the bond torsionangles, and the associated torsional poten-tials are perhaps the most difficult aspect ofany force field to represent accurately ow-ing to the influence of inherently quantummechanical effects, which cannot be rigor-

ously decomposed into independent classi-cal contributions. These potentials are mod-eled empirically in Rosetta on the basis oftorsion angle distributions observed in high-resolution crystal structures. This procedureis far from optimal because of the doublecounting of effects already captured in thenonbonded interaction terms and also on aes-thetic grounds. Overall, the rigorous determi-nation of bonded interactions continues to bea formidable challenge (see, e.g., the discus-sion in Reference 15).

The resulting energy function, encodingthe basic physics of molecular interactions, isnecessarily approximate. For example, the ex-plicit structure of solvent, long-range electro-statics, and residual dynamics in the moleculehave been ignored. Another striking omissionis the massive entropy change of the molecule

366 Das · Baker

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 5: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

upon attaining an ordered structure; we haveassumed, to a first approximation, that theconformational entropies of different well-packed protein conformations are similar.

Nevertheless, it is important to recognizethat success in prediction and design prob-lems neither requires nor is an indicator ofexceptionally high accuracy of the assumedenergy function. Rather, the success of struc-ture prediction partly stems from the verylarge energy gap that must exist between theexperimentally observed conformation of afolded polymer and the vast majority of non-native conformations. In the context of pro-tein structure prediction, a molecule being“folded” indicates that, at equilibrium, it hasa very high probability of being in a single na-tive state. If this probability exceeds 99.9%,the free-energy gap between the native stateand the ensemble of nonnative states mustbe at least �G = kBT log (0.999/0.001) =4 kcal/mol, by the Boltzmann relation. In-deed, this free-energy gap is typically mea-sured at 3–10 kcal/mol (16). However, becauseof the huge decrease in entropy accompany-ing folding, the gap in energy (rather than freeenergy) must be much larger. Experimentaland theoretical estimates of conformationalentropy suggest that the energy of the nativestate must be on the order of 100 kcal/mollower than that of any member of the en-semble of unfolded states (∼1.4 kcal/molper residue) (see, e.g., References 17 and18). Assuming the native state can be lo-cated with reasonable confidence if the er-ror in the energy function is on the orderof 10% of the energy gap, this allowed er-ror can thus be many kcal/mol. [This argu-ment is oversimplified; the accuracy actuallyrequired for structure prediction is somewhatgreater, given that there may be a small num-ber of alternative conformations with energieswithin a few kcal/mol of the native structure(19).] Although errors in the range of sev-eral kcal/mol do not appear to compromisestructure prediction, they do present prob-lems for applications requiring the quantita-tive, high-accuracy estimation of free-energy

differences. Furthermore, the challenges as-sociated with estimating conformational en-tropy changes make the accurate computationof absolute free energies of folding or bindingexceptionally difficult.

Regardless of what approximations aremade in the assumed all-atom energy func-tion, a crucial aspect of discriminating nativestructures and viable designs is the mainte-nance of contacting atoms at their observedcharacteristic spacings (3, 20). Unfortunately,the short-range repulsion required to enforcethese contact distances results in an excep-tionally rugged energy landscape, with highbarriers close to even the deepest minima. Tomake the sampling problem more tractable,it is useful to construct a smoother versionof this all-atom potential whereby degreesof freedom associated with the rapid fluctua-tions are effectively “integrated out.” For ex-ample, in Rosetta, the initial phase in manycalculations is a search on a smoothed en-ergy landscape where the side chain degreesof freedom are represented as soft interac-tion centers (21) (see also Figure 2a,b). Forproteins, the dominant driving forces in thisrepresentation are the nonspecific burial ofhydrophobic residues and the pairing of β-strands into sheets, with a smaller contribu-tion from specific but averaged out interac-tions between side chain centers. For nucleicacids, the forces in this smoothed represen-tation are coarse-grained potentials favoringbase pairing and base stacking (see Reference22).

Conformational Sampling

The first stage in the search for the globalminimum involves locating a large numberof local minima in the coarse-grained lower-resolution potential (Figure 2a,b). The ne-cessity of exploring as many local minimaas possible is a consequence of coarse-grainsmoothing, which necessarily introduces largeerrors owing to the missing critical contribu-tion of interatomic packing to the true freeenergy. Indeed, given the approximate nature

www.annualreviews.org • Macromolecular Modeling with Rosetta 367

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 6: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

of the low-resolution energy function, it is im-portant wherever possible to bias the searchwith any additional information available. Forexample, the Rosetta low-resolution structureprediction for protein and RNA molecules isbased on a picture of folding in which localchain segments sample from distributions oflocal conformations that are relatively low inenergy given their sequences. Folding takesplace when a combination of local confor-mations is sampled that makes possible low-energy tertiary interactions. This flickeringbetween different local structures is modeledby assuming that the distribution of statessampled by a sequence segment in isolationis reasonably well approximated by the distri-bution of structures observed for the sequencein prior crystal structures.

The second stage in the search for the low-est free-energy minimum starts from each ofthe alternative minima identified in the initiallow-resolution search and adds back the miss-ing atomic detail (Figure 2c). In the proteinfolding case, this is carried out by first per-forming a simulated annealing search throughcombinations of discrete amino acid rotamers.In protein design calculations, the process isidentical, except all rotamers of all amino acidsare considered at each position rather thanjust the rotamers for a particular native se-quence. Then, to further optimize the ge-ometry, Rosetta employs a multistep MonteCarlo minimization procedure composed ofan assortment of torsion angle perturbations,with each perturbation followed by efficientone-at-a-time rotamer optimization and thenby continuous gradient-based minimizationof side chain and backbone torsion angles (1).Beyond the side chain optimization protocoldescribed above, the conformational pertur-bations include small changes to the back-bone torsion angles (see, e.g., Reference 1)and shifts of the relative rigid-body orienta-tions of multiple domains. The implemen-tation of these moves for general modelingproblems and the present capabilities and lim-itations of this challenging search procedureare discussed in the next section.

A UNIFIED FRAMEWORK FORBIOMOLECULAR MODELINGAND DESIGN

The development over the past few years ofan all-atom energy function and of an ef-fective high-resolution search procedure hasexpanded molecular modeling work well be-yond the de novo prediction of the structuresof globular, soluble proteins. The same ba-sic ingredients—and, indeed, the same coresoftware routines—are now used to constructcomparative models of large proteins, to pre-dict protein-protein interfaces, to design newproteins, and even to explore polymers be-yond proteins. In this section, we briefly de-scribe how these seemingly diverse molecularmodeling problems can be tackled within thesame framework.

Same Ingredientsfor Multiple Problems

Quite generally, any prediction or design chal-lenge can be formulated as a global optimiza-tion problem with suitable degrees of freedomand constraints. The setup for all such prob-lems is similar. First, the kinematic rules thatthe atoms follow upon changing any inter-nal torsional or rigid-body degree of freedomare defined. Second, where relevant, alterna-tive sets of discrete states for distinct subtreesare specified (different side chain rotamers ateach sequence position, for prediction prob-lems, or rotamers for all or a selected subsetof amino acids, for design problems). Third,the values of the internal degrees of freedomare initialized, potentially with informationfrom existing templates. Finally, a scheduleof conformational moves is set into motion;this protocol and the associated energy func-tion can be quite similar across many differentproblems.

The main difference between differentclasses of modeling problems is typically thefirst component, the definition of kinemat-ics. In packages that simulate molecular dy-namics, a file listing the simulated atoms and

368 Das · Baker

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 7: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

their connections typically supplies this keyinformation. Within Rosetta, these kinematicrules are encoded in an appropriate tree-like representation, as depicted in Figure 3[a protein-specific implementation has beendescribed in References 23 and 24; tree-based representations are also used in sev-eral other programs (25–28)]. During eachRosetta Monte Carlo move, a subset of theinternal degrees of freedom of the molecule,such as the backbone torsion angles of severalrandomly selected residues, is changed. Atomsthat are marked as “ancestors” of the changedresidues in the atom tree remain fixed, andthe affected atoms and their “descendants”are translated and rotated to appropriatelypropagate the change. In addition to changesin torsion angles, the relative rigid-body ori-entations of two domains can be changed, andsuch moves are propagated across noncova-lent connections encoded in the atom tree;see the thin, colored lines in Figure 3. Fur-ther, the current atom-tree framework allowsbond lengths and bond angles to deviate from

starting values, although this feature has notyet been widely explored.

As an illustration, the atom-tree diagramsfor the low-resolution phase of de novo mod-eling and for rebuilding loops in comparativemodels are shown in Figure 3a,b. De novoprediction uses a simple atom tree, with eachbackbone atom connected to its neighbor;internal torsion angles are initialized to uni-form values to be changed during the simu-lation. For loop modeling, backbone move-ments need to be carried out in the rebuiltsegments (colored lines) without perturbingthe rest of the structure, so temporary chainbreaks are introduced into the atom tree [thingray lines in Figure 3b (29–31, 42)]. Afterthis setup, both de novo and loop rebuildingsimulations proceed in a similar fashion, us-ing essentially the same move sets except foradditional loop closure steps in the latter case.

In the above framework, tackling a newproblem is generally as simple as creatinga new, appropriate atom tree. The powerof this approach is further illustrated by

−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−→Figure 3A unified framework for tackling multiple molecular modeling problems. Each panel depicts a problemin biomolecule structure prediction or design (left) and a diagram of the atom-tree representation usedby Rosetta (right). (a) De novo structure prediction for the phage 434 repressor protein (PDB code: 1r69)(2, 82). (b) Loop modeling carried out during comparative modeling of the CASP7 target T0331,pyridoxamine 5′-phosphate oxidase-related protein (2hhz) (4). (c) Protein-protein docking for a host andviral major histocompatibility complex receptor,1p7q (83), with full flexibility of all degrees of freedom(backbone, side chain, rigid body). (d ) Protein-protein docking with backbone flexibility limited to ahinge region (red ) and to a loop (blue) (24). (e) Symmetric folding and docking to model the coiled-coiltrimerization motif of coronin 1, 2akf (84; I. Andre, R. Das, D. Baker, unpublished results).( f ) Symmetry-constrained modeling of the sequence NNQQNY from prion Sup35 (9, 35).( g) Small-molecule docking of a steroid with an antibody, 2dbl (36, 85). (h) RNA fold prediction for apseudoknot with two known Watson-Crick pairings, 1l2x (22, 86). (i ) Redesign of a protein-proteininterface between colicin E7 DNase and the Im7 immunity protein, 1ujz (40). ( j ) Design of a novelretroaldolase enzyme into scaffold 1a53 (42, 87). (k) Redesign of an interface between a homingendonuclease and its DNA-binding site for altered cleavage specificity, 2fld (43). In all panels, coloredsegments represent degrees of freedom that are sampled; gray segments are held fixed during modeling.Thick lines represent torsional degrees of freedom in the polymer; arrows give the backbone direction(N terminus to C terminus for proteins; 5′ to 3′ for nucleic acids), and small white gaps representtemporary chain breaks required to maintain an acyclic tree representation. Each thin line represents sixrigid-body degrees of freedom for the translation and rotation between the connected portions of theatom tree. Faded colors in (e) and ( f ) represent torsional and rigid-body degrees of freedom that arecloned from segments in dark colors. Side chain rotameric degrees of freedom in ( g)-(l ) are representedas solid circles; open circles represent the expansion of the sampled rotamer set to include all amino acidtypes.

www.annualreviews.org • Macromolecular Modeling with Rosetta 369

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 8: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

flexible backbone protein-protein docking.Backbone conformational changes occur fre-quently upon protein binding; thus the fixed-backbone approximation used in most currentdocking algorithms typically precludes high-resolution prediction (32, 33). The most gen-

eral representation of this problem allows allinternal and rigid-body degrees of freedom tovary (Figure 3c), but this entails the searchof a huge conformational space. Instead, witha combination of flexible and rigid atom-treesegments, it is straightforward to supplement

a Protein structure prediction b Loop modeling

c Protein docking (fully flexible) d Protein docking (partly flexible)

e Symmetric complexes f Fibril modeling

HingeLoop

Figure 3(Continued )

370 Das · Baker

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 9: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

the rigid-body and side chain degrees of free-dom with movement in specific loops or hingeregions (Figure 3d ) (24, 34). Alternatively, inspecial cases, a comprehensive search of all de-grees of freedom can be carried out if symme-try constraints are available. Figure 3e showsthe atom tree corresponding to the problem ofmodeling the structure of a three-helix coiledcoil with cyclical symmetry (35; I. Andre, R.Das, D. Baker, unpublished results); a differ-

ent atom-tree topology (Figure 3f ) describesthe problem of modeling extended fibrils (35)and is used to model a variety of disease-associated amyloid structures. Moves carriedout on the first protein chain are copied to theother chains, including torsion angle changesas well as shifts in the overall translation androtation of the chain (thin colored connec-tions in Figure 3). Without developing alarge amount of additional code, the energy

g Small-molecule docking h RNA folding

i Protein designj Protein-protein interface design

k Enzyme design l Protein-DNA interface design

www.annualreviews.org • Macromolecular Modeling with Rosetta 371

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 10: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

function used in de novo folding and a hybridof the folding and docking conformationalsearch procedures can be transferred directlyto this atom tree for both low-resolution andhigh-resolution phases.

Other molecular modeling problems aretreated in Rosetta by following the samesteps of defining an atom tree, initial con-ditions, accessible rotamers, and a scheduleof Monte Carlo moves affecting the flexibledegrees of freedom. Modeling the bindingof small molecules to proteins is very simi-lar to the problem of protein-protein dock-ing (Figure 3g) (36). Modeling of membraneproteins follows a protocol similar to themodeling of soluble proteins, albeit withdifferent low-resolution and high-resolutionenergy functions (37). Searches of nucleic acidconformations are also possible. Indeed, forRNA folds in which some Watson-Crick basepairings are known, this extra pairing infor-mation can be retained throughout the sim-ulation with the appropriate atom tree andinitial rigid-body transformations betweenpaired residues (Figure 3h) (22, 23).

For a seemingly different molecular mod-eling application, the engineering of newproteins, the same framework is used. Indesign calculations, the simulated annealingmove used to optimize side chain rotamersin structure prediction is simply expandedto include rotamers from all amino acidtypes (Figure 3i ). If desired, such “fixed-backbone” design moves can be alternatedwith structure refinement—a strategy used tosuccessfully design a protein with a novel α/β-fold to atomic resolution (20)—within a singleprotocol. Protein design methods have beenused recently to design a sequence that canswitch between two very different folded con-formations (38), and further highlights andcurrent challenges in protein design have beenreviewed (39).

Beyond the design of globular proteins, theenergy functions and search procedures de-veloped for protein-protein docking, small-molecule docking, and protein/nucleic acidinteractions can be converted into analo-

gous design applications. Figure 3j illustratesprotein-protein interface design, which cou-ples refinement of rigid-body orientation withoptimization of the amino acids at the inter-face to create new pairs of interacting pro-teins (40). Figure 3k shows the atom tree fordesigning an enzyme by optimizing interac-tions with a model for the reaction transi-tion state (41, 42); and Figure 3l illustratesprotein-DNA interface design, recently usedto reengineer endonuclease cleavage speci-ficity for gene therapy and other applications(43).

Similar Challenges AcrossMultiple Problems

An advantage of using the same overall frame-work to approach this wide range of prob-lems is the tremendous amount of feedbackthat can be compiled to diagnose limitationsand to improve the underlying free-energyfunction and sampling methodologies. In theRosetta development community, there havebeen numerous occasions in which improve-ments in either the energy function or sam-pling methodology driven by findings in onearea have led to payoffs in another area. Forexample, one crucial step that enabled bothimproved discrimination of native proteinstructures in prediction problems (44) and thecreation of well-ordered protein folds in de-sign problems (20, 37, 45) was to expand theatomic radii of the Rosetta energy function,decreasing the probability of artificially tightside chain packing. At the same time, havinga unified framework clearly exposes the twoformidable issues that still hamper the molec-ular prediction and design problems that canbe currently modeled with Rosetta: the lim-its of conformational search and inaccuraciesin the treatment of polar interactions in theenergy function.

With current conformational samplingstrategies and available computational power,every prediction problem discussed above be-comes intractable at some point. The confor-mational space grows exponentially with the

372 Das · Baker

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 11: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

number of degrees of freedom, roughly rep-resented by the total length of colored seg-ments in each atom-tree diagram in Figure 3.For example, the low-resolution phase of denovo structure prediction can only find foldswithin the 2-A “radius of convergence” ofthe all-atom refinement procedure for pro-tein lengths less than ∼100 residues and, eventhen, only in favorable cases (2, 4). Simi-larly, the ability of high-resolution compar-ative modeling to improve over existing tem-plates drops dramatically for protein domainswith lengths greater than ∼200 residues (4).Problems that additionally require modelingrigid-body degrees of freedom along withtorsional degrees of freedom, e.g., protein-protein docking or protein/small-moleculedocking, have only yielded high-resolutionmodels in cases where the protein backboneremains close to a known template (46, 47) orthe number of flexible residues (35; I. Andre,R. Das, D. Baker, unpublished data) is lessthan ∼30. For larger problems, the confor-mational space that needs to be searched canbe significantly contracted through the use ofeven limited experimental data, a frontier dis-cussed in detail below.

In addition to making some de novo mod-eling problems essentially intractable, thecurrent difficulty of conformational searchgreatly limits the throughput and availabil-ity of high-resolution modeling. In the recentCritical Assessment of Techniques for ProteinStructure Prediction (CASP)7 trials, high-resolution de novo structure prediction sim-ply could not be carried out by the automatedRobetta server owing to the two-day limitson automated predictions. More practically,in silico screening of small-molecule inhibitorsfor proteins at high resolution requires a fewhundred computer hours to be expended oneach case. Given this daunting computationalexpense, large-scale high-resolution screen-ing of thousands of small molecules with mul-tiple protein targets allowing for flexibility ofboth the ligand and the protein—as would bedesirable for most drug design applications—appears currently out of reach.

CASP: CriticalAssessment ofTechniques forProtein StructurePrediction

Even though conformational samplinglimits most areas of high-resolution molec-ular modeling, there are known defects inthe underlying energy function. As discussedabove, quantitative estimates of free-energychanges that involve large changes in molec-ular order, e.g., for folding of whole proteindomains or for the ordering of large loops,are currently intractable owing to the diffi-culty of estimating conformational entropy.Of further worry are intrinsic problems withtreating polar interactions. For example, ap-plication of aggressive optimization methodsto small proteins is beginning to reveal non-native structures assigned lower all-atom en-ergies than native structures for a subset ofcases, including the chymotrypsin inhibitorand trp cage (R. Das & D. Baker, unpublisheddata). The native structures in these cases ex-hibit solvent-exposed hydrogen bonds, someinvolving charged residues; the energetics ofthese interactions may require taking intoaccount explicit solvent, long-range electro-static interactions and/or atomic polarizabil-ity. These types of polar interactions are moreprevalent at active sites of proteins as hotspot interactions in protein-protein interfaces(48) and as mediators of protein/nucleic acidinteractions (49). We have correspondinglyfound that prediction and design of polar in-teractions have been more challenging thanproblems for systems stabilized primarily bynonpolar interactions. Ongoing efforts to ex-plicitly model discrete water molecules (50)and to develop polarizable force fields shouldcontribute to improving prediction and de-sign of challenging, highly polar systems (51).

Toward RNA and OtherHeteropolymers?

Although most of the work described abovehas focused on protein molecules, other poly-mers with intricate structures, notably RNAmolecules, have important roles in modern-day viruses and cells and may have domi-nated biology in its primordial stages (52).More broadly, a vast range of nonbiological

www.annualreviews.org • Macromolecular Modeling with Rosetta 373

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 12: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

heteropolymers with functional structurescan now potentially be created and evolved,thanks to advances in combinatorial chemistry(as reviewed in Reference 53). Even thoughrandom sequences of these other polymersare not expected to have the energy gaps thataid current protein modeling, molecules un-der selection, either in vivo or in vitro, mayevolve these properties in order to robustlycarry out their selected function and thus bedescribable by the same basic ingredients de-veloped for protein modeling in Rosetta.

Thus, naturally occurring RNA moleculesthat attain structure without protein cofac-tors, such as a recently discovered class of“riboswitches” (54), are potentially excellenttargets. Initial studies of automated structureprediction with fragment assembly of RNAguided by a low-resolution energy functionsuggest that a fairly comprehensive conforma-tional search can be carried out for sequencesof less than 30 residues (22). For larger RNAs,the secondary structure (Watson-Crick basepairing pattern) of these molecules can typi-cally be inferred from energy-based or phylo-genetic analysis, dramatically decreasing theamount of conformational space that needsto be searched (55). By analogy with thehistory of protein structure prediction, animportant next step is the development ofa high-resolution energy function that pro-vides a quantitative and reasonably accurateaccounting of base stacking and hydrogenbonds. It will be interesting to see whethersuch energy terms will be accurate enough,and whether functional RNAs have evolvedlarge enough energy gaps, to allow compu-tational discrimination of functional struc-tures from nonnative structures. The seemingprevalence of kinetic traps and alternativestructures in the folding of large functionalRNAs (56) suggests that the discrimina-tion problem may not be as simple as forproteins.

Looking further in the future, nonnaturalpolymers such as nucleic acids with alterna-tive bases (see, e.g., References 57 and 58),peptide nucleic acids (59), or peptoids [N-

linked polyglycine (60)] may gain in impor-tance. Long cooperatively folding sequencesof these new heteropolymers may be evolvablein vitro to become enzymes, drugs, or nan-otechnology scaffolds with useful propertiesorthogonal to existing biopolymers. The denovo prediction of these new molecules’ sec-ondary and tertiary structures—without priorextensive crystallographic information on se-quences of the same polymer—will providepowerful tests of our understanding of molec-ular biophysics. The de novo prediction anddiscovery of the fundamental secondary andtertiary structure motifs for these new classesof heteropolymers—investigations that paral-lel the classic work on proteins by Paulingand colleagues or on nucleic acids by Watsonand Crick—will be exciting studies. We ex-pect that carrying out calculations on thesenew polymers will benefit greatly from the ba-sic insights and powerful methods developedfor protein structure prediction.

CAN MOLECULAR MODELINGBE A PRACTICAL TOOL?

The advent of a unified approach to multiplemolecular modeling problems (Figure 3)suggests that computational approachesmay soon have a broad impact on biologyand medicine. Applications of computationaldesign to molecular engineering are being ex-plored widely and can be validated by testingwhether the sequences carry out their desiredfunction. However, for high-resolution struc-ture prediction problems, high-resolutionvalidation will typically not be available.How much can a practicing investigator trustany theoretical model? As described above,limitations of modeling produced by the ap-proximate energy function and limited con-formational search will likely bedevil structurepredictions for many years to come. Nev-ertheless, molecular modeling can provide areliable basis to guide and inform research if itcan be coupled creatively and rigorously to ex-perimental methods. Although these are earlydays, several initial investigations suggest

374 Das · Baker

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 13: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

that hybrids between structure predictionand both high-resolution and low-resolutionexperimental approaches can accelerate theinference of trustworthy structural models.

The Phase Problemin X-ray Crystallography

One particularly fruitful interface betweenhigh-resolution macromolecular modelingand high-resolution structure determinationinvolves determining the phase estimates re-quired for converting diffraction data intoelectron density maps. Although such phasesare typically inferred from experiments onheavy-atom derivatives of proteins, morerapid solutions can be obtained by molecu-lar replacement if a model with high struc-tural similarity to the crystallized protein isavailable (61, 62). In the absence of prior suf-ficiently accurate structures, high-resolutioncomparative modeling is now able to bol-ster the success rate for molecular replace-ment in favorable cases in which the structuresof even quite distant homologs are available(see, e.g., References 63–65). Recent workhas also raised the exciting possibility of “denovo phasing” of diffraction data for proteinswith de novo models (Figure 4a), althoughthe successes so far have involved easy targetswith significant noncrystallographic symme-try (66) or high solvent content crystals andfew molecules in the asymmetric unit (65, 66).A largely unexplored frontier is the use ofdiffraction data earlier in the modeling; forexample, likelihood scores for data withoutphases or with only weak phase information(67) may provide extra energy terms for re-finement. Such enhancements will likely beimportant for making high-resolution com-putational modeling a practical addition to thearsenal available for crystallographic phasing.

High-Resolution NMR Structuresfrom Limited Data

Beyond applications in crystallographic phas-ing, molecular modeling may have even more

to contribute to high-resolution structuralinference on the basis of NMR spectra. Inaddition to potentially increasing the rateof resonance assignment (68–70), energy-based refinement now appears capable ofconsistently improving the backbone accu-racy and core side chain packing of medium-resolution NMR models (Figure 4b) (65). Forsmall proteins, very high-throughput high-resolution structure determination using onlychemical shift data during fragment assemblyis another exciting prospect (71, 72). Estab-lishing tests of refinement accuracy, e.g., usingcorrelations of final structures to atom-atomcontact data that are set aside during mod-eling or using packing and buried hydrogenbonding quality measures discussed below, isa critical challenge that needs to be addressedbefore these approaches to structural infer-ence can come into wide use.

High-Throughput Techniques

As is apparent from accurate reconstruc-tions on the basis of limited NMR chemicalshift measurements (71, 72), high-resolutionmolecular modeling can benefit from evensmall quantities of external data to constrainthe conformational space that needs to besearched. It is thus exciting to note majorprogress in “lightweight,” high-throughputexperimental approaches that can potentiallyyield such constraints for every molecule ormolecular complex that can be expressed andpurified. Hydroxyl radical footprinting (73)and hydrogen-deuterium exchange (74) re-quire small amounts of sample and samplepreparation times. Via mass spectrometryreadout, these methods give rapid informa-tion on burial of side chains and backbonehydrogen bonds, respectively (Figure 4c).Effective use of this burial data to guide mod-eling, for example, as a score in de novofragment assembly, has not been demon-strated and remains an interesting challenge.Distance constraints from multiplexed cross-linking technologies (see, e.g., Reference75) would be more easily incorporated into

www.annualreviews.org • Macromolecular Modeling with Rosetta 375

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 14: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

structure predictions; the combined effortsof several groups will hopefully make high-throughput cross-linking a widely applica-ble method in the near future. If such ap-

proaches for residue-level structural informa-tion can be carried out rapidly and generallyfor protein domains and protein-protein in-teractions, these data will be of great use for

a b

c d

NMR modelLowest energy structureHigh-resolution crystal structure

Buried side chainsSide chains exposed to solvent

376 Das · Baker

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 15: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

rapid validation of structure predictions; usingthese data to actually guide high-resolutionstructural inference is a nascent and excitingfrontier.

From Low-Resolution Mapsto High-Resolution Models?

Perhaps the most rapidly growing new sourceof low-resolution information on biomolecu-lar complexes is cryoelectron microscopy (76);this technique presents outstanding opportu-nities for all-atom molecular modeling meth-ods. In the best of these maps, nucleic aciddouble helices and protein α-helices and β-sheets become clearly identifiable, and in-dividual β-strands and connections betweensecondary structure elements can sometimesbe inferred. Several groups are developingmethods for automated or manual annota-tion of map features and connectivity (see,e.g., Reference 76). A natural next step is thehigh-resolution modeling of side chains andoptimization of backbone coordinates, fol-lowed by energy-based refinement and selec-tion (Figure 4d ) (see Reference 77). Althoughmaps typically involve large complexes, theconformational space that needs to be sam-pled can be greatly reduced if the secondarystructure elements can be approximately lo-cated. For high-resolution refinement meth-ods to be credible, strategies to avoid artifactsfrom map noise, from inaccuracies in the all-

atom energy function, and from incorrect orambiguous secondary structure assignmentsneed to be developed. If these challengescan be met, all-atom modeling may dramati-cally extend the resolution of cryoelectron mi-croscopy from maps at subnanometer resolu-tion (4–8 A) to models of protein domains andprotein-protein interactions approaching thehigh resolution offered by crystallography andNMR.

Achieving Confident Models

We have outlined several exciting connectionsof experimental methods to molecular model-ing. In each case, use of actual data bolsters thecredibility of the final models. Still, how canone be sure that artifacts are not introducedby the incorrect energy terms or incompleteconformational searches in the modeling pro-cedure? For example, molecular replacementphases for crystallographic data can introducesignificant model bias, unless care is taken tocross-validate final coordinates with diffrac-tion data not used during map refinement.Perhaps a more troubling scenario would bea high-resolution model derived from low-resolution cryoelectron map density because,generally, no independent data would be avail-able to confirm model coordinates in atomicdetail.

In the structure prediction field, assess-ing confidence in molecular models is nota precise science. In our own experience,

←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Figure 4Connecting high-resolution molecular modeling to experimental structural inference.(a) Crystallographic phasing from a de novo model. Electron density map (2mFo-DFc; 2σ contour) forT0283 diffraction data (see Figure 1b), phased by molecular replacement with the Rosetta model andrefined automatically, agrees with coordinates deposited in the Protein Data Bank using experimentallyderived phases (rainbow sticks) (65). (b) Improving the accuracy of an NMR ensemble. Rebuilding andrefinement of NMR models (red ) (88) guided by the Rosetta energy function yields the lowest-energystructure ( green) in better agreement with the high-resolution crystal structure of the same protein (darkblue) (65, 89). (c) Illustration of hydroxyl radical footprint information for hen egg lysozyme (1e8l) (90).Experimental data indicate which phenylalanine and tryptophan side chains are buried (black) or exposedto solvent (white) and may be useful in guiding structure prediction. (d ) Medium-resolution cryoelectronmicroscopy maps may provide sufficient information for determining high-resolution models. This is asimulated map for a capsid protein from the rice dwarf virus (see Reference 91) overlaid on ahigh-resolution structure (92).

www.annualreviews.org • Macromolecular Modeling with Rosetta 377

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 16: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

convergence of independent prediction runsto similar lowest-energy structures is gener-ally a hallmark of accuracy (78), but thereare exceptions, such as the highly polar chy-motrypsin inhibitor case mentioned above.Interestingly, the tools commonly used todouble-check the accuracy of NMR or crys-tal structures lose discrimination in assess-ing models produced by de novo molecularmodeling. For example, completely inaccu-rate models from Rosetta all-atom structurepredictions typically pass the tests for sidechain clashes, side chain rotamers, and back-bone configurations in the MolProbity pack-age, essentially by construction (65; R. Das, D.Baker, S. Raman, unpublished results). Onenew possibility derives from the near univer-sality of the fundamental features of foldedmacromolecular structures described above;we have found that statistical measures forthe overall atomic packing and extent of hy-drogen bonding of buried polar atoms areable to discriminate quite well between nativestructures and incorrect models (W. Sheffler,unpublished results). It is exceptionally diffi-cult to produce an incorrect protein structuremodel with perfectly packed core side chainsand no unsatisfied buried polar atoms. Thedevelopment of robust metrics for evaluat-ing model accuracy is an active area of cur-rent research (see, e.g., References 79 and 80).Nevertheless, we would expect no investiga-tor to believe molecular models, even thoseestimated to have high accuracy, unless theyare confronted with independent experimen-tal tests. We therefore propose two steps thatthe molecular modeling field needs to takebefore de novo rebuilding and all-atom re-finement tools are taken seriously and usedwidely.

First, investigators need to develop proce-dures that always set aside a fraction of theavailable data that will not be utilized dur-ing modeling and will thus give a reasonablyindependent measure of accuracy at the endof modeling. The obvious paradigm here isthe computation, publication, and discussionof Rfree values (81) for diffraction data that is

now carried out universally by the crystallog-raphy community. It is not difficult to imag-ine similarly rigorous protocols, dividing datainto training and test sets, for de novo mod-eling methods guided by NMR chemical shiftdata or mass spectrometric data on residueburial. Envisioning the details of such a cross-validation approach for high-resolution mod-eling with cryoelectron microscopy maps ismore difficult and remains an important chal-lenge for the field.

Second, progress in experimentally cou-pled structural inference may be best cat-alyzed through blind trials, much as theCASP and the Critical Assessment of Pre-diction of Interactions (CAPRI) have pro-moted rapid progress in “experiment-blind”structure prediction. Perhaps the sequencesof a subset of CASP targets with impendingcrystal structures could be given to predic-tors along with the available X-ray diffrac-tion data but without experimental phases.For NMR cases, partial NMR chemical shiftdata—but not additional atom-atom con-tact data from the nuclear Overhauser ef-fect or residual dipolar coupling data—mightbe made available. Independent assessors canthen check whether the resulting structuralmodels superimpose well on subsequentlyreleased structures solved with traditionalstructure determination methods that requiremore data, time, and experimental expense.Success in blind trials would lend much cred-ibility to these new molecular modeling ap-proaches that have the potential to increasethe resolution and efficiency of structuralinference.

MACROMOLECULARMODELING FOR THECOMMUNITY

The advances described in this revieware the collective work of many sci-entists and research groups throughoutthe world who are collaborating on fur-ther developing and improving the ap-proaches described in this review. All of the

378 Das · Baker

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 17: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

resulting code is available freely to academicusers (http://www.rosettacommons.org/).One of the current goals of this Rosetta de-velopment team is the creation of a stream-lined version of the software that will allowthe implementation of new protocols usingnew atom trees, via intuitive interaction with

a graphical user interface as well as via simplescripts. We expect such a user-friendly toolkitfor prediction and design to not only accel-erate molecular modeling research for expertRosetta users but also to help bring thesetools into wider use throughout the biologicalcommunity.

SUMMARY POINTS

1. Many of the basic physical principles underlying biomolecular structure and inter-actions appear reasonably well understood. Success in molecular modeling requiresfairly accurate energy functions which embody these principles as well as effectivesampling methodologies for identifying very low-energy structures (in predictionproblems) and very low-energy sequences (in design problems).

2. The energy function and sampling methodologies in the Rosetta program provide aunified framework for the prediction and design of macromolecular structures andinteractions, with recent examples ranging from fibril structure prediction to RNAfolding to the design of new enzyme catalysts.

3. In favorable cases, de novo structure prediction can approach atomic resolution, pro-tein structure models can be refined to higher resolution, and novel proteins with newand useful functions can be designed.

4. Combination of high-resolution molecular modeling with experimental structuralmethods has the potential to improve crystallographic phasing, NMR structural in-ference, and lower-resolution approaches that make use of high-throughput massspectrometric data and cryoelectron microscopy maps.

FUTURE ISSUES

1. Improvements in conformational search strategies are required to go beyond thesmallest biomolecules and the simplest modeling problems, along with making theseapproaches more widely available to investigators with access to modest computationalpower.

2. Major improvements in the treatment of polar interactions are critical for the accurateprediction and design of enzyme active sites, solvent-exposed loops, and protein/nucleic acid interfaces.

3. Feedback from applications of high-resolution molecular modeling to an ever widen-ing array of prediction and design problems, including the modeling of long het-eropolymers, such as RNA and polypeptoids, will continue to provide rigorous testsof the underlying physical principles and stimulate continued improvement of com-putational methods.

4. Precise statistical assessment measures and blind validation trials are important nextsteps for promoting experimentally coupled molecular modeling into a practical toolfor accelerated structural inference.

www.annualreviews.org • Macromolecular Modeling with Rosetta 379

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 18: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

DISCLOSURE STATEMENT

The authors are not aware of any biases that might be perceived as affecting the objectivity ofthis review.

ACKNOWLEDGMENT

It it impossible for us to adequately thank the many dozens of wonderful scientists who havecontributed to the development of the Rosetta macromolecular modeling program. Importantcontributions have been made and continue to be made by developers in the research groups ofBrian Kuhlman (University of North Carolina), Jeff Gray ( Johns Hopkins University), TanjaKortemme (University of California, San Francisco), Jens Meiler (Vanderbilt University),Ora Furman-Schueler (Hebrew University), Phil Bradley (Fred Hutchinson Cancer ResearchCenter), Rich Bonneau (New York University), and Carol Rohl (Merck). We thank membersof the Baker group for comments on this manuscript. The writing of this work was supportedby the National Institute of General Medical Sciences, National Institutes of Health and theHoward Hughes Medical Institute (D.B.) and a Jane Coffin Childs fellowship (R.D.).

LITERATURE CITED

1. Rohl CA, Strauss CE, Misura KM, Baker D. 2004. Methods Enzymol. 383:66–932. Bradley P, Misura KM, Baker D. 2005. Science 309:1868–713. Bradley P, Malmstrom L, Qian B, Schonbrun J, Chivian D, et al. 2005. Proteins 61(Suppl.

7):128–344. Das R, Qian B, Raman VS, Vernon R, Thompson J, et al. 2007. Proteins. 69:S118–285. Baker D, Sohl JL, Agard DA. 1992. Nature 356:263–656. Baldwin RL. 2007. J. Mol. Biol. 371:283–3017. Dill KA. 1990. Biochemistry 29:7133–558. Kauzmann W. 1959. Adv. Protein Chem. 14:1–639. Nelson R, Sawaya MR, Balbirnie M, Madsen AØ, Riekel C, et al. 2005. Nature 435:773–78

10. Watson JD, Crick FH. 1953. Nature 171:737–3811. Morozov AV, Kortemme T, Tsemekhman K, Baker D. 2004. Proc. Natl. Acad. Sci. USA

101:6946–5112. Schueler-Furman O, Wang C, Bradley P, Misura K, Baker D. 2005. Science 310:638–4213. Kortemme T, Morozov AV, Baker D. 2003. J. Mol. Biol. 326:1239–5914. Boas FE, Harbury PB. 2007. Curr. Opin. Struct. Biol. 17:199–20415. Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, et al. 1995. J. Am. Chem. Soc.

117:5179–9716. Plaxco KW, Simons KT, Baker D. 1998. J. Mol. Biol. 277:985–9417. Thompson JB, Hansma HG, Hansma PK, Plaxco KW. 2002. J. Mol. Biol. 322:645–5218. Sullivan DC, Kuntz ID. 2004. Biophys. J. 87:113–2019. Englander SW. 2000. Annu. Rev. Biophys. Biomol. Struct. 29:213–3820. Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. 2003. Science

302:1364–6821. Simons KT, Kooperberg C, Huang E, Baker D. 1997. J. Mol. Biol. 268:209–2522. Das R, Baker D. 2007. Proc. Natl. Acad. Sci. USA 104:14664–6923. Bradley P, Baker D. 2006. Proteins 65:922–2924. Wang C, Bradley P, Baker D. 2007. J. Mol. Biol. 373:503–19

380 Das · Baker

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 19: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

25. Karplus K, Karchin R, Draper J, Casper J, Mandel-Gutfreund Y, et al. 2003. Proteins53(Suppl. 6):491–96

26. Abagyan R, Totrov M, Kuznetsov D. 1994. J. Comput. Chem. 15:488–50627. Rice LM, Brunger AT. 1994. Proteins 19:277–9028. Schwieters CD, Clore GM. 2001. J. Magn. Reson. 152:288–30229. Rohl CA, Strauss CE, Chivian D, Baker D. 2004. Proteins 55:656–7730. Chivian D, Baker D. 2006. Nucleic Acids Res. 34:e11231. Misura KM, Chivian D, Rohl CA, Kim DE, Baker D. 2006. Proc. Natl. Acad. Sci. USA

103:5361–6632. Janin J. 2005. Protein Sci. 14:278–8333. Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, et al. 2003. J. Mol. Biol.

331:281–9934. Wollacott AM, Zanghellini A, Murphy P, Baker D. 2007. Protein Sci. 16:165–7535. Andre I, Bradley P, Wang C, Baker D. 2007. Proc. Natl. Acad. Sci. USA 104:17656–6136. Meiler J, Baker D. 2006. Proteins 65:538–4837. Barth P, Schonbrun J, Baker D. 2007. Proc. Natl. Acad. Sci. USA 104:15682–8738. Ambroggio XI, Kuhlman B. 2006. J. Am. Chem. Soc. 128:1154–6139. Butterfoss GL, Kuhlman B. 2006. Annu. Rev. Biophys. Biomol. Struct. 35:49–6540. Kortemme T, Joachimiak LA, Bullock AN, Schuler AD, Stoddard BL, Baker D. 2004.

Nat. Struct. Mol. Biol. 11:371–7941. Zanghellini A, Jiang L, Wollacot A, Cheng G, Meiler J, et al. 2006. Protein Sci. 15:2785–9442. Jiang L, Althoff EA, Clemente FR, Doyle L, Rothlisberger D, et al. 2008. Science 319:

1387–9143. Ashworth J, Havranek JJ, Duarte CM, Sussman D, Monnat RJ Jr, et al. 2006. Nature

441:656–5944. Misura KM, Baker D. 2005. Proteins 59:15–2945. Dantas G, Corrent C, Reichow SL, Havranek JJ, Eletr ZM, et al. 2007. J. Mol. Biol.

366:1209–2146. Wang C, Schueler-Furman O, Andre I, London N, Fleishman SJ, et al. 2007. Proteins

69:758–6347. Schueler-Furman O, Wang C, Baker D. 2005. Proteins 60:187–9448. Kortemme T, Baker D. 2002. Proc. Natl. Acad. Sci. USA 99:14116–2149. Havranek JJ, Duarte CM, Baker D. 2004. J. Mol. Biol. 344:59–7050. Jiang L, Kuhlman B, Kortemme T, Baker D. 2005. Proteins 58:893–90451. Ren P, Ponder JW. 2002. J. Comput. Chem. 23:1497–50652. Gesteland RF, Cech TR, Atkins JF. 2006. The RNA World: The Nature of Modern RNA

Suggests a Prebiotic RNA World. Cold Spring Harbor, NY: Cold Spring Harb. Lab.53. Wrenn SJ, Harbury PB. 2007. Annu. Rev. Biochem. 76:331–4954. Winkler WC, Breaker RR. 2003. ChemBioChem 4:1024–3255. Shapiro BA, Yingling YG, Kasprzak W, Bindewald E. 2007. Curr. Opin. Struct. Biol. 17:157–

6556. Treiber DK, Williamson JR. 1999. Curr. Opin. Struct. Biol. 9:339–4557. Liu H, Gao J, Lynch SR, Saito YD, Maynard L, Kool ET. 2003. Science 302:868–7158. Benner SA, Battersby TR, Eschgfaller B, Hutter D, Kodra JT, et al. 1998. Pure Appl. Chem.

70:263–6659. Buchardt O, Egholm M, Berg RH, Nielsen PE. 1993. Trends Biotechnol. 11:384–8660. Kirshenbaum K, Barron AE, Goldsmith RA, Armand P, Bradley EK, et al. 1998. Proc.

Natl. Acad. Sci. USA 95:4303–8

www.annualreviews.org • Macromolecular Modeling with Rosetta 381

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 20: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

ANRV345-BI77-16 ARI 6 May 2008 15:16

61. Rossmann MG, Blow DM. 1962. Acta Crystallogr. 15:24–3162. Read RJ. 2001. Acta Crystallogr. D 57:1373–8263. Schwarzenbacher R, Godzik A, Grzechnik SK, Jaroszewski L. 2004. Acta Crystallogr. D

60:1229–3664. Moult J, Fidelis K, Rost B, Hubbard T, Tramontano A. 2005. Proteins 61(Suppl. 7):3–765. Qian B, Raman VS, Das R, Bradley P, Mccoy AJ, et al. 2007. Nature 450:259–6466. Strop P, Brzustowicz MR, Brunger AT. 2007. Acta Crystallogr. D 63:188–9667. Dauter Z. 2002. Curr. Opin. Struct. Biol. 12:674–7868. Meiler J, Baker D. 2003. Proc. Natl. Acad. Sci. USA 100:15404–969. Bowers PM, Strauss CE, Baker D. 2000. J. Biomol. NMR 18:311–1870. Rohl CA. 2005. Methods Enzymol. 394:244–6071. Cavalli A, Salvatella X, Dobson CM, Vendruscolo M. 2007. Proc. Natl. Acad. Sci. USA

104:9615–2072. Shen Y, Lange O, Delaglio F, Rossi P, Aramini JM, et al. 2008. Proc. Natl. Acad. Sci. USA

105:4685–9073. Takamoto K, Chance MR. 2006. Annu. Rev. Biophys. Biomol. Struct. 35:251–7674. Woods VL Jr, Hamuro Y. 2001. J. Cell Biochem. Suppl. (Suppl. 37):89–9875. Sinz A. 2006. Mass Spectrom. Rev. 25:663–8276. Chiu W, Baker ML, Jiang W, Dougherty M, Schmid MF. 2005. Structure 13:363–7277. Topf M, Baker ML, Marti-Renom MA, Chiu W, Sali A. 2006. J. Mol. Biol. 357:1655–6878. Bonneau R, Strauss CE, Rohl CA, Chivian D, Bradley P, et al. 2002. J. Mol. Biol. 322:65–7879. Wallner B, Elofsson A. 2006. Protein Sci. 15:900–1380. Cozzetto D, Kryshtafovych A, Ceriani M, Tramontano A. 2007. Proteins 69(Suppl. 8):175–

8381. Brunger AT. 1997. Methods Enzymol. 277:366–9682. Mondragon A, Subbiah S, Almo SC, Drottar M, Harrison SC. 1989. J. Mol. Biol. 205:189–

20083. Willcox BE, Thomas LM, Bjorkman PJ. 2003. Nat. Immunol. 4:913–1984. Kammerer RA, Kostrewa D, Progias P, Honnappa S, Avila D, et al. 2005. Proc. Natl. Acad.

Sci. USA 102:13891–9685. Arevalo JH, Taussig MJ, Wilson IA. 1993. Nature 365:859–6386. Egli M, Minasov G, Su L, Rich A. 2002. Proc. Natl. Acad. Sci. USA 99:4302–787. Hennig M, Darimont BD, Jansonius JN, Kirschner K. 2002. J. Mol. Biol. 319:757–6688. Andersen KV, Poulsen FM. 1993. J. Biomol. NMR 3:271–8489. van Aalten DM, Milne KG, Zou JY, Kleywegt GJ, Bergfors T, et al. 2001. J. Mol. Biol.

309:181–9290. Schwalbe H, Grimshaw SB, Spencer A, Buck M, Boyd J, et al. 2001. Protein Sci. 10:677–8891. Zhou ZH, Baker ML, Jiang W, Dougherty M, Jakana J, et al. 2001. Nat. Struct. Biol.

8:868–7392. Nakagawa A, Miyazaki N, Taka J, Naitow H, Ogawa A, et al. 2003. Structure 11:1227–38

382 Das · Baker

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 21: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

AR345-FM ARI 7 May 2008 14:43

Annual Review ofBiochemistry

Volume 77, 2008Contents

Prefatory Chapters

Discovery of G Protein SignalingZvi Selinger � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �1

Moments of DiscoveryPaul Berg � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 14

Single-Molecule Theme

In singulo Biochemistry: When Less Is MoreCarlos Bustamante � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 45

Advances in Single-Molecule Fluorescence Methodsfor Molecular BiologyChirlmin Joo, Hamza Balci, Yuji Ishitsuka, Chittanon Buranachai,and Taekjip Ha � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 51

How RNA Unfolds and RefoldsPan T.X. Li, Jeffrey Vieregg, and Ignacio Tinoco, Jr. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 77

Single-Molecule Studies of Protein FoldingAlessandro Borgia, Philip M. Williams, and Jane Clarke � � � � � � � � � � � � � � � � � � � � � � � � � � � � �101

Structure and Mechanics of Membrane ProteinsAndreas Engel and Hermann E. Gaub � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �127

Single-Molecule Studies of RNA Polymerase: Motoring AlongKristina M. Herbert, William J. Greenleaf, and Steven M. Block � � � � � � � � � � � � � � � � � � � �149

Translation at the Single-Molecule LevelR. Andrew Marshall, Colin Echeverría Aitken, Magdalena Dorywalska,and Joseph D. Puglisi � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �177

Recent Advances in Optical TweezersJeffrey R. Moffitt, Yann R. Chemla, Steven B. Smith, and Carlos Bustamante � � � � � �205

Recent Advances in Biochemistry

Mechanism of Eukaryotic Homologous RecombinationJoseph San Filippo, Patrick Sung, and Hannah Klein � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �229

v

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 22: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

AR345-FM ARI 7 May 2008 14:43

Structural and Functional Relationships of the XPF/MUS81Family of ProteinsAlberto Ciccia, Neil McDonald, and Stephen C. West � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �259

Fat and Beyond: The Diverse Biology of PPARγ

Peter Tontonoz and Bruce M. Spiegelman � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �289

Eukaryotic DNA Ligases: Structural and Functional InsightsTom Ellenberger and Alan E. Tomkinson � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �313

Structure and Energetics of the Hydrogen-Bonded Backbonein Protein FoldingD. Wayne Bolen and George D. Rose � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �339

Macromolecular Modeling with RosettaRhiju Das and David Baker � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �363

Activity-Based Protein Profiling: From Enzyme Chemistryto Proteomic ChemistryBenjamin F. Cravatt, Aaron T. Wright, and John W. Kozarich � � � � � � � � � � � � � � � � � � � � � �383

Analyzing Protein Interaction Networks Using Structural InformationChristina Kiel, Pedro Beltrao, and Luis Serrano � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �415

Integrating Diverse Data for Structure Determinationof Macromolecular AssembliesFrank Alber, Friedrich Förster, Dmitry Korkin, Maya Topf, and Andrej Sali � � � � � � � �443

From the Determination of Complex Reaction Mechanismsto Systems BiologyJohn Ross � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �479

Biochemistry and Physiology of Mammalian SecretedPhospholipases A2

Gerard Lambeau and Michael H. Gelb � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �495

Glycosyltransferases: Structures, Functions, and MechanismsL.L. Lairson, B. Henrissat, G.J. Davies, and S.G. Withers � � � � � � � � � � � � � � � � � � � � � � � � � � �521

Structural Biology of the Tumor Suppressor p53Andreas C. Joerger and Alan R. Fersht � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �557

Toward a Biomechanical Understanding of Whole Bacterial CellsDylan M. Morris and Grant J. Jensen � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �583

How Does Synaptotagmin Trigger Neurotransmitter Release?Edwin R. Chapman � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �615

Protein Translocation Across the Bacterial Cytoplasmic MembraneArnold J.M. Driessen and Nico Nouwen � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �643

vi Contents

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.

Page 23: Macromolecular Modeling with Rosetta - Stanford Universityrhiju/Das_Baker_Ann... · Macromolecular Modeling with Rosetta Rhiju Das1 and David Baker1,2 ... be generally put into practice

AR345-FM ARI 7 May 2008 14:43

Maturation of Iron-Sulfur Proteins in Eukaryotes: Mechanisms,Connected Processes, and DiseasesRoland Lill and Ulrich Mühlenhoff � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �669

CFTR Function and Prospects for TherapyJohn R. Riordan � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �701

Aging and Survival: The Genetics of Life Span Extensionby Dietary RestrictionWilliam Mair and Andrew Dillin � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �727

Cellular Defenses against Superoxide and Hydrogen PeroxideJames A. Imlay � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �755

Toward a Control Theory Analysis of AgingMichael P. Murphy and Linda Partridge � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �777

Indexes

Cumulative Index of Contributing Authors, Volumes 73–77 � � � � � � � � � � � � � � � � � � � � � � � �799

Cumulative Index of Chapter Titles, Volumes 73–77 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �803

Errata

An online log of corrections to Annual Review of Biochemistry articles may be foundat http://biochem.annualreviews.org/errata.shtml

Contents vii

Ann

u. R

ev. B

ioch

em. 2

008.

77:3

63-3

82. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by S

tanf

ord

Uni

vers

ity R

ober

t Cro

wn

Law

Lib

. on

01/0

4/09

. For

per

sona

l use

onl

y.