molecular modeling andsimulation
TRANSCRIPT
Tamar Schlick
Molecular Modelingand Simulation
An Interdisciplinary Guide
2nd edition
4?) Springer
Contents
About the Cover v
Book URLs ix
Preface xi
Prelude xix
Table of Contents xxi
List of Figures xxxiii
List of Tables xxxix
Acronyms, Abbreviations, and Units xli
1 Biomolecular Structure and Modeling: Historical Perspective 1
1.1 A Multidisciplinary Enterprise 2
1.1.1 Consilience 2
1.1.2 What is Molecular Modeling? 3
1.1.3 Need For Critical Assessment 5
1.1.4 Text Overview 6
1.2 The Roots of Molecular Modeling in Molecular Mechanics.
8
1.2.1 The Theoretical Pioneers 8
1.2.2 Biomolecular Simulation Perspective 11
xxii Contents
1.3 Emergence of Biomodeling from Experimental Progress
in Proteins and Nucleic Acids 14
1.3.1 Protein Crystallography 14
1.3.2 DNA Structure 17
1.3.3 The Technique of X-ray Crystallography 18
1.3.4 The Technique of NMR Spectroscopy 20
1.4 Modern Era of Technological Advances 22
1.4.1 From Biochemistry to Biotechnology 22
1.4.2 PCR and Beyond 23
1.5 Genome Sequencing 25
1.5.1 Projects Overview: From Bugs to Baboons 25
1.5.2 The Human Genome 30
2 Biomolecular Structure and Modeling: Problem and Application
Perspective 41
2.1 Computational Challenges in Structure and Function .... 41
2.1.1 Analysis of the Amassing Biological Databases... 41
2.1.2 Computing Structure From Sequence 46
2.2 Protein Folding - An Enigma 46
2.2.1 'Old' and 'New' Views 46
2.2.2 Folding Challenges 48
2.2.3 Folding by Dynamics Simulations? 49
2.2.4 Folding Assistants 50
2.2.5 Unstructured Proteins 52
2.3 Protein Misfolding - A Conundrum 53
2.3.1 Prions and Mad Cows 53
2.3.2 Infectious Protein? 53
2.3.3 Other Possibilities 54
2.3.4 Other Misfolding Processes 55
2.3.5 Deducing Function From Structure 56
2.4 From Basic to Applied Research 57
2.4.1 Rational Drug Design: Overview 58
2.4.2 A Classic Success Story: AIDS Therapy 58
2.4.3 Other Drugs and Future Prospects 65
2.4.4 Gene Therapy - Better Genes 67
2.4.5 Designed Compounds and Foods 69
2.4.6 Nutrigenomics 72
2.4.7 Designer Materials 74
2.4.8 Cosmeceuticals 74
3 Protein Structure Introduction 77
3.1 The Machinery of Life 77
3.1.1 From Tissues to Hormones 77
3.1.2 Size and Function Variability 78
3.1.3 Chapter Overview 79
Contents xxiii
3.2 The Amino Acid Building Blocks 82
3.2.1 Basic C" Unit 82
3.2.2 Essential and Nonessential Amino Acids 83
3.2.3 Linking Amino Acids 85
3.2.4 The Amino Acid Repertoire: From Flexible
Glycine to Rigid Proline 85
3.3 Sequence Variations in Proteins 89
3.3.1 Globular Proteins 90
3.3.2 Membrane and Fibrous Proteins 90
3.3.3 Emerging Patterns from Genome Databases 92
3.3.4 Sequence Similarity 92
3.4 Protein Conformation Framework 97
3.4.1 The Flexible <j> and i/j and Rigid u> Dihedral Angles .97
3.4.2 Rotameric Structures 99
3.4.3 Ramachandran Plots 99
3.4.4 Conformational Hierarchy 103
4 Protein Structure Hierarchy 105
4.1 Structure Hierarchy 106
4.2 Helices: A Common Secondary Structural Element 106
4.2.1 Classic a-Helix 106
4.2.2 3iq and 7T Helices 107
4.2.3 Left-Handed a-Helix 109
4.2.4 Collagen Helix 110
4.3 /3-Sheets: A Common Secondary Structural Element 110
4.4 Turns and Loops 110
4.5 Formation of Supersecondary and Tertiary Structure 113
4.5.1 Complex 3D Networks 113
4.5.2 Classes in Protein Architecture 113
4.5.3 Classes are Further Divided into Folds 114
4.6 a-Class Folds 114
4.6.1 Bundles 114
4.6.2 Folded Leafs 115
4.6.3 Hairpin Arrays 115
4.7 /3-Class Folds 115
4.7.1 Anti-Parallel 0 Domains 116
4.7.2 Parallel and Antiparallel Combinations 116
4.8 a/0 and a + /3-Class Folds 117
4.8.1 a/0 Barrels 117
4.8.2 Open Twisted a/0 Folds 118
4.8.3 Leucine-Rich a/0 Folds 118
4.8.4 a+0 Folds 118
4.8.5 Other Folds 118
4.9 Number of Folds 118
4.9.1 Finite Number? 119
xxiv Contents
4.10 Quaternary Structure 119
4.10.1 Viruses 119
4.10.2 From Ribosomes to Dynamic Networks 123
4.11 Protein Structure Classification 126
5 Nucleic Acids Structure Minitutorial 129
5.1 DNA, Life's Blueprint 130
5.1.1 The Kindled Field of Molecular Biology 130
5.1.2 Fundamental DNA Processes 132
5.1.3 Challenges in Nucleic Acid Structure 133
5.1.4 Chapter Overview 134
5.2 The Basic Building Blocks of Nucleic Acids 135
5.2.1 Nitrogenous Bases 135
5.2.2 Hydrogen Bonds 136
5.2.3 Nucleotides 137
5.2.4 Polynucleotides 137
5.2.5 Stabilizing Polynucleotide Interactions 140
5.2.6 Chain Notation 140
5.2.7 Atomic Labeling 141
5.2.8 Torsion Angle Labeling 142
5.3 Nucleic Acid Conformational Flexibility 142
5.3.1 The Furanose Ring 143
5.3.2 Backbone Torsional Flexibility 145
5.3.3 The Glycosyl Rotation 148
5.3.4 Sugar/Glycosyl Combinations 148
5.3.5 Basic Helical Descriptors 150
5.3.6 Base-Pair Parameters 151
5.4 Canonical DNA Forms 155
5.4.1 B-DNA 156
5.4.2 A-DNA 157
5.4.3 Z-DNA 160
5.4.4 Comparative Features 161
6 Topics in Nucleic Acids Structure: DNA Interactions
and Folding 163
6.1 Introduction 164
6.2 DNA Sequence Effects 165
6.2.1 Local Deformations 165
6.2.2 Orientation Preferences in Dinucleotide Steps ....166
6.2.3 Orientation Preferences in Dinucleotide StepsWith Flanking Sequence Context: Tetranucleotide
Studies 169
6.2.4 Intrinsic DNA Bending in A-Tracts 169
6.2.5 Sequence Deformability Analysis Continues 173
Contents xxv
6.3 DNA Hydration and Ion Interactions 174
6.3.1 Resolution Difficulties 175
6.3.2 Basic Patterns 176
6.4 DNA/Protein Interactions 180
6.5 Cellular Organization of DNA .. ,182
6.5.1 Compaction of Genomic DNA 182
6.5.2 Coiling of the DNA Helix Itself 184
6.5.3 Chromosomal Packaging of Coiled DNA 185
6.6 Mathematical Characterization of DNA Supercoiling 195
6.6.1 DNA Topology and Geometry 195
6.7 Computational Treatments ofDNA Supercoiling 197
6.7.1 DNA as a Flexible Polymer 198
6.7.2 Elasticity Theory Framework 199
6.7.3 Simulations of DNA Supercoiling 200
7 Topics in Nucleic Acids Structure: Noncanonical Helices
and RNA Structure 205
7.1 Introduction 205
7.2 Variations on a Theme 206
7.2.1 Hydrogen Bonding Patterns in Polynucleotides . . .206
7.2.2 Hybrid Helical/Nonhelical Forms 210
7.2.3 Unusual Forms: Overstretched and Understretched
DNA 214
7.3 RNA Structure and Function 216
7.3.1 DNA's Cousin Shines 216
7.3.2 RNA Chains Fold Upon Themselves 216
7.3.3 RNA's Diversity 217
7.3.4 Non-Coding and Micro-RNAs 221
7.3.5 RNA at Atomic Resolution 222
7.4 Current Challenges in RNA Modeling 225
7.4.1 RNA Folding 225
7.4.2 RNA Motifs 225
7.4.3 RNA Structure Prediction 226
7.5 Application of Graph Theory to Studies of RNA Structure
and Function 229
7.5.1 Graph Theory 229
7.5.2 RNA-As-Graphs (RAG) Resource 230
8 Theoretical and Computational Approaches to Biomolecular
Structure 237
8.1 The Merging of Theory and Experiment 238
8.1.1 Exciting Times for Computationalists! 238
8.1.2 The Future of Biocomputations 240
8.1.3 Chapter Overview 240
xxvi Contents
8.2 Quantum Mechanics (QM) Foundations of Molecular Mechan¬
ics (MM) 241
8.2.1 The Schrodinger Wave Equation 241
8.2.2 The Born-Oppenheimer Approximation 242
8.2.3 Ab Initio QM 242
8.2.4 Semi-Empirical QM 244
8.2.5 Recent Advances in Quantum Mechanics 244
8.2.6 From Quantum to Molecular Mechanics 247
8.3 Molecular Mechanics: Underlying Principles 251
8.3.1 The Thermodynamic Hypothesis 251
8.3.2 Additivity 252
8.3.3 Transferability 254
8.4 Molecular Mechanics: Model and Energy Formulation. ...
256
8.4.1 Configuration Space 258
8.4.2 Functional Form 259
8.4.3 Some Current Limitations 262
9 Force Fields 265
9.1 Formulation of the Model and Energy 266
9.2 Normal Modes 267
9.2.1 Quantifying Characteristic Motions 267
9.2.2 Complex Biomolecular Spectra 269
9.2.3 Spectra As Force Constant Sources 269
9.2.4 In-Plane and Out-of-Plane Bending 271
9.3 Bond Length Potentials 272
9.3.1 Harmonic Term 273
9.3.2 Morse Term 274
9.3.3 Cubic and Quartic Terms 275
9.4 Bond Angle Potentials 276
9.4.1 Harmonic and Trigonometric Terms 277
9.4.2 Cross Bond Stretch / Angle Bend Terms 278
9.5 Torsional Potentials 281
9.5.1 Origin of Rotational Barriers 281
9.5.2 Fourier Terms 281
9.5.3 Torsional Parameter Assignment 282
9.5.4 Improper Torsion 2869.5.5 Cross Dihedral/Bond Angle and Improper/Improper
Dihedral Terms 2879.6 The van der Waals Potential 288
9.6.1 Rapidly Decaying Potential 2889.6.2 Parameter Fitting From Experiment 2899.6.3 Two Parameter Calculation Protocols 289
9.7 The Coulomb Potential 2919.7.1 Coulomb's Law: Slowly Decaying Potential 2919.7.2 Dielectric Function 2929.7.3 Partial Charges 294
Contents xxvii
9.8 Parameterization 295
9.8.1 A Package Deal 295
9.8.2 Force Field Comparisons 295
9.8.3 Force Field Performance 297
10 Nonbonded Computations 299
10.1 A Computational Bottleneck 301
10.2 Approaches for Reducing Computational Cost 302
10.2.1 Simple Cutoff Schemes 302
10.2.2 Ewald and Multipole Schemes 303
10.3 Spherical Cutoff Techniques 304
10.3.1 Technique Categories 304
10.3.2 Guidelines for CutoffFunctions 305
10.3.3 General Cutoff Formulations 306
10.3.4 Potential Switch 307
10.3.5 Force Switch 308
10.3.6 Shift Functions 309
10.4 The Ewald Method 311
10.4.1 Periodic Boundary Conditions 311
10.4.2 Ewald Sum and Crystallography 314
10.4.3 Mathematical Morphing of a ConditionallyConvergent Sum 316
10.4.4 Finite-Dielectric Correction 320
10.4.5 Ewald Sum Complexity 320
10.4.6 Resulting Ewald Summation 321
10.4.7 Practical Implementation: Parameters, Accuracy,and Optimization 322
10.5 The Multipole Method 324
10.5.1 Basic Hierarchical Strategy 324
10.5.2 Historical Perspective 329
10.5.3 Expansion in Spherical Coordinates 330
10.5.4 Biomolecular Implementations 332
10.5.5 Other Variants 333
10.6 Continuum Solvation 333
10.6.1 Need for Simplification! 333
10.6.2 Potential of Mean Force 334
10.6.3 Stochastic Dynamics 335
10.6.4 Continuum Electrostatics 338
11 Multivariate Minimization in Computational Chemistry 345
11.1 Ubiquitous Optimization: From Enzymes to Weather to Eco¬
nomics 347
11.1.1 Algorithmic Sophistication Demands Basic
Understanding 347
11.1.2 Chapter Overview 347
xxviii Contents
11.2 Optimization Fundamentals 348
11.2.1 Problem Formulation 348
11.2.2 Independent Variables 349
11.2.3 Function Characteristics 349
11.2.4 Local and Global Minima 351
11.2.5 Derivatives of Multivariate Functions 353
11.2.6 The Hessian of Potential Energy Functions 353
11.3 Basic Algorithmic Components 356
11.3.1 Greedy Descent 356
11.3.2 Line-Search-Based Descent Algorithm 359
11.3.3 Trust-Region-Based Descent Algorithm 361
11.3.4 Convergence Criteria 362
11.4 The Newton-Raphson-Simpson-FourierMethod 364
11.4.1 The One-Dimensional Version of Newton's Method.
364
11.4.2 Newton's Method for Minimization 367
11.4.3 The Multivariate Version of Newton's Method....
368
11.5 Effective Large-Scale Minimization Algorithms 369
11.5.1 Quasi-Newton (QN) 370
11.5.2 Conjugate Gradient (CG) 372
11.5.3 Truncated-Newton (TN) 374
11.5.4 Simple Example 376
11.6 Available Software 378
11.6.1 Popular Newton and CG 378
11.6.2 CHARMM's ABNR 379
11.6.3 CHARMM's TN 379
11.6.4 Comparative Performance on Molecular Systems . . 379
11.7 Practical Recommendations 380
11.8 Future Outlook 383
12 Monte Carlo Techniques 385
12.1 MC Popularity 386
12.1.1 A Winning Combination 386
12.1.2 From Needles to Bombs 387
12.1.3 Chapter Overview 387
12.1.4 Importance of Error Bars 388
12.2 Random Number Generators 388
12.2.1 What is Randoml 388
12.2.2 Properties of Generators 389
12.2.3 Linear Congruential Generators (LCG) 392
12.2.4 Other Generators 396
12.2.5 Artifacts 400
12.2.6 Recommendations 401
12.3 Gaussian Random Variates 403
12.3.1 Manipulation of Uniform Random Variables 403
12.3.2 Normal Variates in Molecular Simulations 403
Contents xxix
12.3.3 Odeh/Evans Method 404
12.3.4 Box/Muller/Marsaglia Method 405
12.4 Means for Monte Carlo Sampling 406
12.4.1 Expected Values 406
12.4.2 Error Bars 409
12.4.3 Batch Means 410
12.5 Monte Carlo Sampling 411
12.5.1 Density Function 411
12.5.2 Dynamic and Equilibrium MC: Ergodicity,Detailed Balance 411
12.5.3 Statistical Ensembles 412
12.5.4 Importance Sampling: Metropolis Algorithmand Markov Chains 413
12.6 Monte Carlo Applications to Molecular Systems 418
12.6.1 Ease of Application 418
12.6.2 Biased MC 419
12.6.3 Hybrid MC 420
12.6.4 Parallel Tempering and Other MC Variants 421
13 Molecular Dynamics: Basics 425
13.1 Introduction: Statistical Mechanics by Numbers 426
13.1.1 Why Molecular Dynamics? 426
13.1.2 Background 427
13.1.3 Outline of MD Chapters 428
13.2 Laplace's Vision of Newtonian Mechanics 429
13.2.1 The Dream Becomes Reality 42913.2.2 Deterministic Mechanics 432
13.2.3 Neglect of Electronic Motion 432
13.2.4 Critical Frequencies 433
13.2.5 Hybrid Quantum/Classical Mechanics Treatments. .
435
13.3 The Basics: An Overview 435
13.3.1 Following the Equations ofMotion 435
13.3.2 Perspective on MD Trajectories 436
13.3.3 Initial System Settings 437
13.3.4 Sensitivity to Initial Conditions and
Other Computational Choices 440
13.3.5 Simulation Protocol 442
13.3.6 High-Speed Implementations 443
13.3.7 Analysis and Visualization 445
13.3.8 Reliable Numerical Integration 445
13.3.9 Computational Complexity 446
13.4 The Verlet Algorithm 448
13.4.1 Position and Velocity Propagation 449
13.4.2 Leapfrog, Velocity Verlet, and Position Verlet....
451
13.5 Constrained Dynamics 453
xxx Contents
13.6 Various MD Ensembles 455
13.6.1 Need for Other Ensembles 455
13.6.2 Simple Algorithms 456
13.6.3 Extended System Methods 459
14 Molecular Dynamics: Further Topics 463
14.1 Introduction 464
14.2 Symplectic Integrators 465
14.2.1 Symplectic Transformation 466
14.2.2 Harmonic Oscillator Example 467
14.2.3 Linear Stability 467
14.2.4 Timestep-Dependent Rotation in Phase Space .... 469
14.2.5 Resonance Condition for Periodic Motion 470
14.2.6 Resonance Artifacts 471
14.3 Multiple-Timestep (MTS) Methods 472
14.3.1 Basic Idea 472
14.3.2 Extrapolation 473
14.3.3 Impulses 474
14.3.4 Vulnerability of Impulse Splitting to Resonance
Artifacts 475
14.3.5 Resonance Artifacts in MTS 476
14.3.6 Limitations of Resonance Artifacts on Speedup;Possible Cures 478
14.4 Langevin Dynamics 479
14.4.1 Many Uses 479
14.4.2 Phenomenological Heat Bath 480
14.4.3 The Effect of 7 480
14.4.4 Generalized Verlet for Langevin Dynamics 482
14.4.5 The LN Method 482
14.5 Brownian Dynamics (BD) 487
14.5.1 Brownian Motion 487
14.5.2 Brownian Framework 489
14.5.3 General Propagation Framework 491
14.5.4 Hydrodynamic Interactions 491
14.5.5 BD Propagation Scheme: Cholesky vs. ChebyshevApproximation 494
14.6 Implicit Integration 496
14.6.1 Implicit vs. Explicit Euler 497
14.6.2 Intrinsic Damping 498
14.6.3 Computational Time 498
14.6.4 Resonance Artifacts 49914.7 Enhanced Sampling Methods 503
14.7.1 Overview 503
14.7.2 Harmonic-Analysis Based Techniques 50314.7.3 Other Coordinate Transformations 505
Contents xxxi
14.7.4 Coarse Graining Models 507
14.7.5 Biasing Approaches 508
14.7.6 Variations in MD Algorithm and Protocol 509
14.7.7 Other Rigorous Approaches for DeducingMechanisms, Free Energies, and Reaction Rates
...511
14.8 Future Outlook 513
14.8.1 Integration Ingenuity 513
14.8.2 Current Challenges 514
IS Similarity and Diversity in Chemical Design 519
15.1 Introduction to Drug Design 520
15.1.1 Chemical Libraries 520
15.1.2 Early Drug Development Work 521
15.1.3 Molecular Modeling in Rational Drug Design .... 523
15.1.4 The Competition: Automated Technology 524
15.1.5 Chapter Overview 526
15.2 Problems in Chemical Libraries 526
15.2.1 Database Analysis 526
15.2.2 Similarity and Diversity Sampling 527
15.2.3 Bioactivity Relationships 529
15.3 General Problem Definitions 532
15.3.1 TheDataset 532
15.3.2 The Compound Descriptors 534
15.3.3 Characterizing Biological Activity 535
15.3.4 The Target Function 536
15.3.5 Scaling Descriptors 536
15.3.6 The Similarity and Diversity Problems 538
15.4 Data Compression and Cluster Analysis 540
15.4.1 Data Compression Based on Principal Component
Analysis (PCA) 540
15.4.2 Data Compression Based on the Singular Value
Decomposition (SVD) 542
15.4.3 Relation Between PCA and SVD 544
15.4.4 Data Analysis via PCA or SVD and Distance
Refinement 545
15.4.5 Projection, Refinement, and Clustering Example . . .546
15.5 Future Perspectives 551
Epilogue 555
Appendices 556
A Molecular Modeling Sample Syllabus 557
B Article Reading List 559