phenix and neutrons › presentations › ...phenix.autobuild data=scale.mtz model=mr.pdb...

PHENIX and neutrons

Neutrons in Biology Santa Fe, NM

October 26, 2009

Pavel Afonine

Computation Crystallography Initiative Physical Biosciences Division

Lawrence Berkeley National Laboratory, Berkeley CA, USA

PHENIX project

Collaboration between several groups:

  Los Alamos National Lab Tom Terwilliger, Li-Wei Hung (SOLVE / RESOLVE) Paul Langan et al (Tools for Neutron crystallography) (MNC)

  Cambridge University, UK Randy Read et al (PHASER)

  Duke University Jane & David Richardson et al (MolProbity, hydrogens)

  Lawrence Berkeley National Lab Paul Adams et al (CCI Apps: phenix.refine, eLBOW, Xtriage,…)

Paul Adams – project director

PHENIX and Neutron crystallography

Macromolecular Neutron Crystallography Consortium (MNC)

Lawrence Berkeley National Lab (LBNL) Paul Adams, Pavel Afonine

Los Alamos National Lab Paul Langan, Marat Mustyakimov, Benno Schoenborn

http://mnc.lanl.gov/

The PHENIX project: facts

  PHENIX is a new package for automated structure solution that incorporates handling of both: X-ray and neutron data

  PHENIX is not a pipe-line made of existing programs, but a highly integrated software

  Library based development (Python, C++) and new algorithms

  Designed to be used by both novices and experienced users

  Long-term development and support

Why Automation ?

•  Automation can increase efficiency, and reduce human error

•  Can speed up the process and can help reduce errors •  Makes difficult cases more feasible for experts •  Routine structure solution cases are accessible to a wider group of

structural biologists •  Software can try more possibilities than we are typically willing to bother

with

•  Multiple trials or use of different parameters can be used to estimate uncertainties

•  What is required: –  Software carrying out individual steps –  Integration between the steps (collaboration between developers) –  Algorithms to decide which is best from a list of possible results –  Strategies for structure determination and decision-making

Why Automation ?

PHENIX: principal tools

Complete set of tools for crystallographic structure determination: from experimental data to PDB deposited structure

Running PHENIX

  Running PHENIX programs: -  GUI: easy for beginners, guided process - less chance of errors -  Command line: convenient for scripting of multiple and large scale tasks

  Command line tools are still easy to run:   Autobuild (from starting phases to complete and refined model):

phenix.autobuild data=scale.mtz model=mr.pdb seq_file=correct.seq

  Ligandfit (automatically find and build ligands into density): phenix.ligandfit data=nsf.mtz model=noligand.pdb ligand=atp.pdb

  AutoMR (molecular replacement with Phaser + Autobuild = refined model): phenix.automr nsf-d2.mtz nsf.pdb

  phenix.refine (highly automated structure refinement, X-ray, Neutron): phenix.refine nsf-d2.mtz nsf.pdb

  phenix.xtriage (complete data analisys): phenix.xtriage porin_fp.mtz

PHENIX: new GUI

Neutron structure determination

  An X-ray derived structure is always available and is used as a starting model for neutron structure determination, therefore the main software needs are:

-  model preparation (add H or D, exchangeable H/D; library files for ligands), -  model refinement and completion (adding DOD, OD or O), -  model validation

Neutron Crystallography Challenges   Crystallographic software is designed and optimized to work with X-ray data

-  Manual work is required to customize the software to work with neutron data •  define scattering tables •  change libraries to adopt handling of H, D or H/D atoms

•  add D or exchangeable H/D (and not only H) – manual file editing -  Cannot use all the features of the software (for example, TLS)

-  Statistics for PDB deposition (REMARK 3 formatted header)   Methodological challenges:

-  Build H, D or H/D to model, including water or ligands

-  Optimize fit of water (DOD) into the density -  Optimize fit of rotatable X-H/D into the density

-  Adding H or/and D increases the amount of refinable parameters: need to use X-ray and neutron data simultaneously (Joint XN refinement)

-  Cancellation effect makes X-H species poorly defined in density. This is needed to be addressed in refinement

-  Constrained occupancy refinement of H/D sites

Solutions in PHENIX

  Automated tools to add H, D or/and exchangeable H/D sites and create consistent library (CIF) files for ligands

  Refinement using neutron data alone or both, X-ray and neutron

Joint refinement: TJOINT = EXRAY * wXC + ENEUTRON * wNC + wC * EGEOM

  Constrained occupancy refinement of H/D sites is done completely automatically

  Automatic adjustment of H/D positions into density map by local real space search (as part of refinement)

  Ready-to-PDB-deposit output files: complete refinement statistics in REMARK 3 for both X-ray and neutron data. Example: PDB code 2R24.

  Automatic building of DOD (O and D atoms) into neutron maps (development version).

phenix.refine

  Highly-automated state-of-the-art structure refinement tool of PHENIX   Active development mainly at Lawrence Berkeley National Laboratory:

-  Paul Adams -  Pavel Afonine -  Nathaniel Echols -  Ralf Grosse-Kunstleve -  Nigel Moriarty -  Peter Zwart

+  valuable scientific support by many others (Tom Terwilliger, Randy Read, Sasha Urzhumtsev, Vladimir Lunin, …)

+ other developers: Marat Mustyakimov, …

Automation of structure refinement

  What used to be in the past … and often still the case nowadays

  Clearly, the modern software should do all these steps automatically

  PHENIX is making a good progress in achieving this goal

Acta Cryst. (2002). D58, 2009-2017, Yousef et al.

Automation of structure refinement

  Recognize any input data file format: CNS/Xplor, MTZ, SHELX, SCALA, …

  Robust processing of PDB files

  Decide refinement strategy based on inputs: iso/aniso, twinning

  Combine multiple steps into one (water picking, TLS, SA, etc.)

  Make each step robust

  Integrate validation tools to minimize the human’s error and optimize time of structure solution

  Output complete information

  Preserve as much information as reasonable -  Do not discard riding hydrogens -  Have a complete foot-print of restraints used

phenix.refine: single program for a very broad range of resolutions

- Group ADP refinement - Rigid body refinement - Torsion Angle dynamics

- Restrained refinement (xyz, ADP: isotropic, anisotropic, mixed) - Automatic water picking

- Automatic NCS restraints

- Simulated Annealing

- Occupancies (individual, group, automatic constrains for alternative conformations)

Low Medium and High Subatomic

- TLS refinement

-  Use hydrogens at any resolution - Refinement with twinned data

-  X-ray, Neutron, joint X-ray + Neutron

-  Built-in water picking and refinement

- Bond density model - Unrestrained refinement -  FFT or direct -  Explicit hydrogens

Refine any part of a model with any strategy: all in one run

+ Automatic water picking + Simulated Annealing

+ Add and use hydrogens

Refinement flowchart

Input data and model processing

Refinement strategy selection

Bulk-solvent, Anisotropic scaling, Twinning parameters refinement

Ordered solvent (add / remove)

Target weights calculation Coordinate refinement

(rigid body, individual) (minimization or Simulated Annealing)

ADP refinement (TLS, group, individual iso / aniso)

Occupancy refinement (individual, group) Output: Refined model, various maps, structure factors, complete statistics, ready for deposition PDB file

PDB model, Any data format (CNS, Shelx, MTZ, …)

Files for COOT, O, PyMol

Repeated several times

Water picking done within refinement -  remove “bad” water:

•  2mFo-DFc (peak height) •  Distances •  map CC (2mFo-DFc, Fc) •  B-factors and anisotropy •  occupancy

-  add new: •  mFo-DFc, •  distances

-  refine water’s ADP before adding to model -  Neutron refinement: automatically add and optimize D atoms

Automatic Water Picking

Input data and model processing

Refinement strategy selection

Bulk-solvent, Anisotropic scaling, Twinning parameters refinement

Ordered solvent (water picking)

Target weights calculation Coordinate refinement

(rigid body, individual) (minimization or SA) ADP refinement

(TLS, group, individual iso / aniso) Occupancy refinement (individual, group)

Output: Refined model, various maps, structure factors, complete statistics, ready for deposition PDB file

Local and global real-space refinement

Update Fmodel and re-compute 2mFobs-DFmodel map Real-space refine whole model into 2mFobs-DFmodel

Compute 2mFobs-DFmodel, mFobs-DFmodel, Fmodel maps

for residue in residues: compute start map- and CC-values for residue for rotamer in rotamers: if criteria1: residue = rotamer real-space refine residue: residuerefined if criteria1: residue = residuerefined update structure with residue

N m

acro-cycles

Validate changes: compute 2mFobs-DFmodel, mFobs-DFmodel and Fmodel for residue in residues: if criteria2: restore original residue (discard change)

phenix.refine protocol

  Automation requires quick and simple tools for assessing if a model is in general good.

  More sophisticated tools can be used for thorough model validation (such as Molprobity).

  “Is this model good?” The answer depends on many details, such as resolution, for example. A typical question on crystallographic bulletin boards:

•  “I got Rwork=20% and Rfree=25%, is it a good result?”

-  Yes, it’s likely a good result if the data resolution is around 2.5 Å.

-  No, it is very bad result, if the data resolution is 1.0 Å or higher.

•  One can ask similar questions about other parameters, such as bond/angles RMSDs, average B-factors, etc…

Results evaluation

POLYGON: Crystallographic Model Quality at a Glance

  Say you are refining a structure at 1.0 Å resolution and the R-factors are: Rwork = 18% and Rfree is 23%.

-  Are these values good? Is refinement completed?

  Go to PDB and plot the histograms for Rwork, Rfree and Rfree-Rwork for all similar structures:

Rwork at 0.9-1.1Å 0.10 - 0.12: 68 0.12 - 0.14: 94 0.14 - 0.16: 73 0.16 - 0.18: 17

POLYGON: Crystallographic Model Quality at a Glance

Good model Likely incorrect model

Acta Cryst. (2009). D65, 297-300

R-factors (all models in PDB at resolution < 1 Å )

Resolution, Å

R-factor, % PDB code (year)

R-work, %

published phenix.refine

2ppn (2007) 20.9 11.7 1g2y (2000) 19.5 12.3 1zlb (2005) 16.8 12.0 2g6f (2006) 18.4 12.9 2elg (2007) 23.2 13.0 1aho (1997) 16.3 9.6 1zf5 (2005) 29.0 16.9

  There are ~25 models out of 324 that have suspiciously high or very high R-factors.

- For most of them the R-factors can be decreased to typical for this resolution values (~10-15%) in one phenix.refine run.

  Automated software with integrated validation would immediately flag these models as suspicious.

  Structure from PDB: 1eic (resolution = 1.4Å; deposition year: 2000)

PUBLISHED: Rwork = 20% Rfree = 25%

  Clear problems: - No ‘riding’ H atoms; - All atoms are isotropic;

  Potential problems - Inoptimal weights, refinement is not converged, incomplete solvent model

  Fixing the model with PHENIX: -  Add and refine H as riding model -  Update ordered solvent -  Refine atoms as anisotropic (except H and water) -  Optimize X-ray/Restraints weights FINAL MODEL: Rwork = 14% Rfree = 17%

Under-refined models or why automation is important

  All this could be done by the software automatically, preventing deposition of under-refined models into PDB

All neutron structures deposited in PDB with data available

1: sum of exchangeable H/D does not add up to 1 2: severe geometry problems 3: minor geometry problems 4: twinning 5: bad or missing information in PDB file header 6: H/D exchange is not modeled or incomplete 7: free-R flags might be bad or absent 8: I/F mismatch in input data file (F reported as I, for example) 9: negative occupancies 10: atoms with unknown scattering type

Re-refinement of neutron structures deposited in PDB

Conclusion

•  PHENIX provides complete set of tools for automated structure solution, refinement and validation

•  PHENIX provides complete support for neutron structure solution

•  Using PHENIX tools requires minimal amount of manual work minimizing related errors

•  Highly integrated validation tools allow to spot problems at earlier stages

•  PDB deposited structures need to be re-visited and remediated. This should be done in close contact with the original authors.

•  Academic releases every several months

•  Nightly builds

•  Supported on:

–  Linux (eq.: RedHat, Fedora)

–  Mac OSX

–  Some limited support on Windows

•  Extensive documentation

http://www.phenix-online.org

PHENIX distribution

Acknowledgments •  Computational Crystallography Initiative

-  Paul Adams -  Nat Echols -  Ralf Grosse-Kunstleve -  Nigel Moriarty -  Nicholas Sauter -  Peter Zwart •  Los Alamos National Laboratory

-  Tom Terwilliger -  Li-Wei Hung -  Paul Langan -  Marat Mustyakimov

•  Funding: - NIH/NIGMS: - P01GM063210, P50GM062412,

P01GM064692, R01GM071939, PN2EY016525

- Lawrence Berkeley Laboratory - PHENIX Industrial Consortium

•  Cambridge University -  Randy Read -  Airlie McCoy -  Laurent Storoni -  Gabor Bunkoczi -  Robert Oeffner

•  Duke University -  Jane Richarson -  David Richardson -  Ian Davis -  Vincent Chen -  Jeff Headd

  All PHENIX users

  Non-PHENIX colleagues for scientific support, very fruitful collaboration and constant valuable feedback:

Sacha Urzhumtsev, Vladimir Lunin, Dale Tronrud, Dusan Turk

Workshop tomorrow !!!

phenix and neutrons › presentations › ...phenix.autobuild data=scale.mtz model=mr.pdb...

Documents