what’s new and automation developments in ccp4

46
12 th April 2007 What’s new and Automation What’s new and Automation developments in CCP4 developments in CCP4 Ronan Keegan Ronan Keegan CCP4, STFC Daresbury Laboratory, U.K. CCP4, STFC Daresbury Laboratory, U.K.

Upload: cara

Post on 01-Feb-2016

103 views

Category:

Documents


0 download

DESCRIPTION

What’s new and Automation developments in CCP4. Ronan Keegan CCP4, STFC Daresbury Laboratory, U.K. Quick Overview. Brief introduction to CCP4 New programs and features in CCP4 Upcoming features in version 6.1 Automation projects MrBUMP – automated Molecular Replacement - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: What’s new and Automation developments in CCP4

12th April 2007

What’s new and Automation What’s new and Automation developments in CCP4developments in CCP4

Ronan KeeganRonan Keegan

CCP4, STFC Daresbury Laboratory, U.K.CCP4, STFC Daresbury Laboratory, U.K.

Page 2: What’s new and Automation developments in CCP4

12th April 2007

Quick OverviewQuick Overview

• Brief introduction to CCP4

• New programs and features in CCP4

• Upcoming features in version 6.1

• Automation projects– MrBUMP – automated Molecular Replacement– Other automation projects

Page 3: What’s new and Automation developments in CCP4

12th April 2007

What is CCP4?What is CCP4?

• Collaborative Computational Project Number 4• Set up in the late 70’s to support collaboration between

researchers working on Protein Crystallography software in the UK and to assemble a comprehensive collection of software to satisfy the computational requirements of the relevant UK groups.

• Many functions:– Support and distribution of the CCP4 suite of programs for PX– Education – workshops, university visits, summer schools,

study weekend– Maintaining the CCP4 bulletin board and website

• Academic users can use the suite for free. Licence fee for commercial users

Page 4: What’s new and Automation developments in CCP4

12th April 2007

CCP4 Organisational StructureCCP4 Organisational Structure

DL CCP4 GroupCore developments &

activities

Project Leader

WG 1WG 2

Funded Developers Associated Developers

Occasional Contributors

STAB

Exec

Core projects e.g : CCP4mg, mmdb,

PIMS, Automation, BIOXHIT …Major programs e.g: Mosflm, Refmac,

Scala, Phaser, Clipper, Coot …

Lots of other useful software e.g. PDBExtract

SteeringCommittees

Page 5: What’s new and Automation developments in CCP4

12th April 2007

Downloads by Month

0

500

1000

1500

2000

2500

3000

3500

Apr May June July Aug Sept Oct Nov Dec Jan Feb Mar

Month

Dow

nlo

ad

s

Source

Windows

Linux

OS X

IRIX

OSF1

SunOS

Total

Page 6: What’s new and Automation developments in CCP4

12th April 2007

download type

2474

9361

7168

1512

199

9

5

source

windows

linux

os x

irix

osf

sunos

Page 7: What’s new and Automation developments in CCP4

12th April 2007

New programs and features in New programs and features in CCP4CCP4

• New Packages in CCP4 6.0:– CCP4mg – Molecular Graphics– Coot – graphical toolkit for model building, model completion

and validation– Phaser – molecular replacement (version 1.3.3)– Chainsaw – MR model preparation– Pirate: statistical phase improvement– Superpose: secondary structure alignment– BP3: heavy atom phasing and refinement– Chooch: anomalous scattering factors from raw

fluorescence spectra– New features in CCP4i

Page 8: What’s new and Automation developments in CCP4

12th April 2007

CCP4mgCCP4mg

• The aim is to provide a molecular graphics program that is fully compatible with the CCP4 environment and programs.

• Features:– Displays molecules with simple,

flexible selection tools and a variety of display styles and colouring schemes.

– A simple graphical interface to select the atoms to display, the colour scheme and the display style.

– Surfaces and electrostatic potential calculations

– Displays maps with a 'continuous crystal' and real time update of contouring level.

Page 9: What’s new and Automation developments in CCP4

12th April 2007

• Superpose two or more protein structures automatically. Also structure analysis: secondary structure, solvent accessible surface area, hydrogen bonds, close contacts.

• Writes 'snapshot' images, create movies. Also creates POV-Ray input files and PostScript files.

• Runs on Linux and Windows (2000, NT and XP) and Mac OSX.

Page 10: What’s new and Automation developments in CCP4

12th April 2007

• Normal mode Analysis

• CCP4MG can currently perform approximate normal mode calculations using two elastic network models.

– Only consider one atom per residue (CA)

– Assume all force constants to be the same

– Gaussian Network and Anisotropic Network methods employed

Page 11: What’s new and Automation developments in CCP4

12th April 2007

Coot Coot

• Coot is for model building, model completion and validation.

• It will display maps and models and allows model manipulations such as idealization, real space refinement, manual rotation/translation, rigid-body fitting, ligand search, solvation, mutations, rotamers, and Ramachandran plots.

• File formats handled: PDB, mmCIF, MTZ files, Phases (.phs) and others.

• Most of its functions are also accessible for scripting.

http://www.ysbl.york.ac.uk/~emsley/coot/http://www.ysbl.york.ac.uk/~emsley/coot/

Page 12: What’s new and Automation developments in CCP4

12th April 2007

CootCoot

Page 13: What’s new and Automation developments in CCP4

12th April 2007

PhaserPhaser• Phaser is a program for phasing

macromolecular crystal structures with maximum likelihood methods. Version 1.3.3 in CCP4 6.0.2 supports the molecular replacement method. The next version will include the experimental phasing method.

• Features:

– brute- force rotation and translation searches

– FFT- based fast rotation and translation searches

– correction for anisotropic diffraction

– search for multiple molecules in multiple space groups

http://www-structmed.cimr.cam.ac.uk/phaserhttp://www-structmed.cimr.cam.ac.uk/phaser/

Page 14: What’s new and Automation developments in CCP4

12th April 2007

Pirate & SuperposePirate & Superpose• Pirate:

– Pirate is a new statistical phase improvement program.– 'pirate' performs statistical phase improvement by classifying the electron

density map by sparseness/denseness and order/disorder, with the aim of obtaining superior results to conventional solvent mask based methods without requiring knowledge of the solvent content.

– Currently available for Linux and MAC OSX.

• Superpose:– superpose aligns two structures by matching graphs built on the protein's

secondary-structure elements, followed by an iterative three-dimensional alignment of protein backbone C-alpha atoms.

Page 15: What’s new and Automation developments in CCP4

12th April 2007

BP3BP3• BP3 is a new program for

obtaining phase information from an S/MIR(AS) and/or S/MAD experiment(s) by multivariate likelihood estimation.

• It will refine heavy and/or anomalously scattering atomic parameters along with error parameters to generate phase information.

Page 16: What’s new and Automation developments in CCP4

12th April 2007

ChoochChooch• Program to determine what

wavelengths to use to do your MAD experiment.

• Determines values of anomalous scattering factors from raw fluorescence spectra and pinpoints the position of the f'' maximum and the f' minimum values.

• Command line driven with all options controlled by switches.

• Optional PGPLOT visual output.

• Publication quality PS output generated on request.

Page 17: What’s new and Automation developments in CCP4

12th April 2007

ChainsawChainsaw• Molecular replacement model preparation utility that mutates a template

PDB file according to a sequence alignment.• Features:

– examines the sequence alignment between target and template and modifies the template PDB file by pruning non-conserved residues back to the gamma atom

– more atoms are preserved than in a polyalanine model, but parts of the model which are unlikely to be present in the crystal structure and thus would only degrade the signal are pruned.

1mr6 used as a template for 1tgx (38% sequence identity). From left to right: unmodified template, chainsaw template, polyalanine template.

Page 18: What’s new and Automation developments in CCP4

12th April 2007

New features in CCP4iNew features in CCP4i

• Interfaces for new programs:– Phaser, – Pirate/Clipper, – BP3, – Chainsaw, – CCP4mg launcher, – CRANK, – Shelx_C/D/E.

Page 19: What’s new and Automation developments in CCP4

12th April 2007

New features in CCP4iNew features in CCP4i

• Database search and sort

• Project shortcuts

• Customise job database view

• Help shortcuts

Page 20: What’s new and Automation developments in CCP4

12th April 2007

CCP4 6.1 and beyondCCP4 6.1 and beyond• Version 6.1 in 6-12 months

time• New Programs for 6.1

– Rapper – Protein modelling, automated conformer generation

– Rampage - generate Ramachandran plots for structure validation

– Buccaneer – chain tracing– Pointless – determine

space/laue group from umerged data

– Oasis– Crunch2– Afro– Clipper2 libraries– Automation scripts

• MrBUMP• XIA2

Page 21: What’s new and Automation developments in CCP4

12th April 2007

iMosflmiMosflm

• New improved mosflm graphical user interface.

• More user friendly than the old one.

Page 22: What’s new and Automation developments in CCP4

12th April 2007

Updates to popular CCP4 programsUpdates to popular CCP4 programs• Acorn

– ab initio procedure for the determination of protein structure using atomic resolution data or artificially extended data to atomic resolution, and for finding sub-structures from anomalous or isomorphous differences.

• Truncate (Uboat)– New improved version written in C++. – In the longer term there will be new tests for twinning,

anisotropy corrections and the ability to handle unmerged data (useful if radiation damage occurs), but these won't be in the initial release.

• Phaser 2.0/2.1– Will include experimental phasing

• Refmac 5.3/6.0– The latest version of Refmac, and will supersede the version

5.2.x in the CCP4 6.0.x series.

Page 23: What’s new and Automation developments in CCP4

12th April 2007

CCP4 6.1 and beyondCCP4 6.1 and beyond• Plans for CCP4i

– CCP4i Classic reworked– CCP4i Auto – automation scripts

• CCP4i database– New database handler– Allow for greater flexibility and control of jobs– Job/DB viewer program built on top of the DB (more about this later)

Page 24: What’s new and Automation developments in CCP4

12th April 2007

CCP4 6.1 and beyond CCP4 6.1 and beyond

• Long term plans– Better integration between CCP4i,

CCP4mg and Coot– More intuitive interfaces to programs– More automation

Page 25: What’s new and Automation developments in CCP4

12th April 2007

CCP4 AutomationCCP4 Automation

• Reasons– Higher throughput at synchrotron

beamlines– Crystallography is increasingly becoming a

tool for researchers in other fields. Not all have the time to learn how to use the complex set of programs for solving structures. Users prefer to concentrate on the Biology

Page 26: What’s new and Automation developments in CCP4

12th April 2007

Page 27: What’s new and Automation developments in CCP4

12th April 2007

MrBUMP - Molecular Replacement with Bulk

Model Preparation

Page 28: What’s new and Automation developments in CCP4

12th April 2007

Aim of MrBUMPAim of MrBUMP

• Automated framework for Molecular Replacement• Particular emphasis on generating variety of search models

• Wraps Phaser, Molrep and Acorn• Uses a variety of helper applications (eg Chainsaw) and

bioinformatics tools (eg FASTA, Mafft)• Uses on-line databases (eg PDB, Scop)• Can make use of computational cluster resources to speed

up the processing

• In favourable cases, gives “one-button” solution• In unfavourable cases, suggests likely search models for

manual investigation

Page 29: What’s new and Automation developments in CCP4

12th April 2007

PipelinePipeline

`

`

`

`Target MTZ

& Sequence

TargetDetails

TemplateSearch

ModelPreparation

Molecular Replacement& Refinement

Check scores and exit or select the next model

Page 30: What’s new and Automation developments in CCP4

12th April 2007

Template SearchTemplate Search

• Sequence based search (FASTA)• Secondary structure based search

(SSM)• Domain search (SCOP)• Identification of possible multimers

(PQS & PISA)• Users can also enter their own

templates by ID or from locally held files.

Page 31: What’s new and Automation developments in CCP4

12th April 2007

Model PreparationModel Preparation• Search models can be prepared for MR in several

ways– Chainsaw – non-conserved residues are pruned (sequence

provided)– Molrep – pruning of non-conserved side-chains (internal

sequence alignment)– Polyalanine – all side chain atoms are pruned beyond the CB

atom– PDBclip – models are not modified

• An ensemble of the best models is also created for Phaser

Page 32: What’s new and Automation developments in CCP4

12th April 2007

Molecular Replacement & Molecular Replacement & RefinementRefinement

• For each search model, MR is done with Molrep or Phaser or both.

• MR programs run mostly with defaults

• MrBUMP provides LABIN columns, MW of target, sequence identity of search model, number of copies to search for, number of clashes tolerated

• Allow Molrep / Phaser to set resolution limits and weights

• After MR, models are passed to Refmac for restrained refinement

otherwise

final Rfree < 0.48 orfinal Rfree < 0.52 and dropped by 5%

final Rfree < 0.35 or final Rfree < 0.5 and dropped by 20%

“success”

“marginal”

“failure”

Page 33: What’s new and Automation developments in CCP4

12th April 2007

MrBUMP and cluster computingMrBUMP and cluster computing• MrBUMP is usually run on a desktop

from ccp4i or the command line• However, MrBUMP can take

advantage of a compute cluster to farm out the Molecular Replacement jobs.

• Currently Sun Grid Engine enabled clusters are supported but support will be added for other types of queuing system (e.g. LSF, Condor) if there is enough demand.

• Job control: All nodes terminate when one finds a solution

• Current (known) cluster installations at Daresbury, Diamond and University of Dundee.

Page 34: What’s new and Automation developments in CCP4

12th April 2007

MrBUMP on the GridMrBUMP on the Grid

• Currently under development• Large parameter space searches. Submit

many jobs to U.K. computational grid resources using recently developed e-Science tools (MCS, AgentX, Rcommands, SRB)

• Goals:– To improve the performance/success rate of the

method– Possibly extract useful Biological information– Make grid-enabled version available to users

Page 35: What’s new and Automation developments in CCP4

12th April 2007

MrBUMP OutputMrBUMP Output

• Currently produces a long log file listing search results, model preparation steps, summaries from each MR and refinement job and relevant references for programs used.

• Not ideal, there’s a lot of

information to trawl through. Summary of results now provided at the end of log file.

• Future versions will provide results in marked-up web page format for more clarity.

Page 36: What’s new and Automation developments in CCP4

12th April 2007

MrBUMP Output – CCP4i dbviewerMrBUMP Output – CCP4i dbviewer

Page 37: What’s new and Automation developments in CCP4

12th April 2007

MrBUMP pre-releaseMrBUMP pre-release

• Beta version first released in Jan’ 06 (current version is 0.3.3)

• Currently supported on Linux and Mac OSX, Windows version will be available when included in suite.

• Will be included in next release of CCP4 (version 6.1)

• MrBUMP paper to be published in Acta Cryst. D in April ‘07

• First citations in Obiero et al., Acta Cryst. (2006). F62, 757-760; El Omari et al., Acta Cryst. (2006). F62, 949-953

http://www.ccp4.ac.uk/MrBUMP/http://www.ccp4.ac.uk/MrBUMP/

Page 38: What’s new and Automation developments in CCP4

12th April 2007

New featuresNew features• Run Acorn after refinement for phase

improvement (high resolution data)• Support for searching in enantiomorphic

spacegroups.• Users can now specify template models by

PDB ID or add local PDB files.• “Generate models only” option.• XML Output.• Additional multiple alignment programs

supported – Tcoffee and Probcons.

Page 39: What’s new and Automation developments in CCP4

12th April 2007

Future versionsFuture versions

• Improvements to multimeric search models (using PISA)

• Supplement multiple alignment with additional sequences and/or structural information

• Model completion and/or re-building• Target complexes.• Improved output presentation

Page 40: What’s new and Automation developments in CCP4

12th April 2007

ConclusionsConclusions

• Test cases and the examples demonstrated the utility of trying a range of search models, a protocol that can only be attempted adequately by automation.

• MrBUMP is not meant to compete with careful analysis of the data and model by an experienced crystallographer. However, it may succeed in difficult cases by finding a combination of models and protocols that would not otherwise have been tried.

• In more straight forward cases the advantage is simply one of convenience.

Page 41: What’s new and Automation developments in CCP4

12th April 2007

CCP4 Automation - BALBESCCP4 Automation - BALBES

• Authors: Garib Murshudov, Alexei Vagin, Fei Long (YSBL)

• Built around Molrep MR and model preparation, Refmac and Sfcheck

• Model preparation based on using a custom database derived from the PDB database

• Best model is derived from the database and used in Molrep.

• Protocols

– Simple molecular replacement

– Domains iterated with refinement

– Use of tertiary structure if available

– Completion of MR using phased MR and refinement

• Released early 2007

Page 42: What’s new and Automation developments in CCP4

12th April 2007

XIA2 Automated Data ReductionXIA2 Automated Data Reduction

• xia2 is a new automated data reduction system designed to work from raw diffraction data and a little metadata, and produce usefully reduced data in a form suitable for immediately starting phasing and structure solution.

• Pre-release version is currently available.

http://www.ccp4.ac.uk/xia/http://www.ccp4.ac.uk/xia/

Page 43: What’s new and Automation developments in CCP4

12th April 2007

XIA2XIA2BEGIN PROJECT TM1553BEGIN CRYSTAL 13185

BEGIN AA_SEQUENCE

MHKMWPSDSNDHRVTRRNVIIFSSLLLGSLAILLALLLIRTKDQYYELRDFALGTSVRIVVSSQKINPRTIAEAILEDMKRITYKFSFTDERSVVKKINDHPNEWVEVDEETYSLIKAACAFAELTDGAFDPTVGRLLELWGFTGNYENLRVPSREEIEEALKHTGYKNVLFDDKNMRVMVKNGVKIDLGGIAKGYALDRARQIALSFDENATGFVEAGGDVRIIGPKFGKYPWVIGVKDPRGDDVIDYIYLKSGAVATSGDYERYFVVDGVRYHHILDPSTGYPARGVWSVTIIAEDATTADALSTAGFVMAGKDWRKVVLDFPNMGAHLLIVLEGGAIERSETFKLFERE

END AA_SEQUENCE

BEGIN HA_INFOATOM SENUMBER_PER_MONOMER 5END HA_INFO

BEGIN WAVELENGTH INFLWAVELENGTH 0.97950F' -12.1F'' 5.8END WAVELENGTH INFL

BEGIN WAVELENGTH LREMWAVELENGTH 1.00000F' -2.5F'' 0.5END WAVELENGTH LREM

BEGIN SWEEP INFLWAVELENGTH INFLBEAM 109.0 105.0IMAGE 13185_2_E1_001.imgDIRECTORY /data/jcsg/als1/8.2.1/20050121/collection/TM1553/13185/END SWEEP

BEGIN SWEEP LREMWAVELENGTH LREMBEAM 109.0 105.0IMAGE 13185_2_E2_001.imgDIRECTORY /data/jcsg/als1/8.2.1/20050121/collection/TM1553/13185/END SWEEP

END CRYSTAL 13185

END PROJECT TM1553

• Requires image data + input specification script with target and experiment data:• Sequence• Number of heavy atoms• Wavelength• Location of image data

Page 44: What’s new and Automation developments in CCP4

12th April 2007

Through your favourite phasing pipeline…

Page 45: What’s new and Automation developments in CCP4

12th April 2007

CCP4 Automation - HAPPy CCP4 Automation - HAPPy – Heavy Atom Phasing in Python– Heavy Atom Phasing in Python

• What it is:

• Automated Experimental Phasing Pipeline• Replaces and expands on the capabilities of the CHART package

• What it will do:

• Take integrated and merged experimental data amplitudes (post-TRUNCATE),de-twinned,consistently indexed.

• Determine the heavy atom structure and phase probabilities.

• Optimize the density map to give interpretable map.

• Build structure.

• First release will handle SAD data only.MAD,MIR,MIRAS modes later.

http://www.ccp4.ac.uk/HAPPy

Page 46: What’s new and Automation developments in CCP4

12th April 2007

Acknowledgements:Acknowledgements:• Core Group (Daresbury):

– Martyn Winn, Charles Ballard, Peter Briggs, Francois Remacle, Norman Stein, Wendy Yang, Maeri Howard.

• CCP4MG (York):– Liz Potterton, Stuart McNicholas

• Coot (Oxford & York):– Paul Emsley, Kevin Cowtan

• Program Developers (York, Cambridge, Diamond & Leiden University):– Garib Murshudov, Alexei Vagin, Fei Long, Randy

Read, Airlie McCoy, Harry Powell, Gwyndaf Evans, Phil Evans, Eleanor Dodson, Nick Furnham, Steve Ness.

• BBSRC for their funding• And many others…