direct use of phase information in refmac abingdon, 18.3.2008 university of leiden p. skubák
TRANSCRIPT
Direct Use of Phase Information in Direct Use of Phase Information in RefmacRefmac
Abingdon, 18.3.2008
University of Leiden
P. Skubák
SAD EXPERIMENT
PHASING and DENSITY MODIFICATION
REFINEMENT and MODEL BUILDING
|F|
|F+|, |F-|
|F| = ( |F+| + |F-| )
REFINEMENT WITHOUT PRIOR PHASE INFORMATIONREFINEMENT WITHOUT PRIOR PHASE INFORMATION
21
SAD EXPERIMENT
PHASING and DENSITY MODIFICATION
REFINEMENT and MODEL BUILDING
, Pe()|F|
|F+|, |F-|
REFINEMENT WITH INDIRECT PRIOR PHASE REFINEMENT WITH INDIRECT PRIOR PHASE INFORMATIONINFORMATION
Pe() = e A cos() + B sin() + C cos(2.) + D sin(2.) |F| = ( |F+| + |F-| )
21
SAD EXPERIMENT
PHASING and DENSITY MODIFICATION
REFINEMENT and MODEL BUILDING
, heavy atom model
|F+|, |F-|
REFINEMENT WITH DIRECT PRIOR PHASE REFINEMENT WITH DIRECT PRIOR PHASE INFORMATIONINFORMATION
|F+|, |F-|
|F| = ( |F+| + |F-| )21
Rice refinement target
P( |Fo|, o , |Fc|, c )
integration over all o
o
2
0 ccoo d ) , , , P(
||F||F
P( |Fo| , |Fc|, c )
P( |Fo| ; |Fc|, c )
division by P( |Fc|, c )
conditional probability distribution P( |Fo| ; |Fc|, c )
maximum likelihood refinement target with no prior phase information
MLHL refinement target
P( |Fo|, o , |Fc|, c )
weighted integration over all o
P( |Fo| , |Fc|, c )
P( |Fo| ; |Fc|, c )
division by P( |Fc|, c )
conditional probability distribution P( |Fo| ; |Fc|, c )
maximum likelihood refinement target indirectly incorporating prior phase information
oe
2
0 ccoo d )(P . ) , , , P(
||F||F
P(|Fo-|, |Fo
+| ; |Fc-|, c
-, |Fc+|, c
+ )
P(|Fo-|, |Fo
+|, |Fc-|, c
-, |Fc+|, c
+ )
SAD refinement target
P(|Fo-|, o
-, |Fo+|, o
+, |Fc-|, c
-, |Fc+|, c
+ )
integration over all o- , o
+
ooccccoooo ||F||F||F||F
dd ) , , , , , , , P( 2
0
2
0
division by P( |Fc-|, c
-, |Fc+|, c
+ )
maximum likelihood refinement target directly incorporating prior phase information
conditional probability distribution P( |Fo-|, |Fo
+| ; |Fc-|, c
-, |Fc+|, c
+ )
SAD distributionP( |Fo
+|, |Fo-| ; Ac, Bc, AHc,
BHc)(strong prior phase information)
Rice distributionP( |Fo| ; Ac, Bc)(no prior phase information)
SAD distribution(weak prior phase information)
SAD REFINEMENT TARGET USE IN REFMAC
iterated automated model building with SAD function refinement
substructure refinement and scaling
refinement of models in the final stages
iterated automated model building with SAD function refinement
substructure refinement and scaling
refinement of models in the final stages
SAD REFINEMENT TARGET USE IN REFMAC
automated model building programs do not support SAD target (yet), workarounds needed in order to test:
the heavy atoms parameters file inputed to model building program separately by a script which also calls Refmac with the extra keywords needed for SAD refinement
this workaround used in CRANK for ARP/wARP+Refmac_sad implementation
better integration of ARP/wARP with Refmac SAD is on the way
MODEL BUILDING WITH SAD REFINEMENT
Fraction of ARP/wARP built residues to total number of residues
resolution lower than 2.4 Å resolution higher than 2.4 Å
iterated automated model building with SAD function refinement
substructure refinement and scaling
refinement of models in the final stages
SAD REFINEMENT TARGET USE IN REFMAC
SAD SUBSTRUCTURE REFINEMENT & SCALING IN REFMAC
(VERY PRELIMINARY RESULTS)
being tested on ~ 200 JSCG datasets using CRANK package with pipeline: Refmac5_sad for scaling, Solomon for DM and Refmac5_sad for model building
average phase error after refmac phasing 75.4 deg
70 runs finished, of which 22 with successful model building
similar results ( 67 runs finished of which 25 with successful model building ) achieved with the same pipeline using BP3 instead of Refmac5_sad for phasing
iterated automated model building with SAD function refinement
substructure refinement and scaling
refinement of models in the final stages
SAD REFINEMENT TARGET USE IN REFMAC
SAD REFINEMENT – CLOSE TO FINAL MODEL
R R-freeRice 22.79 26.37 18.68SAD 23.09 26.94 16.2
R R-freeRice 24.91 31.98 29.19SAD 25.94 31.24 27.99
R R-freeRice 21.73 27.68 17.19SAD 22.29 27.21 17.1
Thionein (1.7Å) ph. err.
Transhydr.(2.4Å) ph. err.
Lysozyme(1.6Å) ph. err.
R R-freeRice 29.31 35.52 32.38SAD 29.76 35.08 32.04
R R-freeRice 19.14 25.77 29.26SAD 20.18 25.69 29.12
R R-freeRice 22.01 23.82 15.92SAD 22.31 24.03 15.96
Thioester.(1.8Å) ph. err.
AEP (2.55Å) ph. err.
Ferredoxin(0.9Å) ph. err.
SAD REFINEMENT – CLOSE TO FINAL MODEL
R R-freeRice 22.79 26.37 18.68SAD 23.09 26.94 16.2MLHL 23.9 27.18 16.51
R R-freeRice 24.91 31.98 29.19SAD 25.94 31.24 27.99MLHL 26.17 31.78 31.02
R R-freeRice 21.73 27.68 17.19SAD 22.29 27.21 17.1MLHL 22.03 27.34 17.26
Thionein (1.7Å) ph. err.
Transhydr.(2.4Å) ph. err.
Lysozyme(1.6Å) ph. err.
R R-freeRice 29.31 35.52 32.38SAD 29.76 35.08 32.04MLHL 29.53 34.99 32.07
R R-freeRice 19.14 25.77 29.26SAD 20.18 25.69 29.12MLHL 19.82 25.56 28.64
R R-freeRice 22.01 23.82 15.92SAD 22.31 24.03 15.96MLHL 22.18 23.81 16.31
Thioester.(1.8Å) ph. err.
AEP (2.55Å) ph. err.
Ferredoxin(0.9Å) ph. err.
SIRAS EXPERIMENTDIRECT USE OF PRIOR PHASES
SIRAS X-RAY EXPERIMENT
PHASING and DENSITY MODIFICATION
REFINEMENT and MODEL BUILDING
substructure model
|FN|, |FD+|,|FD
-|
|FN|, |FD+|,|FD
-|
P( |FoN|,|FoD-|, |FoD
+| ; |FcN|, cN,|FcD-|, cD
-, |FcD+|, cD
+ )
SIRAS IMPLEMENTATION REQUIREMENTS AND TODO
numerical approximations to the 3-dimensional SIRAS integral – done for the function and first derivatives evaluation
second derivatives of SIRAS function should be calculated and used in minimisation too
modelling of non-isomorphism:
more models in Refmac with restraints between them and their parts
Rice MLHL SIRAS total
31 12 76 384
508 523 521 572
GCN5P 111 112 110 116
238 235 236 245
175 183 185 192
26 91 147 246
GerE
Thioesterase
Elastase
Ribonuclease
Soxy
SIRAS VERY PRELIMINARY RESULTS
– number of protein residues correctly built :
– results from Refmac5D - not modeling non-isomorphism (sharing protein part for native and derivative model), heavy atom refinement outside of Refmac, only first derivatives etc
Plans for the coming months run and analyze massive JCSG tests on both Refmac SAD substructure refinement and scaling and protein model building with iterative Refmac SAD refinement
analyze the SAD target improvements for close to final models
better integration of SAD with model building programs
anisotropic ATP's refinement for SAD target
simultaneous refinement of occupancies and ATP's for all targets
more models in Refmac (input, output, refinement etc)
geometry restraints between more models
SIRAS target implementation and testing for substructure refinement and scaling and protein model building
target for joint refinement of protein and ligand P( |FoP|,|FoPL|; |FcP|, cP,|FcPL|, cPL )
I. Original Refmac5 code files
II. Modified Refmac5 code files
III. Bridge code files – layer between Refmac5 and SAD function itself
IV. SAD function code files
Refmac5 code organisation
fortran
fortran
C/C++
C/C++
SAD/SIRAS function implementation standalone C++ template class with double or single precision
general likelihood function for 1 or 2 observed structure factors and N model structure factors (includes a.o. SAD, SIR or Rice functions for both centric and acentric cases)
possibility to define arbitrary covariance matrices for different experiments/situations, with real or complex terms
calculation of functional value, 1. and 2. derivatives with regards to calculated structure factors and Luzzati D parameters
Gaussian integration over unknown observed phases
use of tabulated Sin, Cos, Exp and modified Bessel I0, I1 functions to increase the evaluation speed
use of LAPACK package for calculation of eigenvalues of covariance matrices
I. Original Refmac5 code files
II. Modified Refmac5 code files
III. Bridge code files – layer between Refmac5 and SAD function itself
IV. SAD function code files
Refmac5 code organisation
fortran
fortran
C/C++
C/C++
Tasks performed by bridge layer
passing the calls and parameters between Refmac5 part and likelihood function part in both directions
place of instantiation and “life” of likelihood class
transformation of derivatives with regards to structure factors amplitudes and phases (polar coordinates) to derivates with regards to real and imaginary structure factore part (as used by Refmac5)
role in read/write of substructure files
checks of reasonability and/or correctness of some input and output likelihood function parameters
I. Original Refmac5 code files
II. Modified Refmac5 code files
III. Bridge code files – layer between Refmac5 and SAD function itself
IV. SAD function code files
Refmac5 code organisation
fortran
fortran
C/C++
C/C++
Tasks performed by modified Refmac5 files
input, output and availability in code of observed |F+|, |F-| columns (via standard CCP4 libraries to read and write mtz files)
input, output and availability in code of substructure parameters (standard pdb file format and new internally used refmac5 format for both input and output)
gathering and precomputation of all information required as input by SAD function
calling of SAD function passing all required input information(via bridge functions)
replacement and/or modification of all original Refmac5 subroutines requiring different treatment with SAD function
harvesting of input keywords specific for SAD refinement
all original tasks performed by these files