progress report on crank: experimental phasing biophysical structural chemistry leiden university,...
TRANSCRIPT
Progress report on Crank:Experimental phasing
Biophysical Structural ChemistryLeiden University, The Netherlands
Crank developments available in CCP4 6.1
• “Greatly enhanced” – better tested• Underlying programs haven’t changed (much), but crank
almost completely re-written from version in 6.0.2• Better ccp4i interface• Support for more programs (PIRATE, BUCCANEER,
RESOLVE, COOT)• Faster substructure detection
– Use BP3 to (quickly) check trials and look at deviations between different CRUNCH2 trials significantly decreases the time required for successful substructure detection.
Speeding up CRUNCH2:Results showing improvement
Resol.
(Å)
Anom.
atoms
Exp. Time (old)
(min)
Time (new)
(min)
subtilisin 1.77 3 Ca, SAD 28.91 6.33
Carboxyl proteinase
1.8 9 Br SAD-peak 247.99 19.19
gere 2.75 12 Se MAD p/i/h 5.77 1.42
cyanase 2.41 40 Se MAD p/i/h 31.72 8.18
thioesterase 1.81 20 Br SAD-peak 6.32 1.95
Preliminary substructure detection results from JCSG test cases
• 144 mostly MAD Se-Met data sets• Defaults only: the only input was number of Se-
Met per monomer (number of monomer was guessed). Mtz files, f’, f”.
• Some data sets had f” < 1 (solved by MR)• Some data sets had incorrectly labelled X-PLOR
files as mtz.• DISCLAIMER: 1st logfiles produced and analyzed
yesterday after dinner (until 4 a.m.).
AFRO/CRUNCH2 vs SHELXC/D(both run in CRANK)
CRUNCH2 SHELXD
100 % found 104 72
0 % found 15 15
Input error 25 25
My input error 0 32
total 144 144
Of the 79 jobs in common, crunch2 was faster in 20 jobs, whileshelxd was faster in 59.
Comparison not fair
• Same algorithm to identify solution with BP3 can be used in SHELXD
• SHELXD uses much better Fa values (i.e. using the MAD data – at the moment, Afro just uses delta F from the data set with the greatest anomalous signal).
Improving FA values
• An early step in solving a structure by SAD/MAD or SIRAS is to determine FA values.
• FA is the structure factor amplitude corresponding to the substructure to input to direct methods and/or Patterson programs (i.e. SHELXD or CRUNCH2)
Current FA estimation
• FA is currently estimated by | |F+| - |F-| | for SAD data in most programs.
• Direct method programs are very sensitive to FA values.
• Improving estimates can improve hit rates of direct methods and solve substructures that can not previously been solved.
Multivariate SAD equationE(|FA|,|F+|,|F-|) =
|FA| P(|FA|, αA,| |F+|, α+,|F-|, α-) d|FA| dαA dα+dα-
• Giacovazzo previously proposed multivariate FA estimation, with an implementation assuming Bijvoet phases are equal.
• An equation can be obtained without the equal phase assumption requiring only one numerical integration.
• The equation has been implemented – which reduces to Giacovazzo’s equation if Bijvoet phases are equal.
Covariance matrix properties
• The covariance matrix considers experimental sigmas and correlations between F+, F- and FA.
• Problem: Covariance matrix also depends on (overall) substructure occupancy and b-factor.
• Solution: Obtain a multivariate likelihood estimate for unknown parameters.
Refining overall substructure parameters
• Initial guess of number of substructure atoms per monomer obtained from user.
• Initial guess of B-factor obtained from likelihood estimate of overall B-factor of data set.
• Result: Refinement is stable and maximizes correlation with calculated final E’s.
• Another possible application: Use refined overall occupancy and B-factor for anomalous signal estimation.
Test cases: Correlations with final calculated E’s
Reso AnomAtom
f ˝ Corr
ΔE
Corr Emulti
Ferrodoxin 0.94 Fe 1.25 0.252 0.338
Thioesterase 2.5 Se 5.3 0.529 0.549
Lyso 180 1.64 S 0.56 0.324 0.348
Lyso 135 1.64 S 0.56 0.262 0.319
DNA 360 1.5 P 0.43 0.517 0.540
DNA 90 1.5 P 0.43 0.422 0.478