about the basic postulate of the structural …about the basic postulate of the structural...
TRANSCRIPT
About the basic postulate of the structural crystallography and
the phase problem
Carmelo Giacovazzo Istituto di Cristallografia, CNR,
Bari, Italy
Macromolecular Crystallography School Madrid 2017
Let us answer the following questions:
crystal structure ⇒ ?
⇒ crystal structure ?
As a consequence :
{ }2 F
{ }2 F
)]( 2exp[
)](2exp[
)2exp()2exp(
11
2
1,
11
2
∑∑
∑
∑∑
=≠=
=
==
−+=
−=
−=
N
jijiji
N
jj
N
jijiji
N
jjj
N
jjj
ifff
iff
ififF
rrh
rrh
hrhrh
π
π
ππ
{ } ( )rρ⇔2F
If we accept that only one structure exists with a given set of interatomic vectors, then the set of
interatomic vectors
Then only one structure model should exist, with calculated structure factor moduli fitting the set
|Fobs|. That is , with sufficiently small
{ } { } ( )rrr ρ⇔⇔− 2) Fji
∑∑ −
=h
h||
|||||
obscalcobs
cryst FFFR
BUT, is it possible construct
different electron density maps with the same |F|?
If that is possible , we should have different model with the same
amplitudes!!!!!
The basic postulate of the structural crystallography Only one chemically sound crystal structure exists compatible with the experimental diffraction data. This is a postulate and chemically sound is a necessary attribute.
It is on the basis of this postulate that we can accept that the about
one million of structures deposited in the DataBanks are
really and correctly solved.
Structures deposited in the CSDB up to 01/01/2012 For each range of Rcryst , Nstr and % are the number
of structures and percentage, respectively
ΔRcryst Nstr % 0.01-0.03 62774 10.5 0.03-0.04 122706 20.6 0.04-0.05 135525 22.7 0.05-0.07 163269 27.4 0.07-0.09 60651 10.2 0.09-0.10 13370 2.2 0.10- 0.15 18353 3.1 0.15- …. 3835 0.6
For the practical daily work the basic postulate of the structural crystallography transforms into:
in a diffraction experiment the phase
information is not lost, it is only hidden in the diffraction amplitudes.
Accordingly, any phasing approach is nothing else but a
method for recovering the hidden phases from the set of diffraction amplitudes.
Is the amount of information stored in the diffraction amplitudes always sufficient to
define the structure?
Since the data resolution defines the quantity of information provided by the diffraction experiment, we try to understand how this
quantity varies with the resolution.
Crystal with P1 symmetry. N= number of non-H atoms in the unit cell.
The diffraction experiment provides data up to dmin .
Nref= number of measurable reflections
Here V=k N , where k is usually between 15.5 and 18.5 for a small or medium size molecule ;
for a protein, owing to the presence of the solvent, k may be significantly larger, up to about 40 and more.
kNd
VV
N measmeas
ref 3min3
4π=Φ=
Φ= ∗
∗
∗
Qinf=number of measured symmetry independent reflections/number of structural
parameters Since 4N= number of structural parameters
necessary for defining the structure, then
By including the Friedel Law k
dQ 3
mininf
6π
≈
kdN
kNdN
NQ ref
3min
3min
inf34
13
44
ππ=⋅==
Qinf in P1 some values of dmin and of k.
dmin k=17 k=25 k=35
0.4 139 204 286
0.6 41 60.6 84.7 0.8 17.4 25.6 35.7
1.0 8.9 13.1 18.3
1.4 3.2 4.8 13.7 1.8 1.5 2.2 3.1 2.2 0.8 1.2 1.7 2.5 0.6 0.8 1.2 3.0 0.3 0.5 0.7 3.5 0.2 0.3 0.4
4.0 0.1 0.2 0.3
Ab initio small molecules
• Tests were made on 35 mostly organic structures
• ranging from Nasym=12 to Nasym =200. • Four protocols were used: • Prot. 1: standard DM; • Prot. 2 : standard DM+PSI-0 triplets; • Prot.3: standard DM+ PSI-0 triplets +
resolution bias correction . • Prot. 4: standard DM+ PSI-0 triplets +
resolution bias correction + SFE..
We positively answered the two basic questions:
crystal structure ⇒ ?
⇒ crystal structure ?
Two other questions are however necessary for starting the phasing approach
{ }2 F
{ }2 F
The third question: structure
{ }?ϕ⇔
( ) ( )
( ) ( )
( )( ) ( )00
'
'0
'
10
'0
11
2exp2exp
2exp
2exp2exp
)(2exp2exp
hXhXhX
hrhX
rXhhr
hhh
h
h
πϕππ
ππ
ππ
−=−=→
=
=
+==
∑
∑∑
=
==
h
j
N
jj
j
N
jjj
N
jj
iFiFFFi
ifi
ififF
O
O’
rj’ rj
The fourth basic question
How can we derive the phases from the diffraction moduli ? This seems contradictory: indeed
The phase values depend on the origin chosen by the
user, the moduli are independent of the user .
The moduli are structure invariants ,
the phases are not structure invariants.
Evidently, from the moduli we can derive information only on those combinations of phases ( if they exist)
which are structure invariants.
The simplest invariant : the triplet invariant Use the relation F’h = Fh exp ( -2πihX0) to check that the invariant Fh FkF-h-k does not depend on the origin. The sum (ϕh + ϕ k+ ϕ-h-k ) is called triplet phase invariant
( )
khkh
kh
kh'
kh'kh
XkhkXX
−−
−−
−−
=+
−−=
FFFiF
iFiFFFF
])(2exp[
2exp)2exp(
0
00'
πππ h
.
Structure invariants
Any invariant satisfies the condition that the sum of the indices is zero:
doublet invariant : Fh F-h = | Fh|2
triplet invariant : Fh Fk F-h-k
quartet invariant :Fh Fk Fl F-h-k-l
quintet invariant : Fh Fk Fl Fm F-h-k-l-m
……………….
Additional structure invariants when a model is availvable
Doublet invariant : Fh F-ph = | Fh F-ph |exp(iϕh-ϕph) very important in edm techniques
triplet invariants : Fh Fk F-h+k, Fh Fk F-ph+k Fh Fpk F-h+k , Fph Fk F-h+k
Fph Fpk F-h-k , Fph Fk F-ph+k Fh Fpk F-ph+k , Fph Fpk F-ph+k …………………………………………………………….
The prior information we can use for deriving the phase estimates may be so summarised:
1) atomicity: the electron density is concentrated in atoms:
2) positivity of the electron density: ρ( r ) > 0 ⇒ f > 0
3) uniform distribution of the atoms in the unit cell.
( ) ( )∑=
−=N
jjaj
1
rrr ρρ
r1 r2
ρa1
ρa2
The Wilson statistics • Under the above conditions Wilson ( 1942,1949)
derived the structure factor statistics. The main results where:
• (1) • Eq.(1) is : • a) resolution dependent (fj varies with θ ), • b) temperature dependent: • From eq.(1) the concept of normalized structure factor
arises:
∑ =>=< Nj jfF 1
22|| h
)/sinexp( 220 λθjjj Bff −=
2/11
2)/(∑ == Nj jfFE hh
The Wilson Statistics • |E|-distributions:
and in both the cases. The statistics may be used to evaluate the average themel factor and the absolute scale factor.
)||exp(||2|)(| 21 EEEP −=
)2/||exp(2|)(| 21 EEP −=
π
1|| 2>=< E
∑∑
−==
=jjj
N
jjj iBfifF hrhr
plot Wilson The
h πλθπ 2expsinexp 2exp
2
20
1
A ∑
− jj ifB hrπ
λθ 2expsinexp 0
2
2
s2 Fh0
( )22022 2exp BsFKFKFhhobsh −==
A
( ) ( )202202 2exp 2exp BsKBsFKF shobsh −Σ=−><>=<
20
2
2lnln BsKF
s
obsh −=
Σ
<
y x
Partition of a two-dimensional lattice in resolution shells, to approximately satisfy the following conditions: a) the variations of f in each shell is negligible; b) the number of lattice points in each shell is about constant
Typical < R2> curve versus d (Å) for proteins ( ______ ) Typical < R2> curve versus d (Å) for nucleic acid structures (----------).
The Cochran formula
Φh,k =ϕh + ϕk + ϕ-h-k = ϕh + ϕk - ϕh+k P(Φhk) ≈[2π I0]-1exp(G cos Φhk) where G = 2 | Eh Ek Eh+k |/N1/2 Accordingly:
ϕh + ϕk - ϕh+k ≈ 0 ∝ G = 2 | Eh Ek Eh+k |/N1/2 ϕh - ϕk - ϕh-k ≈ 0 ∝ G = 2 | Eh Ek Eh-k |/N1/2 ϕh ≈ ϕk + ϕh-k ∝ G = 2 | Eh Ek Eh-k |/N1/2
The tangent formula A reflection can enter into several triplets.Accordingly ϕh ≈ ϕk1 + ϕh-k1 = θ1 with P1(ϕh) ∝ G1 = 2| Eh Ek1 Eh-k1 |/N1/2
ϕh ≈ ϕk2 + ϕh-k2 = θ2 with P2(ϕh) ∝ G2 = 2| Eh Ek2 Eh-k2 |/N1/2 ……………………………………………………………………………………………………….
ϕh ≈ ϕkn + ϕh-kn = θn with Pn(ϕh) ∝ Gn = 2| Eh Ekn Eh-kn |/N1/2 Then P(ϕh) ≈ ∏j Pj(ϕh) ≈ L-1 ∏j exp [Gj cos (ϕh - θj )] = L-1 exp [α cos (ϕh - θh )] where
( ) 2/122 ,cossin
tan BTBT
GG
jj
jj +===∑∑
hh αθθ
θ
A geometric interpretation of α
The random starting approach To apply the tangent formula we need to know one or more pairs ( ϕk + ϕh-k ). Where to find such an information? The most simple approach is the random starting approach. Random phases are associated to a chosen set of reflections. The tangent formula should drive these phases to the correct values. The procedure is cyclic ( up to convergence). How to recognize the correct solution? Figures of merit can or cannot be applied
Tangent cycles
• φ1 φ’1 φ’’1 ……………. φc1
• φ2 φ’2 φ’’2 ……………. φc
2
• φ3 φ’3 φ’’3 …………….. φc3
• ……………………………………………………………………..
• φn φ’n φ’’n………………. φcn
•
Ab initio Standard Direct Methods ( SDM)
• As usual , random phases are assigned to the
NLARGE reflections ( those with the largest normalized structure factor amplitudes), and then a tangent formula is applied.
• Phase extension is made by edm-VLD-edm. • A suitable FOM recognizes the correct solution.
Direct Methods • Traditional DM are based on the H-K tangent
formula
h
h
khkkhkkhk
khkkhkkhkh B
TEEww
EEww
j jjjjjj
j jjjjjj =+
+=∑
∑
−−−
−−−
)cos(||
)sin(||tan
φφ
φφφ
2/1222/1 )(||2 hhhh BTENeq += −α
MDM approach • Each trial solution is explored as potential correct solution.
• That however requires a much higher efficiency of the
phasing approach, otherwise cpu time is horribly wasted. • • The folllowing modifications have been introduced: • 1) for triplet invariants the P10 instead of the Cochran
formula is used; • 2) weighting scheme of the tangent formula has been
deeply changed; • 3) psi-0 triplets and negative quartets are actively used ; • 4) misplaced models are translated in the correct positions
via RELAX procedure
• 100 small size structures • SDM VLD MDM • Nr.of solved all all all • <t>sec 14 14 7 • 85 medium size structures • Nr. of solved 80 83 all • <t>min 3.9 10 7 • 35 Proteins ( atom. Resol.) • Nr of solved 16 22 25 • <t>h 0.5 0.5 3.5 (0.4)