about the basic postulate of the structural …about the basic postulate of the structural...

About the basic postulate of the structural crystallography and

the phase problem

Carmelo Giacovazzo Istituto di Cristallografia, CNR,

Bari, Italy

[email protected]

Macromolecular Crystallography School Madrid 2017

Let us answer the following questions:

crystal structure ⇒ ?

⇒ crystal structure ?

As a consequence :

{ }2 F

{ }2 F

)]( 2exp[

)](2exp[

)2exp()2exp(

11

2

1,

11

2

∑∑

∑

∑∑

=≠=

=

==

−+=

−=

−=

N

jijiji

N

jj

N

jijiji

N

jjj

N

jjj

ifff

iff

ififF

rrh

rrh

hrhrh

π

π

ππ

{ } ( )rρ⇔2F

If we accept that only one structure exists with a given set of interatomic vectors, then the set of

interatomic vectors

Then only one structure model should exist, with calculated structure factor moduli fitting the set

|Fobs|. That is , with sufficiently small

{ } { } ( )rrr ρ⇔⇔− 2) Fji

∑∑ −

=h

h||

|||||

obscalcobs

cryst FFFR

BUT, is it possible construct

different electron density maps with the same |F|?

If that is possible , we should have different model with the same

amplitudes!!!!!

The basic postulate of the structural crystallography Only one chemically sound crystal structure exists compatible with the experimental diffraction data. This is a postulate and chemically sound is a necessary attribute.

It is on the basis of this postulate that we can accept that the about

one million of structures deposited in the DataBanks are

really and correctly solved.

Structures deposited in the CSDB up to 01/01/2012 For each range of Rcryst , Nstr and % are the number

of structures and percentage, respectively

ΔRcryst Nstr % 0.01-0.03 62774 10.5 0.03-0.04 122706 20.6 0.04-0.05 135525 22.7 0.05-0.07 163269 27.4 0.07-0.09 60651 10.2 0.09-0.10 13370 2.2 0.10- 0.15 18353 3.1 0.15- …. 3835 0.6

For the practical daily work the basic postulate of the structural crystallography transforms into:

in a diffraction experiment the phase

information is not lost, it is only hidden in the diffraction amplitudes.

Accordingly, any phasing approach is nothing else but a

method for recovering the hidden phases from the set of diffraction amplitudes.

Is the amount of information stored in the diffraction amplitudes always sufficient to

define the structure?

Since the data resolution defines the quantity of information provided by the diffraction experiment, we try to understand how this

quantity varies with the resolution.

Crystal with P1 symmetry. N= number of non-H atoms in the unit cell.

The diffraction experiment provides data up to dmin .

Nref= number of measurable reflections

Here V=k N , where k is usually between 15.5 and 18.5 for a small or medium size molecule ;

for a protein, owing to the presence of the solvent, k may be significantly larger, up to about 40 and more.

kNd

VV

N measmeas

ref 3min3

4π=Φ=

Φ= ∗

∗

∗

Qinf=number of measured symmetry independent reflections/number of structural

parameters Since 4N= number of structural parameters

necessary for defining the structure, then

By including the Friedel Law k

dQ 3

mininf

6π

≈

kdN

kNdN

NQ ref

3min

3min

inf34

13

44

ππ=⋅==

Qinf in P1 some values of dmin and of k.

dmin k=17 k=25 k=35

0.4 139 204 286

0.6 41 60.6 84.7 0.8 17.4 25.6 35.7

1.0 8.9 13.1 18.3

1.4 3.2 4.8 13.7 1.8 1.5 2.2 3.1 2.2 0.8 1.2 1.7 2.5 0.6 0.8 1.2 3.0 0.3 0.5 0.7 3.5 0.2 0.3 0.4

4.0 0.1 0.2 0.3

Ab initio small molecules

• Tests were made on 35 mostly organic structures

• ranging from Nasym=12 to Nasym =200. • Four protocols were used: • Prot. 1: standard DM; • Prot. 2 : standard DM+PSI-0 triplets; • Prot.3: standard DM+ PSI-0 triplets +

resolution bias correction . • Prot. 4: standard DM+ PSI-0 triplets +

resolution bias correction + SFE..

We positively answered the two basic questions:

crystal structure ⇒ ?

⇒ crystal structure ?

Two other questions are however necessary for starting the phasing approach

{ }2 F

{ }2 F

The third question: structure

{ }?ϕ⇔

( ) ( )

( ) ( )

( )( ) ( )00

'

'0

'

10

'0

11

2exp2exp

2exp

2exp2exp

)(2exp2exp

hXhXhX

hrhX

rXhhr

hhh

h

h

πϕππ

ππ

ππ

−=−=→

=

=

+==

∑

∑∑

=

==

h

j

N

jj

j

N

jjj

N

jj

iFiFFFi

ifi

ififF

O

O’

rj’ rj

The fourth basic question

How can we derive the phases from the diffraction moduli ? This seems contradictory: indeed

The phase values depend on the origin chosen by the

user, the moduli are independent of the user .

The moduli are structure invariants ,

the phases are not structure invariants.

Evidently, from the moduli we can derive information only on those combinations of phases ( if they exist)

which are structure invariants.

The simplest invariant : the triplet invariant Use the relation F’h = Fh exp ( -2πihX0) to check that the invariant Fh FkF-h-k does not depend on the origin. The sum (ϕh + ϕ k+ ϕ-h-k ) is called triplet phase invariant

( )

khkh

kh

kh'

kh'kh

XkhkXX

−−

−−

−−

=+

−−=

FFFiF

iFiFFFF

])(2exp[

2exp)2exp(

0

00'

πππ h

.

Structure invariants

Any invariant satisfies the condition that the sum of the indices is zero:

doublet invariant : Fh F-h = | Fh|2

triplet invariant : Fh Fk F-h-k

quartet invariant :Fh Fk Fl F-h-k-l

quintet invariant : Fh Fk Fl Fm F-h-k-l-m

……………….

Additional structure invariants when a model is availvable

Doublet invariant : Fh F-ph = | Fh F-ph |exp(iϕh-ϕph) very important in edm techniques

triplet invariants : Fh Fk F-h+k, Fh Fk F-ph+k Fh Fpk F-h+k , Fph Fk F-h+k

Fph Fpk F-h-k , Fph Fk F-ph+k Fh Fpk F-ph+k , Fph Fpk F-ph+k …………………………………………………………….

The prior information we can use for deriving the phase estimates may be so summarised:

1) atomicity: the electron density is concentrated in atoms:

2) positivity of the electron density: ρ( r ) > 0 ⇒ f > 0

3) uniform distribution of the atoms in the unit cell.

( ) ( )∑=

−=N

jjaj

1

rrr ρρ

r1 r2

ρa1

ρa2

The Wilson statistics • Under the above conditions Wilson ( 1942,1949)

derived the structure factor statistics. The main results where:

• (1) • Eq.(1) is : • a) resolution dependent (fj varies with θ ), • b) temperature dependent: • From eq.(1) the concept of normalized structure factor

arises:

∑ =>=< Nj jfF 1

22|| h

)/sinexp( 220 λθjjj Bff −=

2/11

2)/(∑ == Nj jfFE hh

The Wilson Statistics • |E|-distributions:

and in both the cases. The statistics may be used to evaluate the average themel factor and the absolute scale factor.

)||exp(||2|)(| 21 EEEP −=

)2/||exp(2|)(| 21 EEP −=

π

1|| 2>=< E

∑∑

−==

=jjj

N

jjj iBfifF hrhr

plot Wilson The

h πλθπ 2expsinexp 2exp

2

20

1

A ∑

− jj ifB hrπ

λθ 2expsinexp 0

2

2

s2 Fh0

( )22022 2exp BsFKFKFhhobsh −==

A

( ) ( )202202 2exp 2exp BsKBsFKF shobsh −Σ=−><>=<

20

2

2lnln BsKF

s

obsh −=

Σ

<

y x

Partition of a two-dimensional lattice in resolution shells, to approximately satisfy the following conditions: a) the variations of f in each shell is negligible; b) the number of lattice points in each shell is about constant

Typical < R2> curve versus d (Å) for proteins ( ______ ) Typical < R2> curve versus d (Å) for nucleic acid structures (----------).

The tangent formula A reflection can enter into several triplets.Accordingly ϕh ≈ ϕk1 + ϕh-k1 = θ1 with P1(ϕh) ∝ G1 = 2| Eh Ek1 Eh-k1 |/N1/2

ϕh ≈ ϕk2 + ϕh-k2 = θ2 with P2(ϕh) ∝ G2 = 2| Eh Ek2 Eh-k2 |/N1/2 ……………………………………………………………………………………………………….

ϕh ≈ ϕkn + ϕh-kn = θn with Pn(ϕh) ∝ Gn = 2| Eh Ekn Eh-kn |/N1/2 Then P(ϕh) ≈ ∏j Pj(ϕh) ≈ L-1 ∏j exp [Gj cos (ϕh - θj )] = L-1 exp [α cos (ϕh - θh )] where

( ) 2/122 ,cossin

tan BTBT

GG

jj

jj +===∑∑

hh αθθ

θ

A geometric interpretation of α

The random starting approach To apply the tangent formula we need to know one or more pairs ( ϕk + ϕh-k ). Where to find such an information? The most simple approach is the random starting approach. Random phases are associated to a chosen set of reflections. The tangent formula should drive these phases to the correct values. The procedure is cyclic ( up to convergence). How to recognize the correct solution? Figures of merit can or cannot be applied

Tangent cycles

• φ1 φ’1 φ’’1 ……………. φc1

• φ2 φ’2 φ’’2 ……………. φc

2

• φ3 φ’3 φ’’3 …………….. φc3

• ……………………………………………………………………..

• φn φ’n φ’’n………………. φcn

•

Ab initio Standard Direct Methods ( SDM)

• As usual , random phases are assigned to the

NLARGE reflections ( those with the largest normalized structure factor amplitudes), and then a tangent formula is applied.

• Phase extension is made by edm-VLD-edm. • A suitable FOM recognizes the correct solution.

Direct Methods • Traditional DM are based on the H-K tangent

formula

h

h

khkkhkkhk

khkkhkkhkh B

TEEww

EEww

j jjjjjj

j jjjjjj =+

+=∑

∑

−−−

−−−

)cos(||

)sin(||tan

φφ

φφφ

2/1222/1 )(||2 hhhh BTENeq += −α

MDM approach • Each trial solution is explored as potential correct solution.

• That however requires a much higher efficiency of the

phasing approach, otherwise cpu time is horribly wasted. • • The folllowing modifications have been introduced: • 1) for triplet invariants the P10 instead of the Cochran

formula is used; • 2) weighting scheme of the tangent formula has been

deeply changed; • 3) psi-0 triplets and negative quartets are actively used ; • 4) misplaced models are translated in the correct positions

via RELAX procedure

• 100 small size structures • SDM VLD MDM • Nr.of solved all all all • <t>sec 14 14 7 • 85 medium size structures • Nr. of solved 80 83 all • <t>min 3.9 10 7 • 35 Proteins ( atom. Resol.) • Nr of solved 16 22 25 • <t>h 0.5 0.5 3.5 (0.4)

about the basic postulate of the structural …about the basic postulate of the structural...

Documents