[methods in enzymology] macromolecular crystallography part b volume 277 || [3] noncrystallographic...

36
18 PHASES [31 The treatment just given uses probabilistic model building (as discussed in Bricogne 1) through the device of maximum-entropy placement distributions for these fragments. Besides the phase hypotheses previously used in con- junction with the random atom model (and which will now modulate these placement distributions, causing much more intense extrapolation and thus increasing the sensitivity of the LLG), we may now use a hierarchy of structural hypotheses that vary the types and numbers of known fragments to be included in the "soup" of random ingredients according to the knowl- edge accumulated in the field of macromolecular architecture (rotamers, building blocks, tertiary fold classifications, etc.). Sequential Bayesian inference can be conducted as described in Bri- cogne, 1 using these new statistical criteria with built-in stereochemistry in place of the old ones, testing increasingly complex structural hypotheses (prompted by expert knowledge) against simpler ones in a quantitative and fully automatic way, and merging phase determination and map interpreta- tion into a unique sequential decision process. Outlook The supremely Bayesian viewpoint just reached provides the natural conceptual foundation, as well as all the detailed numerical procedures, on which to build a genuine expert system for knowledge-based structure determination, capable of consulting all relevant structural information to compensate for the relative paucity of diffraction data that distinguishes the macromolecular from the "small moiety" situation. Structure-factor statistics with built-in stereochemistry constitute the "missing link" that had so far precluded the possibility of even contemplating ab initio phasing at typical macromolecular (i.e., well below atomic) resolutions. [3] Noncrystallographic Symmetry Averaging in Phase Refinement and Extension By F. M. D. VELLIEUXand RANDY J. READ Introduction A well-known method of noise reduction in image processing is the addition of multiple, independent images of the same object to generate an improved, averaged image) Using this method, one can increase the 1 W. O. Saxton, "Computer Techniques for Image Processing in Electron Microscopy," p. 230. Academic Press, New York, 1978. Copyright © 1997 by Academic Press All rights of reproduction in any form reserved. METHODS IN ENZYMOLOGY, VOL. 277 0076-6879/97 $25.00

Upload: fmd

Post on 07-Apr-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

18 PHASES [31

The treatment just given uses probabilistic model building (as discussed in Bricogne 1) through the device of maximum-entropy placement distributions for these fragments. Besides the phase hypotheses previously used in con- junction with the random atom model (and which will now modulate these placement distributions, causing much more intense extrapolation and thus increasing the sensitivity of the LLG), we may now use a hierarchy of structural hypotheses that vary the types and numbers of known fragments to be included in the "soup" of random ingredients according to the knowl- edge accumulated in the field of macromolecular architecture (rotamers, building blocks, tertiary fold classifications, etc.).

Sequential Bayesian inference can be conducted as described in Bri- cogne, 1 using these new statistical criteria with built-in stereochemistry in place of the old ones, testing increasingly complex structural hypotheses (prompted by expert knowledge) against simpler ones in a quantitative and fully automatic way, and merging phase determination and map interpreta- tion into a unique sequential decision process.

Outlook

The supremely Bayesian viewpoint just reached provides the natural conceptual foundation, as well as all the detailed numerical procedures, on which to build a genuine expert system for knowledge-based structure determination, capable of consulting all relevant structural information to compensate for the relative paucity of diffraction data that distinguishes the macromolecular from the "small moiety" situation. Structure-factor statistics with built-in stereochemistry constitute the "missing link" that had so far precluded the possibility of even contemplating ab initio phasing at typical macromolecular (i.e., well below atomic) resolutions.

[3] N o n c r y s t a l l o g r a p h i c S y m m e t r y A v e r a g i n g in P h a s e R e f i n e m e n t a n d E x t e n s i o n

By F. M. D. VELLIEUX and RANDY J. READ

Introduction

A well-known method of noise reduction in image processing is the addition of multiple, independent images of the same object to generate an improved, averaged image) Using this method, one can increase the

1 W. O. Sax ton , " C o m p u t e r T e c h n i q u e s for I m a g e Process ing in E l e c t r o n Mic roscopy , " p. 230. A c a d e m i c Press, N e w York , 1978.

Copyright © 1997 by Academic Press All rights of reproduction in any form reserved.

METHODS IN ENZYMOLOGY, VOL. 277 0076-6879/97 $25.00

Page 2: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY A V E R A G I N G 19

signal-to-noise (S/N) ratio by a factor of N 1/2, when N-independent images are being averaged. 2 The technique has been used with great success in the processing of transmission electron microscopy images, where object recognition and alignment are performed before the averaging opera- tion. 3

A similar situation is encountered frequently in macromolecular crystal- lography, where the presence of multiple copies of the same object in the asymmetrical unit is known as "noncrystallographic symmetry." The potential uses of such redundancy in helping to generate improved electron- density maps have long been known. 4

Background and Definit ions

Crystallographic symmetry is global and applies to the whole contents of the crystal. By contrast, noncrystallographic symmetry (n.c.s.) is local, and a given n.c.s, operation applies only to a single set of objects within the entire crystal. Thus, n.c.s, operations, which relate objects to each other, can be applied only locally in a coordinate system used to describe such objects, i.e., molecular models. Hence, a Cartesian system is typically used for this purpose, and most conveniently, Cartesian angstroms.

Two separate types of n.c.s, exist (Fig. 1): (1) proper symmetry, in which the set of n.c.s, operations forms a closed group so that they apply equally to all objects in the assembly. For example, with an n-fold rotation axis, the n.c.s, operation that superimposes object 1 onto object 2 is identical to that superimposing object i onto object i + 1 or object n onto object 1; and (2) improper "symmetry," in which the set of n.c.s, operations applying to object 1 (i.e., superimposing object 1 onto object 2, etc.) is different from that applying to object 2, to object 3, etc. Note that both types of n.c.s, may be encountered simultaneously within an asymmetrical unit, e.g., a dimer with proper twofold symmetry plus a monomer. The overall redundancy is threefold, but in practical terms it is a case of improper sym- metry.

A n.c.s, operation may be expressed, in matrix notation, as

X n = [ R n ] X 1 q- t~

where [Rn] is a rotation matrix, and tn is the associated translation com- ponent.

2 I. A. M. Kuo and R. M. Glaeser, Ultramicroscopy 1, 53 (1975). 3 M. van Heel, in "Crystallography in Molecular Biology" (D. Moras, J. Drenth, B. Strandberg,

D. Suck, and K. Wilson, eds.), p. 89. NATO ASI Series. Plenum, New York, 1987. 4 H. Muirhead, J. M. Cox, L. Mazzarella, and M. F. Perutz, J. Mol. Biol. 28, 117 (1967).

Page 3: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

a

20 I'HASES [31

L__ . . . . . . . . . . . Unit cell/asymmetrical unit

b

L Unit cell/asymmetrical unit

Fio. 1. The two types of noncrystallographic symmetry. (a) Proper local symmetry: example of a threefold local symmetry. (b) Improper local symmetry: example of an improper twofold screw symmetry.

The sort of n.c.s, that can be useful for symmetry averaging may arise from (1) the fortuitous presence in the asymmetrical unit of multiple copies of the same object, these being identical molecules, subunits, or even do- mains, and (2) the existence of several crystal forms of the same molecule (polymorphism).

Reciprocal Space Analysis of Elec t ron-Dens i ty Modification Procedures

Density modification originates from the notion that one might use prior knowledge about electron-density distributions as a source of phase information to improve current estimates of phases. This idea was intro- duced, in terms of noncrystallographic redundancy, by Rossmann and Blow, s who addressed the problem of the detection of identical subunits in the asymmetrical unit. Later, Main and Rossmann 6 expressed the con- straints due to the presence of n.c.s, as equations relating structure factors in reciprocal space (these being referred to hereafter as the "molecular-

s M. G. Rossmann and D. M. Blow, Acta Crystallogr. 15, 24 (1962). 6 p. Main and M. G. Rossmann, Acta Crystallogr. 21, 67 (1966).

Page 4: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 21

replacement equations"). Main 7 then incorporated into these equations the additional constraint of having featureless solvent regions. Crowther 8,9 reformulated the molecular replacement equations in a linear form. Bri- cogne proved the equivalence between the molecular-replacement equa- tions and the procedure of real-space averaging of electron densities of the objects related by n.c.s., I° and described some algorithms and computer programs required for this purpose, u After this, n.c.s, averaging rapidly became a useful tool not only to improve current phase estimates, but also to generate new phase estimates to a higher resolution than the current resolution limit (phase extension).

By their simplicity, the molecular-replacement equations have the ad- vantage of providing insight into the mechanism of phase improvement, and are therefore discussed below.

If a crystal is composed of N identical objects related by noncrystallo- graphic symmetry, plus featureless solvent regions, its electron-density map will be unchanged when averaging and solvent flattening are carried out. Because of errors in both the initial experimental phases and the observed amplitudes, a map computed with these generally will not satisfy this con- straint. However, an iterative procedure of modifying the density, back- transforming it, and computing a new map with the back-transformed phases will produce a set of phases that satisfies this constraint reasonably well. By setting p(x) = Pavg(X), and taking the Fourier transform of each side, one obtains the molecular replacement equations in reciprocal space. These equations express the n.c.s, constraint in reciprocal space and explain the propagation of phase information that leads to phase improvement and extension. For clarity, the derivation that follows is given for space group P1.

The equations are simpler if the (presumably constant) density in the solvent regions is zero. This can be arranged by subtracting ps from all density values in the entire map. The contrast between protein and solvent regions is unchanged, and the only change in the corresponding structure factors is in the Fooo term, which is reduced by Vps. In the following, we use

p ' ( x ) = p ( x ) - ps

N N = P~vg(X) = 1 / N ~ ~ p ' (Xmn)Mm(x ) (1 )

m=l n-1

7 p. Main, Acta Crystallogr. 23, 50 (1967). 8 R. A. Crowther, Acta Crystallogr. 22, 758 (1967). 9 R. A. Crowther, Acta Crystallogr. B25, 2571 (1969).

10 G. Bricogne, Acta Crystallogr. A30, 395 (1974). u G. Bricogne, Acta Crystallogr. A32, 832 (1976).

Page 5: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

2 2 PHASES [3]

where M m ( x ) is the mask function for object m, defined as 1 within the object and 0 outside, and X,nn is X transformed by the n.c.s, operation that superimposes object m onto object n, i.e.,

Xmn = [gmn]X q- Inn (2)

When both sides of Eq. (2) are Fourier transformed, the product inside the double summation becomes a convolution in reciprocal space, as is seen in the following. The Fourier transform of Mm(x) is defined as UGm(s), so that Gm is the interference function; it is continuous because it does not obey crystallographic symmetry, hence the use of the reciprocal space vector s instead of a reciprocal lattice vector (h, k, l).

Fp = fcell p'(x) exp(2rripx) dx = fcell P'vg(X) exp(2rripx) dx (3)

The expression for p'vg(X) is substituted from Eq. (1), while replacing p'(Xmn) and Mm(x) with their Fourier transforms.

N N

F p = fcell 1/N ~ ~ 1/V ~ Fh exp(-27rihxmn)U/V fs Gin(s) (4) m=l n=l h

exp(-2rrisx) ds exp(27rip- x) dx

Now we substitute for Xm, from Eq. (2):

N N

Fp = f 1/N ~ ~ 1/V E Fhexp(-27ri(h[Rmn])'x) m=l n=l h

exp(-27rih. Inn ) u / V f s G i n ( s )

exp(-27ris, x) ds exp(21rip, x) dx (5)

The terms in x are collected into the integral over the unit cell volume: N N

F p = U/NV ~ ~ 1/V~Fhexp(-27ri(h.tm~) f Gin(s) m=l n=l h s

exp(-27ris, x) 1/V f cell

exp(2rri(p - h[Rmn ] - s ) " x ) d x d s (6)

The integral over the unit cell disappears unless s = p - h[Rmn], in which case it is equal to V:

N N

Fp = U/NV~ Fh ~ ~ Gm(p - h[Rmn])exp(-2~rih'tmn) (7) h m=l n=l

Page 6: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 23

-1.0

G

- 2 . 0 I

).8

1.0 2.0 J

FIG. 2. The G function for a spherical molecular envelope. (Adapted from Rossmann and Blow. 5) The abscissa is the dimensionless quantity [[Rn]h - pl*r. The first quantity is the distance in reciprocal space by which the transformed point misses its target; the second is the radius of the molecular envelope.

Equat ion (7), also previously given by Arnold and Rossmann, 12 ex- presses the value of each structure factor Fp as the weighted sum, over the whole reciprocal space, of all structure factors including itself. The value taken by the weighting or interference function G will depend on the position and shape of the envelope of a single object, as well as on the n.c.s, operators [Rmn] and tmn. Note that a similar equation may be derived when the only redundancy of information in the asymmetrical unit is the presence of large featureless regions of solvent. In this case of density modification by solvent flattening alone, the integral given in Eq. (6) is evaluated only at position p - h. Thus the conclusions drawn from this analysis are also applicable, with adaptation, to density modification by solvent flattening alone.

The shape of the G function, shown for a spherical envelope in Fig. 2, suggests that significant interference between structure factors will occur only in the first positive peak of this function, i.e., at or near the origin; the argument of the function must be small. For density modification by n.c.s, averaging, we therefore must have h[Rmn ] - p ~ 0 , i.e., h - ~ [Rmn]Tp,

12 E. Arnold and M. G. Rossmann, Proc. Natl. Acad. ScL U.S.A. 83, 5489 (1986).

Page 7: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

24 PHASES [3]

where [Rmn] T is the transpose (and also the inverse) of the rotation matrix [Rmn]. The implication of these molecular-replacement equations in the case of n.c.s, averaging can be visualized in Fig. 3.

This has implications for the uses that may be made of n.c.s, averaging: the relationships among structure factors caused by the presence of n.c.s. do not extend over the whole of the reciprocal space, but are limited to the regions, related to each other by the rotation components of the n.c.s. operators, where significant interference among structure factors takes place. Thus, the presence of n.c.s, relates distant regions of reciprocal space, with the value of Fp being given by a small number of structure factors, including itself. Hence, n.c.s, averaging may be used not only to improve experimental electron-density maps at constant resolution, but also to assign

Current .~t"~,,. ~ , . , ~ ~ " p . resoluuon ~ \ 0 o e e e o o o e e e e o e e , o

e e e e o e o o e e o e o o o o o o e o l o l e o o e o e o e o e e o . o o o o e O ~ e e l e , O e ' O B O Q O O O O Q , Q Q OOOOtOOlOoBOOeO.OeOOOoe . a 6 o e o o e o a o o e o o O e e o e e a o I * o

~ ~ i e ~ ~ e e e l o o e o o o e a e e O e e e o o l . o o a O

:::::::::::::::::::::::::::::. . . . .

~ ! . ! i ? i i i . i i i i i i i i i ~ ." i~nintegral ~ . . . . . . . . . . . . . . . . . ~ . ~ f l a ~ i c e point

F~. 3. Relationships among structure Nctors caused by the presence of a noncwstall0- graphic proper three~ld axis of symmet~. G is the volume in reciprocal space where the G ~nction takes si~ificant values.

Page 8: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 25

map-inversion phases to unphased [F0l values in a procedure of phase completion or of phase extension. Furthermore, a consequence of these molecular-replacement equations is that some of the estimates for the missing reflections calculated by Fourier inversion of an averaged electron- density map have nonrandom values. Therefore, these can be used in a procedure of data completion.

The range over which map-inversion data computed from an averaged electron-density map can be used may be derived by considering the volume over which the G function takes significant values, i.e., within its first positive peak. As seen previously, h must transform close to p by the n.c.s, rotation operation for significant interference to take place between Fh and Fp, i.e., h ~-- [Rmn]Tp. To understand roughly the propagation of phase information, one may assume a spherical object of radius r. Significant values of the G function are achieved when I[Rmn]h - plr < 0.7, i.e., when d < 0.7/r, where d is the distance between [Rmn]h and p. If we assume reciprocal-space parameters 1/a, the number of reciprocal lattice points to be found within the volume of interest in any one direction is given by n < 0.7a/r. Such calculations may be used after map-inversion data have been obtained from the averaged map to decide, a posteriori, the validity of the estimates of structure factors provided by the averaging procedure. 13 The resolution of the G function (hence the degree to which phase information propagates) depends on the level of detail in the specification of the envelope. Because the envelope will not, in general, be spherical, the G function also will not be spherically symmetrical.

Practical Considerations

Detecting Noncrystallographic Symmetry

Noncrystallographic symmetry may arise in several ways and can be suspected or detected according to these criteria.

1. More than one crystal form of the same molecule has been grown. 2. Biochemical data indicate that a crystallized macromolecule is com-

posed of several identical subunits. 3. Solvent-content calculations TM indicate the presence of more than

one molecule or subunit in the asymmetrical unit. This may be verified by measuring the density of the crystals. 15

13 F. M. D. A. P. Vellieux, J. F. Hunt , S. Roy, and R. J. Read, J. Appl. Crystallogr. 28, 347. 14 B. W. Matthews, J. MoL Biol. 33, 491 (1968). 15 E. M. Westbrook, J. Mol. Biol. 103, 659 (1976).

Page 9: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

26 PHASES I31

4. Self-rotation function calculations may indicate the presence of multi- ple copies of the same object in the asymmetrical unit. Note, however, that such Patterson superimposition calculations provide information only on the rotation component of a local axis of symmetry. Any information about possible translation components (e.g., local screw axes) is unavailable in the absence of a molecular-search model. Also, the only information that can be derived concerns the orientation of the local axis of symmetry, but not, in general, its location.

5. It frequently is found that an n.c.s, axis runs parallel to a crystallo- graphic axis of symmetry of the same, or higher, symmetry. 16 In this case, the rotation peak in a self-rotation function is hidden under an origin peak due to the crystallographic symmetry. Nonetheless, the n.c.s, is betrayed by the presence of a nonorigin peak in the native Patterson function, which arises because the combination of crystallographic symmetry and n.c.s. generates translational pseudosymmetry. From the position of this peak one can deduce the position of the n.c.s, rotation axis relative to a crystallo- graphic rotation axis, as well as its screw component.

6. Cross-rotation function calculations followed by translation function calculations in the case of multiple copies of the same object in the asymmet- rical unit will reveal, when successful, both the orientation and location of the local axes of symmetry. These can be derived by superimposing the properly positioned objects onto each other.

7. When the presence of n.c.s, is suspected and the initial phasing is carried out using a heavy-atom method, information about the orientation and position of the local axis of symmetry often can be derived from the known location of the heavy-atom scatterers. Note, however, that (a) the presence of improper n.c.s. (e.g., a translation component along the rotation axis) will confuse the interpretation of heavy-atom positions in terms of a local axis of symmetry; (b) local proper twofold symmetry will require the presence of at least two pairs of heavy-atom sites to define both the orienta- tion and position of the local symmetry axis (Fig. 4); and (c) all heavy- atom sites will not necessarily obey the local symmetry axis, perhaps because of crystal packing effects: some of the equivalent sites may be inaccessible to the heavy-atom reagent. Also, a heavy-atom site may be located at a special position, i.e., on an N-fold n.c.s, rotation axis.

8. Direct observation of multiple copies of the same object (molecule, subunit, or domain) in an experimental electron-density map: Even when the resemblance of regions of density cannot be detected visually, the n.c.s. operations can be determined by the use of domain rotation 17 and phased

16 X. Wang and J. Janin, Acta Crystallogr. D49, 505 (1993). 17 p. M. Colman, H. Fehlhammer, and K. Bartels, in "Crystallographic Computing Techniques"

(F. R. Ahmed, K. Huml, and B. Sedlacek, eds.), p. 248. Munksgaard, Copenhagen, 1976.

Page 10: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 27

translation functions, 17'18 as applied in the structure determination of pertus- sis toxin. 19

Refinement of Noncrystallographic Symmetry Operators

To apply the technique of noncrystallographic symmetry averaging suc- cessfully in real space, accurate n.c.s, operators must be available. This usually means that one must refine the initial estimates for these operators. Such refinement may be per formed in the following manner.

When atomic coordinates are available (either in the form of a molecu- lar-replacement solution to the phase problem, or as a result of partial map interpretation), refinement may be accomplished readily by rigid-body refinement 2°-22 of the independent models. Otherwise (in the case of an initial electron-density map obtained f rom heavy-atom phases), such re- finement may be carried out by locating the site, in terms of rotation and translation parameters , of maximum correlation among regions related by the n.c.s. 11

Note that the initial estimates of n.c.s, operators provided by molecular- replacement calculations usually still will require refinement: molecular replacement proceeds by Patterson superimposition operations. A charac- teristic of Patterson maps is that they contain broader features than the corresponding electron-density maps23; a lesser precision is obtained by superimposing Patterson maps than by superimposing electron densities.

Envelope Requirements

Because n.c.s, is, by definition, only local and applies to a single set of objects within the entire crystal, the operation of averaging electron density over objects related by n.c.s, requires important information about the position and shape of the objects whose density must be averaged. This information, known as an envelope (Fig. 5), is essential for n.c.s, averaging: Without this information, averaging using the n.c.s, operators may average density belonging to objects to which the operators do not apply. This will lead to deterioration of the electron density in some areas of the map.

18 R. J. Read and A. J. Schierbeek, J. Appl. Crystallogr. 21, 490 (1988). 19 p. E. Stein, A. Boodhoo, G. D. Armstrong, S. A. Cockle, M. H. Klein, and R. J. Read,

Structure 2, 45 (1994). 2o j. L. Sussman, S. R. Holbrook, G. M. Church, and S. H. Kim, Acta Crystallogr. A33,

800 (1977). 21 A. T. Brtinger, Acta Crystallogr. A46, 46 (1990). z2 E. E. Castellano, G. Oliva, and J. Navaza, J. Appl. Crystallogr. 25, 281 (1992). 23 G. H. Stout and L. H. Jensen, "X-Ray Structure Determination: A Practical Guide,"

p. 280. John Wiley & Sons, New York, 1989.

Page 11: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

a

b

Rotation axis

FIG. 4.

Page 12: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 29

#

#

I

I

t

t

Rotation axis

FIG. 4. Determination of n.c.s, operators from heavy-atom positions. (a) Proper local twofold symmetry, with one pair of heavy-atom positions: the location and orientation of the local axis of symmetry are not defined. (b) Proper local twofold symmetry, with two pairs of heavy-atom sites: the location and orientation of the local symmetry axis are defined uniquely. (c) Proper local threefold symmetry, with three heavy-atom sites: both location and orientation of the local axis of symmetry are defined by the heavy-atom positions.

Because electron density is computed as discrete points on a regular grid, the envelope also is generated on a similar grid.

The envelope may be obtained in several ways. When an atomic model is available, one can assign all points near to atomic positions to be part of the envelope. When an initial model is not available, one can trace the envelope with a mouse, 11 although this operation is time consuming and tedious. Alternatively, applying "envelope-free" averaging (i.e., without an envelope) to an experimental map may be used to define the envelope. The lack of envelope information will lead to application of the "wrong" n.c.s, operations to parts of the density. The density in these regions will deteriorate, whereas it will improve in the regions where the known n.c.s. operations apply. This procedure requires some knowledge of the volume where the n.c.s, operations apply, because the map to be averaged should cover all points within this volume. An alternative to the simple averaging of electron density to define an envelope is the computation of a "local

Page 13: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

30 PHASES [3] Envel~

Asymmetrical unit

Y

\

f

!

FIG. 5. Example of an envelope encompassing two subunits in the case of a proper twofold axis of symmetry.

correlation map," where the resulting map contains correlation coefficients between regions related by n .c . s . 13'19'24 The envelope then can be obtained readily by the procedure of automatic envelope definition developed by Wang z5 and Leslie. 26

A perfect envelope can be obtained only when a complete atomic model has been obtained, i.e., when the structure has been solved. Hence, during structure determination, only imperfect envelopes can be generated at first. These should encompass the objects completely so as not to exclude regions of the macromolecules from the averaging process. Thus these envelopes must be larger than absolutely necessary. A consequence of this is that the envelope may extend into a region corresponding to another object, related to the first object either by crystal lattice translations, space group symmetry, or improper n.c.s. If these regions of overlap between envelopes are not taken care of properly, then some areas of the map will be averaged using the incorrect n.c.s, operator, leading to map deterioration. Thus, one algo- rithm used to take care of these regions of overlap between envelopes is simply to assign each point in the overlapping regions as belonging to the nearest, nonoverlapped object, as is shown in Fig. 6.

For the creation of the envelope, one must distinguish the cases of proper and improper noncrystallographic symmetries. In the case of proper n.c.s, with N identical objects, the set of n.c.s, operations relating these N

24 B. Rees, A. Bilwes, J.-P. Samama, and D. Moras, J. MoL Biol. 214, 281 (1990). 2s B. C. Wang, Methods Enzymol. 115, 90 (1985). 26 A. G. W. Leslie, Acta Crystallogr. A43, 134 (1987).

Page 14: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 31

Enve lope 1 - - - ~

Part ic le 1

. . Enve lope 2

..-""

.'f\

/i . ~11 , (e.g., f rom a ] i ~ latt ice contact)

Corrected enve lope

FIG. 6. The envelope overlap problem: case of an overlap region between two objects. The correction to be applied to the envelope assigns points in the overlap region to the nearest nonoverlapped grid points.

objects forms a closed group; it is not necessary to distinguish among the N objects to generate the envelope. A single envelope encompassing all objects is sufficient, and will carry information as to whether a point belongs to one of the N objects, or is part of the solvent region. For improper noncrystallographic symmetry, in which a different n.c.s, operator applies to each of the N objects, one must distinguish the envelope for each object. The envelope corresponding to each individual object must be created separately, then merged together, with all overlap regions dealt with prop- erly. The resulting envelope then will convey information about a point belonging to the solvent region, to object 1, to object 2, etc. In all cases, envelope overlaps due to space group symmetry and crystal lattice transla- tions must be taken care of.

Iterative Noncrystallographic Symmetry Averaging

To perform iterative noncrystallographic symmetry averaging, one must execute the following steps sequentially:

1. Map calculation [preferably by fast Fourier transform (FFT) methods]

2. Reduction of the map to the crystallographic asymmetrical unit 3. Symmetry averaging and solvent flattening 4. Generation of the unit cell from the averaged map

Page 15: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

32 PHASES 131

5. Computation of Fc values from the averaged map (preferably by FFT methods)

6. Scaling of the IFcl values to the IF01 values 7. Computation of weights, phase combination, and data extension

During iterative map improvement, possibly including phase extension, the overall strategy can be summarized as follows: the initial map to be averaged must be the best possible map that may be computed. In the case of a structure determination by heavy-atom methods, this means a figure-of- merit weighted m lF0l exp(iabest) map. When a molecular model is available for the computation of a starting set of phases, two types of map coefficients may be used to compute the initial map: either Sigmaa 27 map coefficients of the form (2mlF0l - DIEd) exp(iOtmodel) for acentric data and mlF01 exp(iC~model ) for centric reflections, or as unweighted (31F0l - 21Eel) exp(iOtmodet) coefficients for acentric reflections and (2If01 - tEd) exp(iC~modei) for centric data. Averaging is carried out then, starting with a few cycles during which only the reflections carrying an initial phase are used. After this, data completion, phase and amplitude extension, or exten- sion of the resolution may be performed. Customarily one refines the phases with a few cycles of the averaging procedure in-between extension or completion steps. Because the phase or amplitude estimates provided by this method are initially nonrandom but not yet optimal, they are im- proved to better estimates by such iterations.

The parts of this cycle dealing with map calculation and map inversion by FFT procedures need not be discussed here, apart from mention of the amplitude coefficients to be used. These can be simply m*lF0l exp(ia~al~), or probably better coefficients of the form m*(21F01 - IFol) exp(iacalc) for acentric reflections and m*lF0l exp(io~alc) for centric data, where m is usually computed using the Sim weighting scheme 28,29 (although other weighting schemes are also being used; see below).

Several algorithms are available to perform the averaging of densities and solvent flattening. Perhaps the best known of these, for historical reasons, is the double-sorting technique developed both by Bricogne 11 and Johnson 3° at a time when computers were not as large or powerful. Being cumbersome and difficult to implement, and not making full use of the capacity of present-day computers, the double-sorting technique is not discussed further.

27 R. J. Read, Acta Crystallogr. A42, 140 (1986). 28 G. A. Sim, Acta Crystallogr. 12, 813 (1959). 29 G. m. Sim, Acta Crystallogr. 13, 511 (1960). 30 j. E. Johnson, Acta Crystallogr. !i34, 576 (1978).

Page 16: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 33

Two other algorithms used for this step keep both the whole map to be averaged and the envelope in computer memory. In the first of these, all density to be averaged can be placed first in an artificial box, in which the averaging procedure is carried out before replacing the averaged density into its original crystal form. 31 We describe the second of these as it is implemented in the program suite DEMON/ANGEL. 13 Both the unaver- aged map, corresponding to one complete crystallographic asymmetrical unit, and the envelope are kept in memory. The averaged electron-density map also will correspond to one asymmetrical unit and typically is generated on a grid corresponding to one-third of the high-resolution limit. The pro- gram sequentially scans each point of this output map. Each of these points is mapped to the envelope. When belonging to the solvent region, the density may be kept as its original value, or replaced by a fixed density, possibly corresponding to the previously computed average solvent density (solvent flattening). For a point belonging to an object, its Cartesian ang- strom coordinates within the envelope are computed. The n.c.s, operators then are applied in turn to these Cartesian coordinates. Each n.c.s, equiva- lent point (also expressed in Cartesian angstroms) is mapped to the unit cell in fractional coordinates, then folded to the asymmetrical unit. The resulting fractional coordinates then are transformed into nonintegral grid points of the input map, and interpolation is performed to obtain the unaveraged electron density at this, possibly nonintegral, grid point of the input map. Several types of interpolation can be used. An eight-point linear interpolation requires an input map computed using a grid corresponding to at most one-fifth to one-sixth of the high-resolution limit. Alternatively, an ll-point nonlinear interpolation, or a 64-point interpolation by means of cubic splines may be used, using an input map computed with a grid size of ca. one-third to one-fifth of the resolution limit. The average density over the N n.c.s, equivalents is then computed, and this value is written to the output map.

This procedure also has been adapted for the averaging of densities between crystal forms, where the input and output maps do not correspond to the same crystal forms. The program still keeps only one input map plus one envelope in memory, but an additional input map, already containing density corresponding to a second crystal form, also is provided to the program. This second input map is on the same grid as the output map, which will contain, after the averaging operation, the averaged density between the two crystal forms.

31 M. G. Rossmann, R. McKenna, L. Tong, D. Xia, J.-B. Dai, H. Wu, H.-K. Choi, and R. E. Lynch, J. Appl. Crystallogr. 25, 166 (1992).

Page 17: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

34 PHASES [3]

One must realize that, in the case of averaging between crystal forms, an additional step of scaling the protein densities within the program regions of the envelope should be carried out, especially if the X-ray data have been left on arbitrary scales. It is preferable to place all data sets used to compute the different maps on an absolute scale, but this is difficult to achieve accurately because of the limited resolution. In addition, electron- density maps for different crystal forms will have different nominal or effective resolution limits, either because the data have been collected or phased to different resolutions or because the figures of merit fall off differently. Furthermore, the FFT programs used to compute the electron- density maps usually generate maps on an arbitrary scale.

Two types of scaling of electron-density maps can be performed, the simplest being linear scaling. If the original densities in the protein regions of crystal forms 1 and 2 are called pa and pe, respectively, then the linear scaling algorithm will determine the values of two parameters A and B that will minimize E(p~ - p l ) 2 with p~ = Ape + B. The second type of scaling, which may be performed subsequently, is by means of histogram mapping, 13,32,33 in which the histogram of densities of pe is made equal to that of pl (or both are matched to a theoretical histogram). This can be used either if the data sets are not on the same absolute scale, or if the two densities to be averaged do not have the same effective resolution (Fig. 7).

After FFT inversion of the averaged electron density, proper treatment of the map-inversion amplitudes involves the computation and application of a scale and isotropic temperature factor to bring the ]Fcl values onto the same scale as the IF01 values. There is one major reason for this requirement: the electron-density map from which the density values are fetched to calculate the averaged map usually is computed from figure-of-merit weighted structure-factor amplitues. These normally fall off with increasing resolution, so that a direct inversion of this map would lead to IF',I values having additional fall-off. This fall-off can be corrected by the application of an artificial isotropic temperature factor. In addition, Bricogne 11 sug- gested that the use of interpolation during the averaging process causes a loss of spectral power that amplifies at high resolution, also causing the need for correction with an artificial temperature factor. However, the improvement in density obtained by averaging can be sufficient to mask this effect, especially when a single interpolation step is performed as opposed to algorithms that use multiple interpolation steps.

After scaling of the map-inversion structure-factor amplitudes, the next

32 K. Y. J. Zhang and P. Main, Acta Crystallogr. A46, 41 (1990). 33 V. Y. Lunin, Acta Crystallogr. D49, 90 (1993).

Page 18: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 35

step is computation of weights for the next iteration to downweight the map-inversion phases as a function of their probability. Three types of weighting scheme are used, none of which is entirely satisfactory. This is because these statistical weighting schemes derive phase probabilities by comparing the scaled [Fc[ values to the IF01 values. Because density modifi- cation procedures always rapidly bring good agreement between calculated and observed structure-factor amplitudes, the resulting weights are usually overestimated. This can be judged by test calculations, when errors are deliberately introduced into the averaging process. The figure-of-merit val- ues still will increase (not shown). The three weighting schemes that are being used are as follows.

1. Sim weighting, 28,29 in which the averaged map-inversion amplitudes are assumed to be derived from a partial structure: The weights are calcu- lated as w = II(X)/Io(X), where 11 and I0 are the modified Bessel functions of order 1 and 0, respectively, and X = 2 1 F o l l F c l / < l l F o l - IFcll2).

2. Rayment weighting, 34 in which the probability of correctness of a phase is estimated from the pairwise comparison of structure-factor ampli- tudes, the weight being computed as

w -- e x p [ - ( I F c l - IFol)/IFol] 3. Sigmaa weighting, 27 where map-inversion amplitudes are assumed

to be derived from a partial structure containing errors.

When data extension is also performed by replacing missingreflections as map-inversion amplitudes and phases, the weight given to the newly introduced reflections may be given as a fraction of the average weight in the resolution bin of the reflection being introduced.

At this stage, one may want to combine the phases obtained from the averaged map with the initial, experimental phases. This step is simply carried out by summing the Hendrickson-Lattman coefficients 35,36 defining the two phase-probability distributions. This may be required, e.g., if the envelope is suspected to truncate part of the protein so that, if solvent flattening is also carried out during averaging, the truncated protein density will reappear in the next map. One word of caution must be professed against this possible phase-combination step--probability distributions can be combined in this manner only if they refer to independent events. Obviously, the probability distribution for the phases obtained by Fourier transform of an averaged electron-density map is not independent from

34 I. Rayment, Acta Crystallogr. A39, 102 (1983). 35 W. A. Hendrickson and E. E. Lattman, Acta Crystallogr. B26, 136 (1970). 36 W. A. Hendrickson, Acta Crystallogr. B27, 1472 (1971).

Page 19: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

a

5~

"i J /

m

J

FIG. 7.

Page 20: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

c <~ ..: ~ ~ °

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 37

FIG. 7. Effects of map scaling by histogram-mapping techniques. (a) Twofold averaged electron density for residue Glu-7 of a monoclinic crystal form of hen egg white lysozyme (Protein Data Bank access code: 1LYM), computed to a nominal resolution of 2.5 A. (b) Threefold averaged electron density obtained starting from the averaged map shown in (a). The map that has been added is computed using data from the triclinic crystal form of hen egg white lysozyme (PDB access code: 1LZT) to a nominal resolution of 2.0 ,~, and linear scaling has been carried out to minimize the r.m.s, difference in the protein regions of the two maps. (c) Same as (b), except that a step of scaling by histogram mapping has been performed to match the histogram of densities in the protein regions of the 2.5-A resolution map to that of the higher resolution map. Electron densities are contoured at the 1.0tr level.

that of the phases used to generate the initial electron-density distribution. Phase combination therefore will result in the overestimation of the quality of the resulting, combined phases. This effect can be alleviated by using an "inspired relative weighting" scheme, 11 in which damping factors are applied to both of the phase-probability distributions, with Peomb(O0 = p(OLinitial) wi * p (Or . . . . . ging) wa"

Alternatively, an ad-hoc weighting function 13 can be derived if we as- sume that both phase sets should contribute equally to the resulting com- bined phase set: an initial phase-combination step is carried out, and the average cosines of the phase difference between the combined phase and each of the two starting phases are computed in resolution bins. From

Page 21: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

38 PHASES [3l

these, difference cosine values are obtained in resolution bins as A = (COS(O/combined - - O/init ial)) - - (COS(O/combined - - O/averaged) ) . The scale factor to be applied to the Hendr ickson-Lat tman coefficients of the initial phase set is then computed as S = 1.0 - A 1/2 if A is positive, or as S = 1.0/(1.0 - A a/2) if A is negative. Such scale factors obviously can be applied in a bin- to-bin smoothly varying fashion, and should be applied only once to the set of coefficients corresponding to the initial phase set, during the initial phase-combination step.

Detec t ion a nd Effect of Er rors

As in any crystallographic procedure, errors may occur. These can have disastrous consequences. Therefore it is important to understand how these errors may arise, and to know what their effect will be and how one may at tempt to correct them. A problem should be suspected when one fails to improve the electron density or to extend the resolution using n.c.s. averaging. The following are potential source of errors.

Insufficient Quality of Measured Amplitudes. Because these amplitudes constrain the density modification procedure, results will be bet ter if mea- surements are accurate.

Inaccurate Definition of Noncrystallographic Symmetry Operators. The general rule is, use only well-refined n.c.s, operators. Two types of error can be made, either in the rotation or translation component of the operator. These will have different consequences: an error in the rotation matrix JR] will lead to a progressive deterioration of map quality from the axis of rotation toward the surface of the object (Fig. 8). By contrast, an error made in the location of the rotation axis will lead to an overall deterioration of the map (Fig. 8).

Use of Erroneous Envelopes. The envelopes are usually larger than absolutely necessary. The use of far too large envelopes will limit the power of the method because the phase improvement from the flattening of solvent regions will be reduced. Envelopes that truncate part of the protein will lead to disappearance of protein density if solvent is being flattened. It is similarly bad if the envelope is in the wrong place (Fig. 9). Manual envelope editing may improve such erroneous envelopes.

Use of Invalid Estimates of Structure Factors. The relationships among structure factors caused by n.c.s, do not extend over the whole reciprocal space, but in practice concern only reciprocal lattice points located in small regions related by the rotation components of the n.c.s, operators. Hence, one should not increase resolution too fast during phase extension. Errors also would arise if map-inversion structure factors were used in place of missing data when large volumes of reciprocal space are missing systemati-

Page 22: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 39

rotation

~ ~ ' ~ Correct rotation about N ~wrongrotationaxis

Assumed / I rota!ion / ]

FIG. 8. The effect of errors made in the rotation matrix [R] or in the translation component [t]. (a) The assumed rotation is 180 °, whereas the correct rotation is 175°/185°; (b) error in the location of a proper local twofold axis of symmetry.

Page 23: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

40 PHASES 131

Correct

t a

t

\ Erroneous envelopes ~ .../

Fio. 9. Effect of placing the envelope in the incorrect location. Case of an improper twofold operation.

cally. Here (Fig. 10), only valid estimates should be used to replace the missing data; data should be completed stepwise. Also remember that in this work four sets of phases are compatible with the measured structure- factor amplitudes: those corresponding to the positive structure, with phases a; the phase set of the Babinet opposite of the structure, i.e., its negative image, with phases a + 180°; the phase set of the enantiomeric structure, with phases - a; and the phase set of the Babinet opposite of the enantiomer, with phases - a + 180 °. If one proceeds incautiously when different regions of reciprocal space are poorly tied together by the n.c.s, equations, a mixture of these different phase sets may arise (see also below).

Selected Applicat ions of Method

From an educational point of view, only applications that have met with problems are discussed in this section.

Improvement of Heavy-Atom Maps

A difficult structure determination is that of aldose reductase, described by T~te-Favier et aL 37 The procedure that eventually succeeded is depicted

37 F. T6te-Favier, J.-M. Rondeau, A. Podjarny, and D. Moras, Acta Crystallogr. 1)49, 246 (1993).

Page 24: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] N O N C R Y S T A L L O G R A P H I C SYMMETRY A V E R A G I N G 41

Resolution limiting

G sphere

Data collection rotation axis

FIG. 10. Example of a nonrandom distribution of missing data: case of a missing cusp. In such a case, the introduction of missing data should be carried out stepwise, taking into account the volume of reciprocal space where the G function takes nonnegligible values.

in Fig. 11, and can be summarized as follows: the monomer ic enzyme aldose reductase crystallizes in space group P1 with four molecules in the asymmetric unit, related to each other by pseudo 222 local symmetry. Initial SIR phases were computed using data f rom a single mercury derivative, and had a mean phase error of 70 °. Self-rotation function calculations showed the presence of local 222 symmetry, although, as seen previously, these computat ions do not allow a test for translation components along the local twofold axes. The local symmetry was assumed to be 222 at this stage, and this was supported by the locations of the heavy-atom sites. The molecular envelope was generated from the SIR map by computing a correlation function-based map, 24 leading to an inflated solvent content of 45%. The initial SIR map, computed with data f rom 12- to 3.5-A resolution, was subjected to averaging using the proper 222 n.c.s, operators, which included phase combination with the initial phases. This led to a phase set having a phase error of 90 °, and gave an averaged map of lower quality than the initial map. In a second attempt, the envelope was judged as having been too small and therefore was enlarged by adding a 2-A-thick layer at the surface of the previous envelope, giving a solvent content reduced to 17%. The averaging procedure was repeated with the assumed 222 opera-

Page 25: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

42 PHASES 131

SIR phasing

Averaging using: 222 Symmetry 45% solvent mask Incomplete data

I Mask improvement

Complete data

Averaging using: 222 Symmetry 17% solvent mask Extended data

I [ Exact symmetry [

Averaging using: Pseudo 222 symmetry 17% solvent mask Extended data

70 ° error

; [ 90 ° error

55 ° error

[ 43 ° error

FIG. 11. Flow chart describing the three different attempts at real-space refinement in the structure determination of aldose reductase. (Reproduced with permission from T~te-Favier et al. 37)

tors, this t ime with inclusion of all unphased or unmeasured reflections between 83- and 10-.4, resolution. The resulting 3.5-A resolution phase set had a phase error of 55 ° and the resulting averaged map showed convincing density for 80% of the main-chain atoms and for 60% of the side-chain atoms. This allowed construction of an initial molecular model. At tempts to extend the phases further to 3.07-A resolution failed, giving an average phase error of 74 ° between 3.5 and 3.07 A. Next the n.c.s, operators were optimized by rigid-body refinement of the four independent monomers . This led to an improved definition of the n.c.s, as pseudo 2t2 t2 t in which the three twofold axes do not intersect. The envelope was recomputed on a monomer -by -m onom er basis, and averaging was carried out as described previously. This led to a phase set having a mean phase error of 43 °. Heavy-

Page 26: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 43

atom rerefinement versus the phases obtained from the averaging procedure led to improved SIR phases, which were used for a final procedure of map averaging in which the resolution was extended successfully to 3.07 ,~. The average phase error to 3.5 ,~ had dropped to 38 °, and that between 3.5- and 3.07-A resolution was 44 °, giving a map in which the complete model could be interpreted.

Remov ing Initial Model Bias

In the structure determination of the avidin-biotin complex, 38'39 Livnah et al. used a partially refined avidin model (provided by A. P~ihler and W. A. Hendrickson) as a search model for molecular replacement. The molecular-replacement solution revealed two copies of the molecule in the asymmetrical unit, and gave an initial R factor of 46% in the resolution range 15-3.0 A. After initial model rebuilding, the 3.0-,~ resolution 2[F0l - IFcl electron-density map showed unclear electron density for several of the loop regions connecting/3 strands in the structure. Phase extension by twofo ld averaging improved the density in these regions. The molecular envelope was defined from the molecular model by assigning all points within 4.0 A from any atomic coordinates to the protein region. The averag- ing procedure was initiated from a Sim weighted 28'29 2IF01 - IFcl electron- density map computed using model phases in the resolution range 15-4.0 * . During the real-space averaging, no phase combination was carried out, and 15 phase-extension steps extended the resolution from 4.0 to 3.0 ]k. The resulting 3.0-,~ resolution averaged map showed convincing density for the loop regions (Fig. 12) and allowed construction of the correct molecular model. Phase extension by twofold real-space averaging success- fully removed the bias from the initial phasing model.

Data Complet ion

As seen previously, n.c.s, averaging may be used to generate meaningful estimates for missing reflections. An example is the structure determination of glycosomal glyceraldehyde-phosphate dehydrogenase from Trypano- soma brucei. 4°m Very few crystals of this enzyme were grown using the little protein material purified from the glycosomes. Unfortunately these

38 O. Livnah, E. A. Bayer, M. Wilcheck, and J. L. Sussman, Proc. Natl. Acad. Sci. U.S.A. 90, 5076 (1993).

39 O. Livnah, Ph.D. thesis. The Weizmann Institute of Science, Rehovot, Israel, 1992. 40 F. M. D. Vellieux, J. Hajdu, C. L. M. J. Verlinde, H. Groendijk, R. J. Read, T. J. Greenhough,

J. W. Campbell, K. H. Kalk, J. A. Littlechild, H. C. Watson, and W. G. J. Hol, Proc. Natl. Acad. Sci. U.S.A. 90, 2355 (1993).

41 F. M. D. Vellieux, J. Hajdu, and W. G. J. Hol, Acta Crystallogr. DSI, 575.

Page 27: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

44 PHASES [31

0

b A Aq, SS S ~ sS S ~

S S •

FIG. 12. Electron-density maps before and after the averaging procedure, in the loop region between residues 102 and 110. (Reproduced with permission from Livnah. 39) Electron densities are contoured at the 1.0o" level. (a) The initial 8- to 3-A resolution 2F0 - Fc map, together with the initial, incorrect tracing of the loop. (b) The map obtained at completion of the averaging and phase-extension procedure, together with the correct model. Also shown in dashed lines is the Ca tracing of the initial, incorrect model for this loop.

Page 28: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 45

60

40

I I I

0 6.24 5.42 4.86 4.44 4.12 3.85 3.63 3.45 3.29

Resolution (/~)

FIG. 13. Completeness of the Laue data set, used in the procedure of data completion by sixfold averaging, as a function of the resolution.

~D 20

r..)

crystals also were extremely sensitive to X-ray radiation42: although the initial diffraction pattern recorded using monochromatic X-ray radiation at the Daresbury synchrotron source extends to ca. 2.8 ,~, it rapidly decays beyond 5 A. With the scarcity of available crystals, the crystal-efficient Laue method was used to collect a 3.2-A native data set with only two crystals, whereas a monochromatic data collection strategy would have required ca. 20 crystals. The data obtained after processing of the four Laue film packs were only 37% complete in the resolution range of 7.0 to 3.2 A (Fig. 13). The structure was solved by molecular replacement, using as a search model the homologous enzyme from Bacillus stearothermophilus, 43 which has 52% sequence identity to the glycosomal enzyme. This revealed the presence, in the asymmetrical unit, of one and a half tetrameric mole- cules, thereby indicating sixfold redundancy. Noncrystallographic symmetry operators were refined by maximizing the correlation coefficient between JFc[ 2 values and IF0[ 2 values, using the program BRUTE. 44 A model was built into the initial electron density to generate the envelope used during density modification. This was necessary owing to the presence of large surface insertions in the T. brucei enzyme subunit with respect to the bacterial enzyme. The initial map then was improved in an iterative proce-

42 R. J. Read, R. K. Wierenga, H. Groendijk, A. Lambeir, F. R. Opperdoes, and W. G. J. Hol, J. Mol. Biol. 194, 573 (1987).

43 T. Skar~yfiski, P. C. E. Moody, and A. J. Wonacott, J. Mol. Biol. 193, 171 (1987). 44 M. Fujinaga and R. J. Read, J. Appl. Crystallogr. 20, 517 (1987).

Page 29: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

46 PHASES [3]

dure of 36 cycles of 6-fold averaging, in which the 63% missing reflections were introduced gradually as map-inversion amplitudes and phases. This procedure succeeded, largely because the missing data were randomly dis- tributed throughout reciprocal space. The resulting averaged electron-den- sity map, which had been computed with the 37% amplitudes from the Laue data set, plus the 63% missing reflections estimated during the averaging procedure, was of good quality. A complete molecular model was built that has been refined to an R factor of 17.6% using noncrystallographic symmetry restraints. Using the refined model phases as a reference set, one can analyze the results of the averaging procedure both for the observed and for the unobserved, missing data (Fig. 14). These scatter plots indicate, for the observed data, that the phases obtained from molecular replacement form a cloud along the main diagonal, thereby indicating their nonran- domness. This cloud shrinks along the diagonal during the averaging proce- dure, thereby indicating an improvement in the quality of the phases. For the unobserved reflections, a much more diffuse but similar cloud is also observed for the phases computed from the molecular replacement model (these phases were obviously not used for the computation of the initial map). Similarly, the cloud has collapsed along the main diagonal at comple- tion of the averaging procedure. The phase estimates provided by the procedure for the missing reflections also are nonrandom. Similarly, using additional Laue data collected after completion of the averaging procedure, one can demonstrate that the structure-factor amplitudes estimated by this procedure also are nonrandom (not shown). Hence, all missing data were gradually replaced by nonrandom estimated values, thereby improving map quality.

Parts o f Structure Not Obeying Assumed Symmetry

In their study of the three-dimensional structure of the homotetrameric enzyme flavocytochrome b2, Xia et al. 45 have described an interesting situa- tion of twofold n.c.s., in which one of the enzyme domains does not obey the n.c.s. Each monomer is made up of two domains, a 411-residue flavin- binding domain and an external, 100-residue cytochrome bz domain. The crystals of space group P3221 contain a proper molecular fourfold axis coincident with the crystallographic twofold axis, thus the asymmetrical unit contains half a tetramer. In the solvent-flattened multiple isomorphous replacement (MIR) map, one of the two cytochrome b2 domains shows no interpretable density and is disordered, whereas both flavin-binding do-

45 Z.-X. Xia, N. Shamala, P. H. Bethge, L. W. Lim, H. D. Bellamy, N. H. Xuong, F. Lederer, and F. S. Mathews, Proc. Natl. Acad. Sci. U.S.A. 84, 2629 (1987).

Page 30: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 47

mains obey the twofold n.c.s. (Fig. 15). Although density modification by means of twofold averaging was not necessary to solve the structure, one could have improved the electron density by this procedure. This would have required an envelope showing the location of the two flavin-binding domains (where twofold averaging could have been performed), the or- dered cytochrome b2 domain (where the density should have been left unmodified), and the remaining regions (where solvent flattening could have been applied).

Refinement of Phases to Phase Sets Other Than That of Positive Structure

The most spectacular examples of the use of n.c.s, averaging are proba- bly observed in the structure determinations of ieosahedral viruses; these determinations make use of the high redundancy of information due to the high-order n.c.s, of the viral capsid. In procedures of phase extension applied to viral particle crystal data, several groups have observed that the technique can converge on phase sets other than those of the correct, positive structure.

In the case of the canine parvovirus structure determination, Tsao et aL 46 initiated phasing from a shell of uniform density, of inner radius 85 and outer radius 128 ,~, which had been positioned according to the results of self-rotation function calculations. Phasing was initiated at 20-A resolu- tion, with 10 cycles of 60-fold averaging using IF0[ values and calculated phases, after which the unobserved reflections were replaced by map inver- sion-calculated data in an additional 10 cycles of averag!ng at constant resolution. Phase extension was then carried out to l l -A resolution in steps of one reciprocal lattice point at a time, then further to 9 ~,, but the agreement between observed and map-inversion amplitudes dropped steadily as the resolution was increased and the procedure was judged not to be converging. A difference Fourier map from a partial KzPtBr6 deriva- tive data set was computed and averaged over the 60 equivalent noncrys- tallographic asymmetrical units. This showed a single negative peak (Fig. 16), indicating that the set of phases generated by the averaging procedure had provided a Babinet opposite of the correct solution. This had been caused by the incorrect estimation both of the diameter of the hollow phasing shell and of the particle center. The SIR phasing at 8-A resolution, followed by gradual phase extension to 3.25 ,~, interspersed with refinement of the particle center, provided an interpretable 3.4-.~ resolution electron- density map corresponding to the enantiomeric structure.

46 j. Tsao, M. S. Chapman, H. Wu, M. Agbandje, W. Keller, and M. G. Rossmann, Acta Crystallogr. B48, 75 (1992).

Page 31: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

a 360.0

180.0

o

- ; , . . n . a . l

. . :

I '~~ ~ 'o~,"~:i"~i~:~i . . . . : , : ~"-[

/ : : ~ : " ~ : ; ' ~ , ~ : ~ , ~ " , : ~ • - . ' " : : • " ' ' : : ' 1

I~ ,~ - ' ; :~ :~ ,~ ,~ . '~ :~ . ' : : . . " - '~ . :" ..". - • . . . - :- ' - , .:f~ ."~| :.. ~:e~.-.,:,-:. ~;, . . ~ .~ . . .: . . . . , . . . . . - : : ,, , ~ : "

I ~ : ~ . ~ ; , ! : . ~ , : : ' ; " ~ ~ : ; f f ~ ~"..~- ~ . . . . .: " ' " : ~ ~: ! . . . - . ' . . . . . . . . ,-. ~':I 0.0 l,t~.~..'~;,'~.~.~-~.::.-:.,:: .... i : ;:~ ( : ~ .:.~:,:.,.i'.~.,- ;.--.;~

0.0 180.0 360.0 Molecular replacement phase, after EMX (degrees)

b

360.0

~¢~ 180.0 •

"go

0.C 0.0

|

%

• . . . g : ~ ' . ~ . : . : . . - . ~ ~ : - .

..,~ ,~,~.::: .: : ..

180.0 360.0 Phase from density averaging, cycle 36 (degrees)

FI6. 14. Phase scatter plots comparing several phase sets to the phases computed from the refined molecular model. (Reproduced with permission from Vellieux et aL 41) (a) Phase set computed from the initial molecular-replacement model, for the observed reflections only. (b) Phase set obtained at completion of the iterative averaging procedure, for the observed reflections only. (c) As in (a), for the unobserved, missing reflections. (d) As in (b), for the unobserved, missing reflections.

Page 32: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] N O N C R Y S T A L L O G R A P H I C S Y M M E T R Y A V E R A G I N G 49

360.0:

~ 180.0

0.~ 0.0

~ " ' , ' ," ,:~', ,~' ' . - : : " ' ' : ~7 5> ' ~ . ~ ' ~ ; ' . . . - . " . ~ _ , . * ~ : ~

~ ..;2 . . . . . . . . . . .~-.- . . . . . . , , . . . . . . . . . . . . . . . . . . . . . . . .

180.0 360.0 Molecular replacement phase, after EMX (degrees)

360.0

g~

180.0

o

0.0 0.0

• : .. " ii;[ ~ ! i I - ; ".- " . ' - ,e, ~:.~," • .:~:.~" ~" ." - . . . ,

• • - ' . . " ' . . ~ . , ; ~ " ~ . ; . , . : < 3 . , • " " . , . . , . . . '

180.0 360.0 Phase from density averaging, cycle 36 (degrees)

FIG. 14. (cont inued)

Page 33: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

50 PHASES [31

FIG. 15. A slab of five superimposed sections of the 6-,~ resolution electron-density map perpendicular to the crystallographic twofold axis, showing the molecular fourfold symmetry. Two cytochrome domains are indicated by solid lines, and the expected positions of the fourfold related cytochrome domains are indicated by dashed lines. (Reproduced with permission from Xia et aL 45)

A n o t h e r , s imi lar e x a m p l e is the s t ruc tu re d e t e m i n a t i o n of b a c t e r i o p h a g e MS2, 47 in which the f ivefold axes of i co sahed ra l s y m m e t r y a re a l igned with the r h o m b o h e d r a l axes o f the R32 space g roup of the crystals . Fa i lu re to i n t e rp re t h e a v y - a t o m d a t a p r o m p t e d the au thor s to solve the s t ruc ture by the m o l e c u l a r - r e p l a c e m e n t m e t h o d , us ing a l ow- reso lu t ion sea rch m o d e l m a d e up o f a t r u n c a t e d s o y b e a n mosa i c virus ( S B M V ) s t ructure . 48 In view

47 K. Valegfird, L. Liljas, K. Fridborg, and T. Unge, Acta CrystaUogr. B47, 949 (1991). 48 C. Abad-Zapatero, S. S. Abdel-Meguid, J. E. Johnson, A. G. W. Leslie, I. Rayment,

M. G. Rossmann, D. Suck, and T. Tsukihara, Nature (London) 286, 33 (1980).

Page 34: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[3] NONCRYSTALLOGRAPHIC SYMMETRY AVERAGING 51

y ~4I-----

X

.

?..' :.-: ..-::. 2.:--, ~~,,~.-.:,~/": ,.-, .. ~., ....:, ! ~.. :.,,*C~.,:: :~...~... . .., -, <.

::~... .... ~ ~--.-.M>..-. ,.~I "~ ~ ~,, c(-:i+,:-~i .

0 10 20 30 40

FIG. 16. A section of the skew-averaged difference-Fourier map calculated for the K2PtBr6 derivative, using the phases extended to 9-.~ resolution from the hollow shell model. The large negative peak corresponds to the Pt site. (Reproduced with permission from Tsao eta/. 46)

of the failure of an R factor search to provide the correct locat ion of the model , the model was placed in the cell in a posit ion compat ible with the observed symmetr ies and where mode l clashes were prevented. A startin~g electron-densi ty map was compu ted with measured data f rom 35 to 13 A and with calculated structure factors used in place of all missing reflections to a lower resolut ion limit of 300 ,~. It was subjected to phase extension

Page 35: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

52 PHASES [31

to 3.4 A by 30-fold n.c.s, averaging. Although the statistics obtained from the averaging procedure were good, giving a map inversion R factor of 20.4%, the resulting map was not interpretable. Reanalysis of the heavy- atom data using the phases generated by the averaging procedure showed heavy-atom sites as negative peaks in difference Fourier maps, thus indicat- ing that phases had, again, refined to the wrong sign. Starting from isomor- phous replacement phases, phase extension from 6 to 3.3 A then provided an interpretable map. Comparison of the phase set computed from the refined molecular model to that generated from the initial averaging proce- dure (Fig. 17) shows that, while most of the phases had refined to the Babinet hand of the correct solution, some had refined to the enantiomorphic phases

0 SBMV 360

0

M I R

360

FIG. 17. A contoured histogram of phase correspondence gives a comparison of phase angles between the phase set after phase extension from the SBMV-MS2 model and the final phase set based on isomorphous replacement, in the resolution range of 15 to 9 A. (Reproduced with permission from Valeg~rd eta/. 47)

Page 36: [Methods in Enzymology] Macromolecular Crystallography Part B Volume 277 || [3] Noncrystallographic symmetry averaging in phase refinement and extension

[41 CONSTRAINING ELECTRON DENSITY 53

whereas others had refined to its Babinet opposite, thus leading to the generation of a totally uninterpretable map.

Acknowledgments

We thank Dr. Fontecilla-Camps for critical reading of the manuscript. This is publication number 256 of the Institut de Biologie Structurale Jean-Pierre Ebel.

[4] C o m b i n i n g C o n s t r a i n t s fo r E l e c t r o n - D e n s i t y M o d i f i c a t i o n

By KAM Y. J. ZHANG, KEVIN COWTAN, and PETER MAIN

Introduction

The crystallographic phase problem is indeterminate given only the structure-factor amplitudes. It is only through the knowledge of the chemi- cal or physical properties of the electron density that the phases can be retrieved. The characteristic features of electron density can often be ex- pressed as mathematical constraints on the phases. The phasing power of a constraint depends on the number of density points affected and the magnitudes of changes imposed on the electron density. It also depends on the physical nature and accuracy of the constraint and on how rigorously the constraint is applied. The phasing power also increases with the number of independent constraints employed.

The phase problem can be solved ab initio under favorable circum- stances by exploiting one or several constraints, such as positivity and atomicity used in direct methods for small molecules, and noncrystallo- graphic symmetry averaging for spherical viruses. Some of the constraints successfully used in the ab initio phasing for small molecules are tabulated in Table I (atomic resolution). The available constraints at our disposal are, however, generally not enough to suffice for an ab initio solution to the phase problem for macromolecules. Nevertheless, these constraints could be used to refine the experimental phases or to extend them to the full resolution of the native data. Table I (nonatomic resolution) lists those constraints and how they are used for macromolecular phase improvement.

The density modification methods that are discussed in this chapter seek to improve the experimental phases by exploiting and combining constraints derived from some characteristic features of the electron-density map. The more constraints that are employed, the lower the resolution

Copyright © 1997 by Academic Press All rights of reproduction in any form reserved.

METHODS IN ENZYMOLOGY, VOL. 277 0076-6879/97 $25.00