Crystallography. Structural resolution

Composition of two scattered waves. A = resultant amplitude; I = resultant intensity (~ A²)
(a) totally in phase (the total effect is the sum of both waves)
(b) with a certain difference of phase (they add, but not totally)
(c) out of phase (the resultant amplitude is zero)

Between the two mentioned spaces (direct and reciprocal) there is a holistic relationship (every detail of one of the spaces affects the whole of the other, and vice versa). Mathematically speaking this relationship is a Fourier transform that cannot directly be solved, since the diffraction experiment does not allow us to know one of the fundamental magnitudes of the equation, the relative phases (Φ) of the diffraction beams.

Holistic relationship between direct and reciprocal spaces

Left: Holistic relationship between direct space (left) and reciprocal space (right). Every detail of the direct space (left) depends on the total information contained in the reciprocal space (right), and vice versa... Every detail of the reciprocal space (right) depends on the total information contained in the direct space (left).

Right: Graphical representation of the out-of-phase between two waves. Relative phase between waves

The diagram below, with the help of the following paragraph, summarizes what the resolution of a crystalline structure through X-ray diffraction implies ...

Atoms, ions, and molecules are packed into units (elemental cells) that are stacked in three dimensions to form a crystal in space that we call direct or real space. The diffraction effects of the crystal can be represented as points of a lattice mathematical space that we call the reciprocal lattice. The diffraction intensities, that is, the blackening of these points of the reciprocal lattice, represent the moduli of some fundamental vector quantities, which we call structure factors. If we get to know not only the moduli of these vectors (the intensities), but their relative orientations (that is, their relative phases), we will be able to obtain the value of the electron density function at each point of the elementary cell, providing thus the positions of the atoms that make up the crystal.

Outline on basic crystallographic concepts: direct and reciprocal spaces. The issue is to obtain information on the left side (direct space) from the diffraction experiment (reciprocal space).

Formula 1. Function defining the electron density in a point of the unit cell given by the coordinates (x, y, z)

F(hkl) represents the resultant diffracted beams of all atoms contained in the unit cell in a given direction. These magnitudes (actually waves), one for each diffracted beam, are known as structure factors. Their moduli are directly related to the diffracted intensities.

h, k, l are the Miller indices of the diffracted beams (the reciprocal points) and Φ(hkl) represent the phases of the structure factors. V represents the volume of the unit cell. The function has limitations due to the extent to which the diffraction pattern is observed. The number of observed structure factors is finite, and therefore the synthesis will only be approximate and may show some truncation effects.

Left: Appearance of a zone of the electron density map of a protein crystal, before it is interpreted.
Right: The same electron density map after its interpretation in terms of a peptidic fragment.

The equation above (Formula 1) represents the Fourier transform between the real or direct space (where the atoms are, represented by the function ρ) and the reciprocal space (the X-ray pattern) represented by the structure factor amplitudes and their phases. Formula 1 also shows the holistic character of diffraction, because in order to calculate the value of the electron density in a single point of coordinates (xyz) it is necessary to use the contributions of all structure factors produced by the crystal diffraction.

Vector representation of a structure factor

The structure factors F(hkl) are waves and therefore can be represented as vectors by their amplitudes, [F(hkl)], and phases Φ(hkl) measured on a common origin of phases.

When the unit cell is centrosymmetric, for each atom at coordinates (xyz) there is an identical one located at (-x,-y,-z). This implies that Friedel's law holds F(h,k,l) = F(-h,-k,-l)] and the expression of the electron density (Formula 1) is simplified, becoming Formula 1.1. And the phases of the structure factors are also simplified, becoming 0 ° or 180 °...

Formula 1.1. Electron density function in a point of coordinates (x, y, z) in a centrosymmetric unit cell.

It is important to realize that the quantity and quality of information provided by the electron density function, ρ, is very dependent on the quantity and quality of the data used in the formula: the structure factors F(hkl) (amplitudes and phases!). We will see later on that the amplitudes of the structure factors are directly obtained from the diffraction experiment.

If your browser is Java enabled, as a practical exercise on Fourier transforms we recommend visiting he following links:

the applet by Steffen Weber...

or, even better, the Java applet kindly provided by Nicholas Schöni y Gervais Chapui (École Polytechnique Fédérale de Lausanne, Switzerland), that you can download (free of any virus) from the link shown and execute in your own computer. This applet calculates the Fourier transform of a two dimensional density function ρ(x) yielding the complex magnitude G(S), the reciprocal space. The applet is also able to calculate the inverse Fourier transform of G(S). The density function can be either periodic or non-periodic. Numerous tools including drawing tools can be applied in order to understand the role of amplitudes and phases which are of particular importance in diffraction phenomena. As an illustration, the Patterson function of a periodic structure can be simulated.

Formula 2. Structure factor for each diffracted beam. This equation is the Fourier transform of the electron density (Formula 1).
The expression takes into account the scattering factors ƒ of all j atoms contained in the crystal unit cell.

Formula 3. Relationship between the amplitude of the structure factors |F(hkl)| and their intensities I(hkl)

K is a factor that puts the experimental structure factors, (F_rel) , measured on a relative scale (which depends on the power of the X-ray source, crystal size, etc.) into an absolute scale, which is to say, the scale of the calculated (theoretical) structure factors (if we could know them from the real structure, Formula 2 above). As the structure is unknown at this stage, this factor can be roughly evaluated using the experimental data by means of the so-called Wilson plot.

Wilson plot

I _rel represents the average intensity (in a relative scale) collected in a given interval of θ (the Bragg angle); f_j are the atomic scattering factors in that angular range, and λ is the X-ray wavelength.

By plotting the magnitudes shown in the left figure (green dots), a straight line is obtained from which the following information can be derived:

The value of the y-axis intercept is the Naperian logarithm of C, a magnitude related to the scale factor K (= 1 / √C), described above.
The slope is equivalent to -2B, where B is the isotropic overall atomic thermal vibration factor.

A is an absorption factor, which can be estimated from the dimensions and composition of the crystal.

L is known as the Lorentz factor, responsible for correcting the different angular velocities with which the reciprocal points cross the surface of Ewald's sphere. For four-circle goniometers this factor can be calculated as 1/sin 2θ, where θ is the Bragg angle of the reflections.

p is the polarization factor, which corrects the polarization effect of the of the incident beam, and is given by the expression (1+cos²2θ)/2, where θ also represents the Bragg angle of the reflections (the reciprocal points).

THE PHASE PROBLEM

However, in order to calculate the electron density (ρ(xyz) in Formula 1, above), and therefore to know the atomic positions inside the unit cell, we also need to know the phases of the different diffracted beams (Φ(hkl) in Formula 1 above). But, unfortunately, this valuable information is lost during the diffraction experiment (there is no experimental technique available to measure the phases!). Thus, we must face the so-called phase problem if we want to solve Formula 1.

The phase problem can be very easily understood if we compare the diffraction experiment (as a procedure to see the internal structure of crystals) with a conventional optical microscope...

Illustration on the phase problem. Comparison between an optical microscope and the "impossible" X-ray microscope. There are no optical lenses able to combine diffracted X-rays to produce a zoomed image of the crystal contents (atoms and molecules).

Formula 5. In the presence of anomalous scattering, the atomic scattering factor, ƒ₀ , has to be modified adding two new terms, a real and an imaginary part.

The advanced reader should also read the section about the phenomenon of anomalous dispersion.

The ƒ' and ƒ'' corrections vs. X-ray energy (see below for the case of Cu Kα) can be calculated taking into account some theoretical considerations...

Real and imaginary components of the Selenium scattering factor vs. the energy of the incident X-rays. The vertical line indicates the wavelength for CuKα.

For X-ray energy values where resonance exists, ƒ' increases dramatically, while the value of ƒ'' decreases. This has practical importance considering that many heavy atoms used in crystallography show absorption peaks at energies (wavelengths) which can be easily obtained with synchrotron radiation. Diffraction data collected in these conditions will show a normal component, mainly due to the light atoms (nitrogen, carbon and hydrogen), and an anomalous part produced by the heavy atoms, which will produce a global change in the phase of each reflection. All this leads to an intensity change between those reflections known as Friedel pairs (pairs of reflections which under normal conditions should have the same amplitudes and identical phases, but with opposite signs). The detectable change in intensity between these reflection pairs (Friedel pairs) is what we call anomalous diffraction.

The MAD method, developed by Hendrickson and Kahn, involves diffraction data measurement of the protein crystal (containing a strong anomalous scatterer) using X-ray radiations with different energies (wavelengths): one that maximizes ƒ'', another which minimizes ƒ' and a third measurement at an energy value distinct from these two. Combining these diffractions data sets, and specifically analyzing the differences between them, it is possible to calculate the distribution of amplitudes and phases generated by the anomalous scatterers. The subsequent use of the phases generated by these anomalous scatterers, as a first approximation, can be used to calculate an electron density map for the whole protein.

In general, there is no current need to introduce individual atoms as anomalous scatterers in protein crystals. It is relatively easy to obtain recombinant proteins in which methionine residues are replaced by selenium-methionine. Selenium (and even sulfur) atoms of methionine (or cysteine), behave as suitable anomalous scatterers for carrying out a MAD experiment.

The MAD method presents some advantages vs. the MIR technique:

As the MAD technique uses data collected from a single crystal, the problems derived from lack of isomorphism, common in the MIR method, do not apply.

While in the absence of anomalous dispersion, the atomic scattering factor (ƒ₀) decreases dramatically with the angle of dispersion, its anomalous component (ƒ' + iƒ'' ) is independent of that angle, so that this relative signal increases at a higher resolution of the spectrum, which is to say, at high Bragg angles. Thus, the estimates of phases by MAD are generally better at high resolution. On the contrary, with the MIR method, the lack of isomorphism is larger at high resolution angles and therefore the high resolution intensities (> 3.5 Angstrom) are not suitable for phasing.

Argand diagram showing the scattering contribution from an anomalous scatterer in a matrix of normal scatterers. This effect implies that Friedel's law fails. Image taken from "Crystallography 101".

Fp represents the contribution from the normal scatterers to the structure factor (of indices hkl).
Fa and Fa''represent the real (ƒ₀ + ƒ' ) and imaginary (ƒ'' ) parts, respectively, of the scattering factor from the anomalous scatterers.
-Fp, -Fa and -Fa" represent the same as Fp, Fa and Fa'', but for the reflection with indices -h, -k, -l.

The anomalous behavior of the atomic scattering factor only produces small differences between the intensities (and therefore among the amplitudes of the structure factors) of the reflections that are related by a centre of symmetry or a mirror plane (such as for instance, I(h,k,l) vs. I(-h,-k,-l), or I(h,k,l) vs. I(h,-k,l). Therefore, to estimate these small differences between the experimental intensities, additional precautions must be taken into account. Thus, it is recommended that reflections expected to show these differences are collected on the same diffraction image, or alternatively, after each collected image, rotate the crystal 180 degrees and collect a new image. Moreover, since changes in ƒ' and ƒ'' occur by minimum X-ray energy variations, it is necessary to have good control of the energy values (wavelengths). Therefore, it is essential to use a synchrotron radiation facility, where wavelengths can be tuned easily.

The advanced reader should also have a look into the web pages on anomalous scattering, prepared by Bernhard Rupp, as well as the practical summary prepared by Georg M. Sheldrick.

MR (Molecular Replacement)

If we know the structural model of a protein with a homologous amino acid sequence, the phase problem can be solved by using the methodology known as molecular replacement (MR). The known structure of the homologous protein is regarded as the protein to be determined and serves as a first model to be subsequently refined. This procedure is obviously based on the observation that proteins with similar peptide sequences show a very similar folding. The problem in this case is transferring the molecular structure of the known protein from its own crystal structure to a new crystal packing of the protein with an unknown structure. The positioning of the known molecule into the unit cell of the unknown protein requires determining its correct orientation and position within the unit cell. Both operations, rotation and translation, are calculated using the so-called rotation and translation functions (see below).

Scheme of the molecular replacement (MR) method.
The molecule with known structure (A) is rotated through the [R] operation and shifted through T to bring it over the position of the unknown molecule (A’).

The rotation function. If we consider the case of two identical molecules, oriented in a different way, then the Patterson function will contain three sets of vectors. The first one will contain the Patterson vectors of one of the molecules, ie all interatomic vectors within molecule one (also called eigenvectors). The second set will contain the same vectors but for the second molecule, identical to the first one, but rotated due to their different orientation. The third set of vectors will be the interatomic cross vectors between the two molecules. While the eigenvectors are confined to the volume occupied by the molecule, the cross vectors will extend beyond this limit. If both molecules (known and unknown) are very similar in structure, the rotation function R(α,β,γ) would try to bring the Patterson vectors of one of the molecules to be coincident with those of the other, until they are in good agreement. This methodology was first described by Rossman and Blow.

R(α,β,γ) = ∫_u P₁(u) x P₂(u_r) du

Formula 6. Rotation function

P₁is the Patterson function and P₂ is the rotated Patterson function, where u is the volume of the Patterson map, where interatomic vectors are calculated.

The quality of the solutions of these functions is expressed by the correlation coefficient between both Patterson functions: the experimental one and the calculated one (with the known protein). A high correlation coefficient between these functions is equivalent to a good agreement between the experimental diffraction pattern and the diffraction pattern calculated with the known protein structure. Once the known protein structure is properly oriented and translated (within the unit cell of the unknown protein), an electron density map is calculated using these atomic positions and the experimental structure factors. It is worth consulting the article published on this methodology by Eleanor Dodson.

Probably it is valuable for the advanced reader to consult a nice article that, despite having been published in 2010, has not lost its validity in relation to the description of the different methodologies for the determination of the relative phases of the diffraction beams.