3DNA: a versatile, integrated software system for the analysis, rebuilding and visualization of three-dimensional nucleic-acid structures

Xiang-Jun Lu & Wilma Olson

This section provides technical details of some of the algorithms used in 3DNA. It is meant for those who want really want to know "what's inside that black box".

Least-squares fitting procedures with illustrated examples

Introduction

The least-squares fitting procedures presented below make use of well known mathematics. Indeed, the methods are so well known and widely used that it is somewhat difficult to locate the original references. As part of our comparison of nucleic acid structure analysis programs (1, 2), we came across a variety of least-squares fitting procedures. This webpage gives a detailed description, with step-by-step examples, of our implementation of two least-squares fitting algorithms based on a covariance matrix and its eigensystem.

Standard vs experimental bases

Three analysis schemes-CompDNA (3), Curves (4, 5), and RNA (6-8), use least-squares procedures to fit a standard base with an embedded reference frame to an observed base structure. CompDNA (3) and Curves (4, 5) take advantage of the conventional approach of McLachlan (9), while RNA (6-8) implements a closed-form solution of absolute orientation using unit quaternions first introduced by Horn (10). The two algorithms are mathematically equivalent since the unit quaternion can be transformed to the rotation matrix given by McLachlan (9). The Horn method (10), however, is more straightforward and generally applicable. The latter algorithm can be applied even when one or both of the structures are perfectly planar, whereas the McLachlan approach (9) fails.

Here we use the ideal adenine geometry derived from the high resolution crystal structures of model nucleosides, nucleotides, and bases (11). The x-, y-, and z- coordinates of the standard base, taken from the Nucleic Acid Database (NDB), are listed below in the columns labeled sx, sy, and sz, respectively. s_(average) is the geometric center of the base.

              sx      sy      sz
  1  N9      0.213   0.660   1.287
  2  C4      0.250   2.016   1.509
  3  N3      0.016   2.995   0.619
  4  C2      0.142   4.189   1.194
  5  N1      0.451   4.493   2.459
  6  C6      0.681   3.485   3.329
  7  N6      0.990   3.787   4.592
  8  C5      0.579   2.170   2.844
  9  N7      0.747   0.934   3.454
 10  C8      0.520   0.074   2.491
------------------------------------
s_(average): 0.4589  2.4803  2.3778

We similarly describe the coordinates of one of the adenine bases (the fifth residue in the sequence strand) from the high resolution (1.4 Å) self-complementary d(CGCGAATTCGCG) dodecamer duplex determined by Williams and co-workers (12). The experimental coordinates (NDB entry: bdl084) are listed below in the columns labeled ex, ey, and ez. The geometric center is e_(average). Note that the atomic serial numbers from the NDB (first column) have been rearranged so that the atoms are in the same order as the above ideal base.

              ex      ey      ez
 91  N9     16.461  17.015  14.676
100  C4     15.775  18.188  14.459
 99  N3     14.489  18.449  14.756
 98  C2     14.171  19.699  14.406
 97  N1     14.933  20.644  13.839
 95  C6     16.223  20.352  13.555
 96  N6     16.984  21.297  12.994
 94  C5     16.683  19.056  13.875
 93  N7     17.918  18.439  13.718
 92  C8     17.734  17.239  14.207
------------------------------------
e_(average):16.1371 19.0378 14.0485

We collect the two sets of coordinates in the 10 x 3 matrices S and E corresponding respectively to the standard and experimental bases. We then construct the 3 x 3 covariance matrix (C) (13) between S and E using the following formula:

        1             1
 C = ------- [S' E - --- S' i i' E]
      N - 1           N

   =
      0.2782    0.2139   -0.1601
     -1.4028    1.9619   -0.2744
      1.0443    0.9712   -0.6610

Here N, the number of atoms in each base, is 10, and i is an N x 1 column vector consisting of only ones. S' and i' are the transpose of matrix S and column vector i, respectively.

From the nine elements of C, we subsequently generate the 4 x 4 real symmetric matrix M using the expression:

      -                                                       -
     | c11+c22+c33     c23-c32       c31-c13        c12-c21    |
 M = |   c23-c32     c11-c22-c33     c12+c21        c31+c13    |
     |   c31-c13       c12+c21     -c11+c22-c33     c23+c32    |
     |   c12-c21       c31+c13       c23+c32      -c11-c22+c33 |
      -                                                       -

   =
      1.5792   -1.2456    1.2044    1.6167
     -1.2456   -1.0228   -1.1890    0.8842
      1.2044   -1.1890    2.3447    0.6968
      1.6167    0.8842    0.6968   -2.9011

The largest eigenvalue of matrix M is 4.0335, and its corresponding unit eigenvector (qi, i=0--3) is:

[ q0   q1    q2    q3 ] = [ 0.6135   -0.2878    0.7135    0.1780 ]

The rotation matrix R is deduced from the qi as:

      -                                                           -
     | q0q0+q1q1-q2q2-q3q3    2(q1q2-q0q3)        2(q1q3+q0q2)     |
 R = |    2(q2q1+q0q3)     q0q0-q1q1+q2q2-q3q3    2(q2q3-q0q1)     |
     |    2(q3q1-q0q2)        2(q3q2+q0q1)     q0q0-q1q1-q2q2+q3q3 |
      -                                                           -
   =
     -0.0817   -0.6291    0.7730
     -0.1923    0.7710    0.6072
     -0.9779   -0.0990   -0.1839

Following coordinate transformation with matrix R, the origin of the standard base is found to be displaced from the experimental structure by:

 o = e_(average) - s_(average) R' = [15.8969 15.7701 15.1802]

The least-squares fitted coordinates (F) of the standard base atoms on the experimental structure are then given by:

 F = S R' + i o
   =
     16.4592   17.0194   14.6699
     15.7747   18.1925   14.4586
     14.4899   18.4519   14.7542
     14.1729   19.6974   14.4070
     14.9343   20.6404   13.8420
     16.2222   20.3472   13.5569
     16.9832   21.2875   12.9925
     16.6829   19.0585   13.8760
     17.9183   18.4437   13.7219
     17.7335   17.2396   14.2062

Here S is the (N x 3) matrix of original coordinates of the standard base, and as noted above, i is an N x 1 column vector consisting of only ones.

The difference matrix (D) between F and E, the (N x 3) matrix of original coordinates of the experimental base, and the root-mean-square (RMS) deviation between the two structures are found to be:

 D = E - F
   =
      0.0018   -0.0044    0.0061
      0.0003   -0.0045    0.0004
     -0.0009   -0.0029    0.0018
     -0.0019    0.0016   -0.0010
     -0.0013    0.0036   -0.0030
      0.0008    0.0048   -0.0019
      0.0008    0.0095    0.0015
      0.0001   -0.0025   -0.0010
     -0.0003   -0.0047   -0.0039
      0.0005   -0.0006    0.0008

 RMS deviation = 0.0054

It should be noted that if the standard base is already defined in terms of its reference frame, as in the RNA (6-8) analysis, the vector o and matrix R represent the best-fitted coordinate frame of the experimental base. Moreover, the three axes of the frame given by R are guaranteed to be orthogonal.

Base normal

Rather than fit a standard base to experimental coordinates, the CEHS (14, 15), FREEHELIX (16), and NUPARM (17, 18) analyses perform a least-squares fitting of a plane to a set of atoms (19) in order to define the base and base-pair normals. The covariance matrix based on the N x 3 matrix of experimental Cartesian coordinates E is diagonalized to find the vector normal to the best plane. Specifically, C is obtained using the above formula with S substituted by E. The normal vector then lies along the eigenvector that corresponds to the smallest eigenvalue. Note that the coefficient 1/(N-1) in the formula for calculating C has no effect on the eigenvectors and only determines the magnitudes of the eigenvalues. The well known procedure from Blow (20) corresponds to a covariance matrix without this coefficient.

Using the above adenine base from the high resolution dodecamer duplex as an example, the covariance matrix C is found to be:

 C =
     1.6680   -0.5015   -0.3253
    -0.5015    2.0670   -0.5840
    -0.3253   -0.5840    0.3061

The smallest eigenvalue of C, 8.26e-5, indicates that the base is almost perfectly planar. The corresponding unit eigenvector corresponding to the base normal is:

 Base normal: 0.2737    0.3224    0.9062

It is also worth noting that the best-fitted global helical axis can be found with the same algorithm, although FREEHELIX implements two subroutines with identical functionalities.

References

  1. X. J. Lu & W. K. Olson (1998). ``Resolving the Discrepancies Among Nucleic Acid Conformational Analyses.'' J. Mol. Biol. 285(4), 1563-1575.
  2. X. J. Lu, M. S. Babcock & W. K. Olson (1998). ``Mathematical Overview of Nucleic Acid Analysis Programs.'' J. Biomol. Struct. Dynam. 16(4), 838-843.
  3. A. A. Gorin, V. B. Zhurkin & W. K. Olson (1995). ``B-DNA Twisting Correlates with Base-pair Morphology.'' J. Mol. Biol. 247, 34-48.
  4. R. Lavery & H. Sklenar (1988). ``The Definition of Generalized Helicoidal Parameters and of Axis Curvature for Irregular Nucleic Acids.'' J. Biomol. Struct. Dynam. 6, 63-91.
  5. R. Lavery & H. Sklenar (1989). ``Defining the Structure of Irregular Nucleic Acids: Conventions and Principles.'' J. Biomol. Struct. Dynam. 6, 655-667.
  6. M. S. Babcock & W. K. Olson (1994). ``The Effect of Mathematics and Coordinate System on Comparability and `Dependencies' of Nucleic Acid Structure Parameters.'' J. Mol. Biol.. 237, 98-124.
  7. M. S. Babcock, E. P. D. Pednault & W. K. Olson (1994). ``Nucleic Acid Structure Analysis: Mathematics for Local Cartesian and Helical Structure Parameters That Are Truly Comparable Between Structures.'' J. Mol. Biol. 237, 125-156.
  8. E. P. D. Pednault, M. S. Babcock & W. K. Olson (1993). ``Nucleic Acids Structure Analysis: A Users Guide to a Collection of New Analysis Programs.'' J. Biomol. Struct. Dynam.. 11, 597-628.
  9. A. D. McLachlan (1979). ``Least Squares Fitting of Two Structures.'' J. Mol. Biol., 128, 74-79.
  10. B. K. P. Horn (1987). ``Closed-form Solution of Absolute Orientation Using Unit Quaternions.'' J. Opt. Soc. Am. A, 4, 629-642.
  11. L. Clowney, S. C. Jain, A. R. Srinivasan, J. Westbrook, W. K. Olson & H. M. Berman (1996). ``Geometric Parameters in Nucleic Acids: Nitrogenous Bases.'' J. Am. Chem. Soc., 118, 509-518
  12. X. Shui, L. McFail-Isom, G. G. Hu & L. D. Williams (1998). ``The B-DNA Dodecamer at High Resolution Reveals a Spine of Water on Sodium.'' Biochemistry, 37, 8341-8355.
  13. T. P. E. Auf der Heyde (1990). ``Analyzing Chemical Data in More Than Two Dimensions: A Tutorial on Factor and Cluster Analysis.'' J. Chem. Educ., 67, 461-469.
  14. M. A. El Hassan & C. R. Calladine (1995). ``The Assessment of the Geometry of Dinucleotide Steps in Double-Helical DNA: A New Local Calculation Scheme.'' J. Mol. Biol. 251, 648-664.
  15. X. J. Lu, M. A. El Hassan & C. A. Hunter (1997). ``Structure and Conformation of Helical Nucleic Acids: Analysis Program (SCHNAaP).'' J. Mol. Biol. 273, 668-680.
  16. R. E. Dickerson (1998). ``DNA Bending: The Prevalence of Kinkiness and the Virtues of Normality.'' Nucl. Acids Res. 26, 1906-1926.
  17. D. Bhattacharyya & M. Bansal (1989). ``A Self-Consistent Formulation for Analysis and Generation of Non-Uniform DNA Structures.'' J. Biomol. Struct. Dynam. 6, 635-653.
  18. M. Bansal, D. Bhattacharyya & B. Ravi (1995). ``NUPARM and NUCGEN: Software for Analysis and Generation of Sequence Dependent Nucleic Acid Structures.'' Comput. Appl. Biosci. 11, 281-287.
  19. V. Schomaker, J. Waser, R. E. Marsh & G. Bergman (1959). ``To Fit a Plane or a Line to a Set of Points by Least Squares.'' Acta Cryst., 12, 600-604.
  20. D. M. Blow (1960). ``To Fit a Plane to a Set of Points by Least Squares.'' Acta Cryst., 13, 168.
[Back to Top]