Page under construction
Overview of the SIGNATURE Program
Overview of the SIGNATURE Program

Four distinct tasks are performed by the SIGNATURE program (cf. figure).
The purpose of the first task, the signature equation, is to calculate the
list of molecular fragments and interfragment bonds that constitute the models.
Roughly, the signature equation consists of matching qualitative structural data with
quantitative structural data in order to compute an exhaustive and
non-overlapping list of molecular fragments and interfragment
bonds. In mathematical terms, the signature equation is an
integer linear programming (ILP) problem, where the unknowns
are the numbers of each molecular fragments and each interfragment
bonds. Structural quantitative data are not exact values; there is a
standard deviation associated with each datum. It is the task
of the expert using the SIGNATURE program to input these standard
deviations. Furthermore, if the molecular formula of the studied
compound is unknown, the user of the program inputs the average
number of atoms. Most of the time, there are several lists of
molecular fragments and interfragment bonds that correspond to
the given sets of 2D data and standard deviations. The goal of
the signature equation is to determine the "best" list, i.e.,
the list that minimizes the deviation between the model and the
2D quantitative data. Once a list of molecular fragments and
interfragment bonds is determined, a structural formula can be
obtained by connecting the fragments with the corresponding
interfragment bonds. At that stage, the structure to be
constructed is much like a jigsaw puzzle; one knows the
pieces of the puzzle and the ways these pieces are connected
together. Generally, several structural formulas can be constructed.
The second task, the structure generation, determines how many structural
formulas have to be constructed. When the studied compound contains a
small number of fragments it is possible to used a deterministic technique,
and therefore, to construct all the structural formulas that correspond
to the list of fragments and interfragment bonds computed by the signature
equation. The SIGNATURE program offers the possibility to use a
deterministic algorithm to generate all the structural formulas.
The algorithm is based on the symmetries of the fragments.
However, as already mentioned, for large molecular compounds,
deterministic techniques are not applicable to resolve the problem
of structure elucidation. In such an instance one has to use a
stochastic structure generation. The purpose of a stochastic structure
generation is to approximate the number of possible structural formulas,
and to generate a sample of these formulas that statistically
represents the entire population of possibilities. The stochastic
technique used by the SIGNATURE program to approximate the number
of possible structural formulas is based on the Knuth algorithm.
Although the Knuth algorithm was devised for other purposes, it can be
used to compute an unbiais estimator of the number of possible
structural formulas. The sample of structural formulas is then
generated using several stochastic techniques: Random Sampling, Monte-Carlo,
Simulated Annealing, and Genetic Algorithm.
All the structural formulas generated are constructed in a
three-dimensional space. During the generation process, the
expert using the system inputs the sample size, and can impose
some structural constraints, such as avoiding the formation of
double bonds, or forcing the generator to build five or six membered rings.
Once the sample of models is constructed, the third task,
the 3D simulations, submits each model to molecular orbital
calculations or molecular simulations. After the optimized
3D models are produced, 3D physical properties are calculated
for the models and compared to the corresponding 3D analytical data.
The 3D physical properties are: the density, the pore volume
distribution, the surface area, and the fractal dimension of
the surface. The methods employed by the SIGNATURE program to
simulate the three-dimensional physical characteristics are based
on finite element theory.
Finally, the sample is statistically analyzed by the fourth task.
If the statistical technique used by the SIGNATURE
program is random sampling the optimal sample size needed
for statistical significance can be determined. Furthermore, the calculations
performed with the sample can be extrapolated to the entire population of possible models.
For more information e-mail to:
Jean-Loup Faulon /
January 12, 1996