Chemical Structure and Reaction Network Generation
Project overview
Chemical structure generators are computational tools that construct chemical
structural formulas within a collection of chemical and physical constraints.
Applications of these tools are in structural elucidation, molecular
design, chemical kinetics, and synthesis design. Due to the relevance
of these problems in chemistry, generators have been a prolific field
of research since the late 1960s. The problem of structure generation
is equivalent to the combinatorial problem of enumerating and sampling
graphs. However, known graph theoretical techniques cannot directly
be applied because molecules are not simple graphs but multigraphs
colored by the elements of the periodic table. Hence, specific
algorithms have to be designed in the context of chemistry. We
are currently investigating three problems related to molecular graph
generation: isomorphism, counting and enumeration, and sampling.
- Faulon J.L., Visco D, Roe D. Enumerating Molecules.
In: Reviews in Computational Chemistry Vol. 21, Lipkowitz K.
Edt., Wiley-VCH, 2005. (link
to amazon.com)
Molecular graph isomorphism.
Because molecules are a particular type of graphs, isomorphism need
to be redefined. The good news is that isomorphism, automorphism
partitioning, and cannonical labeling can be solved in polynomial-time
because molecules are bounded valence graphs. We have written
the first polynomial time algorithm for automorphism partitioning of planar
molecular graphs. More recently (2004) we developed a canonization algorithm
that is not limited to planar or bounded valence graphs, this algorithm perform
well compare to Nauty even
for difficult graphs such as projective planes.
- Faulon J.L., Collins M., Carr R.D. The
Signature Molecular Descriptor. 4. Canonizing Molecules Using
Extended Valence Sequence, J. Chem. Inf. Comput. Sci., 44, 427-436,
2004. [PMID: 15032522]. (.pdf manuscript).
- Faulon, J. L. , Automorphism Partitioning, and
Canonical Labeling Can Be Solved in Polynomial- Time for Molecular
Graphs, J. Chem. Inf. Comput. Sci., 1998, 38, 432-444. (link
to journal)
Counting and enumerating
molecules. The time-complexity of enumerating molecules remains
unknown. However, there are indications that the enumeration
problem should be polynomial-time per output (polynomial-delay) although
all published algorithms are not polynomial-delay. First, the general
problem of enumerating simple graphs has a polynomial-time complexity
per output. Secondly, it is known that specific types of molecules (for
example, molecular trees) can be enumerated with polynomial-delay algorithms.
We are studying the time-complexity of enumerating molecules, and its
implications for structural elucidation and chemical kinetics (cf. slides).
- Faulon J.L., Brown W.M., Martin S Reverse
engineering chemical structures from molecular descriptors:
how many solutions? J. Comput Aided Mol Des. 2005 in press
[PMID: 16267694] (.pdf
manuscript).
- Faulon J. L., Visco D. Churchwell, C. J., The
Signature Molecular Descriptor. 2. Enumerating Molcules from
their Extended Valence Sequences, J. Chem. Inf. Comput.
Sci., 43 (3), 721 -734, 2003. [PMID: 12767130]. (link
to journal)
- Faulon, J.L., On Using the Graph-Equivalent Classes
for the Structure Elucidation of Large Molecules,
J. Chem. Inf. Comput. Sci., 1992, 32, 338-348.
(.pdf)
Sampling molecules.
Stochastic algorithms to sample chemical structures: random sampling,
Monte Carlo sampling, and genetic algorithms have been developped. These
algorithms have been applied to infer samples of structures reprentative
of petroleum feedstock in the context of a CRADA between Sandia and Chevron,
and to model polymeric materials. A comparative study of the performances
of unlabeled sampling techniques versus labeled sampling techniques demonstrated
that isomorph-free algorithms are not necessary to search structures matching
specific properties. While sampling unlabeled molecules necessitates
the use of isomorph-free algorithms, sampling labeled molecules does not
require checking for isomorphism. Although both problems can theoretically
be solved polynomially, much faster algorithms can be designed in the
labeled case.
- Faulon, J. L., Stochastic Generator of Chemical
Structure. (4) Building Polymeric Systems with Specified
Properties, J. Comput. Chem., 2001, 22, 580-590. (link
to journal)
- Faulon, J. L., Stochastic Generator of Chemical
Structure. (2) Using Simulated Annealing to Search the
Space of Constitutional Isomers, J. Chem. Inf. Comput. Sci., 1996,
36, 731-740. (link
to journal). NOTE: This algorithm has been inplemented
into CDK
library maintained by C. Steinbeck.
- Faulon, J. L., Stochastic Generator of Chemical
Structure. (1) Application to the Structure Elucidation
of Large Molecules, J. Chem. Inf. Comput. Sci., 1994, 34,
1204-1218. (.pdf)
Practical
uses of sampling molecules to elucidate natural products.
Reaction network generation.
Reaction network generators are computational tools developed for
chemical kinetics applications in synthesis design, combustion and
petroleum refining, and cellular signaling networks. Network generators
either construct networks from a given set of reactants and elementary
reaction steps (RGB generators), or build networks from a set
of reactants and a set of products (RGR generators). Most of the formal
network generators are making use of the Ugi et al. methodology, which
in turn is based on graph theory. The caveat of Ugi et al. methodology
is the computational scaling: the generated networks grow exponentially
with the problem size. We have developed the first network generator
where generation and network reduction are performed in parallel.
More precisely, we propose three algorithms that perform network reduction
during the generation process. These algorithms achieve polynomial
scaling while maintaining good results for thermal cracking of hydrocarbons.
The algorithms are being implemented in BioNetGen,
a cell signaling network generator (cf. also Faeder, et al. (2005) Rule-based
modeling of biochemical networks. Complexity 10:22-41).
- Faulon, J. L. ; Sault, A. G. Stochastic Generator
of Chemical Structure (3) Reaction Network, J. Chem. Inf.
Comput. Sci, 41, 894 -908, 2001. [PMID: 11500106].
(link to journal)
Page maintained by Jean-Loup M.
Faulon