|
Journal Publications![]() A Convergence Analysis of Unconstrained and Bound Constrained Evolutionary Pa ttern Search W E Hart. Evolutionary Computation. (to appear). Abstract: We present and analyze a class of evolutionary algorithms for unconstrained and bound constrained optimization on $\reals^n$: evolutionary pattern search algorithms (EPSAs). EPSAs adaptively modify the step size of the mutation operator in response to the success of previous optimization steps. The design of EPSAs is inspired by recent analyses of pattern search methods. We show that EPSAs can be cast as stochastic pattern search methods, and we use this observation to prove that EPSAs have a probabilistic, weak stationary point convergence theory. This convergence theory is distinguished by the fact that the analysis does not approximation the stochastic process of EPSAs, and hence it exactly characterizes their convergence properties.
Abstract: Crystal lattices are infinite periodic graphs that occur naturally in a variety of geometries and which are of fundamental importance in polymer science. Discrete models of protein folding use crystal lattices to define the space of protein conformations. Because various crystal lattices provide discretizations of the same physical phenomenon, it is reasonable to expect that there will exist ``invariants'' across lattices related to fundamental properties of the protein folding process. This paper considers whether performance-guaranteed approximability is such an invariant for HP lattice models. We define a master approximation algorithm that has provable performance guarantees provided that a specific sublattice exists within a given lattice. We describe a broad class of crystal lattices that are approximable, which further suggests that approximability is a general property of HP lattice models.
Abstract: We describe a proof of NP-hardness for a lattice protein folding model whose instances contain protein sequences defined with a fixed, finite alphabet that contains $\num$ amino acid types. This lattice model represents a protein's conformation as a self-avoiding path that is embedded on the 3D cubic lattice. A contact potential is used to determine the energy of a sequence in a given conformation; a pair of amino acids contributes to the conformational energy only if they are adjacent on the lattice. This result overcomes a significant weakness of previous intractability results, which do not examine protein folding models that have a finite alphabet of amino acids together with physically interesting conformations.
Abstract: Sequential stopping rules are described for several stochastic algorithms that estimate the global minimum of a function. Stopping rules are described for pure random search and stratified random search. These stopping rules use an estimate of the probability measure of the $\epsilon$-close points to terminate these algorithms when a specified confidence has been achieved. Numerical results indicate that these stopping rules require fewer samples and are more reliable than the previous stopping rules for these algorithms. These stopping rules can also be applied to multistart local search and stratified multistart local search. Numerical results on a standard test set demonstrate that these stopping rules can perform as well as Bayesian stopping rules for multistart local search. Further, these stopping rules have a number of practical advantages over the Bayesian stopping rules.
Abstract: Automatic measurement of blood vessel tortuosity is a useful capability for automatic ophthalmological diagnostic tools. We describe a suite of automated tortuosity measures for blood vessel segments extracted from RGB retinal images. The tortuosity measures were evaluated in two classification tasks: (1) classifying the tortuosity of blood vessel segments and (2) classifying the tortuosity of blood vessel networks. These tortuosity measures were able to achieve a classification rate of 91\% for the first problem and 95\% on the second problem, which confirms that they capture much of the ophthalmologists' notion of tortuosity. Finally, we discuss how the accuracy of these measures can be influence by the method used to extract the blood vessel segments.
Abstract: A novel and robust automated docking method that predicts the bound conformations of flexible ligands to macromolecular targets has been developed and tested, in combination with a new scoring function that estimates the free energy change upon binding. Interestingly, this method applies a Lamarckian model of genetics, in which environmental adaptations of an individual's phenotype are reverse transcribed into its genotype and become heritable traits (sic). We consider three search methods, Monte Carlo simulated annealing, a traditional genetic algorithm, and the Lamarckian genetic algorithm, and compare their performance in dockings of seven protein-ligand test systems having known three-dimensional structure. We show that both the traditional and Lamarckian genetic algorithms can handle ligands with more degrees of freedom than the simulated annealing method used in earlier versions of AutoDock, and that the Lamarckian genetic algorithm is the most efficient, reliable, and successful of the three. The empirical free energy function was calibrated using a set of 30 structurally known protein-ligand complexes with experimentally determined binding constants. Linear regression analysis of the observed binding constants in terms of a wide variety of structure-derived molecular properties was performed. The final model had a residual standard error of 9.11 kJ mol^{-1} (2.177 kcal mol^{-1}) and was chosen as the new energy function. The new search methods and empirical free energy function are available in AutoDock, version 3.0.
Abstract: This paper considers the protein energy minimization problem for lattice and off-lattice protein folding models that explicitly represent side chains. Lattice models of proteins have proven useful tools for reasoning about protein folding in unrestricted continuous space through analogy. This paper provides the first illustration of how rigorous algorithmic analyses of lattice models can lead to rigorous algorithmic analyses of off-lattice models. We consider two side chain models: a lattice model that generalizes the HP model (Dill 85) to explicitly represent side chains on the cubic lattice, and a new off-lattice model, the HP Tangent Spheres Side Chain model (HP-TSSC), that generalizes this model further by representing the backbone and side chains of proteins with tangent spheres. We describe algorithms with mathematically guaranteed error bounds for both of these models. In particular, we describe a linear time performance guaranteed approximation algorithm for the HP side chain model that constructs conformations whose energy is better than 86\% of optimal in a face centered cubic lattice, and we demonstrate how this provides a better than 70\% performance guarantee for the HP-TSSC model. Our analysis provides a mathematical methodology for transferring performance guarantees on lattices to off-lattice models. These results partially answer the open question of Ngo {\it et al.} (1994) concerning the complexity of protein folding models that include side chains.
Abstract: This paper addresses the robustness of intractability arguments for simplified models of protein folding that use lattices to discretize the space of conformations that a protein can assume. We present two generalized NP-hardness results. The first concerns the intractability of protein folding independent of the lattice used to define the discrete protein folding model. We consider a previously studied model and prove that for any reasonable lattice the protein structure prediction problem is NP-hard. The second hardness result concerns the intractability of protein folding for a class of energy formulas that contains a broad range of mean force potentials whose form is similar to commonly used pair potentials (e.g., the Lennard-Jones potential). We prove that protein structure prediction is NP-hard for any energy formula in this class. These are the first robust intractability results that identify sources of computational complexity of protein structure prediction that transcend particular problem formulations.
Conference Publications![]() A Performance Analysis of Evolutionary Pattern Search with Generalized Mutation Steps, W E Hart, K O Hunter. Proc Conf Evolutionary Computation. (to appear) Abstract: Evolutionary pattern search algorithms (EPSAs) are a class of evolutionary algorithms (EAs) that have stationary-point convergence guarantees on a broad class of nonconvex continuous problems. We have analyzed the empirical performance of EPSAs. This paper revisits that analysis and extends it to a more general model of mutation. We evaluate experimentally how the choice of the set of mutation offsets affects optimization performance for EPSAs. In addition, we compare EPSAs to self-adaptive EAs with respect to robustness and rate of optimization. All experiments employ a suite of test functions representing a range of modality and number of multiple minima.
Abstract: Evolutionary algorithms have been successfully applied to a variety of molecular structure prediction problems. In this paper we reconsider the design of genetic algorithms that have been applied to a simple protein structure prediction problem. Our analysis considers the impact of several algorithmic factors for this problem: the conformational representation, the energy formulation and the way in which infeasible conformations are penalized. Further we empirically evaluate the impact of these factors on a small set of polymer sequences. Our analysis leads to specific recommendations for both GAs as well as other heuristic methods for solving PSP on the HP model.
Abstract: In this paper we evaluate the design of the hybrid EAs that are currently used to perform flexible ligand binding in the AutoDock docking software. Hybrid evolutionary algorithms (EAs) incorporate specialized operators that exploit domain-specific features to accelerate an EA's search. We consider hybrid EAs that use an integrated local search operator to refine individuals within each iteration of the search. We evaluate several factors that impact the efficacy of a hybrid EA, and we propose new hybrid EAs that provide more robust convergence to low-energy docking configurations than the methods currently available in AutoDock.
Abstract: Evolutionary programs (EPs) and evolutionary pattern search algorithms (EPSAs) are two general classes of evolutionary methods for optimizing on continuous domains. The relative performance of these methods has been evaluated on standard global optimization test functions, and these results suggest that EPSAs more robustly converge to near-optimal solutions than EPs. In this paper we evaluate the relative performance of EPSAs and EPs on a real-world application: flexible ligand binding in the Autodock docking software. We compare the performance of these methods on a suite of docking test problems. Our results confirm that EPSAs and EPs have comparable performance, and they suggest that EPSAs may be more robust on larger, more complex problems.
Abstract: The Department of Energy (DOE) national laboratories have one of the longest and most consistent histories of supercomputer use. We summarize the architecture of DOE's new supercomputers that are being built for the Accelerated Strategic Computing Initiative (ASCI). We then argue that in the near future scaled-down versions of these supercomputers with petaflop-per-weekend capabilities could become widely available to hundreds of research and engineering departments. The availability of such computational resources will allow simulation of physical phenomena to become a full-fledged third branch of scientific exploration, along with theory and experimentation. We describe the ASCI and other supercomputer applications at Sandia National Laboratories, and discuss which lessons learned from Sandia's long history of supercomputing can be applied in this new setting.
Abstract: This paper presents an experimental evaluation of evolutionary pattern search algorithms (EPSAs). This analysis provides a first step towards understanding of how the broad theory of EPSAs needs to be applied to define practical EPSAs. We evaluate the effect that different types of mutation operators have on the frequency with which an EPSA could update its step length, as well as the effects of different crossover rates. Additionally, we describe a stopping rule for EPSAs that empirically terminates them near a stationary point. The empirical performance of EPSAs confirms that they can perform nonlocal optimization on standard test optimization problems.
Abstract: Single-level parallel optimization approaches, those in which either the simulation code executes in parallel or the optimization algorithm invokes multiple simultaneous single-processor analyses, have been investigated previously and been shown to be effective in reducing the time required to compute optimal solutions. However, these approaches have clear performance limitations which point to the need for multiple levels of parallelism in order to achieve peak parallel performance. Managing multiple simultaneous instances of massively parallel simulations is a challenging software undertaking, especially if the implementation is to be flexible, extensible, and general-purpose. This paper focuses on the design for multilevel parallelism as implemented within the DAKOTA iterator toolkit. Various parallel programming models are discussed, although emphasis is given to a master-slave implementation using the Message Passing Interface (MPI). A mathematical analysis is given on achieving peak efficiency in multilevel parallelism by selecting the most effective processor partitioning schemes. This analysis is verified in some computational experiments.
Abstract: Automatic measurement of blood vessel tortuosity is a useful capability for automatic ophthalmological diagnostic tools. We describe a suite of automated tortuosity measures for blood vessel segments extracted from RGB retinal images. The tortuosity measures were evaluated in two classification tasks: (1) classifying the tortuosity of blood vessel segments and (2) classifying the tortuosity of blood vessel networks. These tortuosity measures were able to achieve a classification rate of 91\% for the first problem and 95\% on the second problem, which confirms that they capture much of the ophthalmologists' notion of tortuosity.
Abstract: This paper describes resource management in a massively parallel, distributed-memory implementation of a branch-and-bound method for mixed integer programming (MIP) problems. Our parallel branch-and-bound algorithm is decomposed into a variety of interacting but largely asynchronous tasks. These tasks perform the algorithm's basic computations and maintain various kinds of global information or attempt to balance workloads. The optimal amount of parallelism and data centralization varies from task to task. The organization of our application code is intended to accomodate this variability, while making flexible and full use of the CPU. In particular, each processor uses an apparently novel scheduling mechanism to maintain a balance between the fraction of time devoted to compute-intensive but interruptable tasks --- computing bounds to subproblems and heuristically searching for better feasible solutions. While described in the specific context of mixed integer programming, we expect our techniques to apply to massively parallel implementations of other complicated, practical branch-and-bound procedures.
Abstract: Inverse protein folding concerns the identification of an amino acid sequence that folds to a given structure. Sequence design problems attempt to avoid the apparant difficulty of inverse protein folding by defining an energy that can be minimized to find protein-like sequences. We evaluate the practical relevance of two sequence design problems by analyzing their computational complexity. We show that the canonical method of sequence design is intractable, and describe approximation algorithms for this problem. We also describe an efficient algorithm that exactly solves the grand canonical method. Our analysis shows how sequence design problems can fail to reduce the difficulty of the inverse protein folding problem, and highlights the need to analyze these problems to evaluate their practical relevance.
Abstract: This paper considers the protein structure prediction problem for lattice and off-lattice protein folding models that explicitly represent side chains. Lattice models of proteins have proven extremely useful tools for reasoning about protein folding in unrestricted continuous space through analogy. This paper provides the first illustration of how rigorous algorithmic analyses of lattice models can lead to rigorous algorithmic analyses of off-lattice models. We consider two side chain models: a lattice model that generalizes the HP model (Dill 85) to explicitly represent side chains on the cubic lattice, and a new off-lattice model, the HP Tangent Spheres Side Chain model (HP-TSSC), that generalizes this model further by representing the backbone and side chains of proteins with tangent spheres. We describe algorithms for both of these models with mathematically guaranteed error bounds. In particular, we describe a linear time performance guaranteed approximation algorithm for the HP side chain model that constructs conformations whose energy is better than 86% of optimal in a face centered cubic lattice, and we demonstrate how this provides a 70% performance guarantee for the HP-TSSC model. This is the first algorithm in the literature for off-lattice protein structure prediction that has a rigorous performance guarantee. Our analysis of the HP-TSSC model builds off of the work of Dancik and Hannenhalli who have developed a 16/30 approximation algorithm for the HP model on the hexagonal close packed lattice. Further, our analysis provides a mathematical methodology for transferring performance guarantees on lattices to off-lattice models. These results partially answer the open question of Karplus et al. (1994) concerning the complexity of protein folding models that include side chains.
Abstract: This paper presents a convergence theory for evolutionary pattern search algorithms (EPSAs). EPSAs are self-adapting evolutionary algorithms that modify the step size of the mutation operator in response to the success of previous optimization steps. Previously, we have proven a stationary point convergence theory for EPSAs for which the step size is not allowed to increase. The present analysis generalizes this analysis to prove a convergence theory for EPSAs that are allowed to both increase and decrease the step size. This convergence theory is based on an extension of the convergence theory for generalized pattern search methods.
Abstract: Molecular docking software makes computational predictions of the interaction of molecules. This can be useful, for example, in evaluating the binding of candidate drug molecules to a target molecule from a virus. In the Autodock docking software (Morris et al. 1996), a physical model is used to evaluate the energy of candidate docking configurations, and heuristic search is used to minimize this energy. Previous versions of Autodock used simulated annealing to do this heuristic search. We evaluate the use of the genetic algorithm with local search in Autodock. We investigate several GA-local search (GA-LS) hybrids and compare results with those obtained from simulated annealing. This comparison is done in terms of optimization success, and absolute success in finding the true physical docked configuration. We used these results to test the energy function and evaluate the success of the application.
Abstract: This paper defines a class of evolutionary algorithms called evolutionary pattern search algorithms (EPSAs) and analyzes their convergence properties. This class of algorithms is closely related to evolutionary programming, evolution strategie and real-coded genetic algorithms. EPSAs are self-adapting evolutionary algorithms that modify the step size of the mutation operator in response to the success of previous optimization steps. The rule used to adapt the step size can be used to provide a stationary point convergence theory for EPSAs on any continuous function. This convergence theory is based on an extension of the convergence theory for generalized pattern search methods.
Abstract: The benefits of applying optimization to computational models are well known, but their range of widespread application to date has been limited. This effort attempts to extend the disciplinary areas to which optimization algorithms may be readily applied through the development and application of advanced optimization strategies capable of handling the computational difficulties associated with complex simulation codes. Towards this goal, a flexible software framework is under continued development for the application of optimization techniques to broad classes of engineering applications, including those with high computational expense and nonsmooth, nonconvex design space features. Object-oriented software design with C++ has been employed as a tool in providing a flexible, extensible, and robust multidisciplinary toolkit that establishes the protocol for interfacing optimization with computationally-intensive simulations. In this paper, demonstrations of advanced optimization strategies using the software are presented in the hybridization and parallel processing research areas. Performance of the advanced strategies is compared with a benchmark nonlinear programming optimization.
Abstract:
This work gives a proof of convergence for a randomized learning
algorithm that describes how anoles (lizards found in the Carribean) learn
a
Abstract: Crystal lattices are infinite periodic graphs that occur naturally in a variety of geometries and which are of fundamental importance in polymer science. Discrete models of protein folding use crystal lattices to define the space of protein conformations. Because various crystal lattices provide discretizations of the same physical phenomenon, it is reasonable to expect that there will exist "invariants" across lattices that define fundamental properties of the protein folding process; an invariant defines a property that transcends particular lattice formulations. This paper identifies two classes of invariants, defined in terms of sublattices that are related to the design of algorithms for the structure prediction problem. The first class of invariants is used to define a master approximation algorithm for which provable performance guarantees exist. This algorithm can be applied to generalizations of the hydrophobic-hydrophilic model that have lattices other than the cubic lattice, including most of the crystal lattices commonly used in protein folding lattice models. The second class of invariants applies to a related lattice model. Using these invariants, we show that for this model the structure prediction problem is intractable across a variety of three-dimensional lattices. It turns out that these two classes of invariants are respectively sublattices of the two- and three-dimensional square lattice. As the square lattices are the standard lattices used in empirical protein folding studies, our results provide a rigorous confirmation of the ability of these lattices to provide insight into biological phenomenon. Our results are the first in the literature that identify algorithmic paradigms for the protein structure prediction problem that transcend particular lattice formulations.
Abstract: This paper examines the effects of relaxed synchronization on both the numerical and parallel efficiency of parallel genetic algorithms (GAs). We describe a coarse-grain geographically structured parallel genetic algorithm. Our experiments provide preliminary evidence that asynchronous versions of these algorithms have a lower run time than synchronous GAs. Our analysis shows that this improvement is due to (1) decreased synchronization costs and (2) high numerical efficiency (e.g. fewer function evaluations) for the asynchronous GAs. This analysis includes a critique of the utility of traditional parallel performance measures for parallel GAs.
Abstract: This paper theoretically compares the performance of simulated annealing and evolutionary algorithms. Our main result is that under mild conditions a wide variety of evolutionary algorithms can be shown to have greater performance than simulated annealing after a sufficiently large number of function evaluations. This class of EAs includes variants of evolutionary strategie and evolutionary programming, the canonical genetic algorithm, as well as a variety of genetic algorithms that have been applied to combinatorial optimization problems. The proof of this result is based on a performance analysis of a very general class of stochastic optimization algorithms, which has implications for the performance of a variety of other optimization algorithms.
Abstract: We present performance-guaranteed approximation algorithms for the protein folding problem in the hydrophobic-hydrophilic model, Dill (1985). Our algorithms are the first approximation algorithms in the literature with guaranteed performance for this model, Dill (1994). The hydrophobic-hydrophilic model abstracts the dominant force of protein folding: the hydrophobic interaction. The protein is modeled as a chain of amino acids of length n that are of two types: H (hydrophobic, i.e., nonpolar) and P (hydrophilic, i.e., polar). Although this model is a simplification of more complex protein folding models, the protein folding structure prediction problem is notoriously difficult for this model. Our algorithms have linear (3n) or quadratic time and achieve a three-dimensional protein conformation that has a guaranteed free energy no worse than 3/8 of optimal. This result answers the open problem of Ngo, Marks and Karplus (1994) about the possible existence of an efficient approximation algorithm with guaranteed performance for protein structure prediction in any well-studied model of protein folding. By achieving speed and near-optimality simultaneously, our algorithms rigorously capture salient features of the recently proposed framework of protein folding by Sali, Shakhnovich and Karplus (1994). Equally important, the final conformations of our algorithms have significant secondary structure (anti-parallel sheets, beta sheets, compact hydrophobic core). Furthermore, hypothetical folding pathways can be described for our algorithms that fit within the framework of diffusion-collision protein folding proposed by Karplus and Weaver (1979). Computational limitations of algorithms that compute the optimal conformation have restricted their applicability to short sequences (length less than or equal to 90). Because our algorithms trade computational accuracy for speed, they can construct near-optimal conformations in linear time for sequences of any size.
Abstract: The developmental mechanisms transforming genotypic to phenotypic forms are typically omitted in formulations of genetic algorithms (GAs) in which these two representational spaces are identical. We argue that a careful analysis of developmental mechanisms is useful when understanding the success of several standard GA techniques, and can clarify the relationships between more recently proposed enhancements. We provide a framework which distinguishes between two developmental mechanisms - learning and maturations - which also showing several common effects on GA search. This framework is used to analyze how maturation and local search can change the dynamics of the GA. We observe that in some contexts, maturation and local search can be incorporated into the fitness evaluation, but illustrate reasons for considering them separately. Further, we identify contexts in which maturation and local search can be distinguished from the fitness evaluation.
Abstract: We describe a method of registering retinal images automatically. Control poitns are automatically identified in each image from blood vessel segments extracted from both images. The location of the optic nerve is used to check the spatial similarity of control point pairs. The control point pairs are ranked with a similarity assessment that calculates a correlation of image intensity around each control point. Using a model of an idealized registration, we calculate the expected scaling factor between the images. Control point pairs that differ from this expected scaling factor are eliminated, with a bias against pairs with a low similarity assessment. Accurate registration is report in 22 out of 23 image pairs. The registration error is related to errors from the methods used to extract the vascular tree and to identify the location of the optic nerve.
Abstract: The Genetic Algorithm (GA) is generally portrayed as a search procedure which can optimize pseudo-boolean functions based on a limited sample of the function's values. There have been many attempts to analyze the computational behavior of the GA. For the most part, these attempts have tacitly assumed that the algorithmic parameters of the GA (e.g. population size, choice of genetic operators, etc.) can be isolated from the characteristics of the class of functions being optimized. In the following, we demonstrate why this assumption is inappropriate. We consider the class, F, of all deterministic pseudo-boolean functions whose values range over the integers. We then consider the Genetic Algorithm as a combinatorial optimization problem o ver {0,1}^l and demonstrate that the computational problem it attempts to solve is NP-hard relative to this class of functions. Using standard performance measures, we also give evidence that the Genetic Algorithm will not be able to efficiently approximate this optimization problem. These results imply that there does not exist a fixed set of algorithmic parameters which enable the GA to optimize an arbitrary function in F. We conclude that theoretical and experimental analyses of the GA which do not specify the class of functions being optimized can make few claims regarding the efficiency of the genetic algorithm for an arbitrary fitness function. When analyzing the computational complexity of the Genetic Algorithm, classes (or distributions) of functions should be analyzed relative to the algorithmic parameters chosen for the GA.
Unrefereed Publications![]() Adaptive Global Optimization with Local Search, W E Hart. PhD thesis, UCSD, June 1994. Abstract: This dissertation examines the performance of genetic algorithm (GA) hybrids that use local search to solve global optimization problems. GAs are a class of adaptive global sampling methods that take many cues from mechanisms observed in natural evolution. GAs maintain a population of solutions that are used to generate new solutions in the search space. GA hybrids using local search (GA-LS hybrids) are motivated by the apparent need to employ both a global and local search strategy to provide an effective global optimization method. Previous experimental results have found that GA-LS hybrids not only find better solutions than the GA, but also optimize more efficiently. To improve the efficiency of GA-LS hybrids, I propose and experimentally validate methods that selectively apply local search to solutions in the GA's population. First, local search is randomly applied with a fixed frequency. Experiments with this method illustrate a trade-off between the refinement performed by local search and the reliability of the competitive search performed by the GA. Next, I describe two classes of adaptive methods. Distribution-based adaptive methods use redundancy in the population to avoid performing unnecessary local searches. Fitness-based adaptive methods use the fitness information in the population to bias the local search towards individuals that have better fitness. An experimental analysis indicates that these adaptive methods can significantly improve the efficiency of GA-LS hybrids. This dissertation explores implications of these results for parallel GAs. In particular, a MIMD design for geographically structured genetic algorithms (GSGAs) is described. GSGAs were initially developed for SIMD architectures, where it is difficult to selectively apply local search. An analytic and experimental analysis of MIMD GSGAs demonstrates that they scale well for large problems. GA-LS hybrids are used to solve global optimization problems in several application domains. First, GA-LS hybrids are used to find the weights of a neural network to solve the six-bit symmetry problem. Next, they are used to solve a simple 19 atom molecular conformation problem. Finally, they are applied to a drug docking problem. When compared to simulated annealing, GA-LS hybrids not only find better solutions and optimize more efficiently.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|