There are a wide range of hybrid global optimization algorithms that have been developed. We summarize several hybrid algorithms that illustrate the reasons hybridization is can be useful.

Mixed Integer Nonlinear Programming (MINLP) refers to mathematical programming algorithms that can optimize both continuous and integer variables, in a context of nonlinearities in the objective function and/or constraints. The general form of a MINLP is:

Z=min C(y, x) w.r.t. y,x s.t. h(y,x) = 0 g(y,x) < 0 m n y in {0,1} , x in R

Engineering design problems often are MINLP problems, since they involve the selection of a configuration or topology (which components should be included in the design) as well as the design parameters of those components, i.e., size, weight, etc. The inclusion of a component can be represented by the binary variables, whereas the design parameters are continuous. Usually, the 0-1 variables appear in a linear form and the continuous variables exhibit the nonlinearities.

MINLP problems are NP-complete and until recently have been considered extremely difficult. However, with current problem structuring methods and computer technology, they are now solvable [Grossmann, 1990]. Major algorithms for solving the MINLP problem include: branch and bound [Beale, 1977], generalized Benders decomposition (GBD) [Geoffrion, 1972], and outer approximation (OA) [Duran and Grossmann, 1986].

The branch and bound method of solution is an extension of B&B for mixed integer programming. The method starts out by relaxing the integrality requirements of the 0-1 variables, thus forming a continous NLP problem. Then, a tree enumeration is performed which has a subset of the integer variables successively fixed at each node. The solution of the corresponding NLP at each node provides a lower bound for the optimal MINLP objective function value. The lower bound is used to direct the search, either by expanding the nodes in a breadth first or depth first enumeration, depending on the value of the lower bound. The major disadvantage of the branch and bound method is that it may require the solution of a large number of NLP subproblems (depending upon the number of integer variables).

Generalized Benders decomposition and outer approximation solve the MINLP by an iterative process. The problem is decomposed into a NLP subproblem (which has the integer values fixed) and an MILP master problem. The NLP subproblems optimize the continous variables and provide an upper bound to the MINLP solution, while the MILP master problems have the role of predicting a new lower bound for the MINLP solution, as well as new integer variables for each iteration. The search terminates when the predicted lower bound equals or exceeds the current upper bound. The process is shown in Figure 1 [Grossmann, 1990]

The main difference between GBD and OA is in the definition of the MILP master problem. In the GBD algorithm, the MILP master problem is given by a dual representation of a continuous space, while in the OA method, is it given by a primal approximation. In general, the OA method requires fewer iterations and thus the solution of fewer NLP subproblems, but the MILP problems require more computation as compared with GBD. For more details, see Grossmann [1990]. To meet sufficient conditions for convergence, all three solution methods require that the MINLP satisfy some form of convexity conditions.

In terms of algorithm performance and computational experience with MINLP, very large problems (hundreds of 0-1 variables, thousands of continuous variables, and thousands of constraints) have been solved [Sahinidis and Grossman, 1989]. No single solution method seemed to be consistently superior in all applications.

Engineering design "synthesis" problems are a major application of MINLP algorithms. In engineering design, one often wants to know how to select a topology or configuration and the parameters of the system to meet the specified goals and functional requirements. Synthesis is defined as "the selection of a configuration or topology, as well as its design parameters ... one has to determine on the one hand which components should integrate a system and how they should be connected; on the other hand, one has to determine the sizes and parameters of the components. The former clearly implies making discrete decisions, while the latter implies making a choice from among a continuous space." [Grossmann, 1990] Examples of synthesis problems include the design of structures in civil engineering, of process flowsheets in chemical engineering, of VLSI circuits in electrical engineering, of servomechanisms in mechanical engineering, etc. To formulate a synthesis problem, a superstructure is postulated that has all the possible alternatives that are candidates for a feasible design embedded in it. The discrete variables are the decision variables for the components in the superstructure to include in the optimal structure, and the continuous variables are the values of the parameters for the included components.

A solver for MINLP developed by Grossmann and Viswanathan is now available as part of the GAMS modeling package. For details, see the web site GAMS Software Site.

Beale, E. M. L. (1977). "Integer Programming," in *The
State of the Art in Numerical Analysis* (D. Jacobs, ed.) Academic
Press: London, 409-448.

Duran, M.A. and I.E. Grossmann (1986). "An Outer Approximation
Algorithm for a Class of Mixed Integer Nonlinear Programs,"
* Mathematical Programming* **36**, 307-339.

Geoffrion, A.M. (1972). "Generalized Benders Decomposition,"
*Journal of Optimization Theory and Applications* **10**(4)
237-260.

Grossman, I. E. (1990). "Mixed-Integer Nonlinear Programming
Techniques for the Synthesis of Engineering Systems,'' *Research
in Engineering Design*. **1**, 205-228.

Sahinidis, N. and I.E. Grossmann (1989). "MINLP Model for Scheduling in Continuous Parallel Production Lines." Presented at AIChE meeting, San Francisco, CA.

Simulated annealing (pointer to web page) was designed for combinatorial optimization, usually implying that the decision variables are discrete. A variant of simulated annealing called tree annealing was developed by Bilbro and Snyder [1991] to globally minimize continuous functions.

In simulated annealing, one is given an initial point *x*
and randomly generates a new candidate solution, *y*. If
*f(y*) < *f(x),* then *y* replaces *x*
and the process repeats. If *f(y)* > *f(x),* *y
*replaces *x* with a probability that is based on the
Metropolis criteria and the progress of the algorithm. In tree
annealing, candidate subspaces are accepted or rejected by evaluating
the function at representative points. Tree annealing gradually
narrows the width of the subspace to close in around the global
minimum. However, tree annealing does not find the global minimizer
*x** exactly: it finds an arbitrarily small interval where
that minimum lies. The tree annealing algorithm is very similar
to simulated annealing: given a point *x* associated with
the current search interval and a point *y* associated with
a candidate search interval, *y* replaces *x* depending
upon *f(x),* *f(y),* and the progress of the algorithm.
Tree annealing stores information in a binary tree to keep track
of which subintervals have been explored. Each node in the tree
represents one of two subintervals defined by the parent node.
Initially the tree consists of one parent and two child nodes.
As better subintervals are found, the path down the tree that
leads to these intervals gets deeper and the nodes along this
path define smaller and smaller subspaces. The tree annealing
algorithm is as follows:

1. Randomly choose an initial point* x* over the search
interval S0

2. Randomly travel down the tree to an arbitrary terminal node
*i*, and generate a candidated point *y* over the subspace
defined by Si.

3. If *f(y) replace *

*
4. Compute P = exp (-(f(y)-f(x))/T). If P>R,
where R is a random number uniformly distributed between 0 and
1, replace x with y.
*

*
5. If y replace x, decrease T slightly and update the tree
*

*
until T < Tmin.
*

*
*

1. Although tree annealing compares the function evaluation at two points x and y, these points are samples of subspaces.

2. In order to have the tree deeper (higher nodal resolution) in the vicinity of the minimum, it is necessary to add nodes to the tree. This is done in the following way: at any non-terminal node, the probability of going left or right is nL/(nL + nR) or nR/(nL + nR) where nL is the number of subnodes to the left and nR is the number of subnodes to the right. For example, in Figure 2, node A has one node to its right and five nodes to the left, so nR =1 and nL= 5. For node B, nR =1 and nL= 3.

To generate a candidate *y*, start at the root node and descend
the tree making left/right decisions based on the probabilities
of going left or right at a given node until a terminal node is
reached. Then, generate a point uniformly over the subspace defined
by the terminal node. Descending the tree implies that the size
of the subspaces is decreasing. If a path is continually travelled,
that implies that the algorithm is converging on the terminal
node in that path. To expand the tree, if *y* is accepted,
turn the terminal node containing y into a non-terminal node with
two children. At this point, the number of right and left nodes
at each node will have changed and the tree will need to be updated.

3. The probability of accepting *y* at a given point is
now governed by a modified Metropolis algorithm. The modified
criterion is:

P=*g(x)p(y)/g(y)p(x)*

where *p(x*)=*exp*(-*f(x*)/T). *g(x*)=(1/Vx)ppi
where i denotes each node in the path from the root node to the
current node, ppi = nL or R/(nL + nR), and Vx represents the
volume of the subspace defined by the node associated with x.
Thus, g(x) approximately the distribution of y over an interval of arbitrary size.

Tree annealing is a method for finding the global minimum of nonconvex functions. It is an extension of simulated annealing in that it can handle continuous variables. It is not guaranteed to converge, and often the rate of convergence is very slow. It is recommended to use tree annealing in the first part of optimization to find the region of the global optimum, then use a gradient descent or other method to hone in on the actual value. Given enough time, a sufficiently low final temperature, and a slow cooling rate, tree annealing should at least converge to a local minimum, and will often converge to a global minimum, especially for piecewise functions with discontinuities.

Tree annealing has been used in applications such as signal processing and spectral analysis, neural nets, data fitting, etc. These problems involve fitting parameters to noisy data, and often it is difficult to find an optimal set of parameters via convential means.

There is no publicly available software that we are aware of, however Laura Painton has a written copy of an implementation of tree annealing in MATLAB. It has an efficient pointer structure for creating dynamic trees in matrix form. For more information, contact Laura.

Bilbro, G.L. and W.E. Snyder (1991). "Optimization of functions
with many minima," *IEEE Transactions on Systems, Man,
and Cybernetics* **21**(4), 840-849.

Durst, S. (1992) "Tree annealing vs. Gradient Descent Methods." Term project for Numerical Optimization course, Dept. of Electrical Engineering, Carnegie Mellon University.

Global Optimization Survey - Main Page

Sandia National Laboratories

This page maintained by wehart@cs.sandia.gov Last modified: March 10, 1997