# Hybrid Methods

There are a wide range of hybrid global optimization algorithms that have been developed. We summarize several hybrid algorithms that illustrate the reasons hybridization is can be useful.

# MINLP

## Overview

Mixed Integer Nonlinear Programming (MINLP) refers to mathematical programming algorithms that can optimize both continuous and integer variables, in a context of nonlinearities in the objective function and/or constraints. The general form of a MINLP is:

```       Z=min C(y, x)  w.r.t. y,x

s.t.  h(y,x) = 0

g(y,x) < 0

m         n
y in {0,1}  , x in R
```

Engineering design problems often are MINLP problems, since they involve the selection of a configuration or topology (which components should be included in the design) as well as the design parameters of those components, i.e., size, weight, etc. The inclusion of a component can be represented by the binary variables, whereas the design parameters are continuous. Usually, the 0-1 variables appear in a linear form and the continuous variables exhibit the nonlinearities.

MINLP problems are NP-complete and until recently have been considered extremely difficult. However, with current problem structuring methods and computer technology, they are now solvable [Grossmann, 1990]. Major algorithms for solving the MINLP problem include: branch and bound [Beale, 1977], generalized Benders decomposition (GBD) [Geoffrion, 1972], and outer approximation (OA) [Duran and Grossmann, 1986].

The branch and bound method of solution is an extension of B&B for mixed integer programming. The method starts out by relaxing the integrality requirements of the 0-1 variables, thus forming a continous NLP problem. Then, a tree enumeration is performed which has a subset of the integer variables successively fixed at each node. The solution of the corresponding NLP at each node provides a lower bound for the optimal MINLP objective function value. The lower bound is used to direct the search, either by expanding the nodes in a breadth first or depth first enumeration, depending on the value of the lower bound. The major disadvantage of the branch and bound method is that it may require the solution of a large number of NLP subproblems (depending upon the number of integer variables).

Generalized Benders decomposition and outer approximation solve the MINLP by an iterative process. The problem is decomposed into a NLP subproblem (which has the integer values fixed) and an MILP master problem. The NLP subproblems optimize the continous variables and provide an upper bound to the MINLP solution, while the MILP master problems have the role of predicting a new lower bound for the MINLP solution, as well as new integer variables for each iteration. The search terminates when the predicted lower bound equals or exceeds the current upper bound. The process is shown in Figure 1 [Grossmann, 1990]

The main difference between GBD and OA is in the definition of the MILP master problem. In the GBD algorithm, the MILP master problem is given by a dual representation of a continuous space, while in the OA method, is it given by a primal approximation. In general, the OA method requires fewer iterations and thus the solution of fewer NLP subproblems, but the MILP problems require more computation as compared with GBD. For more details, see Grossmann [1990]. To meet sufficient conditions for convergence, all three solution methods require that the MINLP satisfy some form of convexity conditions.

In terms of algorithm performance and computational experience with MINLP, very large problems (hundreds of 0-1 variables, thousands of continuous variables, and thousands of constraints) have been solved [Sahinidis and Grossman, 1989]. No single solution method seemed to be consistently superior in all applications.

## Application Domains

MINLP problems involve the simultaneous optimization of discrete and continuous variables. These problems often arise in engineering domains, where one is trying to simultaneously optimize the system structure and parameters. This is difficult because optimal topology is dependent upon parameter levels and vice versa. In many design optimization problems, the structural topology influences the optimal parameter settings so a simple de-coupling approach does not work: it is often not possible to isolate these and optimize each separately. Finally, the complexity of these problems depends upon the form of the objective function. In the past, solution techniques typically depended upon objective functions that were single-attribute and linear (i.e., minimize cost). However, real problems often require multi-attribute objectives such as minimizing costs while maximizing safety and/or reliability, ensuring feasibility, and meeting scheduled deadlines. In these cases, the goal is to optimize over a set of performance indices which may be combined in a nonlinear objective function.

Engineering design "synthesis" problems are a major application of MINLP algorithms. In engineering design, one often wants to know how to select a topology or configuration and the parameters of the system to meet the specified goals and functional requirements. Synthesis is defined as "the selection of a configuration or topology, as well as its design parameters ... one has to determine on the one hand which components should integrate a system and how they should be connected; on the other hand, one has to determine the sizes and parameters of the components. The former clearly implies making discrete decisions, while the latter implies making a choice from among a continuous space." [Grossmann, 1990] Examples of synthesis problems include the design of structures in civil engineering, of process flowsheets in chemical engineering, of VLSI circuits in electrical engineering, of servomechanisms in mechanical engineering, etc. To formulate a synthesis problem, a superstructure is postulated that has all the possible alternatives that are candidates for a feasible design embedded in it. The discrete variables are the decision variables for the components in the superstructure to include in the optimal structure, and the continuous variables are the values of the parameters for the included components.

## Software

A solver for MINLP developed by Grossmann and Viswanathan is now available as part of the GAMS modeling package. For details, see the web site GAMS Software Site.

## References

The MINLP optimization group is located here.

Beale, E. M. L. (1977). "Integer Programming," in The State of the Art in Numerical Analysis (D. Jacobs, ed.) Academic Press: London, 409-448.

Duran, M.A. and I.E. Grossmann (1986). "An Outer Approximation Algorithm for a Class of Mixed Integer Nonlinear Programs," Mathematical Programming 36, 307-339.

Geoffrion, A.M. (1972). "Generalized Benders Decomposition," Journal of Optimization Theory and Applications 10(4) 237-260.

Grossman, I. E. (1990). "Mixed-Integer Nonlinear Programming Techniques for the Synthesis of Engineering Systems,'' Research in Engineering Design. 1, 205-228.

Sahinidis, N. and I.E. Grossmann (1989). "MINLP Model for Scheduling in Continuous Parallel Production Lines." Presented at AIChE meeting, San Francisco, CA.

None.

# Tree Annealing

## Overview

Simulated annealing (pointer to web page) was designed for combinatorial optimization, usually implying that the decision variables are discrete. A variant of simulated annealing called tree annealing was developed by Bilbro and Snyder [1991] to globally minimize continuous functions.

In simulated annealing, one is given an initial point x and randomly generates a new candidate solution, y. If f(y) < f(x), then y replaces x and the process repeats. If f(y) > f(x), y replaces x with a probability that is based on the Metropolis criteria and the progress of the algorithm. In tree annealing, candidate subspaces are accepted or rejected by evaluating the function at representative points. Tree annealing gradually narrows the width of the subspace to close in around the global minimum. However, tree annealing does not find the global minimizer x* exactly: it finds an arbitrarily small interval where that minimum lies. The tree annealing algorithm is very similar to simulated annealing: given a point x associated with the current search interval and a point y associated with a candidate search interval, y replaces x depending upon f(x), f(y), and the progress of the algorithm. Tree annealing stores information in a binary tree to keep track of which subintervals have been explored. Each node in the tree represents one of two subintervals defined by the parent node. Initially the tree consists of one parent and two child nodes. As better subintervals are found, the path down the tree that leads to these intervals gets deeper and the nodes along this path define smaller and smaller subspaces. The tree annealing algorithm is as follows:

1. Randomly choose an initial point x over the search interval S0

2. Randomly travel down the tree to an arbitrary terminal node i, and generate a candidated point y over the subspace defined by Si.

3. If f(y) replace x with y, and go to step 5.

4. Compute P = exp (-(f(y)-f(x))/T). If P>R, where R is a random number uniformly distributed between 0 and 1, replace x with y.

5. If y replace x, decrease T slightly and update the tree

until T < Tmin.

This algorithm is very similar to simulated annealing, however there are some important things to note:

1. Although tree annealing compares the function evaluation at two points x and y, these points are samples of subspaces.

2. In order to have the tree deeper (higher nodal resolution) in the vicinity of the minimum, it is necessary to add nodes to the tree. This is done in the following way: at any non-terminal node, the probability of going left or right is nL/(nL + nR) or nR/(nL + nR) where nL is the number of subnodes to the left and nR is the number of subnodes to the right. For example, in Figure 2, node A has one node to its right and five nodes to the left, so nR =1 and nL= 5. For node B, nR =1 and nL= 3.

To generate a candidate y, start at the root node and descend the tree making left/right decisions based on the probabilities of going left or right at a given node until a terminal node is reached. Then, generate a point uniformly over the subspace defined by the terminal node. Descending the tree implies that the size of the subspaces is decreasing. If a path is continually travelled, that implies that the algorithm is converging on the terminal node in that path. To expand the tree, if y is accepted, turn the terminal node containing y into a non-terminal node with two children. At this point, the number of right and left nodes at each node will have changed and the tree will need to be updated.

3. The probability of accepting y at a given point is now governed by a modified Metropolis algorithm. The modified criterion is:

P=g(x)p(y)/g(y)p(x)

where p(x)=exp(-f(x)/T). g(x)=(1/Vx)ppi where i denotes each node in the path from the root node to the current node, ppi = nL or R/(nL + nR), and Vx represents the volume of the subspace defined by the node associated with x. Thus, g(x) approximately the distribution of y over an interval of arbitrary size.

## Application Domains

Tree annealing is a method for finding the global minimum of nonconvex functions. It is an extension of simulated annealing in that it can handle continuous variables. It is not guaranteed to converge, and often the rate of convergence is very slow. It is recommended to use tree annealing in the first part of optimization to find the region of the global optimum, then use a gradient descent or other method to hone in on the actual value. Given enough time, a sufficiently low final temperature, and a slow cooling rate, tree annealing should at least converge to a local minimum, and will often converge to a global minimum, especially for piecewise functions with discontinuities.

Tree annealing has been used in applications such as signal processing and spectral analysis, neural nets, data fitting, etc. These problems involve fitting parameters to noisy data, and often it is difficult to find an optimal set of parameters via convential means.

## Software

There is no publicly available software that we are aware of, however Laura Painton has a written copy of an implementation of tree annealing in MATLAB. It has an efficient pointer structure for creating dynamic trees in matrix form. For more information, contact Laura.

## References

Bilbro, G.L. and W.E. Snyder (1991). "Optimization of functions with many minima," IEEE Transactions on Systems, Man, and Cybernetics 21(4), 840-849.

Durst, S. (1992) "Tree annealing vs. Gradient Descent Methods." Term project for Numerical Optimization course, Dept. of Electrical Engineering, Carnegie Mellon University.

None.