Mixed Integer Nonlinear Programming (MINLP) refers to mathematical programming algorithms that can optimize both continuous and integer variables, in a context of nonlinearities in the objective function and/or constraints. The general form of a MINLP is:
Z=min C(y, x) w.r.t. y,x
s.t. h(y,x) = 0
g(y,x) < 0
m n
y in {0,1} , x in R
Engineering design problems often are MINLP problems, since they involve the selection of a configuration or topology (which components should be included in the design) as well as the design parameters of those components, i.e., size, weight, etc. The inclusion of a component can be represented by the binary variables, whereas the design parameters are continuous. Usually, the 0-1 variables appear in a linear form and the continuous variables exhibit the nonlinearities.
MINLP problems are NP-complete and until recently have been considered extremely difficult. However, with current problem structuring methods and computer technology, they are now solvable [Grossmann, 1990]. Major algorithms for solving the MINLP problem include: branch and bound [Beale, 1977], generalized Benders decomposition (GBD) [Geoffrion, 1972], and outer approximation (OA) [Duran and Grossmann, 1986].
The branch and bound method of solution is an extension of B&B for mixed integer programming. The method starts out by relaxing the integrality requirements of the 0-1 variables, thus forming a continous NLP problem. Then, a tree enumeration is performed which has a subset of the integer variables successively fixed at each node. The solution of the corresponding NLP at each node provides a lower bound for the optimal MINLP objective function value. The lower bound is used to direct the search, either by expanding the nodes in a breadth first or depth first enumeration, depending on the value of the lower bound. The major disadvantage of the branch and bound method is that it may require the solution of a large number of NLP subproblems (depending upon the number of integer variables).
Generalized Benders decomposition and outer approximation solve the MINLP by an iterative process. The problem is decomposed into a NLP subproblem (which has the integer values fixed) and an MILP master problem. The NLP subproblems optimize the continous variables and provide an upper bound to the MINLP solution, while the MILP master problems have the role of predicting a new lower bound for the MINLP solution, as well as new integer variables for each iteration. The search terminates when the predicted lower bound equals or exceeds the current upper bound. The process is shown in Figure 1 [Grossmann, 1990]
The main difference between GBD and OA is in the definition of the MILP master problem. In the GBD algorithm, the MILP master problem is given by a dual representation of a continuous space, while in the OA method, is it given by a primal approximation. In general, the OA method requires fewer iterations and thus the solution of fewer NLP subproblems, but the MILP problems require more computation as compared with GBD. For more details, see Grossmann [1990]. To meet sufficient conditions for convergence, all three solution methods require that the MINLP satisfy some form of convexity conditions.
In terms of algorithm performance and computational experience
with MINLP, very large problems (hundreds of 0-1 variables, thousands
of continuous variables, and thousands of constraints) have been
solved [Sahinidis and Grossman, 1989]. No single solution method
seemed to be consistently superior in all applications.
Application Domains
MINLP problems involve the simultaneous optimization of discrete
and continuous variables. These problems often arise in engineering
domains, where one is trying to simultaneously optimize the system
structure and parameters. This is difficult because optimal topology
is dependent upon parameter levels and vice versa. In many design
optimization problems, the structural topology influences the
optimal parameter settings so a simple de-coupling approach does
not work: it is often not possible to isolate these and optimize
each separately. Finally, the complexity of these problems depends
upon the form of the objective function. In the past, solution
techniques typically depended upon objective functions that were
single-attribute and linear (i.e., minimize cost). However, real
problems often require multi-attribute objectives such as minimizing
costs while maximizing safety and/or reliability, ensuring feasibility,
and meeting scheduled deadlines. In these cases, the goal is
to optimize over a set of performance indices which may be combined
in a nonlinear objective function.
Engineering design "synthesis" problems are a major
application of MINLP algorithms. In engineering design, one often
wants to know how to select a topology or configuration and the
parameters of the system to meet the specified goals and functional
requirements.
Synthesis is defined as "the selection of a configuration
or topology, as well as its design parameters ... one has to determine
on the one hand which components should integrate a system and
how they should be connected; on the other hand, one has to determine
the sizes and parameters of the components. The former clearly
implies making discrete decisions, while the latter implies making
a choice from among a continuous space." [Grossmann, 1990]
Examples of synthesis problems include the design of structures
in civil engineering, of process flowsheets in chemical engineering,
of VLSI circuits in electrical engineering, of servomechanisms
in mechanical engineering, etc. To formulate a synthesis problem,
a superstructure is postulated that has all the possible alternatives
that are candidates for a feasible design embedded in it. The
discrete variables are the decision variables for the components
in the superstructure to include in the optimal structure, and
the continuous variables are the values of the parameters for
the included components.
Software
A solver for MINLP developed by Grossmann and Viswanathan is now
available as part of the GAMS modeling package. For details,
see the web site GAMS Software Site.
References
The MINLP optimization group is located here.
Beale, E. M. L. (1977). "Integer Programming," in The State of the Art in Numerical Analysis (D. Jacobs, ed.) Academic Press: London, 409-448.
Duran, M.A. and I.E. Grossmann (1986). "An Outer Approximation Algorithm for a Class of Mixed Integer Nonlinear Programs," Mathematical Programming 36, 307-339.
Geoffrion, A.M. (1972). "Generalized Benders Decomposition," Journal of Optimization Theory and Applications 10(4) 237-260.
Grossman, I. E. (1990). "Mixed-Integer Nonlinear Programming Techniques for the Synthesis of Engineering Systems,'' Research in Engineering Design. 1, 205-228.
Sahinidis, N. and I.E. Grossmann (1989). "MINLP Model for
Scheduling in Continuous Parallel Production Lines." Presented
at AIChE meeting, San Francisco, CA.
Miscellaneous Links
None.
Simulated annealing (pointer to web page) was designed for combinatorial optimization, usually implying that the decision variables are discrete. A variant of simulated annealing called tree annealing was developed by Bilbro and Snyder [1991] to globally minimize continuous functions.
In simulated annealing, one is given an initial point x and randomly generates a new candidate solution, y. If f(y) < f(x), then y replaces x and the process repeats. If f(y) > f(x), y replaces x with a probability that is based on the Metropolis criteria and the progress of the algorithm. In tree annealing, candidate subspaces are accepted or rejected by evaluating the function at representative points. Tree annealing gradually narrows the width of the subspace to close in around the global minimum. However, tree annealing does not find the global minimizer x* exactly: it finds an arbitrarily small interval where that minimum lies. The tree annealing algorithm is very similar to simulated annealing: given a point x associated with the current search interval and a point y associated with a candidate search interval, y replaces x depending upon f(x), f(y), and the progress of the algorithm. Tree annealing stores information in a binary tree to keep track of which subintervals have been explored. Each node in the tree represents one of two subintervals defined by the parent node. Initially the tree consists of one parent and two child nodes. As better subintervals are found, the path down the tree that leads to these intervals gets deeper and the nodes along this path define smaller and smaller subspaces. The tree annealing algorithm is as follows:
1. Randomly choose an initial point x over the search interval S0
2. Randomly travel down the tree to an arbitrary terminal node i, and generate a candidated point y over the subspace defined by Si.
3. If f(y)
4. Compute P = exp (-(f(y)-f(x))/T). If P>R,
where R is a random number uniformly distributed between 0 and
1, replace x with y.
5. If y replace x, decrease T slightly and update the tree
until T < Tmin.
1. Although tree annealing compares the function evaluation at two points x and y, these points are samples of subspaces.
2. In order to have the tree deeper (higher nodal resolution) in the vicinity of the minimum, it is necessary to add nodes to the tree. This is done in the following way: at any non-terminal node, the probability of going left or right is nL/(nL + nR) or nR/(nL + nR) where nL is the number of subnodes to the left and nR is the number of subnodes to the right. For example, in Figure 2, node A has one node to its right and five nodes to the left, so nR =1 and nL= 5. For node B, nR =1 and nL= 3.
To generate a candidate y, start at the root node and descend the tree making left/right decisions based on the probabilities of going left or right at a given node until a terminal node is reached. Then, generate a point uniformly over the subspace defined by the terminal node. Descending the tree implies that the size of the subspaces is decreasing. If a path is continually travelled, that implies that the algorithm is converging on the terminal node in that path. To expand the tree, if y is accepted, turn the terminal node containing y into a non-terminal node with two children. At this point, the number of right and left nodes at each node will have changed and the tree will need to be updated.
3. The probability of accepting y at a given point is now governed by a modified Metropolis algorithm. The modified criterion is:
P=g(x)p(y)/g(y)p(x)
where p(x)=exp(-f(x)/T). g(x)=(1/Vx)ppi
where i denotes each node in the path from the root node to the
current node, ppi = nL or R/(nL + nR), and Vx represents the
volume of the subspace defined by the node associated with x.
Thus, g(x) approximately the distribution of y over an interval of arbitrary size.
Application Domains
Tree annealing is a method for finding the global minimum of nonconvex functions. It is an extension of simulated annealing in that it can handle continuous variables. It is not guaranteed to converge, and often the rate of convergence is very slow. It is recommended to use tree annealing in the first part of optimization to find the region of the global optimum, then use a gradient descent or other method to hone in on the actual value. Given enough time, a sufficiently low final temperature, and a slow cooling rate, tree annealing should at least converge to a local minimum, and will often converge to a global minimum, especially for piecewise functions with discontinuities.
Tree annealing has been used in applications such as signal processing and spectral analysis, neural nets, data fitting, etc. These problems involve fitting parameters to noisy data, and often it is difficult to find an optimal set of parameters via convential means.
Software
There is no publicly available software that we are aware of,
however Laura Painton
has a written copy of an implementation
of tree annealing in MATLAB. It has an efficient pointer structure
for creating dynamic trees in matrix form. For more information,
contact Laura.
References
Bilbro, G.L. and W.E. Snyder (1991). "Optimization of functions with many minima," IEEE Transactions on Systems, Man, and Cybernetics 21(4), 840-849.
Durst, S. (1992) "Tree annealing vs. Gradient Descent Methods."
Term project for Numerical Optimization course, Dept. of Electrical
Engineering, Carnegie Mellon University.
Miscellaneous Links
None.
Last modified: March 10, 1997