MICS Robust Discrete Optimization Project
This work funded in part by US Dept of Energy, MICS Division
A discrete optimization problem involves the selection of a best
alternative from among a finite set of feasible solutions as
defined by an objective function. Typically, these problems
involve resource allocation. For example, one might want to pack a
set of boxes into a minimum number of trucks, expend a limited budget
adding edges to a network or increasing edge capacities so as to
maximize the overall transport capacity of a network, or schedule a
set of operations on a set of parallel machines so as to minimize the
last completion time. Examples of application areas of particular
interest to Sandia National Laboratories and the Department of Energy
include: scheduling for stockpile evaluation and management,
production planning, nonproliferation, transportation (routing,
packing, logistics), infrastructure analysis and design, energy,
environment, and tools for massively parallel computing such as domain
decomposition and meshing.
When a discrete optimization problem arises in practice, the solution
method of choice will depend on all of the following factors:
- Underlying combinatorial problem
- Side Constraints
- Data Properties (for the specific application/customer)
- Development Time (need it next week or 3 years from now?)
- End platform (a PC or a supercomputer?)
- Required run time (close to real time? overnight? weeks?)
- Required degree of optimality
Because of this, any problem that arises in practice is never exactly
the same as one that has been tackled previously. Still, we would like
solutions that are robust with respect to changes in data, fast, and
close to optimal.
Complexity analysis guides our expectations for how quickly and how
well we can solve an optimization problem. In general the problems we
most want to solve are NP-complete . For a more detailed
discussion of complexity analysis for discrete optimization problems
and what that means, click here. In
short, NP-completeness corresponds to formal intractability: any
algorithm that optimally solves and NP-complete problem for
all instances will require an unacceptable amount of compute
time in the worst case (provided P is a strict subset of NP)
The items in boldface in this statement are the loopholes. Thus, in
practice, one must concentration on approximation algorithms,
algorithms which assume input structure of interest, average-case
behavior, or getting optimal solutions to particular problems of
interest.
The basic methods used to solve NP-complete (or harder) optimization
problems are
- General Stochastic Heuristics: These include techniques such as
simulated annealing, genetic algorithms, and tabu search. They have
very short development time, and can produce good solutions in practice,
but they give no performance or running-time guarantees, and are likely
to be outperformed in the long run by combinatorial methods that explicitly
exploit problem structure. See our
Survey of Global Optimization
Methods
for more information on general (global) optimization techniques,
and
William Hart's Home Page
to find some references and web sites involving genetic algorithms.
- Operations Research (OR) Techniques: Techniques such as
mixed-integer programming can give optimal solutions to specific
instances of a problem. In general, OR solutions are excellent
engineering solutions. They can be (and frequently are) optimal or
provably close to optimal for a specific instance or set of instances
of interest, but have no guarantees for other instances, and can run
unacceptably slowly in the worst case. Sometimes OR solutions involve
Lagrangian relaxation or just problem-specific heuristics.
Customer-specific research stops when the customer is satisfied. It
is frequently hard to extrapolate results to other settings based upon
a publication from this kind of research because data is proprietary
(hence not published). For a much more balanced and complete
discription of operations research, see Michael Trick's Operations
Research Page
- Instance-specific combinatorial heuristics: Still no guarantee on
performance or running time, but such heuristics can be quite clever
and incorporate a lot of direct experience from those solving the problem
by hand prior to automation. Such heuristics can work well in practice
for specific types of data.
- Approximation algorithms: These are algorithms that run in
polynomial time and have a performance guarantee. That is, for any
input instance (perhaps satisfying some property) the approximation
algorithm will return a solution within a specified factor of the
optimal. This is maximally robust, in that this guarantee remains in
effect no matter how the data set changes. However, the worst case is
usually pathological, and therefore the guarantee is rarely
sufficiently close to optimal to satisfy a customer a priori. The
algorithm itself may perform much better than its worst-case bounds on
practical problems.
Our longterm goal is to develop suites of algoirthms for hard
combinatorial optimization problems and to develop a "cookbook"
suggesting method of choice for a problem based upon the seven
criteria listed above.
Incrementally, we are pursuing this goal in several ways:
- Development of approximation algorithms for specific combinatorial
optimization problems.
Cindy Phillips' publications give a sampling of
these results
- (Nearly) Exact Methods: development of PICO, a massively-parallel integer and
combinatorial optimization engine. The first phase is a
branch-and-bound-based mixed-integer programming engine. Any
NP-complete problem can be expressed as a mixed-integer program, so
this is a general tool. PICO will give optimal solutions to specific
instances. Therefore, it has no acceptable running-time guarantee,
but it can be stopped early with a feasible solution and
instance-specific lower bound.
- We may develop problem-specific heuristics for customers as needed.
PICO can be used to solve optimization problems at both ends of the
spectrum with respect to platform:
- It can be used as a direct tool for huge one-time computations
such as resource allocation for environmental clean-up or analysis
of infrastructure vulnerabilities.
- It can be used to look at the structure of (near) optimal solutions
to provice guidance for better heuristics for weaker platforms.
- It can be used to benchmark heuristics for applications weaker platforms.
The latter application involves the use of
Experimental Algorithmics, a relatively new effort to apply
rigorous experimental design techniques to computer experiments. See
Journal of Experimental Algorithmics
or publications such as
C.C. McGeoch, "Experimental studies of algorithms", Mathematical Systems
Theory, Vol. 29, No. 3-4, Jan/Feb 1996.
Given some statistical properties of anticipated data (either from user
knowledge or analysis of data sets), if one can design a (random) generator
producing data with these properties, one can then analyze the behavior
of the heuristic by comparison to optimal solutions generated by PICO.
The resulting analysis can give a statistically-rigorous evaluation of
the heuristics performance for the given type of data.
A Cross-disciplinary Approach
This project applies heurstics,
approximation algorithms, and exact methods synergistically.
We seek provable bounds where possible. For applications where we
have data, we will seek exact or heuristic methods, depending upon the
need of the user. When finding exact solutions with PICO, we will
generally need to tune the search for a specific problem to obtain
excellent solutions for critical problems. We will integrate
heuristic and approximation methods to guide the enumerative search
within PICO.
We plan to work in four fundamental areas:
- The Pantex Scheduling Problem
- Polyhedral Combinatorics
- Experimental Algorithmics
- Online Algorithms
We will develop new (LP-based) approximation
algorithms and/or heuristics for the Pantex scheduling problem. These
algorithms must allow initially-fixed jobs, that is, jobs whose start
time and facility type have been predetermined and cannot be moved.
This is to allow interaction with branch-and-bound search strategies
where some integer variables are fixed. Fixed jobs can also model
facility maintenance or scheduled worker vacation.
In polyhedral combinatorics, we will developing new methods for
cutting plane generation for general mixed-integer programs. These
methods will improve PICO's default performance for new applications.
We may also develop cutting planes for specific combinatorial problems,
focusing on broad families and base problem classes.
In experimental algorithmics, we will explore
performance-measurement metrics for massively-parallel solution of
combinatorial optimization problems. The "traditional" measure,
MFlops, is clearly not the answer; the amount of parallel work is not
fixed due to stochastic elements of heuristic methods, subproblem
scheduling, and message delivery. That is, whether or not a
subproblem is solved depends upon the best feasible solution found so
far and whether a processor has knowledge of that solution. We will
also, as necessary, do research on problem generation for
combinatorial problems (e.g. scheduling). Statistically significant
experimental analysis often requires more data than can be generated
using only data sets obtained from a a real application. Therefore,
we need methods to generate random instances of constrained problems
with particular statistical properties.
We will continue fundamental work on online algorithms (e.g. for
scheduling and paging).
We will closely
collaborate with the MICS Global Optimization
Project, especially for the development of stochastic methods to
integrate into PICO.
We sometimes assist with specific combinatorial optimization problems
related to meshing.
We are now official members
of the Pantex Process Model (PPM) Team which is providing production
planning/scheduling support for Pantex operations. We have negotiated
a scheduling/planning model which we will solve exactly using the PICO
(Parallel Integer and Combinatorial Optimization) mixed-integer
programming code. These negotiations have suggested several future
model refinements. We can generate problems in a format appropriate
for PICO and have shown that a full-sized problem with over half a
million binary variables and tens of thousands of rational variables
can be bounded by CPLEX on a single processor. Thus the problem size
does not violate the current model of solution by PICO (sequential
bounding).
We have developed a new model of preemptive scheduling motivated by
human data processing and physical systems with similar memory. We
have shown the problem is NP-complete and greedy algorithms which
perform well under cost-free preemption can perform quite badly in
this model. However, there are also instances where the best schedule
in this new model is sqrt n better than the best nonpreemptive
schedule.
We began a new effort in constrained graph matching in support of the
meshing effort. CUBIT currently leaves voids, "holes" in the mesh
where no elements are placed. Members of the meshing group proposed a
new iterative particle-based method for meshing. Each iteration
requires the solution of a constrained matching problem. We have
delivered branch-and-cut-based code to solve the matching problem
arising from 2D meshing. This code closes meshes that could not be
closed with previous methods. We have started preliminary work on 3D
extensions.
In an online setting, one must make resource-allocation decisions
without knowledge of future tasks/input. We investigated the power of
statistical information in online decision making. We developed
succinct characteristics of memory access patterns which can be
exploited in a page-replacement strategy to greatly improve paging
performance in computer systems. The small amount of data needed for
this algorithm can itself be collected online.
We have made several steps towards better understanding the structure
of the traveling-salesman problem. This problem can be expressed as
an integer linear program. The feasible region is a polytope (in
variable space) for both the integer solution and the fractional
relaxation (which can be computed efficiently). We made progress on
calculating the integrality gap (difference between optimal solutions
in the two polytopes) and generating cutting planes, which can be used
to iteratively drive the fractional polytope toward the integer one.
We also developed new techniques for finding violated inequalities for
combinatorial optimization problems in general. A previously-known
method for finding cutting planes (violated constraints), solves a
linear program. We showed that by expanding the feasible region
either by allowing approximate solutions to the LP, or by relaxing the
original combinatorial problem, one can frequently find cutting planes
much faster.
Publications: 4 journal, 2 conference, 8 (nearly) submitted