Summary of Complexity Analysis for Discrete Optimization
Discrete Optimization
A discrete optimization problem involves the selection of a best
alternative from among a finite set of feasible solutions as
defined by an objective function. Typically, these problems
involve resource allocation. For example, one might want to pack a
set of boxes into a minimum number of trucks, expend a limited budget
adding edges to a network or increasing edge capacities so as to
maximize the overall [Btransport capacity of a network, or schedule a
set of operations on a set of parallel machines so as to minimize the
last completion time.
Computational Complexity
Complexity analysis helps to
set our expectations by defining what is and is not possible
computationally in the worst case. Suppose we wished to construct a
black box that would accept any input instance for a discrete
optimization problem and would output and optimal answer.
Computational complexity is the analysis of how long you can expect
this computation to run, as a function of input size, and how good an
answer you can expect. As a technical point, usually optimization
problems are phrased as decision problems: "is there a solution with
value at most B?" If one can efficiently answer the decision problem,
then one can answer the optimization question using binary search.
Though feasible sets are finite, they are not in general small. They
are almost always at least exponential in the size of the problem
input. Checking all possible solutions will not in general be a
viable option. For example, consider the knapsack problem where we
are given a set of objects each with a weight and a value. We wish to
pack a knapsack with a weight capacity with as valuable a set of
objects as possible. If we could generate and check a billion
possible solutions per second, then naively generating all possible
choices for a set of 65 items would require a thousand years.
Knapsack problems can be solved very quickly in practice using more
sophisticated techniques.
For practical purposes, there are two main classes of problems: P and
NP-complete.
The Class P
The class P stands for "polynomial-time" solvable. The black box for
these problems can give an optimal answer in time polynomial in the
input size. This is generally considered to be the class of
tractable problems. Examples of useful optimization problems
solvable in P are: minimum spanning tree, maximum matching, maximum
(network) flow, shortest paths, and linear programming.
The Class NP and NP-Completeness
The class NP stands for "Nondeterministic Polynomial" time. It means
that if you can guess a solution, then you can verify it
is polynomial time (i.e. efficiently). Here is were the
decision/optimization point is important. Given a hypothetical
solution to a decision problem, if one can efficiently check that all
constraints are met (i.e. the solution if feasible) and compute the
objective function to compare with the bound, then the problem is in
NP. Now generating the solution is a different question entirely,
much like the distinction between generating a mathematical proof and
verifying it's correctness. The class NP is generally believed to be
much bigger than the class P (i.e. P is a strict subset of NP), though
it has not yet been proven formally. It is arguably the most
important open problem in theoretical computer science.
A problem is NP-complete if it can be used as a black box to
efficiently solve any other problem in NP. In essence, and
NP-complete problem is as hard as any problem in NP. Right now, the
best known solution for any NP-complete problem requires exponential
time (as a function of input size) in the worst case. That is, it is
not much better than generating all possible solutions and checking
each one.
Thus NP-completeness corresponds to formal intractability: any
algorithm that optimally solves and NP-complete problem for
all instances will require an unacceptable amount of compute
time in the worst case (provided P is a strict subset of NP)
The items in boldface in this statement are the loopholes. In
practice, to get theoretically good algorithms, one must concentration
on approximation algorithms, algorithms which assume input structure
of interest (e.g. graph algorithms restricted to planar graphs, graphs
that can be drawn in the plane without edge crossings), average-case
behavior, or getting optimal solutions to particular problems of
interest.
A Heirarchy of Approximability
When considering approximation algorithms for NP-hard problems, there
is a heirarchy of what is theoretically possible:
Fully-Polynomial-Time Approximation Schemes
The best case is a fully-polynomial-time approximation scheme
(FPTAS) Given an input instance and an error parameter e, such an
algorithm will return a solution guaranteed to be within a factor of
(1 + e) times optimal and run in time that is polynomial in the input
size and 1/e. Thus there is a graceful tradeoff between running time
and solution quality. Such problems are only weakly
NP-complete. Frequently there exist pseudopolynomial-time
algorithms for such problems. These are algorithms that guarantee an
optimal answer and run in time polynomial in the unput size and the
largest number input. Thus if all the numbers input are small, these
algorithms are polynomial-time. Example: knapsack. If all the
weights are small (polynomial in the number of items), this can be
solved in polynomial time. For large input numbers, there is a FPTAS.
Polynomial-Time Approximation Schemes
The next best possibility is a polynomial-time approximation scheme
(PTAS), which is the same as a FPTAS except that the running time
can be exponential in 1/e, where e is the error parameter. Example:
bin packing.
Approximation (Only) Within A Constant Factor
The next level are those problems that cannot be approximated within a
certain constant, but can be approximated within a higher contant.
There is a general class called max-SNP-complete problems,
which is similar to NP-completeness in the sense that all problems
within the class have a certain lower bound on the difficulty of
approximation. The best constant known is still pretty small,
however. For specific problems, one can prove stronger
nonapproximability results. For example, assigning a set of jobs to
unrelated parallel machines cannot be approximated to within a factor
better than 3/2 (unless P = NP), but can be approximated within a
factor of 2.
Approximation (Only) Within a Logarithmic Factor
This is similar to the previous class except that a problem in this class
cannot be approximated to within c log n for some constant c, where n is
the input size, and can be approximated within c'log n for a different,
larger constant c'. Example: set cover.
Not Approximable within a Polynomial Factor
This class is essentially not approximable. Example: clique.
References
The definitive reference for NP-completeness is
still:
M.R. Garey and D.S. Johnson, Computers and Intractability: A
Guide to the Theory of NP-completeness, W.H. Freeman and Company,
New York, 1979.
Any more modern introductory algorithms text will
contain a more succinct discussion. See for example:
T.H. Cormen,
C.E. Leiserson, and R.L. Rivest, Introduction to Algorithms,
MIT Press, Cambridge, MA, and McGraw-Hill, New York, 1990.
For a more detailed description of complexity classes for approximability,
references, and descriptions of all the example problems given above, and
much more, see Crescenzi and Kann's
Compendium of NP optimization problems.
This page maintained by Cynthia Phillips caphill@cs.sandia.gov
Last Modified: October 21, 1997