Parallel Scientific Computing
With a number of different collaborators, I have devised new parallel
algorithms for a variety of problems in scientific computing.
These include many-body calculations, solid dynamics with contacts,
and some radiation transport methods. These projects are sketched
below. I have also done a fair amount of work on
parallel linear algebra.
Parallel Radiation Transport
Discrete ordinates methods for the simulation of radiation transport
lead to a kernel computation involving a set of "sweeps" through the
mesh, one for each angle in the ordinate set. These sweeps are a
challenge to parallelize effectively for unstructured grids. The
first paper below describes our work on this problem.
To perform these sweeps (and also in
many other situations), it is necessary to decompose a directed graph into
strongly connected components. The classic serial algorithm for this problem
is not amenable to parallelism. In the second paper we propse and
analyze an alternative approach which is more parallelizable. The
third paper describes an implementation of this algorithm and its
performance on two parallel computers.
Grid Transfer
A number of computational procedures employ multiple grids on which
solutions are computed. For example, in multi-physics simulations a
primary grid may be used to compute mechanical deformation of an
object while a secondary grid is used for thermal conduction
calculations. When modeling coupled thermo-mechanical effects,
solution data must be interpolated back and forth between the grids
each timestep. On a parallel machine, this grid transfer operation
can be challenging if the two grids are decomposed to processors
differently for reasons of computational efficiency. If the grids
move or adapt separately, the complexity of the operation is
compounded. With colleagues, I have developed a grid transfer algorithm
suitable for massively parallel codes which use multiple grids. It
uses a rendezvous technique wherein a third decomposition is used to
search for elements in one grid that contain nodal points of the
other. This has the advantage of enabling the grid transfer to be
load-balanced separately from the remainder of the computations.
Dynamic Load Balancing
In many applications, computational requirements vary during the course
of a calculation. To run such calculations effectively on a parallel
computer, tasks must be periodically redistributed among processors
to keep the workload balanced. A number of different algorithms have
been proposed for this dynamic load balancing problem. The first paper
below is a critical review of the algorithmic options and the software
challenges. The second paper, somewhat older, describes an attempt
to parallelize a graph partition approach.
Dynamic load balancing is fundamentally harder than static partitioning.
Algorithms and implementations must run quickly in parallel without
consuming much memory. Interfaces need to be subroutine calls instead
of file transfers. And since the data is already distributed, the new
partition should be similar to the current one to minimize the amount
of data that needs to be redistributed. With Karen Devine and a number of
other collaborators, we are working to build a tool called
Zoltan
to address this plethora of challenges. Zoltan is designed to be extensible
to different applications, algorithms and parallel architectures. For example,
it is data-structure neutral - an object oriented interface separates the
application data structures from those of Zoltan. The tool will also support
a very general model of a parallel computer, facilitating the use of
heterogeneous machines.
Many Body Calculations
My work on parallel algorithms for many body calculations has focused
on problems with short-range forces in which the computational effort
scales linearly with the number of particles (eg. molecular dynamics).
Although the techniques I developed with
Steve Plimpton
apply to any
direct simulation, problems with long-range forces are more efficiently
addressed by any of several approximate methods.
My primary contribution was the force decomposition algorithm
in which we divide the computation among processors in an unusual way
to reduce the communication cost. A nice survey of of parallel approaches
for molecular dynamics calculations can be found in the first paper by Steve
Plimpton. The others introduce and extend the force decomposition idea.
Parallel Contact Detection
Imagine simulating a car crash. You would probably choose to use
Lagrangian grids which deform with the physics as the car distorts.
When the fender impacts the radiator, there will be new forces that
need to be modeled. In the simulation, this will correspond to
the mesh intersecting itself. Detecting these contacts is an
essential component of solid dynamics simulations, but it has
proved difficult to parallelize effectively. Previous parallelization
efforts have generally used a single decomposition of the mesh
which is used for both the finite element analysis and the contact
detection. However, this can lead to severe load imbalance and
large volumes of communication.
We have instead adopted an approach in which we use two different
decompositions for the two computational steps. We use a static,
graph-based decomposition for the finite element calculation, and
a dynamic, geometric decomposition for the contact detection.
Although there is some cost associated with communicating between
the two decompositions, a careful implementation keeps this small.
And once we've paid this cost, we get nearly perfect load balance
allowing for the first algorithms which scales to hundreds and
even thousands of processors. This algorithm is implemented in Sandia's
PRONTO code. The parallel algorithms in PRONTO are described in greater
detail in the first two papers below. The capabilities of the code
are presented in the third paper, which has earned parallel PRONTO
a status as finalist for the 1997 Gordon Bell Prize.
Smoothed Particle Hydrodynamics
When simulating phenomena involving high strains, meshes can become
highly distorted and break down. To avoid these problems, one can use
a gridless methodology known as smoothed particle hydrodynamics .
Despite its name, this approach can model solid objects as well as fluids.
Building on the ideas we devised for parallel contact detection, we have
developed a parallelization of smoothed particle hydrodynamics.
Return to Bruce's home page