Zoltan on CPlant

Erik Boman, Karen Devine, Bruce Hendrickson, Matthew St. John, Courtenay Vaughan
Parallel Computing Sciences Department 9226, MS1111

Highlights

Zoltan Overview

The Zoltan library is a suite of dynamic load-balancing and parallel partitioning algorithms for parallel applications.  Its object-oriented design separates the load-balancing data structures from those of an application, allowing many different applications to use Zoltan with no constraints on their data structures.  The design is both flexible and extensible.  Thus, it is easy for application developers to access the latest-and-greatest technology in the library, and easy for algorithm researchers to add and compares algorithms in the library.  Zoltan also provides migration-help tools that perform communication necessary to move objects from their old processors to their new processors after a new decomposition is computed.

Zoltan will play an important role on CPlant and other heterogeneous computing architectures.  Since the characteristics of the processors available to an application are not known until the processors are assigned, traditional static partitioning (as in Chaco/nem_slice) is not sufficient.  A dynamic partition that accounts for the obtained processors' computing power, memory, and network connections is needed.  Zoltan can be used to compute this dynamic partition.

The heterogeneous computing model we are implementing in Zoltan will enable load-balancing for heterogeneous processors and networks.  In the model, a heterogeneous computing system is represented hierarchically, as shown in the figure below.  The root of the hierarchy describes the entire computer, including its network topology and speed. Child nodes represent components of the computer.  Leaf nodes represent individual processors.  In FY99-FY00, we will implement the heterogeneous model, modify existing algorithms to fit the model, and investigate new algorithms that account for processor speeds, network speeds, and available memory.

Results on CPlant

Zoltan has been tested in MPSalsa, an unstructured finite element application for simulating chemically reacting flows.  The following example demonstrates the low overhead involved in using Zoltan for this finite element application.  In the example, we do not advocate any particular load-balancing method or load-balancing approach for MPSalsa; the experiments are meant only to evaluate the performance of Zoltan in a real application.

In the example, a catalytic partial oxidation reactor is modeled.  In the figure below, the blue region represents the reactor wall, where only heat transfer is modeled.  The yellow region represents the volume of the reactor, where fluid flow, heat transfer, and mass transfer with 22 chemical species and 77 reversible gas-phase reactions (known as the "whole enchilada" in MPSalsa) are modeled.  The red region represents a portion of the reactor wall where a catalyst causes surface reactions to take place; fluid flow, heat transfer, mass transfer, and surface reactions are modeled.

The time per finite element node for the matrix fill varies significantly between the three regions.  The matrix-fill time per node in the reactor wall is approximately 0.006 seconds.  Within the volume of the reactor, it is approximately eight times greater, due to the gas-phase reactions and the greater number of unknowns per node.  On the reacting surface, the fill-time per node is roughly two orders of magnitude greater still, due to the non-linear solve required at each reacting surface node.  In general, these computation times are not known and, thus, cannot be used by nem_slice, a static partitioning tool used as a pre-processor in MPSalsa.  These disparate fill times, however, cause the matrix-fill operation in MPSalsa to be highly imbalanced for a static partition computed by a multi-level graph partitioning algorithm.

We ran MPSalsa on 50 processors of CPlant.  During the first matrix fill in MPSalsa, we computed the matrix-fill time for each node.  We then called Zoltan to recompute a partition, weighting the nodes by their matrix-fill times.  Matrix fills using the new partition required 51% less computation time than those using the original partition.  The total load-balancing time in Zoltan was 0.07 seconds.  The time for data migration was 0.69 seconds.  The sum of these two times is far less that the time saved by using the new partition.  For complete results, see the table below.

In MPSalsa, the implementation of the load-balancing callback functions required by Zoltan took less than 200 lines of C code, showing that Zoltan is easy to use by an application.  The data migration callback functions required significantly more code, due to the complexity of the data structures in MPSalsa.  However, all communication for migrating data (nodes, elements, faces, etc.) between processors was done by Zoltan's migration-help tools.  No additional communication routines were needed in MPSalsa to establish the new partition.
 

Results using Zoltan with MPSalsa on 50 CPlant processors

Load-Balancing with Weighted RCB
     - Build the RCB data structures in Zoltan 0.00001 seconds
     - RCB algorithm 0.06 seconds
     - Build Zoltan's return arguments 0.01 seconds
Total Load-Balancing Time 0.07 seconds

Data Migration 
     - Migration of nodes, element, faces, etc. with Zoltan's migration-help tools 0.55 seconds
     - Rebuilding MPSalsa's solution vector, matrix, etc. 0.14 seconds
Total Data-Migration Time 0.69 seconds

Matrix-Fill Time in MPSalsa
     - Before load-balancing 59.4 seconds
     - After load-balancing 28.9 seconds
Reduction in Matrix-Fill Time 51%
 

For more information

Return to Cplant Home Page

Updated: August 29, 1999
Karen Devine
kddevin@cs.sandia.gov