Title: Communication Support for Adaptive Computation
Author: Ali Pinar and Bruce Hendrickson
Status: In Proc. 10th SIAM Conf. Parallel Processing for Scientific Computing, March '01

Abstract:

In this work we address two problems associated with redistributing data amongst processors. The first problem is that of determining the inter--processor communication pattern necessary to perform a calculation like matrix--vector multiplication. Consider the situation when a calculation is first described or when it is repartitioned after dynamic load balancing. Processors do not know what communication operations to perform to enable the matrix--vector multiplication to proceed. Assuming the matrix is partitioned by rows, looking at its own domain allows each processor can determine what it wants to receive, but it does not know which processor owns these desired data. We propose a distributed directory algorithm to efficiently determine the communication pattern (i.e., what a processor needs to receive from and send to every other processor). Our experiments show that the proposed algorithm performs efficiently on large numbers of processors.

The second problem is that of actually migrating data in the case of limited memory. Although a number of algorithms and software tools have been developed to repartition the work amongst processors, the mechanics of actually moving large amounts of data has received much less attention. If sufficient memory is available, each processor can allocate space for its incoming data, post asynchronous receives, and then send its outgoing data. Memory for outgoing data can be deallocated only after the send is complete. This requires each processor to simultaneously have space for both its outgoing and its incoming data, which is not always possible. To overcome this problem, instead of sending all the data at once, we can send in phases. In each phase only a fraction of the data is migrated, so less memory is required to receive messages. After each phase, processors can free up the memory of the data they have sent. That memory is now available for the next communication phase. We discuss efficient algorithms to exchange messages in minimum number of phases. We also discuss practical issues about implementation of these algorithms to exchange data under memory constraints.

Download full paper.