http://www.sandia.gov/~bahendr/paperstrix Calculations on Massively Parallel Computers
Authors: Bruce Hendrickson and David Womble
Status: In SIAM J. Sci. Stat. Comput., 15(5):1201-1226, 1994.

Abstract:

Dense linear systems of equations are quite common in science and engineering, arising in boundary element methods, least squares problems and other settings. Massively parallel computers will be necessary to solve the large systems required by scientists and engineers, and scalable parallel algorithms for the linear algebra applications must be devised for these machines. A critical step in these algorithms is the mapping of matrix elements to processors. In this paper, we study the use of the torus-wrap mapping in general dense matrix algorithms, from both theoretical and practical viewpoints. We prove that, under reasonable assumptions, this assignment scheme leads to dense matrix algorithms that achieve (to within a constant factor) the lower bound on interprocessor communication. We also show that the torus-wrap mapping allows algorithms to exhibit less idle time, better load balancing and less memory overhead than the more common row and column mappings. Finally, we discuss practical implementation issues, such as compatibility with BLAS levels 1, 2, and 3, and present the results of implementations of several dense matrix algorithms. These theoretical and experimental results are compared with those obtained from more traditional mappings.

Download full paper.