[stars image] [Sandia National Laboratories]

Parallel Performance of Seismic Imaging


There are two types of parallelism which are implemented in Salvo: frequency parallelism and spatial parallelism. The processor cube to the right illustrates the 2D decomposition for spatial parallelism and the frequency decomposition in the third dimension. Salvo can utilize frequency parallelism, spatial parallelism, or a mixture of the two.

Frequency parallelism takes advantage that each frequency is solved separately from the other, thus no communication is necessary until the final image is generated. Because of this limited communication, high parallel efficiencies can be obtained. However frequency parallelism is limited to the number of frequencies retained for the solution. If we only use 500 frequencies, only 500 processors can be used for the solution.

To circumvent this limitation, spatial parallelism can be used thus allowing more processors to utilized. Additionally if the system architecture has memory limited, spatial parallelism can reduce the problem size per processor. The drawback (i.e., lower parallel efficiencies) of the spatial parallelism is it requires the tridiagonal solves to be solved in parallel since the wavefield data is spatially decomposed. However one saving feature of the f-x migration is that there are many tridiagonal solves to complete, thus a pipeline technique can be used.


Frequency Parallelism

Small Frequency Problem

To test the frequency parallelism, a small problem was run. The grid was 101x101 and 256 frequencies were migrated. Shown below are the runtimes, MFLOPS/processor, and parallel efficiencies for the Intel Paragon at Sandia National Laboratories. Peak megaflop rating for single precision operations is ~75 MFLOPS/processor, thus we have obtained ~25% of peak. This test problem is of fixed size and is not scaled with the number of processors. Thus we obtain very high efficiencies which drop off is we approach 128 processors. Since each processor has less work, overhead and other inefficiencies are dominating. This is simply the of effect of Amdahl's Law.

A large problem was also tested which had 101x101 grid points and 2048 frequencies. Shown below are the results of this run. Again the trends are similar to those stated above.


Spatial Parallelism

To test the spatial parallelism, a scaled problem used. Each processor had a spatial domain of 101x101 grid points and had 32 frequencies to migrate. Thus for a spatial decomposition of 8x8 (64 compute nodes), the spatial domain was approximately 801x801 grid points. The parallel efficiencies are lower as expected, but are substantially higher than parallel algorithms which try to parallelize single tridiagonal solves. It should be noted that once the pipeline is fully constructed (i.e., it has a beginning node, a middle node(s), and an ending node [9 nodes]), the parallel efficiencies are nearly flat at ~66%.


[Mail to:] Curtis C. Ober

Last modified: October 9, 1998


Back to top of page || Back to Seismic Imaging Home Page || Sandia Home Page

Questions and Comments || Acknowledgment and Disclaimer