# CS 442 Programming Assignment 4 Particle Simulations

Due: Tuesday, October 28

Goals:

1. Introduce new types of parallelism for particles.

2. Explore trade--offs between 1D and 2D data decompositions.

3. Use MPI communicators and collective communication routines.

Background:

While watching Melrose Place, it suddenly dawns on you that you've been ignoring the impact of wild animals in your Gila vegetation simulations. The largest herbivores in the Gila are elk, so you decide to simulate their movements. The Gila elk have evolved a strategy that maximizes their chance of survival by balancing two opposing forces: they must be dispersed enough to not compete for food, but be close enough to attract potential mates. Not unlike their Melrose counterparts, the elk achieve this balance via one of nature's great spectacles --- elaborate bugling rituals. Once a day each elk bugles to announce his/her location and listens for the return calls from other elk. Each elk then moves so as to be far enough from the others to have sufficient vegetation, but close enough to be within bugling range for the next day. Voluminous research has revealed that this can be modeled by having elk i move as if it feels a (vector) force due to elk j that is the derivative of a simple formula for their repulsive/attractive energy of interaction: (see class handout for mathematical representation). Unfortunately, whenever an elk leaves the Gila and wanders onto neighboring private land, it is immediately shot by eager Lincoln County ranchers.

Assignment:

You will write a parallel program to simulate the 2-d movements of the Gila elk. Serial C and Fortran codes which perform this task are available; use them as a starting point. The input parameters are the number of elk (N), the size of the circular wilderness region (RADIUS), the number of timesteps (NSTEPS), and a random number seed (SEED). Essentially, the code computes the force interactions between all N^2 pairs of elk every timestep and integrates Newton's equations of motion forward in time. The specifications for the initial randomized positions/velocities of the elk, the force calculation, the time integrator, and the rancher-enforced boundary conditions are more easily described in the code than in words. The output of a simulation is the number of remaining elk, their average speed, and the position of a particular elk.

Your parallel code should implement the force-decomposition particle algorithm described in class using the MPI collective communication routines AllGather and Reduce_Scatter. It is not necessary to implement Newton's 3rd law. Running on P processors, the code should accept PX and PY (PX*PY=P) as input and give the same answers for any PX by PY decomposition of the force matrix, including PX=1, PY=P or PX=PY=sqrt(P) or PX=P, PY=1. Your code should give the same answers running on any P including P = non-power-of-2 and P=1.

At the end of a run, your code should print-out NALIVE = the number of elk remaining inside the RADIUS, the average speed of the NALIVE elk, and the x,y coordinates of elk NOUTPUT. It should also print-out the total CPU time (of just the timestep loop) and the time spent in communication. The difference is presumably computation time.

What you hand in:

1. A program listing of your code, commented in the sections where you made additions or changes.

2. Output for 2 problems, each run on various numbers of processors, including (if your machine supports it) some runs on non-power-of-two P. Also, for problem (1) show some output for runs with the same P, but where you vary PX and PY, including the cases where PX=1 and where PY=1.

(1) Fixed-size problem:

N = 120, RADIUS = 50.0, NTIMES = 50, SEED = 123456, NOUTPUT = 50

(2) Scaled-size problem:

N = 30*P, RADIUS = 50.0, NTIMES = 50, SEED = 654321, NOUTPUT = 25

As before, we are looking to see that you get the correct answers and that (for problem 1) your results do not depend on P or on PX or PY.

3. A simple table of timings for the 2 problems for different P and PX,PY values that lists the total run-time, computation time, communication time, and overall speed-up and parallel efficiency.

4. A simple log-log plot of total CPU time vs. P for both of these problems. Include curves that show what ``perfect'' speed-up would be, along with data points for your actual timings. Also, for problem (1), a plot of total and communication time vs. PX where PX ranges from 1 to P. Tell us why you think the plots show what they do.

5. Brief (!) answers to the Open Questions below.

Optional:

The serial codes include a "dump" function that writes snapshots of the elk coordinates to a "dump.out" file. This file may be visualized with a program called "xmovie" that resides in /home/u/sjplimp/bin on the Sandia network and in /home/sjplimp on laguna at the ARC. Note: these are Sun SPARC executables. With the usual X-window permissions in place, typing "(path)/xmovie -2d dump.out" will pop up an animated X window of the elk movement. When doing timing runs with your parallel code, you should comment out the calls to "dump" or simply delete the function. However, you are welcome to parallelize the "dump" routine if you wish to experiment with parallel I/O and visualize the results of your parallel runs.

Open Questions:

1. If you simulated millions of elk for thousands of timesteps would you expect to get exactly the same answers running on any P? Why or why not?

2. Consider two runs where PX = 1, PY = P and PX = P, PY = 1. Do you expect the communication cost of the two runs to be the same? Why or why not?

3. How might you modify the code to exploit Newton's 3rd law and cut the cost of computation in half? Would this incur any extra communication?

4. Is your code's performance affected by the initial density distribution of elk? I.e. if some regions of the wilderness have very high density, will your code run slower?

5. (Optional) If Melrose Place were filmed in the Gila, speculate on the response of Lincoln County ranchers.