next up previous contents index
Next: bebopd Node Allocator Up: Overview Previous: The bebopd   Contents   Index

Running a Parallel Application

A user launches a parallel application by running yod on a service node. The general command syntax is:

yod {yod-options} executable-path-name {application arguments}

OR

yod {yod-options} load-file-name

The yod-options specify such things as the number of nodes required or a list of the node numbers of the actual nodes requested. (The details can be found in Chapter 5.) If the application consists of more than one command line (heterogeneous load), then the executables and their arguments can be listed in a textual load file, and this load file name is then the last argument to yod .

The following sequence of events occurs:

  1. yod creates a node request for the bebopd . Depending on the yod-options this will be a simple request for some number of compute nodes, or a more complex request involving specific lists of nodes.
  2. yod sets up portals required for communication with the bebopd , with the compute partition PCTs , and with the user application processes.
  3. yod determines the location of the bebopd from the cplant-host file, sends the bebopd a request for compute nodes, and waits for a reply.
  4. If the yod job is part of a PBS job, the bebopd at this point ensures that the PBS job is not requesting more nodes than it has been allocated by the batch scheduling system.
  5. If the requested nodes are not available, yod informs the user and exits. Otherwise, yod receives from the bebopd a list of the compute nodes reserved for it and the identifying information for the PCTs on those nodes, and a numeric job ID identifying the parallel application. A PCT that has been allocated to a yod job but has not yet heard from yod will be listed as pending allocation on the pingd display.
  6. yod sends an initial message to each PCT notifying it that it will be hosting a parallel application. yod must be snappy because PCT reservations expire within 60 seconds if an allocated PCT does not hear from yod .
  7. The PCTs form a temporary group for the purpose of hosting the parallel application. After successfully forming a group, they can engage in global communication operations like broadcast and barrier synchronization. The root PCT (the one on the node hosting the application rank 0 process) notifies yod when this group formation is complete. In the case of heterogeneous load (more than one command line and possibly more than one executable in the parallel application) the PCTs also form subgroups based on which command line in the load file they are hosting. Each of these has a root, the PCT hosting the lowest ranked member in the subgroup.
  8. Normally, since compute nodes are diskless, the PCT stores the executable image in RAM disk . At this point the PCTs check to see if they have sufficient room in RAM disk to store the executable. If any PCT is unable to reserve local storage for the user's executable file, the PCTs will notify yod that the executable will need to be copied to a file system from which the PCTs can read it. (This kind of load is much slower that a load where the user's program is sent to the PCT and stored in RAM disk.) At this point yod will copy the executable file to the file system specified by the variable PARALLEL_FILE_SYSTEM, or to , /enfs/tmp if that variable is not defined.
  9. yod puts the argument data in a buffer of the root PCT. The root PCT broadcasts this data to the group. (In the case of heterogeneous load, yod puts the argument data for each executable in a buffer of the subgroup root PCT for that executable. These root PCTs then broadcast this data to their subgroups.)
  10. yod puts the user's environment variables in a buffer of the root PCT. The root PCT broadcasts this data.
  11. The PCTs need to know group ID information for the application owner in order to limit file IO operations. If the owner is a member of more than a few groups, this list of groups is sent up to the root PCT for broadcast to the other PCTs.
  12. If the user wants the PCT to run the application under the debugging tool strace, yod sends the strace options up now.
  13. If the PCTs had been able to reserve space in RAM disk for the executable, yod searches for the user executable(s) using the user's PATH environment variable if a fully qualified path name was not provided to yod. yod reads the executable image into a buffer, and then puts this buffer in a buffer of the root PCT for broadcast to the other PCTs. Otherwise, the PCTs access the executable from the file system specified by the variable PARALLEL_FILE_SYSTEM or , /enfs/tmp if that variable is not defined.
  14. In the compute partition, each PCT sets up the environment for the user process and then forks and execs it. The user process begins executing special initialization code linked with it by the Cplant compilation scripts, but does not procede to user code. This initialization code sets up the process to use portals, and sets up the information required to perform file IO through a remote server.
  15. Each PCT sends a synchronization message to yod at this point.
  16. When yod has received all these synchronization messages, it knows that the user process has begun on every node, but has not yet entered user code. yod sends a message to the root PCT to be fanned out to all PCTs indicating that the application processes should continue to the entry point of user code.
  17. Each PCT instructs the user process to procede to user code.
  18. The PCTs collect a map of the portal process IDs of the new processes and yod gets this map from the root PCT.
  19. yod serves as a front end to the running application. It processes user application IO calls (opens, reads, writes, etc), and forwards signals sent by the user to yod. SIGUSR1 and SIGUSR2 are forwarded to the application processes. Interrupting yod will cause yod to forward a SIGTERM to the user application processes. If this is ignored by the application, interrupting yod again will cause yod to forward a SIGKILL.
  20. When a user process terminates, the PCT sends a completion message to yod.
  21. If the yod user ran yod with the -g option, and the application process had terminated with a signal, yod will request a stack trace of the faulting application process from the PCT.
  22. When yod has received a completion message from every node, it sends an all done message to the root PCT for broadcast and displays the completion messages with user process exit code or terminating signal. If stack traces are available yod will display them after each node's completion message. yod then logs the node usage to a log file and exits.
  23. When a PCT receives an all done message, it notes that it is now available and will report itself as available when queried by the bebopd.


next up previous contents index
Next: bebopd Node Allocator Up: Overview Previous: The bebopd   Contents   Index
Lee Ann Fisk 2001-06-25