Next: bebopd Node Allocator
Up: Overview
Previous: The bebopd
  Contents
  Index
A user launches a parallel application by running yod on a service
node. The general command syntax is:
yod {yod-options} executable-path-name {application arguments}
OR
yod {yod-options} load-file-name
The yod-options specify such things as the number of nodes
required or a list of the node numbers of the actual nodes requested.
(The details can be found in Chapter 5.)
If the application consists of more than one command line (heterogeneous
load), then the executables and their
arguments can be listed in a textual load file, and this load file name
is then the last argument to yod .
The following sequence of events occurs:
- yod creates a node request for the bebopd . Depending on the
yod-options this will be a simple request for some number
of compute nodes, or a more complex request involving specific
lists of nodes.
- yod sets up portals required for communication with the bebopd ,
with the compute partition PCTs , and with the user application
processes.
- yod determines the location of the bebopd from the cplant-host
file, sends the bebopd a request for compute nodes, and waits for
a reply.
- If the yod job is part of a PBS job, the bebopd at this point
ensures that the PBS job is not requesting more nodes than it
has been allocated by the batch scheduling system.
- If the requested nodes are not available, yod informs the user
and exits. Otherwise, yod receives from the
bebopd a list of
the compute nodes reserved for it and the identifying information
for the PCTs on those nodes, and a numeric job ID identifying the
parallel application. A PCT that has been allocated to a
yod job but has not yet heard from yod will be listed as
pending allocation on the pingd display.
- yod sends an initial message to each PCT notifying it that
it will be hosting a parallel application. yod must be snappy
because PCT reservations expire within 60 seconds if an allocated
PCT does not hear from yod .
- The PCTs form a temporary group for the purpose of hosting the
parallel application. After successfully forming a group, they
can engage in global communication operations like broadcast and
barrier synchronization. The root PCT (the one on the node
hosting the application rank 0 process) notifies yod when this
group formation is complete. In the case of heterogeneous load
(more than one command line and possibly more than one executable
in the parallel application) the PCTs
also form subgroups based on which command line in the
load file they are hosting. Each of these has a root, the PCT
hosting the lowest ranked member in the subgroup.
- Normally, since compute nodes are diskless, the PCT stores the
executable image in RAM disk .
At this point the PCTs check to see if they have sufficient room
in RAM disk to store the executable. If any PCT is unable to
reserve local storage for the user's executable file,
the PCTs will notify yod that the executable will need
to be copied to a file system from which the PCTs can read it.
(This kind of load is much slower that a load where the user's
program is sent to the PCT and stored in RAM disk.) At this
point yod will copy the executable file to the file system
specified by the variable PARALLEL_FILE_SYSTEM, or to
,
/enfs/tmp if that variable is not defined.
- yod puts the argument data in a buffer of the root PCT.
The root PCT broadcasts this data to the group.
(In the case of heterogeneous load, yod puts the argument data for
each executable in a buffer of the subgroup root PCT for that
executable. These root PCTs then broadcast this data to their
subgroups.)
- yod puts the user's environment variables
in a buffer of the root PCT. The root PCT broadcasts this data.
- The PCTs need to know group ID information for the application
owner in order to limit file IO operations. If the owner is a
member of more than a few groups, this list of groups is sent
up to the root PCT for broadcast to the other PCTs.
- If the user wants the PCT to run the application under the
debugging tool strace, yod sends the strace
options up now.
- If the PCTs had been able to reserve space in RAM disk
for the executable,
yod searches for the user executable(s) using the user's PATH
environment variable if a fully qualified path name was not provided
to yod.
yod reads the executable image into a buffer, and
then puts this buffer in a buffer of the root PCT for broadcast
to the other PCTs.
Otherwise, the PCTs access the executable from the file system
specified by the variable PARALLEL_FILE_SYSTEM or
,
/enfs/tmp if that variable is not defined.
- In the compute partition, each PCT sets up the environment for the
user process and then forks and execs it. The user process begins
executing special initialization code linked with it by the Cplant
compilation scripts, but does not procede to user code.
This initialization code sets up the process to use portals, and
sets up the information required to perform file IO through a
remote server.
- Each PCT sends a synchronization message to yod at this point.
- When yod has received all these synchronization messages, it knows that
the user process has begun on every node, but has not yet entered user
code.
yod sends a message to the root PCT to be fanned out to
all PCTs indicating that the application processes should continue
to the entry point of user code.
- Each PCT instructs the user process to procede to user code.
- The PCTs collect a map of the portal process IDs of the new processes and
yod gets this map from the root PCT.
- yod serves as a front end to the running application.
It processes user application IO calls (opens, reads, writes, etc),
and forwards signals sent by the user to yod. SIGUSR1 and SIGUSR2
are forwarded to the application processes. Interrupting yod will
cause yod to forward a SIGTERM to the user application processes.
If this is ignored by the application, interrupting yod again will
cause yod to forward a SIGKILL.
- When a user process terminates, the PCT sends a completion message
to yod.
- If the yod user ran yod with the -g option, and the application
process had terminated with a signal, yod will request a stack trace
of the faulting application process from the PCT.
- When yod has received a completion message from every node, it sends
an all done message to the root PCT for broadcast and displays
the completion messages with user process exit code or terminating
signal. If stack traces are available yod will display them after
each node's completion message. yod then logs the node usage to
a log file and exits.
- When a PCT receives an all done message, it notes that it is
now available and will report itself as available when queried by the
bebopd.
Next: bebopd Node Allocator
Up: Overview
Previous: The bebopd
  Contents
  Index
Lee Ann Fisk
2001-06-25