Cplant Components

Components

In addition to the functional decomposition of the machine, Cplant also replicates the components of the system software environment that provide for a scalable, high-performance user environment.

High Performance Message Passing

In order to support application-level communication, such as MPI, as well as system-level communication, such as that which occurs between the compute node daemons and the launcher, a flexible, high-performance data movement layer is needed. Much of the work on the Intel MPP machines focused on providing a communication layer that could deliver the highest possible percentage of network resources to these applications. The result of this work are Portals , which are the data movement layer supported on the Intel TFLOPS machine.

Parallel Job Launcher

The parallel job launcher works with the other system software components to start, monitor, and stop parallel jobs. The launcher contacts the allocator to get the appropriate type and number of nodes that the user desires. It then contacts the compute node daemon processes to create the run time environment necessary for supporting a parallel job. The launcher process is responsible for sending or receiving UNIX standard I/O requests and works with the compute node daemons to propogate any UNIX signals that it receives out to the compute node application processes.

The parallel job launcher for Cplant is called yod, as it was on the Paragon and TFLOPS machines (the name is a hal/ibm permutation of the launcher on the nCUBE, xnc). See the man page for yod for more details.

Compute Node Daemon Process

The compute node daemon process works with the launcher and the allocator to manage an individual compute node. It contacts the allocator when it is started to make a node available for use and is contacted by the allocator when it becomes part of a parallel job. The launcher process contacts compute node daemons to build the user environment in which the user process will execute. This environment includes the executable image, the user's shell environment, the number of processes in the job, and the rank of the particular process in the job. The compute node daemons that are involved in launching a job typically work together so that a spanning tree can be used to broadcast common information, such as the executable image, to all participating nodes efficiently. The daemon then launches the application process and monitors the child process. The daemon process then propogates UNIX signals to and from the child process to facilitate job shutdown and cleanup.

The compute node daemon process for Cplant is the Process Control Thread or PCT. On the machines running a lightweight kernel, this process was actually a kernel thread. For the Cplant machine, it is actually a heavyweight UNIX process. See the man page for PCT for more details.

Compute Node Allocator

The compute node allocator works with the launcher and the compute node daemon to manage the resources of the entire machine. It keeps track of which nodes are free and which are in use. On the Paragon and TFLOPS machines, the allocator had knowledge about the physical topology of the machine and could make intelligent decisions about the placement of application processes on physical nodes.

The compute node allocator for Cplant is the Bebopd (better engineered bag of Pc's daemon). The bebopd has been designed to be more adaptive to a cluster environment where there's a greater need for dynamic configuration. It is currently a single daemon process running in the service partition, but eventually will be a distributed service spread throughout the service partition. See the man page for bebopd for more details.

Compute Node Status Tool

The compute node status tool is used to view the status of the machine, such as

The compute node allocator for Cplant is pingd, which simply pings the bebopd to discover status information. It currently prints a single line of text for each avaialable compute node, but in the future it will be enhanced to be more scalable and more similar to the Paragon and TFLOPS showmesh utility. See the man page for pingd for more details.