Cplant is a cluster of workstations dedicated to running message passing parallel applications. It was designed from the early stages of the project to scale to thousands of nodes, and to provide message passing and file IO performance sufficient for the applications that had previously run on the lab's massively parallel supercomputers.
The machines running user parallel codes (the compute nodes) are space shared: only one user process runs on the machine. The application must run to completion on all of it's nodes before another user may claim any of the compute nodes for a new job.
Application message passing is normally handled with calls to an MPI library [7]. The MPI library is layered over Cplant portals [4]. Portals are user level data structures understood by the operating system and were designed to reduce message passing latencies by allowing the user libraries to set up data structures permitting the operating system to copy incoming messages directly into the user's buffers.
Applications are linked with another message passing library called the Cplant server library (also layered over portals), described in [5]. The server library is used by the application before it begins user code to set up the group information required by MPI. It is also used for I/O to the yod proxy process and some other miscellaneous tasks. Use of the server library by the application is transparent to the programmer, but there is no reason why the programmer can't use this library if he or she wishes. Message passing can also be performed with direct calls to the portals library, although we don't know of anyone doing this.
The Cplant daemons and utilities described herein communicate over the Cplant server library. This library allows the various entities involved to send brief control messages, get and put large blocks of data, and to temporarily form groups to do collective communication operations. In subsequent pages of this document, when we refer to putting data in another's memory or getting data into our memory from another utility or daemon, we are referring to server library operations.
The server library has fault recovery features that enable groups of PCTs to carry on if one of the PCTs crashes during an application load. The library routines report which PCT has malfunctioned, and system adminstrators can view log files to obtain this information. After a failed load, yod will also display to the user which PCT failed in the group operation. The user should report this to system administrators so they can investigate further.
There are two utilities and two daemons involved in node allocation, application load, and application status queries:
In this chapter we will provide a general overview of the use of these four elements from the standpoint of a user running a job and from the standpoint of an administrator setting up and maintaining a running Cplant. Subsequent chapters will provide details on the interfaces of the daemons and the configuration elements that affect their behavior.