Building, Installing, and Running Cplant$^{TM}$ Software

Building, Installing, and Running CplantTM Software


Version 2.0


James Otto


Contents

1  Introduction
2  System Architectures
3  Building Cplant Software
    3.1  Overview of the Cplant Build
    3.2  Obtaining Cplant
    3.3  Unarchiving Cplant and Editing Makefile-common
    3.4  Editing top/Makefile-common
    3.5  The Linux Kernel for Cplant
    3.6  Configuring cplant.h
    3.7  Building Cplant Components
4  Installing Cplant Software on the Admin Node - the Cplant VM
    4.1  Default Archives, Myrinet Routes, and the Cplant Map File
5  Configuring the VM and Running the Cplant PROTOtype
6  Building and Running Cplant MPI Applications
7  Setting Up a Cplant Compile Environment
8  Supporting Documentation

1  Introduction

This manual is a guide for the system administrator that describes how to compile code in the Cplant release and how to install the result in the Cplant Virtual Machine (VM) format. It also describes how to do a prototypical run of the Cplant software. Although Cplant is designed to be scalable and to run on thousands of compute nodes it is a simple and instructive matter to run the software on a minimal set of nodes to learn in principal how its components work. That is the goal targeted by the prototype run described here. A minimal setup probably requires 4 computers: an Admin node which exports a file system of config scripts and binary files, a Service node where jobs are launched, and 2 Compute nodes which run the job processes. In addition, a minimal Myrinet setup is recommended: 3 NICs, for the Service and Compute nodes, and a small switch to interconnect the NICs.

2  System Architectures

Cplant software has been developed at Sandia National Labs in Albquerque to run on the DEC Alpha architecture running the Linux OS. It has also been ported to the Intel x86 (through Pentium 4) processor architecture running the Linux OS. Application build support is provided for GNU, Compaq Alpha-Linux, and Compaq Tru64 compilers.

As for specific Linux versions, a subtree of Cplant patches for specific Linux kernels is provided in the Cplant release. The latest patches are for the 2.2.18 and 2.4.7 kernels. However, the set of required kernel changes is minimal and is easily extended to other kernel versions.

Recent development has Cplant running on Linux 2.2 and 2.4 version kernels. Cplant software support for SMP nodes is also is forthcoming.

Currently, Cplant supports the use of Myrinet as its main message passing medium. Accordingly, the message passing layer, Portals 3, has been implemented on top of the Cplant- specific Myrinet driver/packetization/flow-control layer. The Cplant release provides tools (Lanai compilers) and code for building a Cplant version of the Myrinet control program (MCP - the program that runs on the Myrinet interface card) on a Linux Alpha workstation and on a Linux Intel PC. Currently, the Cplant MCP works with Lanai version 4.x, 7.x, and 9.x Myrinet cards.

Cplant also supports messaging over the packet interface exported by standard Linux device drivers. So for example, one can run Cplant over raw ethernet or over GM for Myrinet. This path is the only one for messaging currently available for use with Cplant on the IA64 architecture. For the Readme on this subject, send mail to the first author at jsotto@sandia.gov.

3  Building Cplant Software

3.1  Overview of the Cplant Build

In what follows we will use $CPLANT to refer to the point in the directory tree where the Cplant release is unarchived. We will also on many occasions refer to subdirectories located under a directory named $ARCH-OS. The latter is our shorthand version of a directory name based on a pair of variables ($CPLANT_ARCH and $OS) defined dynmaically in Makefile-common. Common values for $ARCH-OS in practice are ``alpha-linux'' and ``i386-linux''.

An overview of steps in the basic Cplant build is as follows:


These steps are then followed by installing the pooled binaries on an admin node in the Cplant VM format, configuring the VM, and running Cplant.


3.2  Obtaining Cplant

At this time Cplant source has been designated as an open source software product by Sandia National Laboratories. For copyright and redistribution limitations see the accompanying copyright notice. The Cplant software distribution is available from http://www.cs.sandia.gov/cplant.

3.3  Unarchiving Cplant and Editing Makefile-common

The customary location for unarchiving the Cplant release is under the user's home directory at:


    $(HOME)/Cplant


by doing something like:

   % cd $(HOME)
   % mkdir Cplant
   % cd Cplant
   % cp cplant-tar-archive .
   % tar xvf cplant-tar-archive

which will result in the creation of a Cplant source tree at

    $(HOME)/Cplant/top.SUFFIX

where SUFFIX is a possibly null string. For the purposes of this document we will assume SUFFIX is null so that the Cplant source subtrees are located under

    $(HOME)/Cplant/top

With this assumption, built binaries for Cplant will ultimately (after doing a ``./Make install'' - see below) reside in


    $(HOME)/Cplant/$ARCH-OS


3.4  Editing top/Makefile-common

Before attempting this build some configuring may have to be performed by editing $CPLANT/top/Makefile-common. In particular, the default value for application compilation is

(LINUX_COMPILE) equal to compaq, which assumes installation of the Linux version of fort, Compaq's Fortran compiler.


This flag's value may also be set to gnu if a build using g77 is preferred. This is typical, for instance, on an x86 build.


Another common edit to Makefile-common is the LINUX24 variable. It should be set to ``yes'' in case cluster nodes will be running a 2.4 version of the Linux kernel.


3.5  The Linux Kernel for Cplant

Cplant code is tied closely to the Linux kernel. Therefore, a prerequisite to building Cplant is obtaining, patching, and configuring an appropriate version of the Linux kernel. The resulting Linux header files will be included in much of the Cplant source code. A complete build of the Cplant-patched Linux kernel is not necessary in order to build Cplant software (although it is required in order to run Cplant software).

Currently, Cplant modifies a small amount of code in relatively current versions of the Linux kernel. Available kernel patches are located in $CPLANT/compute/OS/linux-patches/linux-x.y.z.

The following procedure describes how to set up the Linux kernel. A similar description can be found in the README file in the specific patch subdirectory.


    % cd $CPLANT/top/compute/OS/linux-patches/linux-2.2.10

    % cp /wherever/linux-2.2.10.tar.gz ./ 

    [copy the matching stock linux kernel archive 
     from sunsite, for example, to the working 
     directory.]

    % tar xvfz linux-2.2.10.tar.gz  

    (unarchive the kernel source creating ./linux tree)

    % cp -a axp-cplant/* ./linux

    (Cplant has a mini version of the kernel tree as
     patch -- copy those files over the stock kernel files)

    % cd linux
 
    % make config  

    (configure the kernel for your hardware. 
     optionally, copy a Cplant config file to .config and do
                                           % make oldconfig )
    % make dep

    % make            (build kernel)

    % make bootpfile  (build bootp-able kernel for the alpha)
    or perhaps
    % make bzImage    (build compressed image for the x86)

    % cp arch/alpha/boot/bootpfile ../ 
    or 
    % cp arch/i386/boot/bzImage ../ 

    (copy kernel image to the patches directory)


Finally, make a link to the just-configured Linux headers in the appropriate location in the Cplant tree:


    % cd $CPLANT/top/compute/OS

    % ln -s linux-patches/linux-2.2.10/linux linux 


3.6  Configuring cplant.h

Prior to a build of Cplant system software an edit of


   $CPLANT/top/include/config/cplant.h


may be required in order to change the value of CPLANT_PATH (or the default value, ``/cplant'' may be used). The value of CPLANT_PATH specifies a directory prefix that is compiled into C code for Cplant run-time utilities. It is also pasted into Cplant start-up scripts during the build process. It is a location (possibly a mount point or symbolic link) common to Cplant compute nodes where the subtree of the Cplant virtual machine (VM) begins - that is, it's the location where nodes in the cluster proper locate their Cplant binaries.

In contrast to this, when the Cplant binaries are eventually installed on an Administration node (see below), they will reside, organized as a VM, in the default location /cplant/vm/default. The subdirectory /cplant/vm/default/nfs-cplant then is the point that should correspond to CPLANT_PATH (default= /cplant) on the cluster nodes (i.e., the Service and Compute nodes - the nodes on the Myrinet network). For example, the Admin node usually exports /cplant/vm/default/nfs-cplant via NFS, and the cluster nodes mount it as /cplant.

3.7  Building Cplant Components

Although it is possible to build individual Cplant components by going to the appropriate subdirectory of the Cplant source tree and doing make, the preferred approach for building subsets of components is to use the Make utility found in $CPLANT/top. This utility is a wrapper for make. It allows one to specify that a build should take place with respect to an indicated subtree of the Cplant source tree. For example, in a moment, we'll use ``Make basic'' to perform a build of Cplant's basic system and test code.

The basic tree is used most often because it encompasses most everything, aside from the Linux kernel, required to install and test a functioning Cplant: Linux modules, system load utlities, Myrinet Control Programs (MCPs), Myrinet startup and diagnostic utilities, Cplant startup scripts, and simple test applications. What it does not build is the Mpich library over Portals 3, or MPI-based test applications that use the ported Mpich library (but see MPI section below where Make mpi is invoked).

Prior to doing a basic build, it is important that configured Linux header files have been established (as described above in the section on building the Cplant/Linux kernel), that $CPLANT/top/compute/OS/linux has been created as a link which points to the directory that contains the header files - normally


$CPLANT/top/compute/OS/linux-patches/
linux-x.y.z/linux), 


and that CPLANT_PATH has been configured in


    $CPLANT/top/include/config/cplant.h,


as described earler.

Then to build Cplant system software and basic tests do:


    % cd $CPLANT/top
    % ./Make basic 


To collect the built binaries then do:


    % ./Make basic install 


This will collect binaries in a tree rooted at

$CPLANT/$ARCH-OS (as determined by Makefile-common). In the case of an Alpha/Linux build this would be $CPLANT/alpha-linux:


    % cd $CPLANT/alpha-linux
    % ls
   system build

    % cd system
    % ls
   bin     etc.in   mcps     nfsroot   vm
   cplant  kernels  misc     sbin
   etc     man      modules  test


To delete the basic binaries from their original (preinstalled) locations do:


    % ./Make basic clean


Later, we will return to discuss building tests of MPI messaging tests (again utilizing Make. Next we discuss automated installation of system software, i.e., creation of the Cplant VM (virtual machine).


4  Installing Cplant Software on the Admin Node - the Cplant VM


Installation overview:


This section describes transferring Cplant binaries to an administration node and setting up the Virtual Machine (VM) file structure on that host. The Virtual Machine is a collection of Cplant binaries intended to run on a section of Cplant hardware - normally, one divides up the available hardware and runs an individually configured instance of Cplant software (a VM) on each division. Each division, or VM, has its own set of Service nodes, its own set of Compute nodes, its own instance of the runtime utilities, bebopd, etc.

The first step is to archive the Cplant binaries which, depending on which builds have taken place so far, may exist in one or several subdirectories of $CPLANT. Archive the Cplant installed binaries, while leaving out ``top'':


      % cd $CPLANT/$ARCH-OS   (i.e., just above system)
      % tar cvf system.tar system


Then copy and unarchive system.tar to an ``admin'' node (which in the case of a PROTOtype build might just be a machine that exports a file system to a small set of similar test machines. A customary location for the binaries, and the obligatory location of the VM trees, on the admin node is /cplant/vm. On the admin node:


     % cd /somewhere  (/cplant/vm, perhaps) 

     % tar xvf system.tar


This replicates the Cplant installed binaries (the ``system'' tree on the admin node.

The next step requires the existence/writability of the directory /cplant/vm (even if it was not used in the previous step). This is the hardcoded location under which the default Cplant VM tree is installed using the vm.pl script in the ``system/vm'' directory on the admin node:


     % cd system/vm
     % ./vm.pl PROTO default


This will install files into /cplant/vm/default. In addition, files in /cplant/vm/default/etc.in with the .PROTO extension will get copied into /cplant/vm/default/etc - the location of the Cplant configuration files. That is, we just set up a VM called ``default'' and gave it a partial PROTOtype configuration.

Note that repeated invocations of vm.pl create a new /cplant/vm/default directory with the previous directory backed up to the same location, but with an added process-id suffix.

4.1  Default Archives, Myrinet Routes, and the Cplant Map File

The normal location in the VM tree for Myrinet routing files used by the Cplant startup scripts is


  /cplant/vm/default/nfs-cplant/routes , 


and a set of default Myrinet routes are generated in this location by the invocation of ``vm.pl PROTO...''.

New non-prototype installations based on Myrinet, on the other hand, require the generation of Myrinet route files. Some notes on generating Myrinet routes using Myricom's GM are included in the Cplant release tree as


  $CPLANT/top/support/cplant/Documentation/HowToMapMyrinet


Also see the Cplant website.

Regardless of the method used to generate Myrinet routes, a Cplant map file must be generated. This file is used to generate a hostname-to-nodeID mapping for use by the job launcher yod and the query utility pingd in displaying information about Cplant jobs. This map file links node IDs to hostnames by virtue of the position of the hostname string in the file. That is, a given hostname maps to node ID i if it is the entry at line i+1 in the map file.

Default versions of the Cplant map can be found in the VM in /cplant/vm/default/etc.in. These files are named cplant-map.target-name: for example, cplant-map.PROTO, and get copied (minus the extension) to the default vm by vm.pl. Therefore, the file /cplant/vm/default/etc/cplant-map should now exist:


   % cat /cplant/vm/default/etc/cplant-map
   c-0.SU-0
   c-1.SU-0
   c-2.SU-0
   c-3.SU-0
   c-4.SU-0
   c-5.SU-0
   c-6.SU-0
   c-7.SU-0

which shows a list of compute node names. Because of the implicit ordering provided by this listing we observe that in our PROTOtype system, the node with hostname c-0.SU-0 will have node ID 0, the node with hostname c-1.SU-0 will have node ID 1, and so on.

It is required at this time that these specific host names be used for all the nodes in the prototype cluster. Note that the same naming scheme is used with respect to the Myrinet route files:

   % ls /cplant/vm/default/routes
   c-0.SU-0
   c-1.SU-0
   c-2.SU-0
   c-3.SU-0
   c-4.SU-0
   c-5.SU-0
   c-6.SU-0
   c-7.SU-0

Based on the contents of these files, it turns out that the same ordering should be used for connection of the nodes to consecutive ports of the Myrinet switch : c-0.SU-0 connected to port 0, c-1.SU-0 connected to port 1, and so on. If only a couple of nodes are being set up (3 is the recommended minimum, allowing 1 Service node and 2 Compute nodes) it is only required that the relative port connections be maintained. For example, an installation with just hosts c-4.SU-0, c-6.SU-0, and c-7.SU-0 could connect to ports 0, 2, and 3 respectively. Still, it is recommended that one include c-0 in the setup since it is configured by default as the service node.

5  Configuring the VM and Running the Cplant PROTOtype

In this section we describe final configuration of the Cplant VM, and the running of Cplant software on a small prototype system. The goal is to describe in principle how Cplant works as a starting point which may be extrapolated from in order to build a full-scale installation.

Configuration of the default VM is done principally in the etc directory: /cplant/vm/default/nfs-cplant/etc (on the admin node). This is the location for the Cplant start-up scripts on the admin node. Recall that for the cluster nodes we assume that /cplant points to the above ``nfs-cplant'' location. Thus, on the compute (and service) nodes startup happens in the directory /cplant/etc.

Accordingly, Cplant software is started on a compute node by giving the cplant/etc/cplant script the start option:


    % /cplant/etc/cplant start


Likewise, Cplant software can be ``brought down'' by doing:


    % /cplant/etc/cplant stop


However, before giving these commands some additional work needs to be done in terms of configuring the individual cluster nodes. In particular, we need to assign the nodes their roles in the cluster. To do this we use the files (on the admin node):

      /cplant/vm/default/nfs-cplant/etc/vm-config,  
      /cplant/vm/default/nfs-cplant/etc/types
      /cplant/vm/default/nfs-cplant/etc/cplant-host

Each the node attempts to match its hostname with a line in vm-config. For example, vm-config for the PROTO installation has:


    c-0.SU-0:ptl-config= user-env=bebopd
    c-1.SU-1:ptl-config= user-env=pct
    c-2.SU-2:ptl-config= user-env=pct
     .
     .
     .


which says that node c-0 is being configured as a Service node (it runs the node allocation daemon, bebopd) and other nodes are being as configured as Compute nodes (they run a process control thread, or pct). According to vm-config, the way that the bebopd and pct get started on these nodes is by running the user-env script with the corresponding argument. And according to this vm-config all nodes run the ptl-config script without any arguments. Although vm-config takes precedence, alternative node configuration can be performed using the (default) types file; see the script


  /cplant/vm/default/nfs-cplant/etc/get-cfg


for the details on obtaining node configurations from these two files.

The other important config file we have to worry about is etc's cplant-host. This file is used to advertise to the nodes in general which node has the role of the ``bebopd'' node. To do this, one simply enters the node ID of that node on a line of the file:

      bebopd  0        # node zero is the Service/Bebopd node,

which is typical for the PROTOtype installation.

Getting the whole Cplant process started is usually done at boot time by having the /cplant/etc/cplant script run as an addition to the family of standard Linux startup scripts (i.e., an appropriate symbolic link to this script is added to /etc/rc.d/rc3.d).

Based on this overview, the steps for running a Cplant prototype once the VM has been installed, Myrinet route files have been established in the VM's ``routes'' directory, and a cplant-map file has been created in the VM's ``etc'' directory are:

Designate a service node along with one or more compute nodes with hostnames c-0.SU-0, c-1.SU-1, etc. On these nodes provide access to


   /cplant/vm/default/nfs-cplant 


via the path designated by CPLANT_PATH - probably just ``/cplant'', but see the source code's


   $CPLANT/top/include/config/cplant.h.


Make sure that hostname designations match the role specified for each node in


    /cplant/vm/default/nfs-cplant/etc/vm-config.


On the admin node, edit


/cplant/vm/default/nfs-cplant/etc/cplant-host


to advertise the node id of the service node, i.e., the node running the node allocator bebopd. The PROTOtype default is node 0.


If using Myrinet and the PROTO route files, connect these nodes to a Myrinet switch with node c-x.SU-0 preferably connected to port x of the Myrinet switch (in correspondence with the PROTO routing files).


Boot service and compute nodes into the Cplant/Linux kernel built previously.


Start Cplant on the service node (c-0.SU-0) by doing /cplant/etc/cplant start


Start Cplant on each compute node by doing /cplant/etc/cplant start

(or by including this in the set of Linux startup scripts).


On each compute node, check that node IDs have been assigned properly by doing /cplant/sbin/getNid.


On each node, test Myrinet connectivity and basic card integrity by doing a low-level Myrinet ivping of neighbors: /cplant/sbin/vping node-id, where ``node-id'' is 0, 1, ....


Check integrity of run-time utilites: use ps aux to check for running bebopd on the service node, and pcts on compute nodes. Then on the service node do /cplant/sbin/pingd. pingd should query bebopd and output a list of running pcts:


  Awaiting status from bebopd...
  Awaiting pct list from bebopd

           node             Job ID  SPID/rank  
  ------------------------- ------ -----------
   0 (    SU-000 n-00    )
   1 (    SU-000 n-01    )
   2 (    SU-000 n-02    )

  Total: 3
  Total busy: 0
  Total free: 3


Run prototype Cplant test application via yod on the service node:


  /cplant/bin/yod -sz size /cplant/test/tshello, 


where size is the number of processes in this ``parallel'' application. yod should print ``hello'' for each of the size processes.

Notice that the location of the Cplant application binary (tshello here is specific to only the Service node. Application binaries only need to be available on a single node, the one where users login to run there jobs. Cplant runtime takes care of distributing the binary to the set of Compute nodes for that job. In this particular instance, the application binary resides in the Cplant VM, but that is only because tshello is one of the standard test apps that are part of the basic build.


Here is the output from tshello:

  Contacting node allocation daemon...
  3 nodes are allocated to your job ID 121.
  Awaiting synchronization of compute nodes before 
  beginning user code.
  Application processes begin user code.

    hello from compute node 0 / 114
    hello from compute node 1 / 112
    hello from compute node 2 / 111

   Node name     Rank  Node  SPID   Elapsed  Exit Code  
---------------- ----  ----  -----  -------  --------- 
  SU-001 n-03       0    11    236  0:00:02       0   
  SU-001 n-04       1    12    232  0:00:02       0  
  SU-001 n-05       2    13    230  0:00:02       0 


And here is the source code for tshello, one of the test applications that gets compiled as part of the basic build:


  #include "stdio.h"
  #include "puma.h"

  int main()
  {
    printf("hello from compute node %d / %d\n", 
                                       _my_nid, _my_ppid);
    return 0;
  }


Note that in the case of printing to stdout, the action is forwarded to yod, which actually does the printing. In fact, this is what happens for normal file I/O as well. The request gets forwarded to yod which performs the action with respect to a file local to the service node. Support for ways of handling I/O other than serially through yod (in particular, fyod and enfs) is built into Cplant's application libraries. These alternate paths are invoked by prefixing filenames with an I/O protocol designation. Since yod has been chosen as the default I/O path, in this case this prefix is the null prefix. The source code for Cplant's I/O protocol switch is located in the directory


   $CPLANT/top/compute/lib/apps


A detailed description of Cplant's existing I/O support and how to add new support will be described in detail in a separate Cplant manual.

6  Building and Running Cplant MPI Applications

Included in the basic build of Cplant system components (described above) is the build of a handful of non-MPI test applications. Here we describe how to build some MPI-based test applications. Simple MPI tests and benchmarks are contained in the Cplant release under


   $CPLANT/top/compute/test/current/mpich-VERSION


In addition to linking with libraries that are compiled under a basic build these tests require linking with the Portals based MPI library. This library along with MPI test apps can be built with


    % cd $CPLANT/top
    % ./Make mpi  
    [or ./Make usrsw -- to get NAS benchmarks]


Note that before attempting this build some configuring may have to be performed by editing $CPLANT/top/Makefile-common. In particular, the default value for application compilation is

(LINUX_COMPILE) equal to compaq, which assumes installation of the Linux version of fort, Compaq's Fortran compiler.


This flag's value may also be set to gnu if a build using g77 is preferred.

The directory


     $CPLANT/top/compute/test/current/mpi/simple 


contains a number of simple MPI tests that do not require command line options (built programs reside in the $ARCH-OS (a common Cplant build convention) subdirectory of simple, and should be copied to the test subdirectory of nfs-cplant in the VM on the admin node). ring, for example, sends data around a logical ring of nodes; it can be run on any number of processes:


   %  /cplant/bin/yod -sz 2 /cplant/test/ring

   Contacting node allocation daemon...
   2 nodes are allocated to your job ID 122.
   Awaiting synchronization of compute nodes before 
   beginning user code.
   Application processes begin user code.

   process 1: number of processes= 2
   process 0: number of processes= 2
   token= 2000.000000l, matches TIMES_AROUND*num_nodes 
                                      (things look ok).
   doing long send...
   ...passed long send test.

   Node name     Rank  Node  SPID   Elapsed  Exit Code
---------------- ----  ----  -----  -------  ---------
  SU-001 n-03       0    11    238  0:00:03       0  
  SU-001 n-04       1    12    234  0:00:03       0 


ring tests both MPI short (messages under 8Kb, roughly) and long send protocols.

The directory


  $CPLANT/top/compute/test/current/mpi/nas 


contains a number of NAS benchmarks, which are for the most part numerical solvers that have been parallelized using MPI. For these tests job size (no. of processes/nodes) is compiled into the program executable. See config/suite.def there for an example of how one indicates which tests of which size are to be built. Built tests reside in the bin directory in that tree. All the NAS benchmarks are run without command line options.


7  Setting Up a Cplant Compile Environment

Compilation of applications contained within the Cplant release is performed via scripts that are wrappers for common compilers. As a part of the basic build, a set of compile scripts is provided in $CPLANT/top/compute/bin/$ARCH-OS. These scripts are used to compile the tests applications (during the basic build).

Setting up a compile environment for Cplant applications is a simple matter of copying the pertinent compile scripts from the above location, making some minor modifications, and seeing that the appropriate libraries and header files are installed. Lists of required libraries are contained in the compile scripts. Normally, these libraries will be found with the installed binaries (on the build host) under


   $CPLANT/$ARCH-OS/build/lib ,


for example, after a build and ``make install''. Similarly required header files would be found after a build in


   $CPLANT/$ARCH-OS/build/include ,


As an alternative, ``Make compile'' can be used to construct an appropriate set of compiler wrappers for Cplant. After doing ``Make compile install'' the results (cc, f77, f90) would be found in

   $CPLANT/$ARCH-OS/build/bin .


Typically, one needs to place these files along with the Cplant libraries in a designated directory tree location on a compile server. Once the location is chosen, it needs to be edited into the compile wrappers as a required shell variable.

8  Supporting Documentation

We list some resources aside from the accompanying Cplant manuals, which may be helpful:





File translated from TEX by TTH, version 2.86.
On 24 May 2002, 16:09.