NAME

mpirun.cit - start up MPI jobs, leveraging the CIToolkit


MODULE

myrinet


SYNOPSIS

mpirun.cit --np <n> "command" [ --gm-np <n> <command> ] [ --help | --gm-h ] [ --db <dbname> | --nodb ] [ --gm-f <conf_file> ] [ --gm-nf <node_conf_file> ] [ --gm-w <seconds> ] [ --gm-kill <seconds> ] [ --gm-v | --debug <n> ] [ --gm-d <n> | --totalview ] [ --gm-copy | --gm-nocopy ] [ --nostatus ] [ --cd [<dir>] ] [ --tombstone ]


DESCRIPTION

This script launches MPI jobs, leveraging the CIToolkit configuration database to determine if there is a "leader" hierarchy to use to farm out the work.

The other major difference between this and other versions of mpirun is in how the executable is located and the current working directory on the compute node is determined. If no path to the executable is given, mpirun.cit searches for it on the admin node, and then uses the same path on the compute nodes. If a full path is given, it is assumed to be the path on the compute nodes, not the admin node.

The working dir on the compute nodes is the directory of the executable, unless the --cd option is used. Also, if the --gm-copy option is used, then the default working dir will be /tmp.


OPTIONS

Options can be specified on the command line, or through the environment variable MPIENV.

--np <n> "command" run command (MPI program) on first <n> nodes listed in gmpi conf file. The quotes are not necessary unless you are passing options to your command that are the same as options to mpirun.cit.

--gm-np <n> "command" same as --np.

--gm-f <conf_file> specify the path to the gmpi conf configuration file (on the current node). If not specified, use GMPICONF env var, or look in GMConf.pm.

--gm-nf <conf_file> specify the path to the gmpi conf configuration file ON THE COMPUTE NODES. If not given, use NODE_GMPICONF env var, or default to path being used on current node.

--gm-w <n> wait n secs between starting each process

--gm-kill <n> n secs after first process exits, kill all other processes.

--db <dbstring> cluster database to use.

--nodb don't connect to the database. Don't try to determine leaders. Just connect directly to all the compute nodes from the current node.

--debug <n> print debugging information to STDERR. n=1..5, higher numbers produce more verbose output.

--gm-v be verbose. equivalent to --debug 1.

--help, --gm-h Print this manpage.

--gm-d <n> runs the debugger gdb on processor n. (NOT FULLY TESTED...)

--totalview, --tv runs the TotalView MPI debugger on all nodes. (NOT FULLY TESTED...)

--nostatus don't check the status of compute nodes before running rsh to start up the job. greatly speeds up launch times, but may make it less reliable.

--gm-copy Copy the executable to each node, instead of assuming its already there. Also copies the gmpi conf file as well.

--gm-nocopy Opposite of of --gm-copy (default)

--cd [<dir>] Use <dir> as the working directory on the compute nodes, instead of the dir of the executable (or /tmp if --gm-copy is specified.) If --cd is specified with no argument, the current working dir is used.

--tombstone Run gdb_tombstone_wrapper to dump a stack trace on process crash. Usually it is easier to use than other debugging tools, and it will help you answer most debugging questions.


NOTES

mpirun.cit supports starting multiple binaries by specifying --np multiple times on the command line. Follow the option each time with the number of processors, the command path, and the command arguments. For example, the following command will run echo -n hello on the first five nodes in the gmpi conf file and foo_cmd on the next two nodes:

  # mpirun.cit --np 5 E<quot>echo -n helloE<quot> --np 2 foo_cmd

For compatibility with other versions of mpirun, you do not have to follow the number of processors with the command, as long as the command to run is somewhere on the command line. After parsing all the options, any remaining arguments on the command line will be added to the --np specification missing it's command. So, the following (though strange) should do the same as above: (with additional verbose output)

  # mpirun.cit echo -n --np 5 --np 2 foo_cmd --gm-v hello

The environment variables are set on each compute node as follows:

  Standard shell varibles are set.
  GMPI_CONF contains the path to the gmpi conf config file
  GMPI_OPTS contains E<quot>mX,nYE<quot>
    (where X is the rank and Y is the total number of nodes)
  DISPLAY, PBS_CHECKPOINT_FILE, PBS_CHECKPOINT_RESTART,
    LD_LIBRARY_PATH, and GMPI_SHMEM_FILE are all
    exported, if they exist on the admin node.
  Variables listed in the GMPIENVVAR environment variable
    (separated by E<quot>:E<quot>) are exported, if they exist on the
    admin node.


FILES

The default path to the gmpi conf file is specified in the GMPICONF environment variable, or in $CLUSTER_CONFIG/GMConf.pm


EXAMPLES

Run 'hostname' on the first four nodes in the gmpi conf file:

  # mpirun.cit --np 4 hostname

Start a simple mpi job that lives in /cluster/rte/bin on the computes:

  # mpirun.cit --np 4 E<quot>/cluster/rte/bin/mpi_initE<quot>

Run xhpl on the computes, with /home/xhplconfigs as the current dir, so that it can find the HPL.dat file:

  # mpirun.cit --np 4 E<quot>/cluster/rte/bin/xhplE<quot> --cd /home/xhplconfigs

Copy 'myprog' to /tmp on each node and run it in /cluster:

  # mpirun.cit --gm-copy --np 4 myprog --cd /cluster


SEE ALSO

mpirun_drvr, mpirun