Table of Contents
bebopd -- Cplant node allocation daemon
bebopd [-D] [-S [1|0]] [-L [1|0]] [-daemon] [-alternative] [-r optional-file-name]
[-help] [-PBSsupport] [-PBSupdate] [-PBSinteractive numNodes]
The bebopd daemon runs in the service partition. It is the point in
the Cplant where knowledge of compute node status resides. It has the following
interfaces:
- PCTs
-
The bebopd receives messages from the compute node PCTs when they start
and end, and when an application terminates. If the bebopd is restarted,
it contacts the PCTs to identify itself to them. The bebopd
P sends status
queries as needed to the PCTs and maintains the responses.
- yod
-
The bebopd accepts yod requests on behalf of users wishing to run a parallel
application. The bebopd attempts to allocate to the job the requested
nodes, and assigns a numeric job ID to the applicat
ion.
- pingd
-
It also accepts pingd requests for updates from the compute parti
tion,
and returns pingd a list of compute node status information. It accepts
requests from pingd to send a SIGTERM or a SIGKILL to an application, kill
PCTs, or to note that a PCT it thought was out there is gone. The bebopd
may
also receive requests from pingd to turn on or off PBSsupport or PBSup
date,
or to change the number of nodes reserved for interactive (i.e. n
on-PBS)
use.
- PBS server
- When the bebopd is run in PBSupdate mode, it updates
the PBS serv
er whenever the number of live compute nodes changes. That
is, it uses the PBS qmgr client to keep the resources_available.size and
resources_max.size attributes of the PBS server accurate.
The bebopd as
designed today exists as a single process on
one node of the service partition.
The plan is to run bebopd as a distributed service across the servic
e
partition, both in the interest of fault tolerance and to improve response
time to yod and pingd users.
- -alternative
- Every portals process has a portal ID. It is this ID
that the portals module uses when dispatching received messages to process
es.
For testing purposes we may want to run another bebopd on the same node.
This argument causes the bebopd to request an unused portal
ID from the
portals module. The bebopd will display it's alternative portal
ID on
startup.
- -D
- This option causes the bebopd to output information about
what it is doing. Repeating the -D option on the command line increases
the amount of information.
- -S [0|1]
- The bebopd outputs warnings
and errors, and, if the -D option is used, status information. The 0 switch
turns off all output from the bebopd to
stderr. The 1 switch turns it on.
By default, the bebopd
does not write to stderr.
- -L [0|1]
- The bebopd outputs warnings and errors, and, if the
-D option is used, status information. The 0 switch turns off all output
from the bebopd to
the log file. The 1 switch turns it on. By default,
the bebopd
does write to log file.
- -r optional-file-name
- This option
specifies that the bebopd is being restarted. The bebopd always saves a
file (CRsaved_pct_list in t
he same directory as the bebopd registry file)
containing a list of active PCTs when it exits. When bebopd restarts, it
reads in this file and contacts the PCTs for their status. If an optional-file-name
is given, the bebopd will look there for the PCT list instead of in the
CRsaved_pct_list file.
- -help
- This option displays the list of bebopd
options.
- -daemon
- This option runs the bebopd in the background. The
default is to run the bebopd as a foreground process.
- -PBSsupport
- -PBSupdate
- PBS (Portal Batch System) on Cplant requires support from the beb
opd.
The bebopd is running in PBSsupport mode if it is keeping track of the
number of live comp
ute nodes in the machine and policing PBS users to
ensure they use no
more nodes than they were allocated. The bebopd is
running in PBSupda
te mode if in addition it sends updates to the PBS server
whenever the nu
mber of live compute nodes changes. These two arguments
can be used t
o turn on PBSsupport or to turn on PBSupdate. Since PBSupdate
implies PBSsupport, turning on PBSupdate automatically turns on PBSsupport.
- -PBSinteractive numNodes
- The bebopd can reserve numNodes nodes for
interactive use.
PBS will not be able to schedule these nodes for batch
jobs.
Errors and warnings are logged to /var/log/cplant on the node hos
ting
the bebopd.
On receiving a SIGUSR1 or SIGUSR2, the bebopd will write to the l
og
file it's identifying information and what routine it is in. On receiving
a SIGHUP, the bebopd will close and reopen it's log
file, list identifying
information to the log file, and re-read the sit
e file.
- /etc/local/saved_pct_list
- This file lists all PCTs that were active
when the last bebopd terminated.
- /etc/local/site
- This file defines
site specific information that may be required by the bebopd.
- /var/log/cplant
- This is the log file where Cplant daemons and utilities log status.
pingd
yod
pct
site
Let us know if you locate any (cplant-help@cs.sandia.gov).
Table of Contents