Next: Cplant runtime components
Up: PBS for Cplant
Previous: PBS for Cplant
  Contents
  Index
The software components of a PBS system are:
- the PBS server
-
The PBS server (pbs_server) is the heart of the PBS system. A
single server runs on one of the Cplant service nodes in a virtual
machine. The server's behavior is governed by attributes, which
may be set by authorized users with the qmgr program.
Users submit requests to the server to add jobs to a queue, delete jobs,
alter jobs, and so on.
- the PBS scheduler
-
The PBS scheduler (pbs_sched) is a single daemon process that evaluates
jobs in the queues and selects those that will run next. PBS
sites that wish to alter their scheduling policy may do so by
rewriting the scheduler.
- the PBS MOM processes
-
The PBS MOM (machine oriented miniserver) process (pbs_mom)
is the process that starts the user's job script
and ensures that it completes within it's allotted time. We run
one PBS MOM on each service node that we wish to run job scripts
on.
- PBS client programs
- To submit a job to a queue, users run qsub. To check the
status of jobs in a queue, users run qstat. To change the
operating parameters (called attributes) of the PBS server,
administrators run qmgr. These are all client programs that
contact the PBS server for help. A complete list of these may be
found at [1].
The life-cycle of a PBS job through these components looks like this:
- A user runs qsub to submit a job script to PBS. The script
contains commands to run yod to start Cplant parallel applications. The user
requests a number of nodes (the size request) for a duration
of time (the walltime request), either on the qsub
command line or as directives within the job script file.
- The PBS server adds the job to a queue.
- The PBS server sends a message to the PBS scheduler asking it to
evaluate the queue. It does this everytime a job is submitted,
every time a job completes, and whenever a certain interval
of time elapses (defined as the scheduler_iteration attribute
of the server). It will also initiate a scheduling cycle whenever
an administrator enters the qmgr command set server scheduling=true.
- The scheduler requests all the queue information from the server, and
requests the server's attribute values as well. The server's attributes
provide information such as how many compute nodes are available to PBS,
how many have been allocated, the maximum number of jobs any user can
have running at one time, and so on.
- If possible, the scheduler chooses jobs to run. It also chooses the
service node on which each job should run. It does this by contacting
the MOM on each node and requesting the system load of the service node.
The scheduler assigns the PBS job to the least loaded service node.
- The scheduler sends the server a message telling it which jobs to
run and what service nodes to run them on.
- For each job the scheduler wishes to place into execution, the server
contacts the MOM on the service node and tells it to run
the job.
- The MOM process starts the PBS job. It redirects the job's stdout and
stderr streams to files in /tmp/pbs/working/spool.
- The MOM monitors the job for termination. When it terminates, the
MOM copies the stdout and stderr files to the directory from which
the user submitted the job.
Then the MOM sends an obituary to the server.
- If the PBS job does not terminate within it's allotted time, the
MOM sends a SIGTERM to it's parallel applications. After awhile
it sends the parallel applications a SIGKILL and then kills the
job script. Then it copies over the stdout and stderr streams,
and sends an obituary to the server.
Next: Cplant runtime components
Up: PBS for Cplant
Previous: PBS for Cplant
  Contents
  Index
Lee Ann Fisk
2001-06-25