- -alloc
Choosing -alloc was useful for compute node debugging
before the availability of cgdb or Totalview.
It displays the nodes
on which your application has been started and waits for you to press
a key before allowing the processes in your parallel application
to procede out of system code and into user code. You
could at this point log in to
a compute node and attach a debugger to your application to
catch it before it procedes to main.
Since users are discouraged from logging into compute nodes,
it would be better for you to use -attach and cgdb.
Also see the -bt option of yod.
- -attach
- This option is essentially the same as -alloc. It is intended
to hold the application processes once they have started executing
at an instruction prior to user code (prior to main). You can
at this point start cgdb to attach a debugger to a process. See
the cgdb man page for more help on debugging compute node processes.
- -bt
- This option will cause yod to display a stack trace for
user processes that terminate abnormally.
yod normally displays a completion message for each process in the
parallel application, listing the exit code or terminating
signal if any. If the completion message indicates that the
application process terminated with a
signal and you wish to investigate, you may rebuild the application
with debugging symbols and re-run it with the -g option of
yod. The PCT will then attach a debugger to the process,
collect the stack trace when it faults, and send the stack
trace to yod for display. This capability may be expanded
some day to allow for interactive debugging of node processes from the
service node.
- -D
Turn on debugging of the application load. The steps in
the load protocol are displayed as the application load progresses.
Application process file IO requests are displayed as yod
receives them.
- -d {info-type}
Selective types of information can be displayed by yod using this
option. Two of the most common requests are -d io and
-d iomore which display I/O requests sent from the application
processes to yod. The entire list follows.
- io
- Display application IO requests to yod.
- iomore
- Display details of dispatch of IO requests.
- memory
- Display all buffers allocate by yod.
- load
- Display the steps of the load protocol.
- loadmore
- Display the load protocol in great detail.
- alloc
- Display nodes allocated to the job.
- hetero
- Display the heterogeneous load information.
- pbs
- Display the PBS information.
- environ
- Display yod's environment variables.
- rtl
- Display the Fortran run time library messages from compute nodes.
- progress
- Display the progress of the application through load and termination.
- failure
- Display all launch failure information.
- debug
- Display efforts to obtain debugging data for application processes.
- bebopd
- Display yod interactions with bebopd.
- comm
- Display information about portals setup.
- phase1
- Display information only until application starts.
- phase2
- Display information only after application starts.
- -Log
- This option causes the compute node application load protocol
steps to be logged to /var/log/cplant on the compute node.
- -sleep {where}
Cplant system debuggers may want to attach a debugger
to a Cplant application before it is in user code. This option provides
four different points at which the processes can be held for 60 seconds.
- 1
- right after the fork
- 2
- just before the exec
- 3
- right after entering system startup code
- 4
- at the end of system startup code, just before proceeding to main
- -strace {path-name}
-
Yet another debugging tool. path-name
should be a directory which is mounted writable on the compute
node. This option will cause the PCT to run the application process under
strace which will list all system calls (and their arguments) made by the
application process. By default, only the rank 0 process is traced. The
strace output goes to a file in directory path-name.
The file name contains the Cplant job ID and the rank of the process
being traced.
- -straceoptions {option-list}
- The PCT will invoke strace with the options you specify
n the quoted string option-list.
You must use the -strace option with this option.
- -stracenodes {rank-list}
- The PCT will invoke strace on the
processes with the ranks given in the rank-list.
The format for the rank-list
is the same as the format for a node list. By default, strace is invoked
only on the rank 0 process. You must use the -strace
option with this option.
- -timing
- Interested in how long the different stages of application
load are taking? The -t option times them and displays
the results in seconds. (If our name was mpirun instead
of yod we would display it in minutes!)