Included in the myrinet diagnostic tools for Cplant
are a set of ``ping'' utilities (vroute, vrouted,
vping, do-ping, get-ping, req), aroute,
rroute,
Perl scripts (mach2.pl, all2.pl, one2all.pl,
su.pl) for doing global connectivity tests based on these,
and a route crawler
(troute, getroute, crawl), useful in diagnosing
problems with myrinet switches.
The source code for all the utilities and scripts described here
can be found in the Cplant source distribution at
directory of the Cplant source release. In the compute
partition of a running Cplant, the utilities get installed to
Cplant software has been developed at Sandia National Labs
in Albquerque to run on the DEC Alpha
architecture running the Linux OS. It has also been ported
to the Intel x86 (Pentium and Pentuim II) processor architecture
running the Linux OS, although little effort is spent at Sandia in
maintaining the Intel port.
Currently, Cplant supports the use of myrinet as its main
message passing medium. Accordingly, the message passing
layer, Portals, has been implemented on top of the Cplant-
specific myrinet driver/packetization layer (encapsulated
in the rtscts module, rtscts.mod).
The Cplant release provides tools (Lanai compilers) and code for
building a
Cplant version of the myrinet control
program (MCP - the program that runs on the myrinet interface
card) on a Linux Alpha workstation and on a Linux Intel PC.
Currently, the Cplant MCP works with Lanai version 4.x, 7.x, and 9.x
myrinet cards.
Because of the main reliance of Sandia's Cplant installations
on myrinet messaging, developers of Cplant have originated and
maintained a set of myrinet diagnostic tools which work specifically
in the context of the rtscts module. The interaction that these
tools have with the Cplant stack is largely limited to traps
into the kernel to initiate the sending of rtscts protocol messages.
Received messages may also generate host interrupts during which
accounting structures may be updated. Tools may trap to examine
these structures or they may map the myrinet card and examine
the structures directly from user space. In the latter case
superuser privelages are required to run the tools.
There are Cplant diagnostic tools for both
1-way and 2-way myrinet pings at the rtscts level.
The distinction between 1 and 2-way tools is that
in 2-way tests the receiver generally returns an
ack to the sender. Note that since myrinet is
source routed and normally little effort is spent
in maintaining symmetry of routes send and ack routes
generally differ.
The main ping tools are:
Note that some of the above utlities require superuser privelages
to be run, since they map the Lanai board.
The most used of the myrinet diagnostic tools, vping
can be used to send an rtscts PING protocol packet directly from
the node where the tool is run to another specifed node. The other
node is specified by its node ID:
The originator times out on the ack after a base time period and
attempts the ping again after doubling the timeout. The default
number of retries (something like 4) can be limited or
extended with the optional max_retries ³ 0. As this utility
maps the lanai card and uses the 1/2 microsecond accurate lanai
clock for timing purposes its timing is extremely accurate (as
opposed to relying on usleep() which may be inaccurate by orders of
magnitude).
Closely related to vping, the 2-way ping utility req
uses a proxy node (normally a service node) in the myrinet network
to request that one node ping another node and report the result
back to the proxy. The intention here has been to design a
relatively inobtrusive test: logging on to a compute node in
order to run vping potentially robs the running compute
node of valuable processor cycles. This expense can be minimized
by running the test on a proxy node and accessing the test
nodes only at the lowest protocol layers.
The test is run from the proxy node as:
This pair of utilities can be used to send and detect a
1-way rtscts PING. They are run directly from the sender
and target. On the sender, one does:
The aroute utility, like rroute
and troute is a ping tool that works using route specification.
The a in aroute stands for ack: the tool sends an
APING protocol message
out along a specified path and waits for an ack, reporting on the result.
If a destination node receives an APING, it records the sender's node-id
and sends an ack to that node using the corresponding route from its
route table.
The sender polls for receipt of the ack on a flag in Lanai shared
memory. Based on the Lanai clock it also times out on receipt of the ack
after a hardcoded interval (originally 2 seconds). If the sender receives
an ack it reports the node-id of the replying node.
aroute is invoked as:
Like aroute, rroute
is a ping tool that works using route specification. It is very
similar to aroute, the difference being that rroute
sends the specified route along with the ping protocol message
(an RPING message) for use in the ack. If a destination node
receives an RPING, it uses to payload to send an ack along the reverse
route.
The sender polls for receipt of the ack on a flag in Lanai shared
memory. Based on the Lanai clock it also times out on receipt of the ack
after a hardcoded interval (originally 2 seconds). If the sender receives
an ack it reports the node-id of the replying node.
rroute is invoked as:
The troute utility is a specialized ping tool on which
the switch diagnostic tool crawl (see below) depends. It performs
a self-ping along a specified route:
The following is a set of tools associated with crawling
a myrinet route. To crawl a given route, one performs
a self-ping along every associated subroute and corresponding
reverse path. An example of such a symmetric route
would be:
The troute utility is a specialized ping tool normally used
by the crawl script, although it is frequently used as a
stand alone utility to verify correctness of myrinet switch layout.
It performs a self-ping along a specified route:
Used as well by the crawl script, getroute is a
tool for extracting a route from the MCP's static route table:
Cplant's route crawler, crawl, can be
used to diagnose problematic paths between pairs of myrinet
nodes. It is basically a Perl wrapper for getroute and
troute that generates symmetric paths along which it
performs self-ping tests:
or
Cplant implements a number of ``global'' connectivity tests
based on the low-level rtscts request ping protocol tool req.
These Perl scripts are:
Rather than taking time to describe these utlities here, we
refer the reader to the code and comments in the ``source''.
Efforts continue to make these tests as fast as possible
for large Cplant installations by optimizing the underlying
rtscts PING utility (i.e., req).
Cplant has a number of rtscts specific tests of Lanai card
functionality:
Run this on a given myrinet node to test memory access throughout the Lanai
card's region of SRAM. Writing and reading of a variety of 32-bit bit patterns
will be compared throughout the entire memory region. Usually run as
which lists SU ranges 1-16 and 18-21 along
with the single SU ``ranges'' 23 and 24. lmem.pl
runs mcpmemsts on all nodes in each of the specified SUs.
The optional -mail flag says to e-mail results to a
hardcoded list of addresses (see source code).
A test of the Lanai's DMA engines for transferring data to and
from host memory. A variety of 32-bit bit patterns are tested
in transferring data between Lanai send and receive buffers
and a kernel buffer.
This test is somewhat exceptional in that it can only be run
when the MCP is loaded onto the Lanai card.
Accordingly, it is invoked using a command-line option to
mcpload, Cplant's utility for starting the MCP. For
example,
Results of the DMA integrity test are written to the system log.
James Otto
Contents
1 Introduction
2 System Architectures
3 Myrinet Ping Tools
3.1 vping
3.2 req
3.3 do-ping and get-ping
3.4 aroute
3.5 rroute
3.6 troute
4 Myrinet Route Crawler
4.1 troute
4.2 getroute
4.3 crawl
5 Global Ping Tests
6 Lanai Card Tests
6.1 mcpmemtst
6.2 lmem.pl
6.3 DMA Integrity Test
1 Introduction
This manual for the Cplant system administrator
describes a set of tools for performing diagnostics
with respect to the myrinet network for a running Cplant.
$CPLANT/top/compute/OS/Myrinet/vroute
$CPLANT_PATH/sbin.
2 System Architectures
3 Myrinet Ping Tools
3.1 vping
% vping node-id [max_retries]
Here node-id is used to indicate which route to send the
packet along. That is, as a result of the call, the rtscts module
sends a PING packet with a route header containing the route
from the node-id-th slot of the MCP's route table. When this
packet is recieved, the destination node sends an PING_ACK protocol
cket back to the originator along a possibly different route (again
determined by a slot in the route table indexed by node-id).
3.2 req
% req node-id0 node-id1 [max_retries]
where node-id0 is the node that sends a PING
protocol message to node-id1 and reports back to the
proxy node on receipt of an ACK. On the proxy node, the
test times out after a base timeout and retries on successive
doublings of the timeout period. The number of retries can
again be limited or extended by specfying the optional
max_retries ³ 0.
3.3 do-ping and get-ping
% do-ping target-id
where target-id is the id of the target node.
On the target, one does
% get-ping send-id
where send-id is the id of the sender. This will
cause a look-up into the target's table of PING status entries
to see if a PING came in from the sender node. get-ping
then reports on the result of this look-up.
3.4 aroute
% aroute myrinet-route ,
where myrinet-route is a space-separated list of
valid myrinet route bytes. For each byte the 2-digit hex format
0xAB is assumed (with A and B hex digits).
3.5 rroute
% rroute myrinet-route ,
where myrinet-route is a space-separated list of
valid myrinet route bytes. For each byte the 2-digit hex format
0xAB is assumed (with A and B hex digits).
3.6 troute
% troute myrinet-route
Here, myrinet-route is a space-separated list of
valid myrinet route bytes. For each byte the 2-digit hex format
0xAB is assumed (with A and B hex digits).
troute polls for receipt of the self-ping using a default
1-second timeout interval, reporting on success or failure of receipt.
4 Myrinet Route Crawler
0x84 0x81 0x81 0x80 0xbf 0xbf 0xbc
The tools associated with the route crawler are:
Note that certain utilities do require superuser privelages.
4.1 troute
% troute myrinet-route
where myrinet-route is a space-separated list of myrinet
route bytes in hexidecimal 2-digit format, 0xAB.
troute causes a trap which copies the route into a reserved
slot in the MCP's route table. Normally, slots in this table
correspond to statically designated routes that correspond to
nodes in the existing myrinet mesh. However, a special slot has
been allocated to allow the sending of messages along arbitrary
routes for diagnostic purposes. After loading the route, troute
triggers the send of a rtscts PING protocol message. It then
polls for receipt of the ping using a default timeout value. Note,
that this is in effect a self-ping although no checking
is done with respect to the supplied route in this regard -
as its
original intent was to be used in the context of a wrapper
that would supply symmetric routes (see crawl),
troute will accept any route in the specified format.
4.2 getroute
% getroute node-id
It traps via the rtscts module, extracts the route
in the slot corresponding to node-id, and prints the
associated route bytes.
4.3 crawl
% crawl node-id
% crawl myrinet-route
crawl acts upon a designated route which can be
specified indirectly as a node-id or
directly as a list of route bytes. In the indirect case, the associated
route is obtained by running getroute on the node-id.
In the direct case,
a myrinet-route is supplied as a space-separated list of myrinet
route bytes in hexidecimal 2-digit format, 0xAB. crawl
breaks the indicated route into subroutes, and for each of these it
appends the corresponding return path forming a symmetric
route along which a self-PING
test can be performed using troute. It reports the results
of each invocation of troute as it progressively crawls
the route. In the case of a bad path (initially indicated by
vping, for example), between two nodes crawl may
be used to identify at what link (i.e., switch) in the path
the transmission breaks down.
5 Global Ping Tests
6 Lanai Card Tests
6.1 mcpmemtst
% mcpmemtst -loop cnt
which repeats the test cnt times (use 0 for
infinite loop). Other options are available. Use -h on
the command line for a complete list.
6.2 lmem.pl
Perl script for automating the run of mcpmemtst over
multiple nodes:
% lmem.pl -su su-list [-mail] .
su-list is a comma-seperated list of SU ranges. For
example:
% lmem.pl -su 1-16,18-21,23,24
6.3 DMA Integrity Test
% mcpload -m /tmp/rtsmcp.9 -dma 2 -pnid 999 -route rfile
Here, the portion of the command line pertinent to
the dma test is -dma 2 which says to perform the dma test
for 2 iterations during the load.