Myrinet Diagnostic Tools for Cplant$^{TM}$

Myrinet Diagnostic Tools for CplantTM


James Otto


Contents

1  Introduction
2  System Architectures
3  Myrinet Ping Tools
    3.1  vping
    3.2  req
    3.3  do-ping and get-ping
    3.4  aroute
    3.5  rroute
    3.6  troute
4  Myrinet Route Crawler
    4.1  troute
    4.2  getroute
    4.3  crawl
5  Global Ping Tests
6  Lanai Card Tests
    6.1  mcpmemtst
    6.2  lmem.pl
    6.3  DMA Integrity Test

1  Introduction

This manual for the Cplant system administrator describes a set of tools for performing diagnostics with respect to the myrinet network for a running Cplant.

Included in the myrinet diagnostic tools for Cplant are a set of ``ping'' utilities (vroute, vrouted, vping, do-ping, get-ping, req), aroute, rroute, Perl scripts (mach2.pl, all2.pl, one2all.pl, su.pl) for doing global connectivity tests based on these, and a route crawler (troute, getroute, crawl), useful in diagnosing problems with myrinet switches.

The source code for all the utilities and scripts described here can be found in the Cplant source distribution at


  $CPLANT/top/compute/OS/Myrinet/vroute

directory of the Cplant source release. In the compute partition of a running Cplant, the utilities get installed to


  $CPLANT_PATH/sbin.


2  System Architectures

Cplant software has been developed at Sandia National Labs in Albquerque to run on the DEC Alpha architecture running the Linux OS. It has also been ported to the Intel x86 (Pentium and Pentuim II) processor architecture running the Linux OS, although little effort is spent at Sandia in maintaining the Intel port.

Currently, Cplant supports the use of myrinet as its main message passing medium. Accordingly, the message passing layer, Portals, has been implemented on top of the Cplant- specific myrinet driver/packetization layer (encapsulated in the rtscts module, rtscts.mod). The Cplant release provides tools (Lanai compilers) and code for building a Cplant version of the myrinet control program (MCP - the program that runs on the myrinet interface card) on a Linux Alpha workstation and on a Linux Intel PC. Currently, the Cplant MCP works with Lanai version 4.x, 7.x, and 9.x myrinet cards.

Because of the main reliance of Sandia's Cplant installations on myrinet messaging, developers of Cplant have originated and maintained a set of myrinet diagnostic tools which work specifically in the context of the rtscts module. The interaction that these tools have with the Cplant stack is largely limited to traps into the kernel to initiate the sending of rtscts protocol messages. Received messages may also generate host interrupts during which accounting structures may be updated. Tools may trap to examine these structures or they may map the myrinet card and examine the structures directly from user space. In the latter case superuser privelages are required to run the tools.


3  Myrinet Ping Tools

There are Cplant diagnostic tools for both 1-way and 2-way myrinet pings at the rtscts level. The distinction between 1 and 2-way tools is that in 2-way tests the receiver generally returns an ack to the sender. Note that since myrinet is source routed and normally little effort is spent in maintaining symmetry of routes send and ack routes generally differ.

The main ping tools are:

Note that some of the above utlities require superuser privelages to be run, since they map the Lanai board.


3.1  vping

The most used of the myrinet diagnostic tools, vping can be used to send an rtscts PING protocol packet directly from the node where the tool is run to another specifed node. The other node is specified by its node ID:


  % vping node-id [max_retries]


Here node-id is used to indicate which route to send the packet along. That is, as a result of the call, the rtscts module sends a PING packet with a route header containing the route from the node-id-th slot of the MCP's route table. When this packet is recieved, the destination node sends an PING_ACK protocol cket back to the originator along a possibly different route (again determined by a slot in the route table indexed by node-id).

The originator times out on the ack after a base time period and attempts the ping again after doubling the timeout. The default number of retries (something like 4) can be limited or extended with the optional max_retries ³ 0. As this utility maps the lanai card and uses the 1/2 microsecond accurate lanai clock for timing purposes its timing is extremely accurate (as opposed to relying on usleep() which may be inaccurate by orders of magnitude).


3.2  req

Closely related to vping, the 2-way ping utility req uses a proxy node (normally a service node) in the myrinet network to request that one node ping another node and report the result back to the proxy. The intention here has been to design a relatively inobtrusive test: logging on to a compute node in order to run vping potentially robs the running compute node of valuable processor cycles. This expense can be minimized by running the test on a proxy node and accessing the test nodes only at the lowest protocol layers.

The test is run from the proxy node as:


  % req node-id0 node-id1  [max_retries]


where node-id0 is the node that sends a PING protocol message to node-id1 and reports back to the proxy node on receipt of an ACK. On the proxy node, the test times out after a base timeout and retries on successive doublings of the timeout period. The number of retries can again be limited or extended by specfying the optional max_retries ³ 0.


3.3   do-ping and get-ping

This pair of utilities can be used to send and detect a 1-way rtscts PING. They are run directly from the sender and target. On the sender, one does:


  % do-ping target-id


where target-id is the id of the target node. On the target, one does


  % get-ping send-id


where send-id is the id of the sender. This will cause a look-up into the target's table of PING status entries to see if a PING came in from the sender node. get-ping then reports on the result of this look-up.


3.4   aroute

The aroute utility, like rroute and troute is a ping tool that works using route specification. The a in aroute stands for ack: the tool sends an APING protocol message out along a specified path and waits for an ack, reporting on the result. If a destination node receives an APING, it records the sender's node-id and sends an ack to that node using the corresponding route from its route table. The sender polls for receipt of the ack on a flag in Lanai shared memory. Based on the Lanai clock it also times out on receipt of the ack after a hardcoded interval (originally 2 seconds). If the sender receives an ack it reports the node-id of the replying node. aroute is invoked as:


  % aroute myrinet-route ,


where myrinet-route is a space-separated list of valid myrinet route bytes. For each byte the 2-digit hex format 0xAB is assumed (with A and B hex digits).


3.5   rroute

Like aroute, rroute is a ping tool that works using route specification. It is very similar to aroute, the difference being that rroute sends the specified route along with the ping protocol message (an RPING message) for use in the ack. If a destination node receives an RPING, it uses to payload to send an ack along the reverse route. The sender polls for receipt of the ack on a flag in Lanai shared memory. Based on the Lanai clock it also times out on receipt of the ack after a hardcoded interval (originally 2 seconds). If the sender receives an ack it reports the node-id of the replying node. rroute is invoked as:


  % rroute myrinet-route ,


where myrinet-route is a space-separated list of valid myrinet route bytes. For each byte the 2-digit hex format 0xAB is assumed (with A and B hex digits).


3.6   troute

The troute utility is a specialized ping tool on which the switch diagnostic tool crawl (see below) depends. It performs a self-ping along a specified route:


  % troute myrinet-route 


Here, myrinet-route is a space-separated list of valid myrinet route bytes. For each byte the 2-digit hex format 0xAB is assumed (with A and B hex digits). troute polls for receipt of the self-ping using a default 1-second timeout interval, reporting on success or failure of receipt.


4  Myrinet Route Crawler

The following is a set of tools associated with crawling a myrinet route. To crawl a given route, one performs a self-ping along every associated subroute and corresponding reverse path. An example of such a symmetric route would be:


  0x84 0x81 0x81 0x80 0xbf 0xbf 0xbc


The tools associated with the route crawler are:


Note that certain utilities do require superuser privelages.


4.1  troute

The troute utility is a specialized ping tool normally used by the crawl script, although it is frequently used as a stand alone utility to verify correctness of myrinet switch layout. It performs a self-ping along a specified route:


  % troute myrinet-route


where myrinet-route is a space-separated list of myrinet route bytes in hexidecimal 2-digit format, 0xAB. troute causes a trap which copies the route into a reserved slot in the MCP's route table. Normally, slots in this table correspond to statically designated routes that correspond to nodes in the existing myrinet mesh. However, a special slot has been allocated to allow the sending of messages along arbitrary routes for diagnostic purposes. After loading the route, troute triggers the send of a rtscts PING protocol message. It then polls for receipt of the ping using a default timeout value. Note, that this is in effect a self-ping although no checking is done with respect to the supplied route in this regard - as its original intent was to be used in the context of a wrapper that would supply symmetric routes (see crawl), troute will accept any route in the specified format.


4.2  getroute

Used as well by the crawl script, getroute is a tool for extracting a route from the MCP's static route table:


  % getroute node-id


It traps via the rtscts module, extracts the route in the slot corresponding to node-id, and prints the associated route bytes.


4.3   crawl

Cplant's route crawler, crawl, can be used to diagnose problematic paths between pairs of myrinet nodes. It is basically a Perl wrapper for getroute and troute that generates symmetric paths along which it performs self-ping tests:


  % crawl node-id

or


  % crawl myrinet-route


crawl acts upon a designated route which can be specified indirectly as a node-id or directly as a list of route bytes. In the indirect case, the associated route is obtained by running getroute on the node-id. In the direct case, a myrinet-route is supplied as a space-separated list of myrinet route bytes in hexidecimal 2-digit format, 0xAB. crawl breaks the indicated route into subroutes, and for each of these it appends the corresponding return path forming a symmetric route along which a self-PING test can be performed using troute. It reports the results of each invocation of troute as it progressively crawls the route. In the case of a bad path (initially indicated by vping, for example), between two nodes crawl may be used to identify at what link (i.e., switch) in the path the transmission breaks down.


5  Global Ping Tests

Cplant implements a number of ``global'' connectivity tests based on the low-level rtscts request ping protocol tool req. These Perl scripts are:

Rather than taking time to describe these utlities here, we refer the reader to the code and comments in the ``source''.

Efforts continue to make these tests as fast as possible for large Cplant installations by optimizing the underlying rtscts PING utility (i.e., req).


6   Lanai Card Tests

Cplant has a number of rtscts specific tests of Lanai card functionality:


6.1   mcpmemtst

Run this on a given myrinet node to test memory access throughout the Lanai card's region of SRAM. Writing and reading of a variety of 32-bit bit patterns will be compared throughout the entire memory region. Usually run as


  % mcpmemtst -loop cnt


which repeats the test cnt times (use 0 for infinite loop). Other options are available. Use -h on the command line for a complete list.


6.2   lmem.pl

Perl script for automating the run of mcpmemtst over multiple nodes:


  % lmem.pl -su su-list [-mail] .


su-list is a comma-seperated list of SU ranges. For example:

  % lmem.pl -su 1-16,18-21,23,24

which lists SU ranges 1-16 and 18-21 along with the single SU ``ranges'' 23 and 24. lmem.pl runs mcpmemsts on all nodes in each of the specified SUs. The optional -mail flag says to e-mail results to a hardcoded list of addresses (see source code).


6.3  DMA Integrity Test

A test of the Lanai's DMA engines for transferring data to and from host memory. A variety of 32-bit bit patterns are tested in transferring data between Lanai send and receive buffers and a kernel buffer.

This test is somewhat exceptional in that it can only be run when the MCP is loaded onto the Lanai card. Accordingly, it is invoked using a command-line option to mcpload, Cplant's utility for starting the MCP. For example,


  % mcpload -m /tmp/rtsmcp.9 -dma 2 -pnid 999 -route rfile


Here, the portion of the command line pertinent to the dma test is -dma 2 which says to perform the dma test for 2 iterations during the load.

Results of the DMA integrity test are written to the system log.




File translated from TEX by TTH, version 2.86.
On 24 Feb 2001, 14:26.