Cplant Myrinet Switch Monitoring
Myrinet switches provide the high speed cluster message passing network for passing messages between compute nodes and for I/O. The Myrinet switches have a few counters that can be accessed from an ethernet connection to the switch. These counters can be accessed to monitor the health of the connections, cables, etc. The following information refers to the 16-port, the clos-64 switches, and the Myrinet2000 switches.
Myrinet Switch Counters
The Myrinet switches maintain counters for the following types of received packets between reboots or clears:
System Administration
The normal location for Myrinet adminstration Cplant scripts and programs is /usr/local/system/myrinet. The following files are found in this directory. Cron is normally setup to call the sw-report script to periodically dump the switch counters for offline analysis. This may be hourly or daily depending on the reliablility of the cluster's network.
Naming Conventions
The /etc/dhcpd.conf file is built from the Cluster tools database. Some naming conventions exist that are used in the above scripts. The "mf" devices are m3 fiber switches. The "my" devices are m3 switches too. Other devices are clos-64 or 16 ports switches.
System Setup
The Myrinet switches can be queried from their ethernet connections. This is normally connected to the SSS1 admin node but some systems have the switches connected to the mothers that in turn are connected to the SSS1 node. Connections to a SSS1 admin node are preferable for speed of query and uniformity of configurations.
Clos-64 or 16 port switch requirements
Myrinet2000, or M3, Switches
Myrinet 2000 Switches are considerably different from the Clos-64(m2) and 16-port switches. First, these switches get their IP addresses from the dhcpd demon, not from the RARP command. So, the switches' IP and MAC addresses must be in the dhcpd's configuration file, /etc/dhcpd.conf, which is generated from the Cluster Tools on Cplant.. Dhcpd must be restarted when the file is modified, it does not respond to a SIGHUP or notice that the file is changed. Once the m3 switch has its address it is accessed by HTML browsing, SNMP commands, or one of several small programs supplied by Myrinet. Note that these programs do not all run correctly so use with caution.
Software
Myrinet 2000 Notes
Myrinet 2000 Switches are considerably different from the Clos-64(m2) and 16-port switches. First, these switches get their IP addresses from the dhcpd demon, not from the RARP command. So, the switches' IP and MAC addresses must be in the dhcpd's configuration file. Once the m3 switch has its address it is accessed by HTML browsing:
> lynx IPADDR where IPADDR is the switches name or IP address. Then navigate through the pages to desired data.
> lynx IPADDR/bad or
> lynx IPADDR/good lists the pages without having to navigate.> lynx -dump IPADDR/bad dumps the page so it can be redirected to a file.
In the lynx interface each switch's ports are numbered 0..255.
The M3 switches can have values such as the timeout value set by the lynx interface. This is pretty painful, so the values can be set or read using snmp commands such as the below.
> snmpset IPADDR public .1.3.6.1.4.1.1771.3.1.10.1.4.oid i microseconds
Where IPADDR is the IP address, oid is the switch (1..16), and microseconds is the number of microseconds.
"public" is not changed and "i" is integer type which is not changed.
The absolute value for the SNMP is used since the snmp tools provided don't compile the MIB values correctly.
Building Clos64 bob_test
The 64-port bob_test is located at ~/[release]/config/user_sixteen. In order to build the clos64 version on the Alpha, the following modifications are required.
Building Myrinet2000 - M3 Version
bob_test Use
It is useful to routinely dump the contents of all the switches to a file for later analysis. A README file comes with the Myrinet diagnosis software that vaguely describes the bob_test program. Bob_test communicates with each switch over a socket at a fixed address, 4002. Each switch has sixteen or eight ports, depending on the hardware. Depending on each cluster these may be compute or service nodes or network interconnects. Also, not all ports may be populated depending on the cluster architecture.
Sample 16-port Switch sw-dump:
# sw-dump [su] [sw-addr] # where [sw-addr] may be an IP address not in hosts file, for instance. bob_test=/usr/local/system/myrinet/firmware/bob_test host=`hostname | cut -d. -f2,3` su=$1 echo "Switch m-2.SU-$su.$host -----------------------------------------" $bob_test m-0 4002 -get-version-string $bob_test m-0 4002 -get-uptime $bob_test m-0 4002 -get-timeout for p in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 do echo "Switch m-2.SU-$su.$host Port $p" $bob_test $2 4002 -get-port-state $p doneThe 64-port Clos64 switches have 0..15 ports per 0..16 xbars, or crossbars. Again, these may not all be populated depending on the cluster.
Sample clos64 Switch sw-dump:
The following is a sample script that dumps the data from Clos64 style switches:
#!/bin/bash
path=/cplant/myrinet
# Nested loops do all switches, all xbars, all ports.
# Currently, the limits are hard-coded until other config files
# setup the topology
m=0
while [ $m -lt 29 ]
do
xbar=0
echo "Switch m-$m.SU-0.SM-0.zermatt ------------------------------------"
$path/bob_test m-$m 4002 -get-version-string
$path/bob_test m-$m 4002 -get-uptime
$path/bob_test m-$m 4002 -get-timeout $xbar
while [ $xbar -lt 16 ]
do
for p in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
do
echo "Switch m-$m.SU-0.SM-0.zermatt Xbar $xbar Port $p"
$path/bob_test m-$m 4002 -get-port-state $xbar $p
done
xbar=`expr $xbar + 1`
done
m=`expr $m + 1`
done
Analysis
The first step is to periodically run a script such as above to collect a snapshot of the switches' counters in a file. The script may be run as a cron job to periodically save the counter's values.
The create program can be used to parse the data and save it in a file as a serialized form of the bob_struct.
The report program then can be used to compare snapshots for differences in any of the counters reported by the switch. These differences may be then compared by day, hour, or whatever time-period is used.
It has been noted that powering off a compute node attached to a switch sometimes results in thousands of bad packets being counted. This is probably some kind of noise spike. Interestingly, this does not appear during power-on.
Source Files
bob.h - Defines the bob_strucg used in the applications.
bob.c - Implements functions that load or dump the bob_struct, parses files, or compare bob_structs.
Makefile - Builds the programs.
create.c - Parses output from the switches, builds bob_structs, and saves the serialized form in a file.
report.c - A report writer that compares the counts of a switch count type for any number of files.