NAME

gm_crc - examine the packet counters on the myrinet cards and save them to a file.


MODULE

myrinet


SYNOPSIS

gm_crc [--help] [--db datasource] [--debug] [--quiet] [--check <{device|collection}...>] [--mash [<{device|collection}...>]]


DESCRIPTION

Checks the total and bad packet counters on the myrinet cards. Saves the results in one file per node. Parses through the result files and generates four summary files: CRC.all - combination of all node files CRC.bad - lists all the nodes with missing or non-zero bad packet counts CRC.d - like CRC.all but stripped out total counts for diffing with previous (saved) runs. CRC.tab - tab delimetd format for importing into a spreadsheet

/cluster/rte/bin/gm_get_crc is called on each node to get the counters.


OPTIONS

--help Print manpage.

--db <datasource> Database type and connection information. For GDBM, "GDBM:" followed by the filename of the cluster database to use. For LDAP, the syntax is "LDAP:host:port:dbname"

--quiet Supress printing of all results from --check and bad results from --mash.

--check Connect to each node and update the packet counter files for each node give by device or collection. You MUST SPECIFY at least one device.

--mash Merge the individual files into just a few. (tab delim, diffable, bad only, all) If no device or collection is specified, --mash will parse and combine all results already collected and saved in /cluster/tmp/gm_crc.


NOTES

Suggested use:

  Check all the counters:
    gm_crc --check --mash t-0 t-1 t-2...
  and save the results for later comparison:
    mkdir /cluster/tmp/gm_crc.<date>
    mv /cluster/tmp/gm_crc/CRC* /cluster/tmp/gm_crc.<date>
  Then you can run your favorite myrinet stress test:
    run.mpptest
    mpirun -np 512 mpi_routecheck
  and gather the counters again:
    gm_crc --check --mash t-0 t-1 t-2...
  and diff the results to find new crc errors:
    diff /cluster/tmp/gm_crc.<date>/CRC.d
    /cluster/tmp/gm_crc/CRC.d

Naming scheme specifics: gm_crc uses the local_sort routine of Local.pm to sort node names so that diffing will work reliably.


FILES

  /cluster/tmp/gm_crc/CRC*
  /cluster/tmp/gm_crc/if-*

Paths and other defaults are recorded in CConf.pm.


SEE ALSO

run.mpptest mpi_routecheck gm_lan_check gm_get_crc