node_hw_analyze - parse through the results of node_hw_test and check them against known tolerances
diag
node_hw_analyze [--help] [--debug] [--net] [--cpu] [ [--mem] | [--longmem] ] [--disk] [--summarize] [--quiet] [--report] <device|collection>...
After Node HW testing, node_hw_analyze is used to search through the results and check them against known tolerances.
Currently Tests for: Ping Netperf Reported system memory Memtest Memory ECC Errors (Tsunami chipset errors) Linpack (single node) STREAM
Results are gathered from /cluster/tmp/node_hw_tests
--summarize Print short version of results (average, min, max)
--report Create tab delim spreadsheet-importable data. Does not work with single node. A separate file is created for each node type, named /cluster/tmp/node_hw_tests/node_hw_report.tab.<node type>
--quiet Supress printing of errors to stdout. Useful with "--summarize" and "--report"
--db <datasource> Database type and connection information. For GDBM, "GDBM:" followed by the filename of the cluster database to use. For LDAP, the syntax is "LDAP:host:port:dbname"
--help Print manpage.
--net Analyze results of ethernet tests. (ping, netperf)
--cpu Analyze results of processing performance tests. (linpack)
--longcpu Analyze results of extended cpu tests. Also checks "--cpu" tests above. (nasker, lloops)
--mem Analyze results of memory tests. (/proc/meminfo, stream, tsunami_machine_check)
--longmem Analyze results of memory stress tests. Also checks "--mem" tests above. (memtest)
--disk (not currently implemented)
This script should be run from the admin node
/cluster/tmp/node_hw_tests/* /cluster/machine/data/node_hw_values.* /cluster/tmp/node_hw_tests/node_hw_report.tab*
Paths and other defaults are recorded in CConf.pm.
node_hw_test show_disks show_temp