|
If
we hope to automatically detect and diagnose failures in large-scale
computer systems, we must study real deployed systems and the data they
generate. Progress has been hampered by the inaccessibility of
empirical data. This site addresses that dearth by providing system
logs from five supercomputers. By distributing these logs we hope to
facilitate reproducible research in log analysis, specifically toward
increasing supercomputer reliability, availability, and serviceability
(RAS).
A preliminary analysis of these logs can be found in:
Adam Oliner and Jon Stearley. What Supercomputers Say - An Anaysis of Five System Logs.
IEEE/IFIP Conference on Dependable Systems and Networks
(DSN), 2007. [paper, presentation]
| System |
Start Date |
Days |
Size (GB) |
Compressed (GB) |
Rate (bytes/sec) |
Messages |
Alerts |
Alert Categories |
|
Blue Gene/L |
2005-06-03 |
215 |
1.207 |
0.118 |
64.976 |
4,747,963 |
348,460 |
41 |
|
Thunderbird |
2005-11-09 |
244 |
27.367 |
5.721 |
1298.146 |
211,212,192 |
3,248,239 |
10 |
|
Red Storm |
2006-03-19 |
104 |
29.990 |
1.215 |
3337.562 |
219,096,168 |
1,665,744 |
12 |
|
Spirit (ICC2) |
2005-01-01 |
558 |
30.289 |
1.678 |
628.257 |
272,298,969 |
172,816,564 |
8 |
|
Liberty |
2004-12-12 |
315 |
22.820 |
0.622 |
835.824 |
265,569,231 |
2,452 |
6 |
See this README for md5sums, format description, and other useful information.
Further analysis has revealed additional alert types, as well as a small
number of incorrectly tagged lines. The updated tagging can be
reproduced using the tools here.
Detailed description of the revision and analysis of the resulting logs
is given in "Alert Detection in System Logs", IEEE International
Conference on Data Mining (ICDM), 2008.
These and a wide variety of other systems logs are available at
http://cfdr.usenix.org/.
|
|