SOS7 Machines Session
Revised March 26, 2003

Introduction

There will be two sessions at SOS7 dedicated for presentations about machines. The Tuesday session will discuss already operational machines, while the Wednesday session is for planned machines. Instead of presenting technical details of their machines, the panelists were asked to answer three or four specific questions. As a reminder and for people less familier with the individual machines, we summarize here, and briefly before each session, the main technical details for each machine presented.

The panelists for the already exisitng machines session are:

  • ASCI White: Mark Seager
  • ASCI Q: John Morrison
  • NCSA Cluster: Dan Reed
  • PSC Cluster: Mike Levine
  • NOAA Cluster: Leslie Hart
  • French CEA Machine: Jean Gonnord

Here are the questions these panelists were asked to answer:

  1. Is your machine living up to the performance expectations? If yes, how? If not, what is the root cause?
  2. What is the MTBI (mean time between interrupts)? What are the topmost reasons for interrupts? What is the average utilization rate?
  3. What is the primary complaint, if any, from the users?

The panelists for the planned machines session are:

  • LLNL Purple & MCR Cluster: Mark Seager
  • SNL Red Storm: Jim Tomkins
  • LANL Pink Cluster: Ron Minnich
  • PNNL Cluster: Scott Studham
  • ORNL X1/X2: Buddy Bland
  • LLNL/IBM Blue Gene/L: Jose Moreira (IBM Watson Center)
  • NERSC Blue Planet: Brent Gorda

Here are the questions these panelists were asked to answer:

  1. What is unique in structure and function of your machine?
  2. What characterizes your applications? Examples are: Intensities of message passing, memory utilization, computing, IO, and data.
  3. What prior experience guided you to this choice?
  4. Other than your own machine, for your needs what are the best and worst machines? And, why?

Technical Data

We will now present a series of tables comparing the technical aspects of these machines.

Hardware: General
Machine Maker Available · Power Size Plans
ASCI Q HP QA Aug. 2002
QB Feb. 2003
3.5 MW > 12,000 ft2 (2 segments)  
PSC Cluster Compaq/PSC Apr. 2002 0.46 MW 2500 ft2 Add small number of EV7
NOAA Cluster HPTi Oct. 2002   1,200 ft2  
French CEA Compaq Feb. 2001 0.6 MW   Phase 2 in 2003: 50 TF peak
SNL Red Storm Cray/Sandia Aug. 2004 < 2 MW < 3,000 ft2 Expandable to > 100 TF
LANL Pink Cluster Linux Networx Jun. 2003   1,000 ft2 Could expand to 8192 nodes
PNNL Cluster HP Phase 1: Dec. 2002
Phase 2: Jul. 2003
7.5 MW 2,500 ft2 Upgrade Sep. 2003
ORNL X1/X2 Cray Sep. 2003 0.4 MW 100 ft2 4 cabinets in 2003, 10 in 2004, 50 in 2005
LLNL Blue Gene/L IBM 2004/2005 1.5 MW 1,200 ft2  
NERSC Blue Planet IBM Jun. 2005 6 MW 12,000 ft2 Later expansion

· Power consumption for some machines includes cooling.


Hardware: Node Level
Machine CPU type CPUs / node Mem / CPU NIC CPU / NIC
ASCI Q Alpha EV-68, 1.25 GHz, 2.5 GF, 16 MB cache 4 2,4, & 8 GB/CPU, 4 GB/s Quadrics Elan 3 4?
PSC Cluster Alpha EV-68, 1 GHz, 2 GF, 8MB cache 4 4 GB, 4 GB/s Quadrics Elan 3 1/2
NOAA Cluster Intel P4 Xeon, 2.2GHz 2 0.5 GB, 400 MHz Myrinet 2000 2?
French CEA Alpha EV-68, 1.0 GHz 4 1 GB Quadrics 1
SNL Red Storm AMD Sledgehammer, 2 GHz, 1 MB cache 1 1 GB DDR @ 333 MHz Cray 1
LANL Pink Cluster Intel P4, 2.4 GHz, 2.4 GF? 512 kB cache 2 1 GB Myrinet LANai 9 2?
PNNL Cluster McKinley, 1 GHz, 4 GF, 3 MB cache
Phase2 upgrade: Madison, 1.5 GHz, 6 GF, 6 MB cache
2 6 GB; 6.4GB/s Phase2 Upgrade: 1 Elan3 (270 MB/s), 1 Elan4 (>700 MB/s) 1/3?
ORNL X1/X2 Cray vector, 12.8 GF 64 4 GB; 51 GB/s Cray; 100 GB/s  
LLNL Blue Gene/L 440 PowerPC; 700 MHz; 2.8 GF; 32 kB L1 cache; 4 MB cache 2 128 MB; 5.6 GB/s IBM 175 MB/s 2
NERSC Blue Planet Power-5, 8-10 GF 8 16 GB IBM  

Hardware: Machine Level
Machine tot. CPUs tot. Mem tot. Mem BW Peak TF Linpack
ASCI Q 8192       7.7 TF (10T)
PSC Cluster 3000 12 TB 12 TB/s 6 TF 4.5 TF
NOAA Cluster 1816 0.9 TB     3.337 TF
French CEA 2560 3 TB 5 TB/s 5 TF 3.98 TF
SNL Red Storm 10,368 10 TB DDR @ 333 MHz ~55 TB/s ~40 TF > 14 TF
LANL Pink Cluster 2048        
PNNL Cluster Phase2 Upgrade: > 1,900     11.4 TF 8 TF
ORNL X1/X2 32 3/03; 128 6/03; 256 9/03     3.2 TF  
LLNL Blue Gene/L 131,072 16 TB     ~200 TF
NERSC Blue Planet 16,384 256 TB     40-50 TF

Hardware: Network and Reliability Features
Machine Topology Bi-section BW   MTBI Features
ASCI Q Fat tree 128 GB/s (20T)   6.5 h per 10T (2.1 h for 30T?)  
PSC Cluster Fat tree 165 GB/s   11 h Redundant Power, NIC, network
NOAA Cluster Clos        
French CEA QSW Dual Rail        
SNL Red Storm Full 3D mesh > 1.5 TB/s   > 50 h Sophisticated RAS system
LANL Pink Cluster         Redundant BIOS & fans; no mechanical parts
PNNL Cluster 3 Fat trees (Elan3, Elan4 and GigE) 164 GB/s (Elan3), 900 GB/s (Elan4), 10 GB/s (GigE)     chckpoint/restart, RAID5, multiple SAN paths, failover I/O, login, mgmt nodes
ORNL X1/X2          
LLNL Blue Gene/L 3D torus 64x32x32 700 GB/s      
NERSC Blue Planet 2 separate federation switches, each with a third stage 8192 switch links        

Hardware: I/O
Machine tot. Disk Aggregate BW (local FS) Aggregate BW (global FS) Off machine I/O cat5
ASCI Q 442 TB 19.2 GB/s (10T) 19.2 GB/s (10T)    
PSC Cluster 30 TB   < 32 GB/s    
NOAA Cluster 20 TB        
French CEA 50 TB   7.5GB/s    
SNL Red Storm 240 TB 50 GB/s 50 GB/s 25 GB/s  
LANL Pink Cluster          
PNNL Cluster 256 TB; 200 TB local and 56 TB Global 132 GB/s 3.2 GB/s    
ORNL X1/X2          
LLNL Blue Gene/L       128 GB/s  
NERSC Blue Planet 2,500 TB        


Maintained by: Rolf Riesen
Modified on: March 26, 2003