[Sandia National Laboratories]





Return to
Homepage




Sandia/OakRidge/Swiss
Commodity-Based Computing Collaboration

Summary


The Distributed Virtual Supercomputing workshop was held from June 25 - 27, 1997, in Santa Fe, New Mexico. The workshop was coordinated by Sandia in response to their new ASCI project devoted to distributed computing.

People from Sandia, ORNL, LANL, and a couple of Swiss groups participated. Pretty cool deal: build a big time computational machine using commodity parts. Plan for expansion. Plan for widely distributed computing platforms. Below is a partial description of what was discussed.

Some of the participants:

    Sandia: Bill Camp, Jim Tomkins, David Greenberg
    ORNL: Al Geist, Ken Kliewer, Tim Sheehan
    Swiss: Anton Gutzinger (ETH) and Ralf Gruber (EPFL)
    LANL: Richard Thomsen, Ian Philp, Mike Warren (Omissions are my oversight. Sorry.)

Sandia and ORNL described the type of applications they are interested in running on such a machine.

Sandia folks described their "Computational Plant" project. The '97 goal is to build a 100 GFlops peak/60 GFlops achieved machine. Here are some details:

Hardware (presented by Robert Clay)
--------
Tentatively:

  • DEC chip (433 or 500MHz)
  • Myrinet networking boards. Will explore topologies, but a mesh seems the likely candidate in order to allow for easier growth.
  • Flexible partition a la ASCI Red: service, system, compute, I/O.
  • 1 I/O node per scalable unit (16 cpus?)
System software (presented by Rolf Riesen)
--------
  • Lightweight operating system kernel (initially linux, then puma/cougar port to alpha)
  • Portals for fast communication
  • Big compiler effort.
  • MPI (details below)

Other stuff:

Heterogeneity (Al Geist)
--------
Al showed that we have most of the pieces to allow for heterogeneity. They just need to be adapted/adopted/refined. Many presentors/participants shared this view. However, a homogeneous setup will be pursued in the initial phases.

Long distance message passing (Tim Sheehan)
--------
A collaboration between ORNL, Sandia, and the Pittsburg Supercomputing center involves running applications across widely distributed computing resources. With a lot of work, codes (CTH, LSMS) were executed in parallel on machines located at Sandia and ORNL. For details, see

http://www.lanl.gov/organization/cic/cic8/para-dist-team/DVS/paragon_results.html.

Many argue that this type of computing is simply a political exercise. Regardless, the challenges involved in computing across the country are related to the challenges involving computing between two machines located in the same room.

Bill Camp: "The alternative to having two major computing resources located at two sites is to locate them both at one site. Which would you prefer?"
Answer: "If they are located at my site, I'm all for the single site model." (My argument for that site being LANL is that the speed of light is faster at altitude.)
Message passing
--------
Although there were some rumblings regarding other programming models, it appears that explicit message passing is the clear front runner. And probably will be an MPI implementation, although only a subset of MPI's functionality will get attention with respect to optimization. (That is, adapt a public domain version of MPI, such as mpich or lam.) Hardware folks are quite interested in providing an excellent network infrastructure. In fact, there is talk of not only implementing barriers in hardware, but also reductions/broadcasts/gather/scatter. This could be a big win.

A variety of subgroups will be formed in order to attack the multitude of challenging issues that must be addressed/conquered in order for this project to succeed. It could be argued that these issues are not applicable to us rich folk who can afford $150,000,000 machines, but I would counter that it is exactly these types of issues that must be addressed in order to kick us up to the 10/30/100 TFLOP machines, regardless of cost.

Details regarding how to get in on such discussions should appear shortly. Should be fun...

Richard Barrett
rbarrett@lanl.gov
667-6845

What is it?

"It's like a tree. It grows, and is pruned to achieve scalability."
    - Jim Tomkins

"You start with a tree. Then you get a forest. Then you have to deal with a continent."
    - Al Geist

"It's like a cow: we feed it, we milk it, then we slaughter it."
    - I ain't sayin'...



[Mail to:] Lilia G. Martinez

Last modified: March 6, 1998


Back to top of page

Questions and Comments || Acknowledgment and Disclaimer