CPlant FAQ Cplant logo

Frequently Asked Questions about the Cplant

  1. What is the Cplant machine?
  2. Is it another Beowulf machine?
  3. What does each node contain?
  4. How are the machines connected together?
  5. Which Myrinet Control Program (MCP) is being used?
  6. What message passing library is being used?
  7. What version of Linux is being used?
  8. How well does the machine perform?
  9. Which compilers are being used?
  10. On what LAN is the machine?
  11. Who may have accounts on the machine?
  12. How do I obtain an account?
  13. Whom do I contact if I have a problem with my account?
  14. Where do I get help?
  15. What mailing lists exist for the machine?
  16. How do I log into the service partition?
  17. How do I transfer files to and from the machine?
  18. What security levels exist on the machine?
  19. What file systems are available?
  20. Where can I find some on-line documentation?
  21. What operating systems run on the machine?
  22. What languages are supported?
  23. Are there any Fortran 90 compilers or translators?
  24. How do I set up my environment for compiling codes?
  25. How do I compile a code?
  26. Where should codes be compiled?
  27. For what operating systems are the compilers available?
  28. How do I run a code in the compute partition?
  29. How do I check whether my parallel job is loaded?
  30. How do I kill my job?
  31. What debugging tools are available?
  32. What programming models are available?
  33. What message passing libraries are available?
  34. What libraries are available?
  35. Can I use sockets from within an application?
  36. What options are available for I/O from parallel applications?
  37. What tools for performance analysis are available?
  38. What tools for resource management are available?
  39. What does a service node look like?
  40. What does a compute node look like?
  41. What does an I/O node look like?
  42. How much disk space is available?
  43. Is HiPPI available for this system?
  44. Is there any way to get information about what is going on in my program without instrumenting my code by hand?
  45. What is the clock rate on the PCI bus?
  46. What is the hardware latency for sending messages? What is the software latency for sending a message?
  47. How large is the message header for interprocessor messages?
  48. What binary format is used for data storage in files?
  49. Can message passing be used to communicate with processors external to the machine?
  50. What can I do to improve message passing performance?
  51. Are there any known problems with the message passing?
  52. Where did Cplant get its name?

  1. What is the Cplant machine?
    The Cplant machine is a parallel computer constructed entirely from commodity personal computer hardware.

  2. Is it another Beowulf machine?
    Not really. The Cplant project has some broader goals than traditional Beowulf systems. We are not trying to build a machine for a small number of users to run a small number of applications on a small number of machines. We are trying to build a production machine for hundreds of users to run all types of parallel applications on potentially thousands of nodes. We are essentially trying to build a commodity-based machine patterned after the design of the Intel TeraFLOPS machine.

  3. What does each node contain?
    Each new phase is composed of different hardware. For a detailed list of the hardware used in each phase, click on buttons for the different phases.

  4. How are the machines connected together?
    The machines are connected with Myricom's Myrinet gigabit networking hardware. Each node contains a SAN PCI card connected to a 16-port SAN/LAN switch. The topology of the network is roughly a cube of hypercubes.

  5. Which Myrinet Control Program (MCP) is being used?
    We are using our own MCP that works with our implementation of Portals in Linux. We initially used Myricom's GM software, but only to test network connectivity and generate routes.

  6. What message passing library is being used?
    We have ability to use the MPI on Puma Portals implementation that was originally designed for the Intel Paragon running Puma, and is currently in use on the Intel TeraFLOPS machine running Cougar. It was developed from MPICH version 1.0.12 and has been extensively tested and validated by Intel.

  7. What version of Linux is being used?
    We are using Red Hat 5.1 and Linux-AXP v2.0.34.

  8. How well does the machine perform?
    Various performance numbers can be obtained from the Cplant Performance page.

  9. Which compilers are being used?
    We are using compilers for Digital UNIX, Digital FORTRAN and Digital C Digital C++ to build statically linked ECOFF executables that run under Linux-AXP.

  10. On what LAN is the machine?
    The production Cplant Alpha/Linux cluster (called Alaska) currently is on Sandia's Internal Restricted Network (IRN).

  11. Who may have accounts on the Cplant?
    An IRN account is needed to access Cplant.

  12. How do I obtain an account on Cplant?
    You can request an account through WebCARS on Sandia's internal web.

  13. Whom do I contact if I have a problem with my account?
    If you do not receive notification that your account has been set up within two weeks of submitting the appropriate form, send e-mail to alaska-help.

  14. Where do I get help?
    For usage problems (e.g., "Where are the compilers?"), or questions related to mailing lists, send e-mail to alaska-help@sandia.gov or check the on-line documentation..

  15. What mailing lists exist for Cplant?
    The following mailing lists exist for Cplant:

  16. How do I log into the Cplant service partition?
    Check this Sandia internal web link for usage info.

  17. How do I transfer files to and from Cplant?
    Check this Sandia internal web link for usage info.

  18. What security levels exist on Cplant?
    The unclassified computers on the IRN have been approved by DOE for the storage and processing of Sensitive Unclassified data. Owners of such data have the responsibility to adequately protect against unauthorized access.

    Standard UNIX file access permissions are sufficient to protect this data, but users are reminded that it is their responsibility to properly set their file mode bits and umask. Note that most people set their umask to permit public read which is inappropriate for Sensitive Unclassified data.

  19. What file systems are available?
    Check this Sandia internal web link for usage info.

  20. Where can I find some on-line documentation?
    There are man pages available in /Net/mp/cplant/man and here.

  21. What operating systems run on Cplant? Cplant currently runs a variant of Alpha Linux version 2.0.34 in both the service and compute partitions. The stock version of the kernel is augmented with modules that implement portals.

  22. What languages are supported on Cplant?
    C, C++, Fortran 77, and Fortran 90, are supported for the compute nodes of Cplant.

  23. Are there any Fortran 90 compilers or translators?
    We currently make use of the Digital UNIX Fortran 90 compiler.

  24. How do I set up my environment for compiling codes?
    Check this Sandia internal web link for usage info.

  25. How do I compile a code?
    Check this Sandia internal web link for usage info.

  26. Where should codes be compiled?

    There is a dedicated Digital UNIX machine which serves as Cplant application compile server named 'juneau'. All users who have accounts on the Cplant machine also have accounts on juneau.

  27. For what operating systems are the compilers available?
    Cplant applications can only be compiled under Digital UNIX.

  28. How do I run a code in the compute partition?
    Jobs are launched in the compute partition by a command named yod. This command is similar the yod available under SUNMOS on the large Paragon and Cougar on the TFLOPS machine. See yod.

  29. How do I check whether my parallel job is loaded on Cplant?
    You may check whether your parallel job has loaded on Cplant using the pingd and showmesh commands, which display the jobs that are running on the system.

  30. How do I kill my job?
    The appropriate method of killing a parallel job is
      % kill -2 
      
    where is the process number of the yod command for the job, obtained via ps. When yod receives this signal, it terminates the parallel job and shuts itself down in an orderly fashion. Note that from an interactive session, ctrl-C is interpreted properly.

  31. What debugging tools are available on Cplant?
    There currently is no source level debugger available for general use. Debugging on the compute nodes via gdb is possible, but requires a special configuration which has to be set up by Cplant support.

  32. What programming models are available on Cplant?
    Cplant is a distributed-memory, MIMD machine which supports explicit parallel programming.

    In explicit parallel programming, the code developer must explicitly decompose data structures into sub units and distribute them among the nodes of the machine. The code written to execute on each node uses standard languages (e.g., Fortran 77, C) for local processing. Messages are passed between nodes using a message-passing protocol (MPI or Portals) to coordinate processing.

  33. What message passing protocols are available on Cplant?
    The supported protocols for message passing on Cplant are Portals and MPI. Portals provide the low-level communication facility upon which all Cplant applications (user and support) are built. Portals are essentially shared data structures between the application process and the kernel which tell the kernel where to deposit incoming messages.

    Portals are available to the application through a user-level Portal library, a system-level Portal library, or through direct manipulation of Portal data structures. No special flags are required to compile a code which uses Portals.

  34. What libraries are available on Cplant?
  35. The libraries which will be supported on Cplant are o libc.a o libm.a o libdxml.a (DEC Extended math library) o libmpi.a

  36. Can I use sockets from within an application?
    Sockets are not currently supported from within a compute node application. Currently the only way to get data from the compute partition to the outside world is through the file system.

    In the near future, compute node applications will have the ability to communicate with the service partition (and other compute node applications) through an MPI-2 interface. Once this happens, it will be possible to have a service node application receive messages from the compute node application and funnel these messages off of the machine through TCP/IP. In the distant future, there will be a network partition which will also provide this capability, although its exact functionality has not been determined yet.

  37. What options are available for I/O from parallel applications on Cplant?
    A UNIX file system is currently the only file system avaialable. All file I/O currently goes through yod. A parallel file system is being developed.

  38. What tools for performance analysis are available on Cplant?
    None currently.

  39. What tools for resource management are available on Cplant?
    None currently.

  40. What does a service node look like?
    The hardware and operating system on a service node are identical to that of the compute nodes. The service nodes have an extra ethernet interface which allows them to be accessible to the LAN. The service nodes also NFS mount user home directories from the central file server.

  41. What does an I/O node look like?
    The I/O nodes are Alpha Server 1200's running Linux.

  42. How much disk is available on Cplant now?
    Available disk is system dependent.

  43. Is HiPPI available for this system?
    HiPPI is not available for Cplant.

  44. Is there any way to get information about what is going on in my program without instrumenting my code by hand?
    Not currently.

  45. What is the clock rate on the PCI bus?
    The clock rate of the PCI bus is 33 MHZ.

  46. What is the hardware latency for sending messages? What is the software latency?
    Check the Cplant performance page.

  47. How large is the message header for interprocessor messages?
    The message header requires 64 bytes for all messages.

  48. What binary format is used for data storage in files?
    The DEC Alpha uses the little endian format.

  49. Can message passing be used to communicate with processors external to Cplant?
    MPI is supported as a message-passing library on the compute partition of Cplant, but off-machine connections for applications running on the compute partition are not currently supported, but are planned.

  50. What can I do to improve message passing performance?
    Make sure the buffer you are sending from or receiving into is 4 or 8 byte aligned, i.e. the two least significant bits (LSB) of the buffer address are 0.

    Make sure your message length is a multiple of 4.

    Send large messages to amortize message startup overhead.

  51. Are there any known problems with the message passing?
    The low-level Myrinet driver cannot currently send 1 and 2 byte messages that do not start or end on a 4 byte boundary.

    The Myrinet hardware has a built-in 4 second timer that causes a network reset, when a message transfer takes longer than that time. While that is enough time to transfer all the physical memory of our nodes, it can cause problems in case of network contention. For example, when all the nodes in an application send large messages to a single node at the same time, the last few messages to arrive at the single node may have been in the network longer than the maximum of 4 seconds. We are working on solutions to recover more gracefully from such a reset.

  52. Where did Cplant get its name?
    The Computational Plant derived its name from two of the main concepts behind its inception. First, Cplant is a plant is the sense of a power plant. Cplant provides compute cycles in much the same way that a power plant provides electricity. Also, Cplant is a living entity that will grow and be pruned on a three year cycle. Each year new hardware will be added while older hardware will be removed.

(some questions and answers lifted from the TFLOPS FAQ)