PARAGON POLICIES


Introduction

The mission of DOE's Massively Parallel Computing Research Laboratory at Sandia National Laboratories (MPCRL) is to provide leadership in high-performance computing on advanced-architecture, large-scale parallel supercomputers for DOE, DOD, and U.S. industry by focusing on solutions of complex, large-scale, interdisciplinary applications of national importance, by supporting a staff of world-class researchers, and by forming strategic partnerships.

In partial fulfillment of its mission, the MPCRL and Sandia National Laboratories operate an 1840-node Intel Paragon XP/S L-140 Paragon parallel supercomputer. The MPCRL and Sandia provide access to this computer for projects commensurate with their missions and for the benefit of its partners in the National Consortium for High-Performance Computing (NCHPC).

This document presents the policies governing access to and usage of Sandia's large Paragon. Sandia's intent is to allow as many users as possible access to the Paragon for problems which are commensurate with Sandia's and NCHPC's missions and strategic interests and which cannot be run elsewhere.

To fulfill this intent, these policies are designed to implement five governing principles:

The remaining sections of this document present policies governing

Polices governing the usage of the Paragon during non-dedicated time are presented in a separate policy document.


Policies Governing Dedicated Time Allocation

In order to fulfill their missions and their obligations to the NCHPC, the MPCRL and Sandia allocate dedicated time on the 1840-node Paragon to reflect financial contributions to the purchase and maintenance of the computer, and only for projects meeting the criteria for technical and programmatic importance. These criteria are presented in a separate policy document.

These policies govern the allocation of dedicated time. As system administration software and procedures mature, the fraction of compute time which is dedicated time will increase.

Definitions

Available compute time on the Paragon is divided into dedicated time and non-dedicated time. Owing to preventive maintenance and unexpected downtime, the available time in a week will be less than the total compute time. Dedicated time can be reserved by users worked on approved projects for their exclusive use. Non-dedicated time, some- times referred to as "prime time", is available to those who have accounts.

Policies

  1. Available Paragon dedicated time is allocated monthly among Sandia, its NCHPC partners, and Intel SSD (as per the contract with Intel).
  2. Sandia dedicated time will be allocated to significant, multi-staff projects based on proposals. New proposals are solicited annually with grants awarded for the following year. Sandia proposals are reviewed by a team of Sandia scientists and engineers for their technical and programmatic impact to Sandia, its partners, and DOE. The review team will strive to balance competing groups' requests and accommodate approximately twelve significant projects each year.
  3. NCHPC dedicated time will be allocated to significant projects based on proposals. New proposals are solicited every 3 months: grants are awarded for periods of up to one year. NCHPC proposals are reviewed by a team of scientists and engineers from DeSRA institutions (at least Sandia, ARL, PL, WL, and NRAD) for their technical and programmatic impact to NCHPC, the HPCC initiative, and DoD. Successful proposals should include collaboration between a DoD lab and at least one non-DoD NCHPC member institution. The review team will strive to balance competing groups' requests and accommodate approximately six significant projects each year.
  4. Wright Laboratory and Phillips Laboratory will each allocate their 1.5% using the NCHPC guidelines above.
  5. Intel must have specific approval from a designated Sandia representative for allocation of their 10%, which re-initializes weekly (as per the Intel/Sandia purchase contract).
  6. Time on the Paragon is divided into dedicated time and non-dedicated or prime time. Prime time is from 8:00 a.m. to 5:00 p.m. Albuquerque time on normal Sandia work days; all other time is dedicated time. In addition, there are two one-hour time slots available during prime time which can be reserved on short notice for benchmarking, demonstrations, debugging, etc. These slots are nominally 8:00 a.m. to 9:00 a.m., and 12:00 p.m. to 1:00 p.m., both Albuquerque time.

The allocation of dedicated time for Sandia and NCHPC will be

50% for work covered by approved research proposals 25% for proposals received in the middle of the normal review cycle 25% for a reserve (for exceptional circumstances)

Accounts granted outside the normal proposal process will be given a small allocation of dedicated time for a limited time.

7. All allocations of dedicated time are for maximum node hours per month. Node hours are not guaranteed and may be unavailable due to scheduled and unscheduled system downtimes (e.g., for maintenance, diagnosis, repair, or upgrade).

8. Who gets an account on the Paragon will be limited by the following:


Policies Governing Paragon Usage

Sandia's intent is to allow as many users as possible access to the Paragon for problems which are commensurate with Sandia's and NCHPC's missions and strategic interests and which cannot be run elsewhere.

The policies presented in this section govern the usage of the Paragon in terms of the sizes of jobs which are run (measured by the number of nodes used) and their duration. The specific rules governing usage during non-dedicated time are presented in the "Good Citizen Rules". These rules are intended to provide maximum flexibility and ease of use while maintaining accountability.

Policies

  1. The Paragon should be used primarily for large jobs, those which require all or nearly all the machine (1000 nodes or more). Initial development of codes on small numbers of nodes should be done on other machines. However, some software errors can only be discovered on large numbers of nodes, so provision has been made for debugging codes on large numbers of nodes. The running of large jobs on the Paragon is encouraged; the running of small jobs is discouraged.
  2. All users, regardless of affiliation, shall abide by the "Good Citizen" usage rules for the Paragon. These rules specify the method for reserving dedicated time on the Paragon and define appropriate usage during prime time.
  3. Currently, dedicated time will be scheduled in large blocks of wall-clock time to individual users. These blocks are:
    1. from 5 p.m. Albuquerque time on a normal Sandia work day until

      8 a.m. the following morning.

    2. from 8 a.m. Albuquerque time on a Sandia non-work day until 8 a.m.

      the following morning.

In the near future, an automated queuing system is expected to be established to control dedicated time, which would then allow scheduling blocks of differing sizes.

Granting of dedicated time will be based on node-hour entitlement and will give priority to jobs requiring more nodes.

4. Total node-hour availability per week is the hours in a week, minus the number of hours unavailable due to preventive maintenance or system outage, times the total available nodes. Node hours per week available for allocation (dedicated time) are the node hours which are available that week during the non-prime-time hours. During an ordinary week with 1840 nodes these figures are:

node hours per week = ( (7x24) - unavailable ) x 1840 allocatable per week = ( (5x15 + 2x24) - (non-prime unavailable) ) x 1840