PARAGON POLICIES
Introduction
The mission of DOE's Massively Parallel Computing Research Laboratory
at Sandia National Laboratories (MPCRL) is to provide leadership in
high-performance computing on advanced-architecture, large-scale
parallel supercomputers for DOE, DOD, and U.S. industry by focusing
on solutions of complex, large-scale, interdisciplinary applications
of national importance, by supporting a staff of world-class
researchers, and by forming strategic partnerships.
In partial fulfillment of its mission, the MPCRL and Sandia National
Laboratories operate an 1840-node Intel Paragon XP/S L-140 Paragon
parallel supercomputer. The MPCRL and Sandia provide access to this
computer for projects commensurate with their missions and for the
benefit of its partners in the National Consortium for
High-Performance Computing (NCHPC).
This document presents the policies governing access to and usage of
Sandia's large Paragon. Sandia's intent is to allow as many users as
possible access to the Paragon for problems which are commensurate
with Sandia's and NCHPC's missions and strategic interests and which
cannot be run elsewhere.
To fulfill this intent, these policies are designed to implement five
governing principles:
- The Paragon should be used primarily for large, important problems
which have significant technical and programmatic impact.
- Access to the Paragon should be as open as possible, based on technical
merit and programmatic importance consistent with the missions of
Sandia and the NCHPC.
- Allocation of time on the Paragon should reflect the contributions of
those who helped purchase or who help maintain the machine.
- Administration of the Paragon (monitoring and accounting for usage,
etc.) should be as easy and automated as possible.
- Sandia and our NCHPC partners should get an outstanding return
from granting access to the Paragon to its users.
The remaining sections of this document present policies governing
- Allocation of dedicated time, and
- Usage of the Paragon during non-dedicated ("prime") time.
The terms "dedicated time" and "non-dedicated time" are defined below.
Polices governing the usage of the Paragon during non-dedicated time
are presented in a separate policy document.
Policies Governing Dedicated Time Allocation
In order to fulfill their missions and their obligations to the NCHPC,
the MPCRL and Sandia allocate dedicated time on the 1840-node Paragon
to reflect financial contributions to the purchase and maintenance of
the computer, and only for projects meeting the criteria for technical
and programmatic importance. These criteria are presented in a
separate policy document.
These policies govern the allocation of dedicated time. As system
administration software and procedures mature, the fraction of compute
time which is dedicated time will increase.
Definitions
Available compute time on the Paragon is divided into dedicated time
and non-dedicated time. Owing to preventive maintenance and
unexpected downtime, the available time in a week will be less than
the total compute time. Dedicated time can be reserved by users
worked on approved projects for their exclusive use. Non-dedicated
time, some- times referred to as "prime time", is available to those
who have accounts.
Policies
- Available Paragon dedicated time is allocated monthly among Sandia, its
NCHPC partners, and Intel SSD (as per the contract with Intel).
- Sandia dedicated time will be allocated to significant, multi-staff
projects based on proposals. New proposals are solicited annually with
grants awarded for the following year. Sandia proposals are reviewed by a
team of Sandia scientists and engineers for their technical and programmatic
impact to Sandia, its partners, and DOE. The review team will strive to
balance competing groups' requests and accommodate approximately twelve
significant projects each year.
- NCHPC dedicated time will be allocated to significant projects
based on proposals. New proposals are solicited every 3 months: grants
are awarded for periods of up to one year. NCHPC proposals are
reviewed by a team of scientists and engineers from DeSRA institutions
(at least Sandia, ARL, PL, WL, and NRAD) for their technical and
programmatic impact to NCHPC, the HPCC initiative, and DoD. Successful
proposals should include collaboration between a DoD lab and at least
one non-DoD NCHPC member institution. The review team will strive to
balance competing groups' requests and accommodate approximately six
significant projects each year.
- Wright Laboratory and Phillips Laboratory will each allocate their
1.5% using the NCHPC guidelines above.
- Intel must have specific approval from a designated Sandia
representative for allocation of their 10%, which re-initializes
weekly (as per the Intel/Sandia purchase contract).
- Time on the Paragon is divided into dedicated time and
non-dedicated or prime time. Prime time is from 8:00 a.m. to 5:00
p.m. Albuquerque time on normal Sandia work days; all other time is
dedicated time. In addition, there are two one-hour time slots
available during prime time which can be reserved on short notice for
benchmarking, demonstrations, debugging, etc. These slots are
nominally 8:00 a.m. to 9:00 a.m., and 12:00 p.m. to 1:00 p.m., both
Albuquerque time.
The allocation of dedicated time for Sandia and NCHPC will be
50% for work covered by approved research proposals
25% for proposals received in the middle of the normal review cycle
25% for a reserve (for exceptional circumstances)
Accounts granted outside the normal proposal process will be given a
small allocation of dedicated time for a limited time.
7. All allocations of dedicated time are for maximum node hours per
month. Node hours are not guaranteed and may be unavailable due to
scheduled and unscheduled system downtimes (e.g., for maintenance,
diagnosis, repair, or upgrade).
8. Who gets an account on the Paragon will be limited by the following:
- Proposals are required for all accounts, and accounts are granted only
after appropriate review by the technical review committee.
- Any Sandian who submits a reasonable research proposal can get an
account on the Paragon and will compete with other users during the
prime time.
- Other users can get accounts by submitting a research proposal which
includes a sponsor at Sandia, within NCHPC, or at Phillips or Wright
Laboratories (the latter two because they have an extra allocation
beyond their participation in NCHPC) and will compete with other users
during the prime time.
9. It is intended that, as the appropriate software tools become
available, prime time will be steadily decreased and dedicated time
will be increased, to increase the amount of time available for
reservations.
Policies Governing Paragon Usage
Sandia's intent is to allow as many users as possible access to the
Paragon for problems which are commensurate with Sandia's and NCHPC's
missions and strategic interests and which cannot be run elsewhere.
The policies presented in this section govern the usage of the Paragon
in terms of the sizes of jobs which are run (measured by the number of
nodes used) and their duration. The specific rules governing usage
during non-dedicated time are presented in the "Good Citizen Rules".
These rules are intended to provide maximum flexibility and ease of
use while maintaining accountability.
Policies
- The Paragon should be used primarily for large jobs, those which
require all or nearly all the machine (1000 nodes or more). Initial
development of codes on small numbers of nodes should be done on other
machines. However, some software errors can only be discovered on
large numbers of nodes, so provision has been made for debugging codes
on large numbers of nodes. The running of large jobs on the Paragon
is encouraged; the running of small jobs is discouraged.
- All users, regardless of affiliation, shall abide by the "Good
Citizen" usage rules for the Paragon. These rules specify the method
for reserving dedicated time on the Paragon and define appropriate
usage during prime time.
- Currently, dedicated time will be scheduled in large blocks of
wall-clock time to individual users. These blocks are:
- from 5 p.m. Albuquerque time on a normal Sandia work day until
8 a.m. the following morning.
- from 8 a.m. Albuquerque time on a Sandia non-work day until 8 a.m.
the following morning.
In the near future, an automated queuing system is expected to be
established to control dedicated time, which would then allow
scheduling blocks of differing sizes.
Granting of dedicated time will be based on node-hour entitlement and
will give priority to jobs requiring more nodes.
4. Total node-hour availability per week is the hours in a week, minus
the number of hours unavailable due to preventive maintenance or
system outage, times the total available nodes. Node hours per week
available for allocation (dedicated time) are the node hours which are
available that week during the non-prime-time hours. During an
ordinary week with 1840 nodes these figures are:
node hours per week = ( (7x24) - unavailable ) x 1840
allocatable per week = ( (5x15 + 2x24) - (non-prime unavailable) ) x 1840