2008 Newsnotes
Previous
Newsnotes
UQ Methods in DAKOTA Employed by NuGET Team to Achieve Critical QMU Milestone
Uncertainty quantification (UQ) algorithms are a critical capability to perform the Quantification of Margins and Uncertainty (QMU). The UQ needs of our users, coupled with the high cost of computational simulations, have led researchers in 1411 to develop more robust and efficient UQ algorithms, delivered in the DAKOTA framework.
Second-order probability (nested sampling methods) is a differentiating capability in DAKOTA. Second-order probability allows one to propagate both aleatory (inherent variation) and epistemic (lack of knowledge) uncertainty. A common situation is where some uncertain inputs can be characterized by probability distributions, but other uncertain inputs can only be characterized with intervals (e.g., any value between an upper and lower bound is possible). In this case, the analysis is done with a nested sampling approach, where the outer loop sampling is over the epistemic variables and the inner loop sampling is over the aleatory variables. The results of second-order probability are a family or ensemble of cumulative distribution functions (CDFs), see Figure 1. Each CDF represents an inner loop sample conditioned on a possible value of the epistemic variables. The bounds on the entire family at a particular response threshold represent the epistemic uncertainty in where the true CDF value may fall. Second-order probability is being used in many Advance Simulation and Computing (ASC) milestones, for example, to assess epistemic ranges on margins at particular threshold levels.
The UQ algorithms in DAKOTA have played a crucial role in assessing uncertainties in stockpile materials, components, systems, and environments, and their effect on weapon performance, safety, and reliability. The Neutron Gamma Energy Transport (NuGET) team employed second-order probability methods for the ASC Level II Milestone titled “NuGET QMU Methodology.” The goal of this milestone was to assess the influence of both aleatory and epistemic uncertainties in hostile and fratricide scenario predictions. The second-order probability method played an important role in downselecting experiments and demonstrating compliance with Stockpile-to-Target Sequence (STS) requirements. The ensembles of CDFs enabled the calculation of interval bounds on margins and failure probabilities, and demonstrated how this methodology can identify components that might have possible problems or need additional analysis to reduce epistemic uncertainty.
To help educate users about the UQ methods in DAKOTA, classes on DAKOTA 4.1 were held at SNL/NM and SNL/CA in April, training 35 users. Additional classes are planned to meet the demands of a growing user base
(Contact:James Stewart)
May 2008
Petascale-ready version of CCSM's Atmospheric Model
The multi-lab SciDAC project, “Modeling the Earth System”, sponsored by the DOE's Office of Biological and Environmental Research, is focused on creating a first generation Earth system model based on the Community Climate System Model (CCSM). The envisioned Earth system model will require petascale computing resources, so the project is also working to ensure the CCSM is ready to fully utilize DOE’s upcoming petascale platforms. The main bottleneck to petascale performance in Earth system models is the scalability of the atmospheric dynamical core. Team members at Sandia, NCAR and ORNL, lead by Mark Taylor (1433), have thus been focusing on the integration and evaluation of new, more scalable, atmospheric dynamical cores (based on cubed-sphere grids) into the CCSM. They have recently developed a new formulation of the highly scalable spectral element dynamical core that locally conserves both mass and energy and has positive preserving advection. They have successfully integrated this dynamical core into the CCSM. This work allows the atmospheric component to use true two-dimensional domain decomposition for the first time, leading to unprecedented scalability demonstrated out to 96,000 processors with an average grid spacing of 25km. Even better scalability will be possible when computing with a global resolution of 10km, DOE's long term goal. The team has completed extensive verification work using standardized atmospheric tests with prescribed surface temperatures and without the CCSM land, ice or ocean models. As part of this work, they have performed detailed mesh convergence studies including a record setting simulation using 64,000 processors of BG/L. The team is currently focused on coupling with the other CCSM component models.
|
Illustration 1: The cubed-sphere grid used on each hybrid-pressure surface by the spectral element atmospheric model component of the CCSM. |
(Contact:Mark Taylor)
May 2008
DAKOTA 4.1 Extends Capabilities for Risk-Informed Decision Making
Version 4.1 of the DAKOTA software toolkit was released and deployed this fall. DAKOTA is used broadly within the Tri-Lab Advanced Simulation and Computing (ASC) community for sensitivity, uncertainty, and design studies involving ASC simulations of NW components and systems. Version 4.1 deploys major new capabilities in uncertainty quantification (UQ), optimization, and optimization under uncertainty (OUU), which emphasize emerging needs in verification and validation (V&V) and risk-informed decision making.
In particular,
- New UQ methods include generalized polynomial chaos expansions, efficient global reliability analysis, incremental sampling, and adaptive importance sampling; and extended UQ methods include second-order reliability, variance-based decomposition, and evidence theory. The new methods bridge a critical gap in accuracy and efficiency that has existed with current production methods and emphasize smart, adaptive approaches with verifiable accuracy.
- New optimization methods include multipoint hybrid methods, global surrogate-based optimization, efficient global optimization, and dynamic optimizer plug-ins; and extended optimization capabilities include trust-region surrogate-based optimization, DIRECT, OPT++, and AMPL. These new methods emphasize the efficient identification of globally-optimal designs.
- The optimization and UQ developments enable new OUU methods, including polynomial chaos-based OUU, global reliability-based OUU, epistemic OUU, and model calibration under uncertainty. These new methods enable risk-informed analysis and design.
These new DAKOTA algorithms are being applied to the probabilistic design of microsystems in order to define shapes that are both robust and reliable with respect to manufacturing uncertainties; see Figure 1.
|
|
Figure 1. Micrograph of bi-stable MEMS device (left) with optimized force-displacement profile (right). Prescribed reliability level for actuation force was achieved while reducing sensitivity to manufacturing uncertainties by an order of magnitude. |
DAKOTA is used with Sandia's high performance simulation codes such as Alegra, Xyce, and SIERRA, and is impacting Sandia mission areas in Defense Programs, Qualification Alternatives to the Sandia Pulsed Reactor (QASPR), Microsystems and Engineering Science Applications (MESA), High Energy Density Physics (HEDP), National Infrastructure Simulation and Analysis Center (NISAC), and others. DAKOTA is open-source and has approximately 4000 registered installations from sites all over the world. DAKOTA is led by Department 1411 and has contributors from across Centers 1400, 1500, 6600, and 8900. See http://www.cs.sandia.gov/DAKOTA/software.html
(Contact:Jim Stewart and Mike Eldred)
April 2008
Bundle-Exchange-Compute (BEC): A New Parallel Programming Environment
BEC (Beta version) was publicly released on April 9, 2008. A BEC Tutorial will be taught by Mike Heroux (1414) and Zhaofang Wen (PI, 1423) on April 10th to Sandia audience. BEC represents a new parallel programming model for the high-performance scientific application development. BEC is jointly developed by Sandia and Syracuse University (funded by Sandia/CSRF). Using BEC, Sandia parallel programmers can potentially increase their programming productivity by a factor of 3X or more.
Sandia's large-scaled scientific applications need to run on high-end parallel computers, each consists of thousands of workstations interconnected together. A parallel application program needs to make all these workstations work together to solve a single (scientific) problem as fast as possible. To exploit the computing power of a parallel computer, computation work should be divided as evenly as possible among the workstations; but computation can not be done without data; so it is also important to partition the data among these workstations, and to move the data between the workstations ("communication") as needed by computation at the right moment ("synchronization"). For more than a decade, programmers at Sandia (and everywhere else) have been using a parallel programming environment called MPI. MPI application developers, often domain scientists, are forced to handle the difficult tasks of data partition, communication, and synchronization, which are low-level machines details totally irrelavent to their domain scientific expertise.
BEC frees the programmers from the low-level machine details and allows them to focus their applications and algorithms. Writing parallel programs will be much easier.
Comparison of the same applications using BEC and MPI show that the BEC and MPI programs have similar performance, but the BEC programs are much simpler and easier to write. The charts and table below is an example, a sparse linear solver using the Conjugate Gradient (CG) method with data from the diffusion problem on 3D chimney domain.Table 1 is a code size comparison of BEC vs. MPI, excluding empty, comments, debugging, and # lines. Figure 1 shows a comparison of the parallel execution time of the BEC and MPI programs.
Task
(in CG Application) |
Number of Lines of Code |
BEC |
MPI |
Computation related |
60 |
87 |
Communication related |
11 |
277 |
Whole program |
233 |
733 |
|
Table 1: Code size comparison: BEC vs. MPI |
|
Figure 1: Comparison of parallel execution time of the BEC and MPI programs |
(Contact: Zhaofang Wen)
April 2008
D-Fluids-DFT Calculations of Peptide Assemblies in Bilayers
Calculation of the structure of peptides and their assemblies is challenging when they are found in a small molecule (water) based electrolyte. The problem becomes much more difficult when the peptides are embedded in lipid bilayer membranes. In this case the dense fluid medium of the bilayer membrane makes statistical sampling of embedded peptide assemblies especially challenging. We have recently succeeded in first-of-a-kind calculations of the 3-dimensional structure of a fluid bilayer in the vicinity of an assembly of anti-microbial peptides (AMPs). In these calculations, we were able to predict the existence of membrane spanning pores at the intersection of an assembly of 6 AMPs. This is significant because the primary mode action of many AMPs is to increase the porosity of a cell membrane resulting in cell death. In order to solve these kinds of problems, we have developed the Tramonto software (see http://software.sandia.gov/tramonto). Tramonto computes density profiles both for fluid systems where surfaces cause the fluid to be inhomgeneous (e.g. fluids in zeolite) and for fluids where intra-molecular interactions result in self-assembled structures (e.g. fluid lipid bilayers found in cell membranes). The underlying theories solved in Tramonto include several types of modern nonlocal density functional theories that retain the length scale of a monomer. Thus a variety of coarse-grained models can be solved with Tramonto depending on the definition of the monomer. The solution of these complex theories is facilitated by specialized parallel solver methods that allow the code to run very efficiently on massively parallel computers. These calculations demonstrate an unprecedented level of complexity in modeling biological membranes such as cell boundaries that enables new scientific insight into the details of drug-membrane interactions with implications for new antibiotics and drug-delivery mechanisms.
While the discussion here focuses on a biological application, the larger story is the successful development of 3D Fluids-Density Functional Theory (DFT) capabilities. While the Quantum-DFT community is quite large and several well-established software packages exist, Tramonto is one of only a few codes capable of 3D Fluids-DFT calculations for predicting fluid structure. It is the only code that combines nonlocal theories that capture the monomer length scale with highly tuned parallel computing algorithms that include engineering analysis algorithms (such as the Trilinos-LOCA stability and bifurcation analysis tools, see http://trilinos.sandia.gov) for materials design. This software opens many opportunities for new kinds of investigations across many disciplines in materials modeling, nanoscience, and biology. The Tramonto team includes Laura Frink (Colder Insights, SNL contractor), Amalie Frischknecht (1814), Mike Heroux (1416), and Andrew Salinger (1414).
|
Figure 1: 3D density profiles from Tramonto calculation of AMPs in lipid bilayers. In both cases, the red to white contours show the solvent, the yellow to black contours show the lipid tail group, and the blue lines show contours for the head group species. On the left, the lipid bilayer exhibits nanoscale structure due to the presence of the AMPs as indicated by the horizontal yellow to black striping. However, the bilayer remains intact. On the right, a nanoscale pore forms in between the AMPs that form the assembly. Nanoscale solvent structure is observed (see red to white bead pattern that runs through the bilayer). In addition, head group densities are now nonzero at the interface of the lipid tails and the solvent in the nanopore. This toroidal structure arises naturally in the Fluids-DFT calculation. |
(Contact: Scott Collis)
April 2008
Material Failure Improvements in ALEGRA
The ALEGRA shock physics code is being used by the Army Research Laboratory (ARL) to simulate important experiments for advanced armor development. These simulations involve impact of a metal rod into targets of ceramic materials and metals, all subject to high strain rate and material failure. In early January, long-standing algorithmic issues with material failure modeling were causing early termination of these simulations at less than 20 microseconds.
To address this situation, Erik Strack, Mike Wong, Ed Love, and Bill Rider (all 1431) collaborated in an intense effort to develop, implement and test improvements to these algorithms. The major improvements were implementation of a new void insertion model to handle excessive tensile stresses and extension of the isentropic multimaterial algorithm to accommodate void inserted during tension relief.
The new void insertion model is intended to replace the existing pressure-dependent fracture model, which used a Newton iteration scheme to converge to a relaxation pressure. Under a number of circumstances observed in these simulations, however, the derivative of the pressure-density function from the equation of state was sufficiently inaccurate that the iteration would fail to converge, resulting in eventual failure of the calculation. The new model provides a less efficient but more robust backup iteration scheme and logic to detect conditions necessary to switch schemes. In addition, error checking, convergence tests and diagnostics were substantially improved.
The void volume evolution is now treated in a manner compatible with both a modern multi-material treatment and the improved fracture algorithm. This resulted in additional enhancement to the robustness of the simulations and provided physically realistic and meaningful results.
These improvements were implemented in ALEGRA and resulted in successful ARL armor simulations running to completion at up to 180 microseconds, representing a major milestone that ARL has been striving to achieve since 2001. The accompanying figure shows ceramic failure patterns in agreement with those observed experimentally.
(Contact: Erik Strack, Mike Wong, Ed Love, and Bill Rider)
April 2008
Red Storm's 284 TeraFlop Upgrade: The Inside Story
On February 5, 2008 a News Release was issued by Cray Corporation and publicized on HPCwire about the agreement between Sandia and Cray to upgrade our NNSA/ASC Red Storm system to 284 TeraFlops. This agreement was also described in a Feb 15, 2008 Sandia Lab News article. Our upgrade is scheduled for the summer of 2008 and will be the 2nd major upgrade to Red Storm. The first upgrade occurred in the Fall of 2006 and brought the system from an initial performance level of 41 TeraFlops to the current theoretical peak performance of 124 TeraFlops. A critical concern for all massively parallel supercomputers is scalability. The attention to interconnection network performance and scalable system software provides Red Storm very good application scalability; that is, the ability to have application performance scale up to entire system. The inside story implied in these recent news reports is a different dimension to supercomputer scalability. This 2nd upgrade to Red Storm exploits the ability to grow the system to take advantage of improvements in processor technology. The initial system used 10,368 AMD Opteron processors at 2.0 GHz; and the current system uses 12,960 AMD dual-core Opteron processors at 2.4GHz. This upgrade will replace about 48% of these dual-core processors with the latest generation of quad-core AMD Opterons at 2.2 GHz that give four floating point operations per clock versus the two floating point operations per clock of the current dual core and original single core Opterons. If the other 52% of the system were also upgraded, the theoretical peak performance of Red Storm would be 456 TeraFlops. Finally, these successive phases of processor upgrades are enabled by Sandia's collaboration with Cray to upgrade the Catamount system software to support dual-core and quad-core Opteron processors.
(Contact: James Ang)
March 2008
Nanoparticle Simulations with LAMMPS
Sandia's molecular dynamics package LAMMPS has been modified to enable more efficient simulations of large-scale nanoparticle models which contain large variation in particle sizes, e.g. for modeling a nanoparticle suspension with background solvent. This required new algorithms for neighbor finding and inter-processor communication which are able to search for a minimum number of nearby particles without needless distance computations. For systems with a 20:1 size ratio between nanoparticles and solvent particles, the new version of the code is over 100x faster, in either serial or parallel.
This is enabling large simulations of suspensions to measure rheological properties such as diffusion coefficients and viscosity. These are of interest in manufacturing processes such as extrusion and coating where suspensions are used to disperse nanoparticles and assemble nanostructured materials. This work has been funded by NINE (National Institute for Nano Engineering) and a funds-in CRADA with 5 companies interested in manufacturing issues for nano and colloidal suspensions.
The LAMMPS parallel molecular dynamics package is an open-source code distributed world-wide. Contributors to LAMMPS within Sandia are in centers 1100, 1400, 1500, and 1800. See lammps.sandia.gov for more details.
 |
A snapshot of nanoparticles in explicit solvent. |
(Contacts: Steve Plimpton or Scott Collis)
March 2008
New Solver Research Enables Simulations of Large Electrical Circuits
Recent solver research, focusing on matrix ordering algorithms and block-structured preconditioning, has significantly improved the capability to numerically simulate the response of large-scale electrical circuits for stockpile systems. Xyce, a massively-parallel circuit simulation code, is used to predict the electrical response of large, integrated circuits, particularly for hostile radiation environments. Xyce models individual circuit elements as nonlinear differential equations, which are assembled into a large system of nonlinear differential equations to form the full circuit problem. This set of equations must be implicitly integrated forward in time in order to determine the circuit response. The preconditioner research, led by David Day (1414) and Heidi Thornquist (1437), identified and implemented a permutation operation that re-orders the Jacobian matrix of the governing equations, leading to a block-structured and diagonally-dominant matrix that is amenable to preconditioning and efficient computation of iterative solutions.
Circuit models that include integrated circuit interconnect parasitics can result in poorly conditioned linear systems, which is traditionally challenging for iterative solvers. Typically this issue becomes more significant as integrated circuit feature sizes shrink, as parasitic capacitive and inductive effects are more likely to dominate overall circuit behavior.
For newer integrated circuit technologies traditional preconditioners often perform poorly or fail, so code developers and analysts have had to rely on direct solvers, which are memory-intensive and not scalable to very large numbers of processors. Reliance on direct solvers has therefore limited the size and complexity of circuit problems that Xyce could solve. The new preconditioner, implemented in the Trilinos framework, has enabled the use of scalable and efficient iterative solvers, thereby allowing for high fidelity simulations of parasitic effects in integrated circuits.
|
Figure 1. The reduced block connectivity matrix is shown for a Jacobian matrix from the transient simulation of a circuit with 109,345 unknowns and strong parasitics. Graph algorithms are used to find strongly connected blocks.
|
|
Figure 2. The reduced block connectivity graph from Figure 1 is shown here partitioned for four processors. The Isorropia package and Zoltan parallel load balancing utility were used to determine the reordering.
|
(Contacts: David Day or Heidi Thornquist)
February 2008
|