2010 Newsnotes
Previous
Newsnotes
Lowering Energy Use on HPC Systems
Power is now recognized as a critical and potentially limiting factor in architecting current and next generation High Performance Computing (HPC) systems. Sandia recognized this issue a number of years ago and began efforts to investigate and affect power use on HPC platforms beginning with Red Storm. Our latest efforts have been directed at the energy efficiency of HPC applications at large scale. Commodity hardware and software employs a number of power saving features directed towards extending battery life. This approach could, however, be highly disruptive if used at scale on HPC platforms. Using the advanced monitoring techniques we developed, combined with modifications to our Catamount Light Weight Kernel, we have successfully measured the energy used at scale for the AMG application and a number of other important tri-lab applications while varying the clock frequency of theCPU. By deterministically lowering the CPU frequency, we have successfully taken advantage of the secondary effect of lowering input voltage to the CPU. The dynamic power dissipated by a chip is proportional to the voltage squared. This squared relationship has a much higher impact than the linear relationship resulting from simply lowering the CPU frequency.The result is shown in the following graph where AMG was first executed at the default CPU frequency (red), and again on the same nodes using a lower CPU frequency (blue). AMG, similar to many other tri-lab applications, is memory bound. Lowering the CPU frequency has the effect of extending run time by 3% but achieves an energy savings of more than 30%, and order of magnitude difference. Further analysis is being conducted on other important tri-lab applications.

(Contacts: James Laros and Kevin Pedretti)
November 2010
2010-8685P
Optimal Cooperative Control Algorithms for Camera Based
Observational Systems
Over the last several years, there has been considerable growth in camera based observation systems for a variety of safety, scientific, and recreational applications. Public safety officials use cameras to monitor traffic flow on the freeway. Conservationists place cameras in the wilderness to track migrations of birds and other animals. Broadcasters use cameras in stadiums to film sports games such as football, baseball, and soccer. In each of these situations, the user frequently desires the ability to increase the number of observed objects, but solving this problem is not as simple as adding more cameras. Quite often, there are economic or physical restrictions that prevent them from adding additional cameras to a system. As a result, these systems require methods that coordinate the tracking of objects between multiple cameras in an optimal way. In order to solve this problem, researchers at the Computer Science Research Institute (CSRI) in cooperation with Decision Support Systems have developed a collection of optimal cooperative control algorithms for camera based observation systems. In these algorithms, the cameras, mounted on pan/tilt units, coordinate their motion and field of view so that they can observe as many objects in the system as possible. In order to accomplish this, the researchers formulated the problem as a mixed integer linear program (MILP) where the number of observable objects is maximized while satisfying the kinematic constraints of the cameras. In order to increase their tracking accuracy, an MILP algorithm is integrated with predictive methods such as Kalman filters so that the coordination between cameras in the system may anticipate future motion. In short, this new formulation has provided a unified, integrated approach to solving a difficult coordinated tracking and control problem.

Figure 1: An example of two camera and their view angles, in translucent red, observing fifteen objects traveling in a ballistic trajectories. This mimics a situation where the cameras would observe multiple balls flying through the air.
(Contact: Joseph Young)
November 2010
2010-8193P
Workshop: Control and optimization of open quantum systems
A Sandia-organized workshop entitled “Control and optimization of open quantum systems for information processing” was held at the American Institute of Mathematics (AIM) in Palo Alto, California from June 21st to June 25th, 2010. Matthew Grace (8961) originated the idea after participating in an earlier AIM workshop. The co-organizers, Constantin Brif (8961), Wayne Witzel (1435), and Andrew Landahl (1412), brought their expertise in optimal control, dynamical decoupling, and quantum error correction, respectively. AIM solicits proposals for workshops and fully funds the selected participants.
The goal of the workshop was to bring together a focused cadre of world experts in the areas represented by the co-organizers to tackle the challenge of quantum control, with an especial emphasis on quantum information processing (QIP) applications. The ability to control technology at the quantum-mechanical level opens entirely new areas of science and engineering, with QIP applications ranging from quantum teleportation to quantum cryptography to quantum computation. Because real quantum systems are “open” to their environment, controlling them requires more than just experimental finesse—it requires carefully designed protocols to minimize the impact of system-environment interactions. This is precisely what the approaches of optimal control, dynamical decoupling, and quantum error correction aim to do, but in very different ways.
To appreciate how disparate these approaches to quantum control are, consider how widely their mathematical underpinnings vary. Optimal control is based on variational calculus and uses gradient methods or genetic algorithms to search for time-dependent classical controls that, subject to whatever constraints exist, protects quantum information against environmental decoherence as well as possible. Dynamical decoupling, on the other hand, is based on Lie algebra theory and reduces the effective coupling of a quantum system to its environment to arbitrary orders in perturbation theory. Finally, quantum error correction is based on linear-algebraic coding theory and distributes quantum information across multiple subsystems so that local environmental disturbances can be detected and corrected. While these approaches are quite different, their goal is the same: to control quantum systems, preferably in a way that is fault-tolerant, namely in a way that is robust to imperfect knowledge of the system and imperfect mastery of the controls.
The informal “structure-the-workshop-as-you-go” format allowed for adequate time to educate each community’s experts on the other’s approaches, with ample time for interactive questioning. It also generated lively discussions about what the most important open problems are and where some of the greatest areas for cross-fertilization of ideas may lie. While some progress was made on these questions during the workshop, the most lasting impact of the workshop will likely be the new collaborations that were begun across communities that historically have not interacted much. As quantum technology becomes more and more pervasive, the demand to control it for information-processing tasks grows—hopefully some practical new solutions will be able trace their origins to this workshop.

(Contacts: Wayne Witzel)
September 2010
2010-5928P
Cognitive Foundry Released Open Source
Sandia’s Cognitive System Research and Applications organizations (1432, 1434) are pleased to announce the first open-source release of the Cognitive Foundry. The Cognitive Foundry is a modular software library, written in Java, for the research and development of cognitive systems. It contains many reusable components for machine learning, statistical analysis, and cognitive modeling that are designed to be easy to plug into existing applications or explore novel research directions. The Foundry has been utilized in a wide range of research and development projects across Sandia, such as real-time automobile driver modeling, large-scale simulation of social networks, automated student assessment, modeling human analogy making, trainable automated forces for virtual environments, network defense, and search personalization.
(Contacts: Justin Basilico)
August 2010
Doc # 5285561
Tensor Toolbox and Poblano Toolbox Lead To Robust, Scalable Multiway Data Analysis
Version 2.4 of the Tensor Toolbox (http://csmr.ca.sandia.gov/~tgkolda/TensorToolbox) and version 1.0 of the Poblano Toolbox (http://software.sandia.gov/poblano) were released in March 2010. Together, these tools provide new robust, scalable algorithms for analysis, modeling and prediction of multiway data sets: CP-OPT for CANDECOMP/PARAFAC (CP) tensor factorizations and CP-WOPT for CP factorizations of data sets with missing values. Experiments show that CP-OPT is a fast, accurate and scalable approach to CP factorization, as compared to state-of-the-art methods for this problem (alternating least squares and nonlinear least squares); see Figure 1. CP-WOPT extends the CP-OPT method to handle data with missing values, and experiments demonstrate improved accuracy and improved scalability of the method as compared to state-of-the-art algorithms (expectation maximization alternating least squares and a Levenberg-Marquardt method);
see Figure 2.
Impact of these new methods includes improved results in the areas of cybersecurity, chemometrics, neuroscience, bibliometrics and graph analysis. In addition, this work has led to several recent peer reviewed publications (J. Chemometrics, ACM Transactions on Knowledge Discovery from Data, J. Chemometrics and Intelligent Laboratory Systems, International Conference on Data Mining, and SIAM Data Mining Conference) along with a chapter in an upcoming book from SIAM on graph algorithms.
 |
| Figure 1. Factors corresponding to the emission mode of an amino acid fluorescence data set using an alternating least squares method (CP-ALS; left) and CP-OPT (right) for rank R=3 (top) and rank R=4 (bottom). Three amino acids are present in the data; however, only CP-OPT is able to identify them correctly in the case of overfactoring (R=4). This is often the case in practice, where the true number of signatures in data is not known. |
|
| Figure 2. Factors of the CP-WOPT rank-3 model using complete EEG data (left) and data with 47% missing values (right). Sensor activation (column 1), time-frequency relationships (column2) and left/right hand stimuli responses (column 3) are shown. This example indicates how CP-WOPT can recover the underlying brain dynamics even in the presence of missing data (e.g, due to missing EEG channel data, noise, etc.). |
(Contacts: Brett Bader, Danny Dunlavy, Tamara Kolda )
July 2010
2010-5103P
ParaSpace: Sensitivity Analysis and Relationship Discovery in Large Ensembles of Simulation Runs
Modeling and simulation are central to the Stockpile Stewardship Program. Sensitivity analysis examines how variation in model outputs can be mapped to, or understood in relation to, differences in model inputs. This research is focused on developing a system combining scalable sensitivity analysis with visual abstractions to enable exploration of the correlations between data features (model parameters, mesh quantities, simulation metadata, etc.) and simulation runs. By integrating statistical analysis, mathematical modeling, and visualization, we are enabling a deeper understanding of latent relationships across simulation runs and providing confidence in ASC models.
We have developed two different prototypes, HelioView and Helio3D, each focusing on a different scale of analysis. HelioView (shown analyzing automobile data) provides a tool for analyzing input parameters against scalar outputs. Conventional tools use linear regression (a one-to-one measure) and response surface analysis (an all-to-one measure) to rank input variables with respect to output metrics. We use Canonical Correlation Analysis (CCA) to simultaneously evaluate all of the inputs with respect to all of the outputs, such that the correlations between the two are maximized (an all-to-all measure). The circular plot in the upper right visualizes which input parameters (left side) have the strongest impact on the outputs (right side), which outputs are most responsive to the inputs, and the directionality of these correlations. The circular bar chart in the center shows the correlation structure between all inputs and outputs. These values are displayed as red (positive) or blue (negative) bars extending inward or outward from a circular axis, where the length of each bar indicates its magnitude/relative importance. Opposite colors between inputs and outputs represent an inverse relationship, whereas same colors show a positive relationship. In the top image, increasing horsepower (HP) and cylinders are correlated with decreasing acceleration time and miles per gallon (MPG).
Helio3D (not shown) evaluates correlations at the scale of the full 3D model outputs from all of the runs. A bar chart displays the correlation results for the input parameter list. Volume rendering displays the output correlations, which are superimposed on the three-dimensional topology of the simulation model. Unlike conventional volume renderings which look at one simulation output at a time, this display shows correlative information for all simulation runs simultaneously. As with the bar chart, the directionality is shown through the red/blue color-coding. Opacity encodes magnitude, so small-valued correlations become transparent preventing the most significant correlations from being obscured. The bar chart and the rendered image must be viewed in combination to understand the correlations. We will be expanding Helio3D to provide web-based delivery from HPC platforms such as Cielo.
This project makes a number of important contributions to Sandia’s analysis capabilities. We have been working with Tony Giunta (former V&V manager, 1544) throughout the project. Following his suggestion, we initially concentrated on a scalar analysis tool to facilitate the learning curve needed by analysts in moving from familiar one-to-one/all-to-one measures to an all-to-all analysis. HelioView provides all-to-all analysis (a unique capability), visualization of how inputs and outputs are correlated as a whole, and an interactive environment for exploring how each element fits (or does not fit) within the trends of the ensemble. The scatterplot view helps identification of anomalous elements (the solitary element in green to the far right of the others is a car whose acceleration time is much faster and MPG is much lower than other cars of similar weight, as shown in the highlighted row of the table). With Helio3D, we expand our scale to the full simulation output (though in an all-to-one mode since the multiple output variables are replaced by one variable at multiple locations) and are able to add spatial context to the correlation analysis, which is unlike anything currently available elsewhere.
HelioView (shown analyzing automobile data) provides a tool for analyzing input parameters against scalar outputs. Conventional tools use linear regression (a one-to-one measure) and response surface analysis (an all-to-one measure) to rank input variables with respect to output metrics. We use Canonical Correlation Analysis (CCA) to simultaneously evaluate all of the inputs with respect to all of the outputs, such that the correlations between the two are maximized (an all-to-all measure). The circular plot in the upper right visualizes which input parameters (left side) have the strongest impact on the outputs (right side), which outputs are most responsive to the inputs, and the directionality of these correlations. The circular bar chart in the center shows the correlation structure between all inputs and outputs. These values are displayed as red (positive) or blue (negative) bars extending inward or outward from a circular axis, where the length of each bar indicates its magnitude/relative importance. Opposite colors between inputs and outputs represent an inverse relationship, whereas same colors show a positive relationship. In the top image, increasing horsepower (HP) and cylinders are correlated with decreasing acceleration time and miles per gallon (MPG).
 |
 |
(Contact: Patricia Crossno)
July 2010
2010-4664P
Optimization-Based Remap: A New Mathematical Paradigm in Mesh-To-Mesh Transfer of Simulation Data
Remap, broadly defined as the transfer of simulation data between different computational meshes subject to physical constraints, is a critical component in a number of numerical algorithms. These include a variety of mesh tying algorithms, data transfer in multiphysics schemes and the so-called “continuous rezone” used in Arbitrary Lagrangian-Eulerian transport schemes. Sandia researchers have recently developed a new mathematical framework for remap, based on ideas from constrained optimization. Compared to the state of the art, Sandia’s optimization-based remap (OBR) can generate significantly more accurate solutions at or below the cost of the best currently used remapping techniques.
Sandia's OBR is formulated as the solution of an inequality-constrained optimization problem, in which accuracy considerations, handled by an objective functional, are separated from the enforcement of local physical bounds (or monotonicity considerations), handled by a carefully defined set of inequality constraints. The strengths of Sandia's OBR formulation are that it finds the most accurate mesh-to-mesh data transfer subject to the given physical bounds and that it naturally applies to unstructured meshes and meshes comprised of arbitrary polyhedral cells.
Sandia's OBR has been compared to flux-corrected remap (FCR), which is the current state of the art. The FCR algorithm is related to the well-known flux-corrected transport (FCT) scheme. The analysis of the connection between OBR and FCR shows that FCR can be interpreted as a solution procedure for another optimization problem, denoted M-OBR, in which the OBR objective functional is minimized over a subset of the OBR admissible set. This is illustrated in Figure 1. A very important consequence is that OBR can select solutions from a considerably larger admissible set, often resulting in added accuracy. In practice, differences in accuracy of up to three orders of magnitude in favor of OBR have been observed on sequences of mesh-to-mesh data transfers. A preliminary complexity analysis and several computational examples indicate that the computational cost of OBR is a small constant factor times the cost of FCR. This suggests that OBR can be competitive in practical applications where (1) an optimally accurate (2) bounds-preserving mesh-to-mesh data transfer is desired.
The work on OBR has been presented at the DOE ASCR Applied Mathematics Research PI Meeting (Berkeley, May 2010), the NNSA BER Meeting on Atmospheric Transport (Sandia NM, March 2010), and will be submitted for publication in J. Comp. Physics. See technical report SAND2010-3021.

Figure 1: Structure of the OBR (left panel) and the M-OBR (right panel, a.k.a. FCR) admissible sets for a simple 3-cell remap example. The shadows point toward the interior of the admissible sets. It is evident that (1) OBR admits a significantly larger set of potential solutions and (2) the M-OBR (FCR) set of solutions is fully contained within the OBR admissible set.
(Contact: Denis Ridzal)
May 2010
2010-3397P
Sandians Organize and Present at International Hydrodynamics Conference
Researchers in the Computational Shock and Multiphysics Department (1431) participated in the 2009 Numerical Methods for Multi-material Fluids and Structures Conference held in Pavia, Italy last September. The conference brings together researchers from around the world to discuss the state of the art of multi-material hydrodynamics simulation methodologies. G. Scovazzi (1431) helped organized the conference and is also serving as an editor for papers to be published in the International Journal for Numerical Methods in Fluids. Other researchers from Department 1431 presented papers related to work on a Riemann solution based multi-material closure algorithm, a stability analysis leading to a more robust multi-material sub-cell closure algorithm, an adaptive multi-material remap method which is aware of material interfaces, variational multi-scale Lagrangian shock hydrodynamics, predictor-corrector Lagrangian hydrodynamics algorithms and approaches for magneto-hydrodynamic remapping in an Arbitrary Lagrangian-Eulerian modeling framework. Papers associated with the conference presentations have been submitted to the conference journal and are in review and revision. Algorithmic research and development provides the basis for continued improvement of the modeling software delivered to customers for important National Security missions.
(Contact: Randall Summers)
May 2010
2010-2912P
Improved Parallel Mesh Partitioning with Hypergraphs and Zoltan
Sandia researchers in the DOE SciDAC Combinatorial Scientific Computing and Petascale Simulations (CSCAPES) Institute recently developed a new hypergraph model for partitioning computational meshes in parallel simulations that accurately represents both interprocessor communication cost and the memory cost of “ghosting” mesh entities. Meshes play an important role in many scientific simulations, especially the numerical solution of partial differential equations by finite element, finite volume and finite difference methods. In parallel computations, high-quality mesh partitioning – dividing a mesh among parallel processors – is crucial to ensure memory use and computational work are balanced and interprocessor communication cost is low. The Zoltan toolkit provides several algorithms suitable for mesh partitioning, including geometric and connectivity-based partitioners. The most commonly used approach to mesh partitioning is to model the mesh using the dual graph, where graph vertices represent mesh elements, and graph edges connect each element with its adjacent elements (Figure 1, middle). Graph partitioners then partition this graph (and, thus, the mesh) so that each part has an equal number of vertices (elements) while attempting to minimize the number of edges (adjacencies) cut by processor boundaries. However, minimizing the edge cut in the graph does not produce the optimal partition of the mesh. Sandia’s new model improves upon the traditional model by employing hypergraphs – extensions of graphs that allow edges to contain two or more vertices. As in the graph model, the new model partitions mesh elements, but assumes that the simulation’s degrees of freedom are associated with mesh nodes. Thus, hypergraph vertices represent mesh elements, and hyperedges represent mesh nodes, with each hyperedge (mesh node) connecting the mesh elements incident to that node (Figure 1, right). A hypergraph partitioner then assigns an equal number of vertices (elements) to each part, while attempting to minimize the number of processors over which hyperedges extend (the number of processors sharing nodes). Unlike the graph model, this new model provides an exact measure of the communication volume associated with boundary data exchanges for shared nodes along processor boundaries.
In collaboration with members of DOE’s SciDAC Interoperable Technologies for Advanced Petascale Simulations (ITAPS) Center at Rensselaer Polytechnic Institute, we tested this hypergraph model on a billion-element mesh in an implicit finite element fluid-flow simulation of blood flow in abdominal aortic aneurysms. The parallel hypergraph partitioner in Zoltan was used to generate partitions for up to 131K processors. The hypergraph partitions reduced the application’s memory and communication costs, and were also faster to compute than traditional graph partitions. The resulting application run times on the Intrepid BG/L are shown in Figure 2. The time reductions were achieved by changing only the partitioning model; no changes to the application code were needed. Because the solver phase of the computation is node-based, it benefits most from the new model. These experiments were performed on tetrahedral meshes; we speculate that the improvement may be greater for mixed-element meshes as the accuracy of the partitioning model is more important.
Figure 1. (Left) An example mesh. (Middle) The dual-graph model (shown in red) typically used in graph partitioning; graph vertices (circles) represent mesh elements, while graph edges connect adjacent elements. (Right) Sandia’s new hypergraph mesh partitioning model; each hypergraph vertex (red circle) represents an element, while each hyperedge (blue square) represents a mesh node, connecting all elements incident to it.
|
|
 |
Figure 2. Execution times for the implicit finite element code PHASTA using a 1B element mesh. The hypergraph partition model decreases application execution time compared to a traditional dual-graph partition. Both partitions were computed using Zoltan. |
(Contacts: Erik Boman and Karen Devine)
April 2010
2010-2386P
Radiation Damage Simulations Featured on the Cover of the Journal of Chemical Physics
Molecular dynamics (MD) simulations can be used to study damage that occurs when materials are exposed to high doses of radiation or other intense energy deposition. Electronic effects are represented as a continuum field for which we solve the heat equation and the electron field is coupled to the discrete ions using a two temperature model (TTM). Previous researchers demonstrated TTM coupling by using an inhomogeneous and finite Langevin thermostat.
The 21 August 2009 issue of the Journal of Chemical Physics (JCP) featured recent TTM simulation work by Carolyn Phillips (University of Michigan) and Paul Crozier (1435), with a picture from their article appearing on the cover. During the Spring of 2009 Carolyn completed a three month practicum at Sandia's CSRI as part of her DOE Computational Science Graduate Fellowship (CSGF); Paul was her practicum advisor and mentor. Carolyn demonstrated that the TTM + MD approach reported in the literature leaks energy. She then developed, implemented, and tested an energy conserving version. Her new-and-improved version of TTM + MD was added to Sandia’s massively parallel MD code, LAMMPS, and used to simulate radiation damage in single-component and binary Lennard-Jones (LJ) crystals. (This is a fictitious material that is described by the well known Lennard-Jones interatomic potential model).
The single component face centered cubic LJ crystals showed remarkable resilience to radiation damage, whereas the binary glass-forming LJ crystal retained damage much more extensively. For both of these LJ systems, a sweep of TTM parameter space was performed to investigate how the electronic subsystem properties and its coupling with the ionic subsystem influenced material damage. A special shape-matching algorithm was used to identify damaged material.
The figure from the cover of JCP shows two unit cells of binary LJ in (a) and a partially damaged sample in (b) and (c). (a) and (b) Species A atoms in the CsCl phase are colored red and green in the fcc phase; species B atoms are colored blue. (c) Damaged atoms are colored cyan and crystalline atoms are red.
(Contact: Paul Crozier)
January 2010
2009-5344P
|