skip to: onlinetools | mainnavigation | content | footer

Projects

Catalyst

Catalyst is an in situ library built using ParaView to link visualization capability directly with a simulation to produce visualization and analysis products as part of the simulation run.  Catalyst pairs with ParaView by taking pipelines which are generated interactively within ParaView, including all views and algorithms, and exporting these into a pipeline which can be run as part of the simulation to generate those same views and algorithms.  Many improvements have been made since Catalyst's first release into ParaView version 3 to both the usability of the pipeline creation and efficiency and scalability of library itself.  While originally released as an optional part of ParaView's build process, Catalyst has since evolved into a separate library optimized for in situ operation, to be available with ParaView version 4.

 

 

DAX

DAX Image
The transition to exascale machines represents a fundamental change in computing architecture. Efficient computation on exascale machines requires a massive amount of concurrent threads, at least 1000× more concurrency than existing systems. Current visualization solutions cannot support this extreme level of concurrency. Exascale systems require a new programming model and a fundamental change in how we design fundamental algorithms. To address these issues, our project builds the Data Analysis at Extreme (Dax) Toolkit.

The Dax Toolkit supports the fine-grained concurrency for data analysis and visualization algorithms required to drive exascale computing. The basic computational unit of the Dax Toolkit is a worklet, a function that implements the algorithm’s behavior on an element of a mesh (that is, a point, edge, face, or cell) or a small local neighborhood.  Worklets can be scheduled on an unlimited number of threads but are easy to design and debug.  The Dax toolkit provides the basic mesh computations and a set of communicative operations to build pervasively parallel visualization algorithms.

GeoGraphy

Football fields - optical

Click on image for larger view

The GeoGraphy project seeks to extract knowledge from geospatial imagery data that has been classified into discrete landcover classes (i.e., Buildings, Roads, Railroads, Grass, Tree Canopy, Pavement, Water, etc.).

Football fields - landcover

Click on image for larger view

The principal manner of feature detection is to construct a semantic-graph representation of geospatial features and then mine it for matches to template patterns that capture both features and the geospatial context among them.
NEEEEIT

The Network-and-Ensemble Enabled Entity Extraction from Informal Text (NEEEEIT) project is conducting basic computer science research into the application of traditional ensemble techniques to non-traditional machine learning algorithms such as conditional random fields (CRFs), applied to named entity extraction from sources of informal text such as email, blogs, and tweets.

Nested Narratives

We have plenty of tools for network monitoring, intrusion detection, data capture and forensics. We have far less for constructing, manipulating and annotating the stories that we tell from that raw data. We have almost nothing that applies any resulting insight to decision support. This project's intent is to address that gap. We use cybersecurity as our driving domain and envision broader applications in decision support.

Our Approach

We begin with a testbed network instrumented to capture events in a red team/blue team exercise. By using a testbed we can simplify the low-level task of attributing action to actors.

To whatever extent we are able we want to let the computer provide building blocks. We want to assemble these individual actions into short sequences -- beats in theater terminology -- that are organized around a goal. These beats can be composed to make larger beats or even entire scenes that encompass actors, a setting and an intent. Actors and scenes are in turn grouped into larger organizations that have strategies and engagements.

The innovation in our results will be twofold. First, we will develop prototypes for constructing, storing and presenting these stories at any level of abstraction from high-level strategy to low-level supporting data -- all in one artifact. Second, we will test our approach in collaboration with the Tracer FIRE computer forensics workshop hosted at Sandia.

Contact: Andy Wilson

Para Text Pipeline Image
The volume of data contained in textual form is enormous.  Automated and scalable methods are needed to evaluate the contents of document collections without reading them.  The ParaText project at Sandia National Laboratories created a scalable text analysis engine that uses statistical methods, such as Latent Semantic Analysis (LSA), to evaluate the concepts found within a large corpus of documents.  Using LSA to extract concepts and relationships between documents, the corpus can be interactively explored through a visual application where documents are grouped by concept within a landscape metaphor, for example. 

ParaText is open-source, including a distributed memory software framework for document ingestion, text extraction, modeling, and analyzing a large corpus of unstructured text.  ParaText components are built within the Titan Toolkit, which provides a pipeline-based execution model that allows data sources, filters, and outputs to be flexibly combined. ParaText pipelines typically included text extraction, term dictionary creation, term-document matrix creation, and term weighting prior to modeling. Parallel implementations of Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) are the two modeling approaches that have been developed. ParaText components can be used as a C++, Python, or Java programming library. ParaText is available for download through the Titan repository.

ParaView

ParaView Image
ParaView is an open-source application for visualizing two- and three-dimensional data sets.  The size of the data sets ParaView can handle varies widely depending on the architecture on which the application is run.  The platforms supported by ParaView range from single-processor workstations to multiple-processor distributed-memory supercomputers or workstation clusters.  Using a parallel machine, ParaView can process very large data sets in parallel and collect the results.

To date, Sandia National Laboratories has used ParaView to visualize meshes containing 100's of millions of unstructured cells and billions of cells in structured AMR grids with 100's of thousands of blocks.  Furthermore, scalability tests have demonstrated ParaView on meshes with trillions of cells, and ParaView batch processing has leveraged 100's of thousands of processors.

PubMed

You have 17 million articles written by 9.3 million men and women over 129 years. More are arriving every day.

You have until Monday to report on them. Topics, trends, communities, everything.

You're going to need some help.

For information: PubMed Project Page

Themis

ThemisArchitecture Image
Themis is a framework for providing remote analysis and visualization of ensembles of large, high-dimensional data stored on High Performance Computing (HPC) platforms.  The Themis architecture integrates data ingestion, data management, scalable analysis, and visualization using a multi-tiered hierarchy of data and model storage.  Large data are kept on the HPC.  Analysis operations are performed in place, minimizing costly data movement and utilizing the HPC to scale the analysis.  The resulting smaller-scale models and other analysis artifacts are moved to a NoSQL project database on a separate server.  These artifacts are the basis for visualizations that are delivered to users' desktops through an ordinary web browser, eliminating the need to build and deploy platform-specific client applications.

Within Themis, data ingestion is handled by araXne. Taking inspiration from search engine spiders that must handle similar complexity within the World Wide Web, araXne is a tool for cataloging scientific data by scanning, indexing, querying, and retrieving it in storage- and schema-agnostic ways, thereby adapting to the way scientists store their data instead of attempting to dictate formats or structure.  Just as cards in a library’s card catalog offer multiple representations (title, author, subject) of underlying documents, metadata extracted by araXne present different perspectives on units of scientific data, such as tables, time series, images, animations, and meshes.  Each perspective contains metadata and aggregated or summarized data appropriate to its type.  These can then be queried to specify analysis inputs and to create visualizations.

Titan

A collaborative effort between Sandia National Laboratories and Kitware Inc., Titan is a collection of scalable algorithms for data ingestion and analysis that share a common set of data structures and a flexible, component-based pipeline architecture. The algorithms in Titan span a broad range of structured and unstructured analysis techniques, and are particularly suited to parallel computation on distributed memory supercomputers.

Titan components may be used by application developers using their native C++ API on all popular platforms, or using a broad set of language bindings that include Python, Java, TCL, and more. Developers will combine Titan components with their own application-specific business logic and user interface code to address problems in a specific domain. Titan is used in applications varying from command-line utilities and straightforward graphical user interface tools to sophisticated client-server applications and web services, on platforms ranging from individual workstations to some of the most powerful supercomputers in the world.

Note: Click on images for larger view.

Contact Us: Patricia Crossno | (505) 845-7506