Catalyst is an in situ library built using ParaView to link visualization capability directly with a simulation to produce visualization and analysis products as part of the simulation run. Catalyst pairs with ParaView by taking pipelines which are generated interactively within ParaView, including all views and algorithms, and exporting these into a pipeline which can be run as part of the simulation to generate those same views and algorithms. Many improvements have been made since Catalyst's first release into ParaView version 3 to both the usability of the pipeline creation and efficiency and scalability of library itself. While originally released as an optional part of ParaView's build process, Catalyst has since evolved into a separate library optimized for in situ operation, to be available with ParaView version 4.
The Dax Toolkit supports the fine-grained concurrency for data analysis and visualization algorithms required to drive exascale computing. The basic computational unit of the Dax Toolkit is a worklet, a function that implements the algorithm’s behavior on an element of a mesh (that is, a point, edge, face, or cell) or a small local neighborhood. Worklets can be scheduled on an unlimited number of threads but are easy to design and debug. The Dax toolkit provides the basic mesh computations and a set of communicative operations to build pervasively parallel visualization algorithms.
The Network-and-Ensemble Enabled Entity Extraction from Informal Text (NEEEEIT) project is conducting basic computer science research into the application of traditional ensemble techniques to non-traditional machine learning algorithms such as conditional random fields (CRFs), applied to named entity extraction from sources of informal text such as email, blogs, and tweets.
We have plenty of tools for network monitoring, intrusion detection, data capture and forensics. We have far less for constructing, manipulating and annotating the stories that we tell from that raw data. We have almost nothing that applies any resulting insight to decision support. This project's intent is to address that gap. We use cybersecurity as our driving domain and envision broader applications in decision support.
We begin with a testbed network instrumented to capture events in a red team/blue team exercise. By using a testbed we can simplify the low-level task of attributing action to actors.
To whatever extent we are able we want to let the computer provide building blocks. We want to assemble these individual actions into short sequences -- beats in theater terminology -- that are organized around a goal. These beats can be composed to make larger beats or even entire scenes that encompass actors, a setting and an intent. Actors and scenes are in turn grouped into larger organizations that have strategies and engagements.
The innovation in our results will be twofold. First, we will develop prototypes for constructing, storing and presenting these stories at any level of abstraction from high-level strategy to low-level supporting data -- all in one artifact. Second, we will test our approach in collaboration with the Tracer FIRE computer forensics workshop hosted at Sandia.
Contact: Andy Wilson
ParaText is open-source, including a distributed memory software framework for document ingestion, text extraction, modeling, and analyzing a large corpus of unstructured text. ParaText components are built within the Titan Toolkit, which provides a pipeline-based execution model that allows data sources, filters, and outputs to be flexibly combined. ParaText pipelines typically included text extraction, term dictionary creation, term-document matrix creation, and term weighting prior to modeling. Parallel implementations of Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) are the two modeling approaches that have been developed. ParaText components can be used as a C++, Python, or Java programming library. ParaText is available for download through the Titan repository.
To date, Sandia National Laboratories has used ParaView to visualize meshes containing 100's of millions of unstructured cells and billions of cells in structured AMR grids with 100's of thousands of blocks. Furthermore, scalability tests have demonstrated ParaView on meshes with trillions of cells, and ParaView batch processing has leveraged 100's of thousands of processors.
You have 17 million articles written by 9.3 million men and women over 129 years. More are arriving every day.
You have until Monday to report on them. Topics, trends, communities, everything.
You're going to need some help.
For information: PubMed Project Page
Within Themis, data ingestion is handled by araXne. Taking inspiration from search engine spiders that must handle similar complexity within the World Wide Web, araXne is a tool for cataloging scientific data by scanning, indexing, querying, and retrieving it in storage- and schema-agnostic ways, thereby adapting to the way scientists store their data instead of attempting to dictate formats or structure. Just as cards in a library’s card catalog offer multiple representations (title, author, subject) of underlying documents, metadata extracted by araXne present different perspectives on units of scientific data, such as tables, time series, images, animations, and meshes. Each perspective contains metadata and aggregated or summarized data appropriate to its type. These can then be queried to specify analysis inputs and to create visualizations.
A collaborative effort between Sandia National Laboratories and Kitware Inc., Titan is a collection of scalable algorithms for data ingestion and analysis that share a common set of data structures and a flexible, component-based pipeline architecture. The algorithms in Titan span a broad range of structured and unstructured analysis techniques, and are particularly suited to parallel computation on distributed memory supercomputers.
Titan components may be used by application developers using their native C++ API on all popular platforms, or using a broad set of language bindings that include Python, Java, TCL, and more. Developers will combine Titan components with their own application-specific business logic and user interface code to address problems in a specific domain. Titan is used in applications varying from command-line utilities and straightforward graphical user interface tools to sophisticated client-server applications and web services, on platforms ranging from individual workstations to some of the most powerful supercomputers in the world.
Note: Click on images for larger view.