Categories of Work
Malicious cyber activities, such as malware and phishing attacks, have become an increased threat, as more of the world's important assets become computerized and move online. Possessing the ability to detect and neutralize such attacks is of paramount importance if we want to ensure our ability to operate effectively in this new environment.
We address this challenge by the development of frameworks capable of the collection, processing, and analysis of cyber activity in real-time. We combine state-of-the-art packet capture utilities and flexible databases with prediction and analysis algorithms from Sandia's Titan Toolkit to deliver accurate and timely solutions to analysts desktops, integrated into their existing tools. This combination results in a flexible, easy--to-use, and potent edge in the fight against cyber terrorists.
Graphs have been increasingly used to describe complex data sets in recent years because of their ability to capture relationships between numerous features in complex, heterogeneous data sets. Many algorithms and heuristics exist enabling efficient searches of these data sets to extract useful information and structure from the data. We are using heterogeneous graphs to model temporal geospatial data as well as to analyze results from scientific simulation data.
The Araxne project is conducting research into methods for remote data indexing, retrieval, and inference, inspired by the organization of the World Wide Web, to support exascale analysis and address data-movement challenges by creating, storing, and indexing reduced-bandwidth representations of remote data. These capabilities will be used to implement the next generation of online, highly-interactive post-processing and visualization tools, capable of working with scientific datasets that are massive both in size and complexity.
Titan provides both supervised and unsupervised machine-learning algorithms, flexibly implemented to allow researchers to explore interesting new ideas that advance computer science, while simultaneously enabling developers to create effective analysis pipelines to deliver quick and accurate results. For analysis in which time is of the essence, Titan offers algorithms suitable for real-time and streaming applications, employing parallelized designs, rapid-training mechanisms, and instance-based learning techniques.
Titan's machine-learning algorithms have been applied to many different problem areas, such as text analysis, cyber security, and bioinformatics. Available in both generic header and pipeline formats, these algorithms provide an effective complement to Titan's analysis suite.
Vast quantities of information are contained in documents, articles, webpages, emails, and other types of unstructured text. Given the rate that new content is being generated, even an army of readers could not hope to keep up with reading and categorizing all of it. Text analysis provides an automated means for evaluating the conceptual content of these collections without first having to read them. The ParaText project developed a scalable text analysis engine that uses various statistical and probabilistic methods to develop concept and topic models from large document collections.
As part of the ParaText project, we also evaluated the impacts of various algorithmic choices on document groupings and how document-concept relationships change with changing parameter values. We created a visual analytics tool, LSAView, for presenting statistical information and correlations. LSAView uses multiple-linked views of document-similarity graphs and the difference matrices between them to enable exploration of various configurations. Additionally, we developed a different visual analytics tool, TopicView, for comparing and contrasting two different modeling approaches, specifically Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA)