Sandia National Laboratories
Daniel M. Dunlavy
Home
Contact Info
Publications
Presentations
Software
Curriculum Vitae
Internal Reports


Contact
Daniel M. Dunlavy
Principal Member of Technical Staff
dmdunla@sandia.gov
(505) 206-9855


Related Links
Department
Center
CSRI

Publications


QuickSearch:

Search Settings

Book Chapters

  • Multilinear Algebra for Analyzing Data with Multiple Linkages, Dunlavy, D.M., Kolda, T.G. & Kegelmeyer, W.P., In Graph Algorithms in the Language of Linear Algebra, Philadelphia, PA , SIAM 2010 (in press). [BibTeX] [PDF]
BibTeX:
@incollection{DuKeKo10,
  author = {Daniel M. Dunlavy and Tamara G. Kolda and W. Philip Kegelmeyer},
  title = {Multilinear Algebra for Analyzing Data with Multiple Linkages},
  booktitle = {Graph Algorithms in the Language of Linear Algebra},
  publisher = {SIAM},
  year = {2010 (in press)}
}

Refereed Journal Articles

  • TopicView: Visual Analysis of Topic Models and their Impact on Document Clustering, Crossno, P.J., Wilson, A.T., Shead, T.M., IV, W.L.D. & Dunlavy, D.M.. International Journal on Artificial Intelligence Tools 2013 (accepted). [Abstract] [BibTeX] [PDF]
Abstract: We present a new approach for analyzing topic models using visual analytics. We have developed TopicView, an application for visually comparing and exploring multiple models of text corpora, as a prototype for this type of analysis tool. TopicView uses multiple linked views to visually analyze conceptual or topical content, document relationships identified by the models, and the impact of the models on the results of document clustering. As case studies, we examine models created using two standard approaches: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Conceptual content is compared through the combination of (i) a bipartite graph matching LSA concepts with LDA topics based on the cosine similarities of model factors and (ii) a table containing the terms for each LSA concept and LDA topic listed in decreasing order of importance. Document relationships are examined through the combination of (i) side-by-side document similarity graphs, (ii) a table listing the weights for each document's contribution to each concept/topic, and (iii) a full text reader for documents selected in either of the graphs or the table. The impact of LSA and LDA models on document clustering applications is explored through similar means, using proximities between documents and cluster exemplars for graph layout edge weighting and table entries. We demonstrate the utility of TopicView's visual approach to model assessment by comparing LSA and LDA models of several example corpora.
BibTeX:
@article{CrWiShDaDu13,
  author = {Patricia J. Crossno and Andrew T. Wilson and Timothy M. Shead and Warren L. Davis IV and Daniel M. Dunlavy},
  title = {TopicView: Visual Analysis of Topic Models and their Impact on Document Clustering},
  journal = {International Journal on Artificial Intelligence Tools},
  year = {2013 (accepted)}
}
  • Temporal Link Prediction using Matrix and Tensor Factorizations, Dunlavy, D.M., Kolda, T.G. & Acar, E.. ACM Transactions on Knowledge Discovery from Data Vol. 5(2), February 2011. [Abstract] [BibTeX] [DOI] [PDF]
Abstract: The data in many disciplines such as social networks, web analysis, etc. is link-based, and the link structure can be exploited for many dfiferent data mining tasks. In this paper, we consider the problem of temporal link prediction: Given link data for times 1 through T, can we predict the links at time T + 1? If our data has underlying periodic structure, can we predict out even further in time, i.e., links at time T + 2, T + 3, etc.? In this paper, we consider bipartite graphs that evolve over time and consider matrix- and tensor-based methods for predicting future links. We present a weight-based method for collapsing multi-year data into a single matrix. We show how the well-known Katz method for link prediction can be extended to bipartite graphs and, moreover, approximated in a scalable way using a truncated singular value decomposition. Using a CANDECOMP/PARAFAC tensor decomposition of the data, we illustrate the usefulness of exploiting the natural three-dimensional structure of temporal link data. Through several numerical experiments, we demonstrate that both matrix- and tensor-based techniques are effective for temporal link prediction despite the inherent difficulty of the problem. Additionally, we show that tensor-based techniques are particularly e ective for temporal data with varying periodic patterns.
BibTeX:
@article{DuKoAc11,
  author = {Daniel M. Dunlavy and Tamara G. Kolda and Evrim Acar},
  title = {Temporal Link Prediction using Matrix and Tensor Factorizations},
  journal = {ACM Transactions on Knowledge Discovery from Data},
  year = {2011},
  volume = {5},
  number = {2},
  doi = {http://dx.doi.org/10.1145/1921632.1921636}
}
  • Scalable Tensor Factorizations for Incomplete Data, Acar, E., Dunlavy, D.M., Kolda, T.G. & Morup, M.. Chemometrics and Intelligent Laboratory Systems Vol. 106(1), pp. 41-56., March 2011. [Abstract] [BibTeX] [DOI] [PDF]
Abstract: The problem of incomplete data---i.e., data with missing or unknown values---in multi-way arrays is ubiquitous in biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, communication networks, etc. We consider the problem of how to factorize data sets with missing values with the goal of capturing the underlying latent structure of the data and possibly reconstructing missing values (i.e., tensor completion). We focus on one of the most well-known tensor factorizations that captures multi-linear structure, CANDECOMP/PARAFAC (CP). In the presence of missing data, CP can be formulated as a weighted least squares problem that models only the known entries. We develop an algorithm called CP-WOPT (CP Weighted OPTimization) that uses a first-order optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factorize tensors with noise and up to 99% missing data. A unique aspect of our approach is that it scales to sparse large-scale data, e.g., 1000 X 1000 X 1000 with five million known entries (0.5% dense). We further demonstrate the usefulness of CP-WOPT on two real-world applications: a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes and the problem of modeling computer network traffic where data may be absent due to the expense of the data collection process.
BibTeX:
@article{AcDuKoMo11,
  author = {Evrim Acar and Daniel M. Dunlavy and Tamara G. Kolda and Morten Morup},
  title = {Scalable Tensor Factorizations for Incomplete Data},
  journal = {Chemometrics and Intelligent Laboratory Systems},
  year = {2011},
  volume = {106},
  number = {1},
  pages = {41--56},
  doi = {http://dx.doi.org/10.1016/j.chemolab.2010.08.004}
}
  • A Scalable Optimization Approach for Fitting Canonical Tensor Decompositions, Acar, E., Dunlavy, D.M. & Kolda, T.G.. Journal of Chemometrics Vol. 25(2), pp. 67-86., February 2011. [Abstract] [BibTeX] [DOI] [PDF]
Abstract: Tensor decompositions are higher-order analogues of matrix decompositions and have proven to be powerful tools for data analysis. In particular, we are interested in the canonical tensor decomposition, otherwise known as CANDECOMP/PARAFAC (CP), which expresses a tensor as the sum of component rank-one tensors and is used in a multitude of applications such as chemometrics, signal processing, neuroscience, and web analysis. The task of computing CP, however, can be difficultt. The typical approach is based on alternating least squares (ALS) optimization, but it is not accurate in the case of overfactoring. High accuracy can be obtained by using nonlinear least squares (NLS) methods; the disadvantage is that NLS methods are much slower than ALS. In this paper, we propose the use of gradient-based optimization methods. We discuss the mathematical calculation of the derivatives and show that they can be computed efficiently, at the same cost as one iteration of ALS. Computational experiments demonstrate that the gradient-based optimization methods are more accurate than ALS and faster than NLS.
BibTeX:
@article{AcDuKo11,
  author = {Evrim Acar and Daniel M. Dunlavy and Tamara G. Kolda},
  title = {A Scalable Optimization Approach for Fitting Canonical Tensor Decompositions},
  journal = {Journal of Chemometrics},
  year = {2011},
  volume = {25},
  number = {2},
  pages = {67--86},
  doi = {http://dx.doi.org/10.1002/cem.1335}
}
  • QCS: A System for Querying, Clustering and Summarizing Documents, Dunlavy, D.M., O'Leary, D.P., Conroy, J.M. & Schlesinger, J.D.. Information Processing & Management Vol. 43(6), pp. 1588-1605. 2007. [Abstract] [BibTeX] [DOI] [PDF]
Abstract: Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel integrated information retrieval system--the Query, Cluster, Summarize (QCS) system--which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of methods in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) as measured by the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence trimming and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.
BibTeX:
@article{DuOlCoSc07,
  author = {Daniel M. Dunlavy and Dianne P. O'Leary and John M. Conroy and Judith D. Schlesinger},
  title = {QCS: A System for Querying, Clustering and Summarizing Documents},
  journal = {Information Processing & Management},
  year = {2007},
  volume = {43},
  number = {6},
  pages = {1588--1605},
  note = {Text Summarization},
  doi = {http://dx.doi.org/10.1016/j.ipm.2007.01.003}
}
  • HOPE: A Homotopy Optimization Method for Protein Structure Prediction, Dunlavy, D.M., O'leary, D.P., Klimov, D. & Thirumalai, D.. Journal of Computational Biology Vol. 12(10), pp. 1275-1288. 2005. [Abstract] [BibTeX] [DOI] [URL] [PDF]
Abstract: We use a homotopy optimization method, HOPE, to minimize the potential energy associated with a protein model. The method uses the minimum energy conformation of one protein as a template to predict the lowest energy structure of a query sequence. This objective is achieved by following a path of conformations determined by a homotopy between the potential energy functions for the two proteins. Ensembles of solutions are produced by perturbing conformations along the path, increasing the likelihood of predicting correct structures. Successful results are presented for pairs of homologous proteins, where HOPE is compared to a variant of Newton's method and to simulated annealing.
BibTeX:
@article{DuOlKlTh05,
  author = {Dunlavy, Daniel M. and O'leary, Dianne P. and Klimov, Dmitri and Thirumalai, D.},
  title = {HOPE: A Homotopy Optimization Method for Protein Structure Prediction},
  journal = {Journal of Computational Biology},
  year = {2005},
  volume = {12},
  number = {10},
  pages = {1275--1288},
  note = {PMID: 16379534},
  url = {http://www.liebertonline.com/doi/abs/10.1089/cmb.2005.12.1275},
  doi = {http://dx.doi.org/10.1089/cmb.2005.12.1275}
}
  • Structure Preserving Algorithms for Perplectic Eigenproblems, Mackey, D.S., Mackey, N. & Dunlavy, D.M.. Electronic Journal of Linear Algebra Vol. 13, pp. 10-39., February 2005. [Abstract] [BibTeX] [URL] [PDF]
Abstract: Abstract: Structured real canonical forms for matrices in R^n x n that are symmetric or skewsymmetric about the anti-diagonal as well as the main diagonal are presented, and Jacobi algorithms for solving the complete eigenproblem for three of these four classes of matrices are developed. Based on the direct solution of 4 x 4 subproblems constructed via quaternions, the algorithms calculate structured orthogonal bases for the invariant subspaces of the associated matrix. In addition to preserving structure, these methods are inherently parallelizable, numerically stable, and show asymptotic quadratic convergence.
BibTeX:
@article{MaMaDu05,
  author = {D. Steven Mackey and Niloufer Mackey and Daniel M. Dunlavy},
  title = {Structure Preserving Algorithms for Perplectic Eigenproblems},
  journal = {Electronic Journal of Linear Algebra},
  year = {2005},
  volume = {13},
  pages = {10-39},
  note = {Supplmental media available at http://www.math.technion.ac.il/iic/ela/ela-articles/articles/media/perplectic.html.},
  url = {http://www.math.technion.ac.il/iic/ela/ela-articles/articles/media/perplectic.html}
}

Refereed Conference and Workshop Proceedings

  • Using NoSQL Databases for Streaming Network Analysis, Wylie, B., Dunlavy, D., IV, W.D. & Baumes, J., In Proceedings of the IEEE Symposium on Large Scale Data Analysis and Visualization (LDAV), 2012. [Abstract] [BibTeX] [PDF]
Abstract: The high-volume, low-latency world of network traffic presents significant obstacles for complex analysis techniques. The unique challenge of adapting powerful but high-latency models to realtime network streams is the basis of our cyber security project. In this paper we discuss our use of NoSQL databases in a framework that enables the application of computationally expensive models against a realtime network data stream. We describe how this approach transforms the highly constrained (and sometimes arcane) world of realtime network analysis into a more developer friendly model that relaxes many of the traditional constraints associated with streaming data.
BibTeX:
@conference{WyDuDaBa12,
  author = {Brian Wylie and Daniel Dunlavy and Warren Davis IV and Jeff Baumes},
  title = {Using NoSQL Databases for Streaming Network Analysis},
  booktitle = {Proceedings of the IEEE Symposium on Large Scale Data Analysis and Visualization (LDAV)},
  year = {2012}
}
  • TopicView: Visually Comparing Topic Models of Text Collections, Crossno, P.J., Wilson, A.T., Shead, T.M. & Dunlavy, D.M., In Proceedings of the 2011 IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Special Session on Text and Web Mining (TWM), 2011. [Abstract] [BibTeX] [PDF]
Abstract: We present TopicView, an application for visually comparing and exploring multiple models of text corpora. TopicView uses multiple linked views to visually analyze both the conceptual content and the document relationships in models generated using different algorithms. To illustrate TopicView, we apply it to models created using two standard approaches: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Conceptual content is compared through the combination of (i) a bipartite graph matching LSA concepts with LDA topics based on the cosine similarities of model factors and (ii) a table containing the terms for each LSA concept and LDA topic listed in decreasing order of importance. Document relationships are examined through the combination of (i) side-by-side document similarity graphs, (ii) a table listing the weights for each document's contribution to each concept/topic, and (iii) a full text reader for documents selected in either of the graphs or the table. We demonstrate the utility of TopicView's visual approach to model assessment by comparing LSA and LDA models of two example corpora.
BibTeX:
@conference{CrWiShDu11,
  author = {Patricia J. Crossno and Andrew T. Wilson and Timothy M. Shead and Daniel M. Dunlavy},
  title = {TopicView: Visually Comparing Topic Models of Text Collections},
  booktitle = {Proceedings of the 2011 IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Special Session on Text and Web Mining (TWM)},
  year = {2011}
}
  • TopicView: Understanding Document Relationships Using Latent Dirichlet Allocation Models, Crossno, P.J., Wilson, A.T., Dunlavy, D.M. & Shead, T.M., In Proceedings of the IEEE Workshop on Interactive Visual Text Analytics for Decision Making, 2011. [Abstract] [BibTeX] [PDF]
Abstract: Document similarity graphs are a useful visual metaphor for assessing the conceptual content of a corpus. Algorithms such as Latent Dirichlet Allocation (LDA) provide a means for constructing such graphs by extracting topics and their associated term lists, which can be converted into similarity measures. Given that users' understanding of the corpus content (and therefore their decision-making) depends upon the outputs provided by LDA as well as how those outputs are translated into a visual representation, an examination of how the LDA algorithm behaves and an understanding of the impact of this behavior on the final visualization is critical. We examine some puzzling relationships between documents with seemingly disparate topics that are linked in LDA graphs. We use TopicView, a visual analytics tool, to uncover the source of these unexpected connections.
BibTeX:
@conference{CrWiDuSh11,
  author = {Patricia J. Crossno and Andrew T. Wilson and Daniel M. Dunlavy and Timothy M. Shead},
  title = {TopicView: Understanding Document Relationships Using Latent Dirichlet Allocation Models},
  booktitle = {Proceedings of the IEEE Workshop on Interactive Visual Text Analytics for Decision Making},
  year = {2011}
}
  • All-at-once Optimization for Coupled Matrix and Tensor Factorizations, Acar, E., Kolda, T.G. & Dunlavy, D.M., In Proceedings of Mining and Learning with Graphs (MLG), 2011. [Abstract] [BibTeX] [PDF]
Abstract: Joint analysis of data from multiple sources has the potential to improve our understanding of the underlying structures in complex data sets. For instance, in restaurant recommendation systems, recommendations can be based on rating histories of customers. In addition to rating histories, customers' social networks (e.g., Facebook friendships) and restaurant categories information (e.g., Thai or Italian) can also be used to make better recommendations. The task of fusing data, however, is challenging since data sets can be incomplete and heterogeneous, i.e., data consist of both matrices, e.g., the person by person social network matrix or the restaurant by category matrix, and higher-order tensors, e.g., the ratings tensor of the form restaurant by meal by person.

In this paper, we are particularly interested in fusing data sets with the goal of capturing their underlying latent structures. We formulate this problem as a coupled matrix and tensor factorization (CMTF) problem where heterogeneous data sets are modeled by tting outer-product models to higher-order tensors and matrices in a coupled manner. Unlike traditional approaches solving this problem using alternating algorithms, we propose an all-at-once optimization approach called CMTF-OPT (CMTF-OPTimization), which is a gradient-based optimization approach for joint analysis of matrices and higher-order tensors. We also extend the algorithm to handle coupled incomplete data sets. Using numerical experiments, we demonstrate that the proposed all-at-once approach is more accurate than the alternating least squares approach.

BibTeX:
@conference{AcKoDu11,
  author = {Evrim Acar and Tamara G. Kolda and Daniel M. Dunlavy},
  title = {All-at-once Optimization for Coupled Matrix and Tensor Factorizations},
  booktitle = {Proceedings of Mining and Learning with Graphs (MLG)},
  year = {2011}
}
  • ParaText: Scalable Text Modeling and Analysis, Dunlavy, D.M., Shead, T.M. & Stanton, E.T., In Proceedings of the 19th International ACM Symposium on High Performance Distributed Computing, Chicago, IL, USA , pp. 344-347., June 23-25 2010. [Abstract] [BibTeX] [PDF]
Abstract: Automated analysis of unstructured text documents (e.g., web pages, newswire articles, research publications, business reports) is a key capability for solving important problems in areas including decision making, risk assessment, social network analysis, intelligence analysis, scholarly research and others. However, as data sizes continue to grow in these areas, scalable processing, modeling, and semantic analysis of text collections becomes essential. In this paper, we present the ParaText text analysis engine, a distributed memory software framework for processing, modeling, and analyzing collections of unstructured text documents. Results on several document collections using hundreds of processors are presented to illustrate the exibility, extensibility, and scalability of the the entire process of text modeling from raw data ingestion to application analysis.
BibTeX:
@conference{DuShSt10,
  author = {Daniel M. Dunlavy and Timothy M. Shead and Eric T. Stanton},
  title = {ParaText: Scalable Text Modeling and Analysis},
  booktitle = {Proceedings of the 19th International ACM Symposium on High Performance Distributed Computing},
  year = {2010},
  pages = {344--347},
  note = {(34% acceptance rate)}
}
  • Scalable Tensor Factorizations with Missing Data, Acar, E., Dunlavy, D.M., Kolda, T.G. & Morup, M., In Proceedings of the 2010 SIAM International Conference on Data Mining, Columbus, OH, USA , April 2010. [Abstract] [BibTeX] [URL] [PDF]
Abstract: The problem of missing data is ubiquitous in domains such as biomedical signal processing, network trace analysis, bibliometrics, social network analysis, chemometrics, computer vision, and communication networks|all domains in which data collection is subject to occasional errors. Moreover, these data sets can be quite large and have more than two axes of variation, e.g., sender, receiver, time. Many applications in those domains aim to capture the underlying latent structure of the data; in other words, they need to factorize data sets with missing entries. If we cannot address the problem of missing data, many important data sets will be discarded or improperly analyzed. Therefore, we need a robust and scalable approach for factorizing multi-way arrays (i.e., tensors) in the presence of missing data. We focus on one of the most well-known tensor factorizations, CANDECOMP/PARAFAC (CP), and formulate the CP model as a weighted least squares problem that models only the known entries. We develop an algorithm called CP-WOPT (CP Weighted OPTimization) using a rst-order optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factor tensors with noise and up to 70% missing data. Moreover, our approach is significantly faster than the leading alternative and scales to larger problems. To show the real-world usefulness of CP-WOPT, we illustrate its applicability on a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes.
BibTeX:
@conference{AcDuKoMo10,
  author = {Evrim Acar and Daniel M. Dunlavy and Tamara G. Kolda and Morten Morup},
  title = {Scalable Tensor Factorizations with Missing Data},
  booktitle = {Proceedings of the 2010 SIAM International Conference on Data Mining},
  year = {2010},
  note = {(23% acceptance rate)},
  url = {http://www.siam.org/proceedings/datamining/2010/dm10_061_acare.pdf}
}
  • Link Prediction on Evolving Data using Matrix and Tensor Factorizations, Acar, E., Dunlavy, D.M. & Kolda, T.G., In Proceedings of the Workshop on Large-scale Data Mining: Theory and Applications (LDMTA 2009), Miami, FL, USA , pp. 262-269., December 2009. [Abstract] [BibTeX] [DOI] [PDF]
Abstract: The data in many disciplines such as social networks, web analysis, etc. is link-based, and the link structure can be exploited for many different data mining tasks. In this paper, we consider the problem of temporal link prediction: Given link data for time periods 1 through T, can we predict the links in time period T+1? Specifically, we look at bipartite graphs changing over time and consider matrix- and tensor-based methods for predicting links. We present a weight-based method for collapsing multi-year data into a single matrix. We show how the well-known Katz method for link prediction can be extended to bipartite graphs and, moreover, approximated in a scalable way using a truncated singular value decomposition. Using a CANDECOMP/PARAFAC tensor decomposition of the data, we illustrate the usefulness of exploiting the natural three dimensional structure of temporal link data. Through several numerical experiments, we demonstrate that both matrix and tensor-based techniques are effective for temporal link prediction despite the inherent difficulty of the problem.
BibTeX:
@conference{AcDuKo09,
  author = {Evrim Acar and Daniel M. Dunlavy and Tamara G. Kolda},
  title = {Link Prediction on Evolving Data using Matrix and Tensor Factorizations},
  booktitle = {Proceedings of the Workshop on Large-scale Data Mining: Theory and Applications (LDMTA 2009)},
  year = {2009},
  pages = {262--269},
  note = {(26% acceptance rate)},
  doi = {http://dx.doi.org/10.1109/ICDMW.2009.54}
}
  • LSAView: A Tool for Visual Exploration of Latent Semantic Modeling, Crossno, P., Dunlavy, D. & Shead, T., In IEEE Symposium on Visual Analytics Science and Technology, Atlantic City, NJ , October 2009. [Abstract] [BibTeX] [DOI] [PDF]
Abstract: Latent Semantic Analysis (LSA) is a commonly-used method for automated processing, modeling, and analysis of unstructured text data. One of the biggest challenges in using LSA is determining the appropriate model parameters to use for different data domains and types of analyses. Although automated methods have been developed to make rank and scaling parameter choices, these approaches often make choices with respect to noise in the data, without an understanding of how those choices impact analysis and problem solving. Further, no tools currently exist to explore the relationships between an LSA model and analysis methods. Our work focuses on how parameter choices impact analysis and problem solving. In this paper, we present LSAView, a system for interactively exploring parameter choices for LSA models. We illustrate the use of LSAView's small multiple views, linked matrix-graph views, and data views to analyze parameter selection and application in the context of graph layout and clustering.
BibTeX:
@conference{CrDuSh09,
  author = {P.J. Crossno and D.M. Dunlavy and T.M. Shead},
  title = {LSAView: A Tool for Visual Exploration of Latent Semantic Modeling},
  booktitle = {IEEE Symposium on Visual Analytics Science and Technology},
  year = {2009},
  doi = {http://dx.doi.org/10.1109/VAST.2009.5333428}
}
  • Formulations for Surrogate-Based Optimization with Data Fit, Multifidelity, and Reduced-Order Models, Eldred, M.S. & Dunlavy, D.M., In Proceedings of the 11th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, (AIAA-2006-7117), September 2006. [Abstract] [BibTeX] [PDF]
Abstract: Surrogate-based optimization (SBO) methods have become established as effective techniques for engineering design problems through their ability to tame nonsmoothness and reduce computational expense. Possible surrogate modeling techniques include data fits (local, multipoint, or global), multifidelity model hierarchies, and reduced-order models, and each of these types has unique features when employed within SBO. This paper explores a number of SBO algorithmic variations and their effect for different surrogate modeling cases. First, general facilities for constraint management are explored through approximate subproblem formulations (e.g., direct surrogate), constraint relaxation techniques (e.g., homotopy), merit function selections (e.g., augmented Lagrangian), and iterate acceptance logic selections (e.g., filter methods). Second, techniques specialized to particular surrogate types are described. Computational results are presented for sets of algebraic test problems and an engineering design application solved using the DAKOTA software.
BibTeX:
@conference{ElDu06,
  author = {Michael S. Eldred and Daniel M Dunlavy},
  title = {Formulations for Surrogate-Based Optimization with Data Fit, Multifidelity, and Reduced-Order Models},
  booktitle = {Proceedings of the 11th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference},
  year = {2006},
  number = {AIAA-2006-7117},
  note = {Refereed conference paper}
}
  • From TREC to DUC to TREC Again, Conroy, J.M., O'Leary, D.P. & Dunlavy, D.M., In Proceedings of the Twelfth Text Retrieval Conference (TREC), , November 2003. [Abstract] [BibTeX] [URL] [PDF]
Abstract: The Document Understanding Conference (DUC) uses TREC data as a test bed for algorithms for single and multiple document summarization. For the 2003 DUC task of choosing relevant and novel sentences, we tested a system based on a Hidden Markov Model (HMM). In this work, we use variations of this system on the tasks of the TREC Novelty Track for finding relevant and new sentences.
BibTeX:
@conference{CoOlDu03,
  author = {John M. Conroy and Dianne P. O'Leary and Daniel M. Dunlavy},
  title = {From TREC to DUC to TREC Again},
  booktitle = {Proceedings of the Twelfth Text Retrieval Conference (TREC)},
  year = {2003},
  url = {http://trec.nist.gov/pubs/trec12/papers/ccs.novelty.pdf}
}
  • Performance of a Three-Stage System for Multi-Document Summarization, Dunlavy, D.M., Conroy, J.M., Schlesinger, J.D., Goodman, S.A., Okurowski, M.E., O'Leary, D.P. & van Halteren, H., In Proceedings of the Document Understanding Conference (DUC), , June 2003. [Abstract] [BibTeX] [URL] [PDF]
Abstract: Our participation in DUC 2003 was limited to Tasks 2, 3, and 4. Although the tasks differed slightly in their goals, we applied the same approach in each case: preprocess the data for input to our system, apply our single-document and multi-document summarization algorithms, post-process the data for DUC evaluation. We did not use the topic descriptions for Task 2 or the viewpoint descriptions for Task 3, and used only the novel sentences for Task 4. The preprocessing of the data for our needs consisted of term identification, part-of-speech (POS) tagging, sentence boundary detection and SGML DTD processing. With the exception of sentence boundary detection for Task 4 (the test data was sentence-delimited using SGML tags), each of these preprocessing tasks was performed on all of the documents. The summarization algorithms were enhanced versions of those presented by members of our group in the past DUC evaluations (Conroy et al., 2001; Schlesinger et al., 2002). Previous post-processing consisted of removing lead adverbs such as "And" or "But" to make our summaries flow more easily. For DUC 2003, we added more extensive editing, eliminating part or all of selected sentences.
BibTeX:
@conference{DuCoScGo03,
  author = {Daniel M. Dunlavy and John M. Conroy and Judith D. Schlesinger and Sarah A. Goodman and Mary Ellen Okurowski and Dianne P. O'Leary and Han van Halteren},
  title = {Performance of a Three-Stage System for Multi-Document Summarization},
  booktitle = {Proceedings of the Document Understanding Conference (DUC)},
  year = {2003},
  url = {http://duc.nist.gov/pubs/2003final.papers/schlesinger.nsa.ps}
}

Other Conference and Workshop Proceedings

  • CPOPT: Optimization for Fitting CANDECOMP/PARAFAC Models, Acar, E., Kolda, T.G. & Dunlavy, D.M., In CASTA 2008: Workshop on Computational Algebraic Statistics, Theories and Applications, , December 2008. [Abstract] [BibTeX] [PDF]
Abstract: Tensor decompositions (e.g., higher-order analogues of matrix decompositions) are powerful tools for data analysis. In particular, the CANDECOMP/PARAFAC (CP) model has proved useful in many applications such chemometrics, signal processing, and web analysis. The problem of computing the CP decomposition is typically solved using an alternating least squares (ALS) approach. We discuss the use of optimization-based algorithms for CP, including how to efficiently compute the derivatives necessary for the optimization methods. Numerical studies highlight the positive features of our CPOPT algorithms, as compared with ALS and Gauss-Newton approaches.
BibTeX:
@conference{AcKoDu08,
  author = {Evrim Acar and Tamara G. Kolda and Daniel M. Dunlavy},
  title = {CPOPT: Optimization for Fitting CANDECOMP/PARAFAC Models},
  booktitle = {CASTA 2008: Workshop on Computational Algebraic Statistics, Theories and Applications},
  year = {2008}
}
  • QCS: A Tool for Querying, Clustering, and Summarizing Documents, Dunlavy, D.M., Conroy, J.M. & O'Leary, D.P., In Proceedings of the HLT-NAACL Conference, , June 2003. [Abstract] [BibTeX]
Abstract: The QCS information retrieval (IR) system is presented as a tool for querying, clustering, and summarizing document sets. QCS has been developed as a modular development framework, and thus facilitates the inclusion of new technologies targeting these three IR tasks. Details of the system architecture, the QCS interface, and preliminary results are presented.
BibTeX:
@conference{DuCoOl03,
  author = {Daniel M. Dunlavy and John M. Conroy and Dianne P. O'Leary},
  title = {QCS: A Tool for Querying, Clustering, and Summarizing Documents},
  booktitle = {Proceedings of the HLT-NAACL Conference},
  year = {2003}
}

Technical Reports

  • ParaText - Scalable Solutions for Processing and Searching Very Large Document Collections: Final LDRD Report, Dunlavy, D.M., Shead, T.M., Crossno, P.J. & Stanton, E.T.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2010-6269, September 2010. [BibTeX] [PDF]

BibTeX:
@techreport{SAND2010-6269,
  author = {Daniel M. Dunlavy and Timothy M. Shead and Patricia J. Crossno and Eric T. Stanton},
  title = {ParaText - Scalable Solutions for Processing and Searching Very Large Document Collections: Final LDRD Report},
  year = {2010},
  number = {SAND2010-6269}
}
  • Poblano v1.0: A Matlab Toolbox for Gradient-Based Optimization, Dunlavy, D.M., Kolda, T.G. & Acar, E.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2010-1422, March 2010. [Abstract] [BibTeX] [PDF]

Abstract: We present Poblano v1.0, a Matlab toolbox for solving gradient-based unconstrained optimization problems. Poblano implements three optimization methods (nonlinear conjugate gradients, limited-memory BFGS, and truncated Newton) that require only first order derivative information. In this paper, we describe the Poblano methods, provide numerous examples on how to use Poblano, and present results of Poblano used in solving problems from a standard test collection of unconstrained optimization problems.
BibTeX:
@techreport{SAND2010-1422,
  author = {Daniel M. Dunlavy and Tamara G. Kolda and Evrim Acar},
  title = {Poblano v1.0: A Matlab Toolbox for Gradient-Based Optimization},
  year = {2010},
  number = {SAND2010-1422}
}
  • Scalable Tensor Factorizations with Missing Data, Acar, E., Dunlavy, D.M., Kolda, T.G. & Morup, M.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2009-6764, October 2009. [Abstract] [BibTeX] [PDF]

Abstract: The problem of missing data is ubiquitous in domains such as biomedical signal processing, network trace analysis, bibliometrics, social network analysis, chemometrics, computer vision, and communication networks|all domains in which data collection is subject to occasional errors. Moreover, these data sets can be quite large and have more than two axes of variation, e.g., sender, receiver, time. Many applications in those domains aim to capture the underlying latent structure of the data; in other words, they need to factorize data sets with missing entries. If we cannot address the problem of missing data, many important data sets will be discarded or improperly analyzed. Therefore, we need a robust and scalable approach for factorizing multi-way arrays (i.e., tensors) in the presence of missing data. We focus on one of the most well-known tensor factorizations, CANDECOMP/PARAFAC (CP), and formulate the CP model as a weighted least squares problem that models only the known entries. We develop an algorithm called CP-WOPT (CP Weighted OPTimization) using a rst-order optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factor tensors with noise and up to 70% missing data. Moreover, our approach is significantly faster than the leading alternative and scales to larger problems. To show the real-world usefulness of CP-WOPT, we illustrate its applicability on a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes.
BibTeX:
@techreport{SAND2009-6764,
  author = {Evrim Acar and Daniel M. Dunlavy and Tamara G. Kolda and Morten Morup},
  title = {Scalable Tensor Factorizations with Missing Data},
  year = {2009},
  number = {SAND2009-6764}
}
  • Relationships Between Accuracy and Diversity in Heterogeneous Ensemble Classifiers, Gilpin, S.A. & Dunlavy, D.M.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2009-6940C 2009. [Abstract] [BibTeX] [PDF]

Abstract: The relationship between ensemble classi er performance and the diversity of the predictions made by ensemble base classi ers is explored in the context of heterogeneous ensemble classi ers. Speci cally, numerical studies indicate that heterogeneous ensembles can be generated from base classi ers of homogeneous ensemble classi ers that are both signi cantly more accurate and diverse than the base classi ers. Results for experiments using several standard diversity measures on a variety of binary and multiclass classi cation problems are presented to illustrate the improved performance.
BibTeX:
@techreport{SAND2009-6940C,
  author = {Sean A. Gilpin and Daniel M. Dunlavy},
  title = {Relationships Between Accuracy and Diversity in Heterogeneous Ensemble Classifiers},
  year = {2009},
  number = {SAND2009-6940C}
}
  • Semisupervised Named Entity Recognition, Turpen, T.P. & Dunlavy, D.M., In The Computer Science Research Institute Summer Proceedings, , Sandia National Laboratories, Albuquerque, NM and Livermore, CA 2009. [BibTeX] [PDF]
BibTeX:
@incollection{SAND2010-3083P,
  author = {Taylor P. Turpen and Daniel M. Dunlavy},
  title = {Semisupervised Named Entity Recognition},
  booktitle = {The Computer Science Research Institute Summer Proceedings},
  publisher = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
  year = {2009}
}
  • An Optimization Approach for Fitting Canonical Tensor Decompositions, Acar, E., Kolda, T.G. & Dunlavy, D.M.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2009-0857, February 2009. [Abstract] [BibTeX] [PDF]

Abstract: Tensor decompositions are higher-order analogues of matrix decompositions and have proven to be powerful tools for data analysis. In particular, we are interested in the canonical tensor decomposition, otherwise known as the CANDECOMP/PARAFAC decomposition (CPD), which expresses a tensor as the sum of component rank-one tensors and is used in a multitude of applications such as chemometrics, signal processing, neuroscience, and web analysis. The task of computing the CPD, however, can be difficult. The typical approach is based on alternating least squares (ALS) optimization, which can be remarkably fast but is not very accurate. Previously, nonlinear least squares (NLS) methods have also been recommended; existing NLS methods are accurate but slow. In this paper, we propose the use of gradient-based optimization methods. We discuss the mathematical calculation of the derivatives and further show that they can be computed efficiently, at the same cost as one iteration of ALS. Computational experiments demonstrate that the gradient-based optimization methods are much more accurate than ALS and orders of magnitude faster than NLS.
BibTeX:
@techreport{AcKoDu09,
  author = {Evrim Acar and Tamara G. Kolda and Daniel M. Dunlavy},
  title = {An Optimization Approach for Fitting Canonical Tensor Decompositions},
  year = {2009},
  number = {SAND2009-0857}
}
  • Heterogeneous Ensemble Classification, Gilpin, S.A. & Dunlavy, D.M.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2009-0203P, January 2009. [Abstract] [BibTeX] [URL] [PDF]

Abstract: The problem of multi-class classification is explored using heterogeneous ensemble classifiers. Heterogeneous ensembles classifiers are defined as ensembles, or sets, of classifier models created using more than one type of classification algorithm. For example, the outputs of decision tree classifiers could be combined with the outputs of support vector machines (SVM) to create a heterogeneous ensemble. We explore how, when, and why heterogeneous ensembles should be used over other classification methods. Specifically we look into the use of bagging and different fusion methods for heterogeneous and homogeneous ensembles. We also introduce the Hemlock framework, a software tool for creating and testing heterogeneous ensembles.
BibTeX:
@techreport{SAND2009-0203P,
  author = {Sean A. Gilpin and Daniel M. Dunlavy},
  title = {Heterogeneous Ensemble Classification},
  booktitle = {CSRI Summer Proceedings 2008, Technical Report SAND2007-7977, Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
  year = {2009},
  number = {SAND2009-0203P},
  url = {http://www.cs.sandia.gov/CSRI/Proceedings/}
}
  • Trilinos CMake Evaluation, Barlett, R.A., Dunlavy, D.M., Guillen, E.J. & Shead, T.M.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2008-7593, October 2008. [Abstract] [BibTeX] [PDF]

Abstract: Autotools is terrible. We need something better. How about CMake? Here we document our evaluation of CMake as a build system and testing infrastructure to replace the current autotools-based system.
BibTeX:
@techreport{SAND2008-7593,
  author = {Ross A. Barlett and Daniel M. Dunlavy and Estaban J. Guillen and Timothy M. Shead},
  title = {Trilinos CMake Evaluation},
  year = {2008},
  number = {SAND2008-7593}
}
  • Heterogeneous Ensemble Classification, Dunlavy, D.M. & Gilpin, S.A., In Proceedings of the 2008 Sandia Workshop on Data Mining and Data Analysis, (SAND2008-6109), pp. 33-35., Sandia National Laboratories, Albuquerque, NM and Livermore, CA, September 2008. [Abstract] [BibTeX] [PDF]
Abstract: Recent results in solving classification problems indicate that the use of ensembles classifier models often leads to improved performance over using single classifier models. In this work, we discuss heterogeneous ensemble classifier models, where the member classifier models are not of the same model type. A discussion of the issues associated with creating such classifiers along with a brief description of the new HEterogeneous Machine Learning Open Classification Kit (HEMLOCK) will be presented. Results for a problem of text classification and several standard multi-class test problems illustrate the performance of heterogeneous ensemble classifiers.
BibTeX:
@incollection{SAND2008-6109,
  author = {Daniel M. Dunlavy and Sean A. Gilpin},
  title = {Heterogeneous Ensemble Classification},
  booktitle = {Proceedings of the 2008 Sandia Workshop on Data Mining and Data Analysis},
  publisher = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
  year = {2008},
  number = {SAND2008-6109},
  pages = {33--35}
}
  • Proceedings of the 2008 Sandia Workshop on Data Mining and Data Analysis, Brandt, J.M., Dunlavy, D.M. & Gentile, A.C.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2008-6109 2008. [Abstract] [BibTeX] [PDF]

Abstract: In this document, we report the proceedings of the 2008 Sandia Workshop on Data Mining and Data Analysis. This year?s workshop focused on the the data analysis capabilities and needs of the space systems, satellite, ground-based monitoring, and remote sensing communities. In addition to the extended abstracts of each presentation of the workshop, summaries of the discussion sessions and resultant recommendations of the workshop committee are given.
BibTeX:
@techreport{SAND2008-6109a,
  author = {James M. Brandt and Daniel M. Dunlavy and Ann C. Gentile},
  title = {Proceedings of the 2008 Sandia Workshop on Data Mining and Data Analysis},
  year = {2008},
  number = {SAND2008-6109}
}
  • Yucca Mountain LSN Archive Assistant, Basilico, J.D., Dunlavy, D.M., Verzi, S.J., Bauer, T.L. & Shaneyfelt, W.. Sandia National Laboratories, Technical Report SAND2008-1622, March 2008. [Abstract] [BibTeX] [PDF]

Abstract: This report describes the Licensing Support Network (LSN) Assistant applicationa set of tools---the Licensing Support Network (LSN) Assistant---for categorizing e-mail messages and documents, and investigating and correcting existing archives of categorized e-mail messages and documents. The two main tools in the LSN Assistant are the LSN Archive Assistant (LSNAA) tool for re-categorizing manually labeled e-mail messages and documents and the LSN Real-time Assistant (LSNRA) tool for categorizing new e-mail messages and documents. This report focuses on the the LSNAAtool.

There are two main components of the LSNAA tool. The first is the Sandia Categorizer Framework, which is responsible for providing categorizations for documents in an archive and storing them in an appropriate Categorization Database. The second is the actual user interface, which primarily interacts with the Categorization Database, providing a way for finding and correcting categorizations errors in the database.

A procedure for applying the LSNAA tool and an example use case of the LSNAA tool applied to a set of e-mail messages are provided. Performance results of the categorization model designed for this example use case are presented.

BibTeX:
@techreport{SAND2008-1622,
  author = {Justin D. Basilico and Daniel M. Dunlavy and Stephen J. Verzi and Travis L. Bauer and Wendy Shaneyfelt},
  title = {Yucca Mountain LSN Archive Assistant},
  year = {2008},
  number = {SAND2008-1622}
}
  • QCS: A System for Querying, Clustering and Summarizing Documents, Dunlavy, D.M., O'Leary, D.P., Conroy, J.M. & Schlesinger, J.D.. Sandia National Laboratories, Technical Report Technical Report Number SAND2006-5000, October 2006. [Abstract] [BibTeX] [PDF]

Abstract: Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel hybrid information retrieval system---the Query, Cluster, Summarize (QCS) system---which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of components methods in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) along with the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence trimming, and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules.
BibTeX:
@techreport{SAND2006-5000,
  author = {Daniel M. Dunlavy and Dianne P. O'Leary and John M. Conroy and Judith D. Schlesinger},
  title = {QCS: A System for Querying, Clustering and Summarizing Documents},
  year = {2006},
  number = {Technical Report Number SAND2006-5000},
  note = {newer version available}
}
  • DAKOTA, A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 4.0 Users Manual, Eldred, M.S., Brown, S.L., Adams, B.M., Dunlavy, D.M., Gay, D.M., Swiler, L.P., Giunta, A.A., Hart, W.E., Jean-PaulWatson, Eddy, J.P., Griffin, J.D., Hough, P.D., Kolda, T.G., Martinez-Canales, M.L. & J.Williams, P.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2006-6637, October 2006. [Abstract] [BibTeX] [URL] [PDF]

Abstract: The DAKOTA (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a exible and extensible interface between simulation codes and iterative analysis methods. DAKOTA contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quantication with sampling, reliability, and stochastic nite element methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the DAKOTA toolkit provides a exible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers.
BibTeX:
@techreport{SAND2006-6337,
  author = {Michael S. Eldred and Shannon L. Brown and Brian M. Adams and Daniel M. Dunlavy and David M. Gay and Laura P. Swiler and Anthony A. Giunta and William E. Hart and Jean-PaulWatson and John P. Eddy and Josh D. Griffin and Patty D. Hough and Tammy G. Kolda and Monica L. Martinez-Canales and Pamela J.Williams},
  title = {DAKOTA, A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 4.0 Users Manual},
  year = {2006},
  number = {SAND2006-6637},
  url = {http://www.cs.sandia.gov/DAKOTA/licensing/release/Users4.0.pdf}
}
  • DAKOTA, A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 4.0 Developers Manual, Eldred, M.S., Brown, S.L., Adams, B.M., Dunlavy, D.M., Gay, D.M., Swiler, L.P., Giunta, A.A., Hart, W.E., Jean-PaulWatson, Eddy, J.P., Griffin, J.D., Hough, P.D., Kolda, T.G., Martinez-Canales, M.L. & J.Williams, P.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2006-4056, September 2006. [Abstract] [BibTeX] [URL] [PDF]

Abstract: The DAKOTA (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a exible and extensible interface between simulation codes and iterative analysis methods. DAKOTA contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quantication with sampling, reliability, and stochastic nite element methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the DAKOTA toolkit provides a exible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers.
BibTeX:
@techreport{SAND2006-4056,
  author = {Michael S. Eldred and Shannon L. Brown and Brian M. Adams and Daniel M. Dunlavy and David M. Gay and Laura P. Swiler and Anthony A. Giunta and William E. Hart and Jean-PaulWatson and John P. Eddy and Josh D. Griffin and Patty D. Hough and Tammy G. Kolda and Monica L. Martinez-Canales and Pamela J.Williams},
  title = {DAKOTA, A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 4.0 Developers Manual},
  year = {2006},
  number = {SAND2006-4056},
  url = {http://www.cs.sandia.gov/DAKOTA/licensing/release/Developers4.0.pdf}
}
  • DAKOTA, A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 4.0 Reference Manual, Eldred, M.S., Brown, S.L., Adams, B.M., Dunlavy, D.M., Gay, D.M., Swiler, L.P., Giunta, A.A., Hart, W.E., Jean-PaulWatson, Eddy, J.P., Griffin, J.D., Hough, P.D., Kolda, T.G., Martinez-Canales, M.L. & J.Williams, P.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2006-4055, September 2006. [Abstract] [BibTeX] [URL] [PDF]

Abstract: The DAKOTA (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a exible and extensible interface between simulation codes and iterative analysis methods. DAKOTA contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quantication with sampling, reliability, and stochastic nite element methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the DAKOTA toolkit provides a exible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers.
BibTeX:
@techreport{SAND2006-4055,
  author = {Michael S. Eldred and Shannon L. Brown and Brian M. Adams and Daniel M. Dunlavy and David M. Gay and Laura P. Swiler and Anthony A. Giunta and William E. Hart and Jean-PaulWatson and John P. Eddy and Josh D. Griffin and Patty D. Hough and Tammy G. Kolda and Monica L. Martinez-Canales and Pamela J.Williams},
  title = {DAKOTA, A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 4.0 Reference Manual},
  year = {2006},
  number = {SAND2006-4055},
  url = {http://www.cs.sandia.gov/DAKOTA/licensing/release/Reference4.0.pdf}
}
  • Multilinear Algebra for Analyzing Data with Multiple Linkages, Dunlavy, D.M., Kolda, T.G. & Kegelmeyer, W.P.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2006-2079, April 2006. [Abstract] [BibTeX] [PDF]

Abstract: Link analysis typically focuses on a single type of connection, e.g., two journal papers are linked because they are written by the same author. However, often we want to analyze data that has multiple linkages between objects, e.g., two papers may have the same keywords and one may cite the other. The goal of this paper is to show that multilinear algebra provides a tool for multilink analysis. We analyze five years of publication data from journals published by the Society for Industrial and Applied Mathematics. We explore how papers can be grouped in the context of multiple link types using a tensor to represent all the links between them. A PARAFAC decomposition on the resulting tensor yields information similar to the SVD decomposition of a standard adjacency matrix. We show how the PARAFAC decomposition can be used to understand the structure of the document space and define paper-paper similarities based on multiple linkages. Examples are presented where the decomposed tensor data is used to find papers similar to a body of work (e.g., related by topic or similar to a particular author's papers), find related authors using linkages other than explicit co-authorship or citations, distinguish between papers written by different authors with the same name, and predict the journal in which a paper was published.
BibTeX:
@techreport{SAND2006-2079,
  author = {Daniel M. Dunlavy and Tamara G. Kolda and W. Philip Kegelmeyer},
  title = {Multilinear Algebra for Analyzing Data with Multiple Linkages},
  year = {2006},
  number = {SAND2006-2079}
}
  • Homotopy Optimization Methods for Global Optimization, Dunlavy, D.M. & O'Leary, D.P.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2005-7495, December 2005. [Abstract] [BibTeX] [PDF]

Abstract: We define a new method for global optimization, the Homotopy Optimization Method (HOM). This method differs from previous homotopy and continuation methods in that its aim is to find a minimizer for each of a set of values of the homotopy parameter, rather than to follow a path of minimizers. We define a second method, called HOPE, by allowing HOM to follow an ensemble of points obtained by perturbation of previous ones. We relate this new method to standard methods such as simulated annealing and show under what circumstances it is superior. We present results of extensive numerical experiments demonstrating performance of HOM and HOPE.
BibTeX:
@techreport{SAND2005-7495,
  author = {Daniel M. Dunlavy and Dianne P. O'Leary},
  title = {Homotopy Optimization Methods for Global Optimization},
  year = {2005},
  number = {SAND2005-7495}
}
  • Structure Preserving Algorithms for Perplectic Eigenproblems, Mackey, D.S., Mackey, N. & Dunlavy, D.M.. Manchester Centre for Computational Mathematics, Technical Report Numerical Analysis Report No. 427, May 2003. [Abstract] [BibTeX] [URL] [PDF]

Abstract: Abstract: Structured real canonical forms for matrices in R^n x n that are symmetric or skewsymmetric about the anti-diagonal as well as the main diagonal are presented, and Jacobi algorithms for solving the complete eigenproblem for three of these four classes of matrices are developed. Based on the direct solution of 4 x 4 subproblems constructed via quaternions, the algorithms calculate structured orthogonal bases for the invariant subspaces of the associated matrix. In addition to preserving structure, these methods are inherently parallelizable, numerically stable, and show asymptotic quadratic convergence.
BibTeX:
@techreport{MaMaDu03,
  author = {D. Steven Mackey and Niloufer Mackey and Daniel M. Dunlavy},
  title = {Structure Preserving Algorithms for Perplectic Eigenproblems},
  year = {2003},
  number = {Numerical Analysis Report No. 427},
  note = {newer version available},
  url = {http://www.maths.man.ac.uk/~nareports/narep427.pdf}
}
  • Numerical Steady-State Solutions of Non-Linear DAE's Arising in RF Communication Circuit Design, Dunlavy, D., Joo, S., Lin, R., Marcia, R., Minut, A. & Sun, J.. Institute for Mathematics and its Application, Technical Report IMA Preprint Series 1752-1, February 2001. [Abstract] [BibTeX] [URL] [PDF]

Abstract: Large systems of coupled non-linear differential algebraic equations (DAE) arise naturally in applications areas like in the design of radio-frequency integrated circuits. The steady-state response of a non-linear system to periodic or quasi-periodic stimulus is of primary interest to a designer because certain aspects of system performance are easier to characterize and verify in steady state. For example: noise, distortion, blocking are best measured when a circuit is in this state. The system of equations generated in circuit design has the following form, "f(v(t)) + d/dt q(v(t)) - b(t) = 0" where m is the number of circuit nodes excluding the reference, q(v(t)) is the m-vector of sums of capacitor charges at each node, f(v(t)) is the m-vector of sums of resistor currents at each node, b(t) is the m-vector of input currents, and v(t) is the m-vector of node voltages. "Closed form" solutions to these DAE's are extremely difficult, if not impossible, to obtain because of the size of the problem and the complexity of non-linear models. Computing the solutions numerically is a highly effective alternative to computing the solutions analytically.
BibTeX:
@techreport{DuJoLiMaMiSu01,
  author = {Danny Dunlavy and Sookhyung Joo and Runchang Lin and Roummel Marcia and Aurelia Minut and Jianzhong Sun},
  title = {Numerical Steady-State Solutions of Non-Linear DAE's Arising in RF Communication Circuit Design},
  year = {2001},
  number = {IMA Preprint Series 1752-1},
  url = {http://www.ima.umn.edu/preprints/feb01/1752-1.pdf}
}

Expository Articles, Etc.

  • Mathematical Challenges in Cybersecurity, Dunlavy, D.M., Hendrickson, B. & Kolda, T.G.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND2009-0805, February 2009. [Abstract] [BibTeX] [PDF]

Abstract: This white paper is a response to a recent report on cybersecurity submitted to the U.S. Department of Energy (Catlett, 2008). We discuss what we see as some of the major mathematical challenges in cybersecurity. The document is not intended to be comprehensive, but rather the articulation of a few key themes. We have organized our thoughts into three challenge areas: modeling large-scale networks, threat discovery, and network dynamics.
BibTeX:
@techreport{SAND2009-0805,
  author = {Daniel M. Dunlavy and Bruce Hendrickson and Tamara G. Kolda},
  title = {Mathematical Challenges in Cybersecurity},
  year = {2009},
  number = {SAND2009-0805}
}

Dissertation and Thesis

  • Homotopy Optimization Methods and Protein Structure Prediction, Dunlavy, D.M., School: AMSC Program, University of Maryland , August 2005. [Abstract] [BibTeX] [PDF]
Abstract: The focus of this dissertation is a new method for solving unconstrained minimization problems---homotopy optimization using perturbations and ensembles (HOPE). HOPE is a homotopy optimization method that finds a sequence of minimizers of a homotopy function that maps a template function to the target function, the function from our minimization problem. To increase the likelihood of finding a global minimizer, points in the sequence are perturbed and used as starting points to find other minimizers. Points in the resulting ensemble of minimizers are used as starting points to find minimizers of the homotopy function as it deforms the template function into the target function. We show that certain choices of the parameters used in HOPE lead to instances of existing methods: probability-one homotopy methods, stochastic search methods, and simulated annealing. We use these relations and further analysis to demonstrate the convergence properties of HOPE. The development of HOPE was motivated by the protein folding problem, the problem of predicting the structure of a protein as it exists in nature, given its amino acid sequence. However, we demonstrate that HOPE is also successful as a general purpose minimization method for nonconvex functions. Numerical experiments performed to test HOPE include solving several standard test problems and the protein folding problem using two different protein models. In the first model, proteins are modeled as chains of charged particles in two dimensions. The second is a backbone protein model, where the particles represent amino acids, each corresponding to a hydrophobic, hydrophilic, or neutral residue. In most of these experiments, standard homotopy functions are used in HOPE. Additionally, several new homotopy functions are introduced for solving the protein folding problems to demonstrate how HOPE can be used to exploit the properties or structure of particular problems. Results of experiments demonstrate that HOPE outperforms several methods often used for solving unconstrained minimization problems---a quasi-Newton method with BFGS Hessian update, a globally convergent variant of Newton's method, and ensemble-based simulated annealing.
BibTeX:
@phdthesis{Du05,
  author = {Daniel M. Dunlavy},
  title = {Homotopy Optimization Methods and Protein Structure Prediction},
  school = {AMSC Program, University of Maryland},
  year = {2005}
}
  • QCS: An Information Retrieval System for Improving Efficiency in Scientific Literature Searches, Dunlavy, D.M., School: AMSC Program, University of Maryland , August 2003. [BibTeX] [PDF]
BibTeX:
@mastersthesis{Du03,
  author = {Daniel M. Dunlavy},
  title = {QCS: An Information Retrieval System for Improving Efficiency in Scientific Literature Searches},
  school = {AMSC Program, University of Maryland},
  year = {2003}
}

Created by JabRef on 11/16/2010.