Contact
Daniel M. Dunlavy
Principal Member of Technical Staff
dmdunla@sandia.gov
(505) 2069855
Related Links
Department
Center
CSRI

Publications

Book Chapters 
 Multilinear Algebra for Analyzing Data with Multiple Linkages, Dunlavy, D.M., Kolda, T.G. & Kegelmeyer, W.P., In Graph Algorithms in the Language of Linear Algebra, Philadelphia, PA , SIAM 2010 (in press).
[BibTeX] [PDF]

BibTeX:
@incollection{DuKeKo10,
author = {Daniel M. Dunlavy and Tamara G. Kolda and W. Philip Kegelmeyer},
title = {Multilinear Algebra for Analyzing Data with Multiple Linkages},
booktitle = {Graph Algorithms in the Language of Linear Algebra},
publisher = {SIAM},
year = {2010 (in press)}
}

Refereed Journal Articles 
 TopicView: Visual Analysis of Topic Models and their Impact on Document Clustering, Crossno, P.J., Wilson, A.T., Shead, T.M., IV, W.L.D. & Dunlavy, D.M.. International Journal on Artificial Intelligence Tools 2013 (accepted).
[Abstract] [BibTeX] [PDF]

Abstract: We present a new approach for analyzing topic models using visual analytics. We have developed TopicView, an application for visually comparing and exploring multiple models of text corpora, as a prototype for this type of analysis tool. TopicView uses multiple linked views to visually analyze conceptual or topical content, document relationships identified by the models, and the impact of the models on the results of document clustering. As case studies, we examine models created using two standard approaches: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Conceptual content is compared through the combination of (i) a bipartite graph matching LSA concepts with LDA topics based on the cosine similarities of model factors and (ii) a table containing the terms for each LSA concept and LDA topic listed in decreasing order of importance. Document relationships are examined through the combination of (i) sidebyside document similarity graphs, (ii) a table listing the weights for each document's contribution to each concept/topic, and (iii) a full text reader for documents selected in either of the graphs or the table. The impact of LSA and LDA models on document clustering applications is explored through similar means, using proximities between documents and cluster exemplars for graph layout edge weighting and table entries. We demonstrate the utility of TopicView's visual approach to model assessment by comparing LSA and LDA models of several example corpora. 
BibTeX:
@article{CrWiShDaDu13,
author = {Patricia J. Crossno and Andrew T. Wilson and Timothy M. Shead and Warren L. Davis IV and Daniel M. Dunlavy},
title = {TopicView: Visual Analysis of Topic Models and their Impact on Document Clustering},
journal = {International Journal on Artificial Intelligence Tools},
year = {2013 (accepted)}
}

 Temporal Link Prediction using Matrix and Tensor Factorizations, Dunlavy, D.M., Kolda, T.G. & Acar, E.. ACM Transactions on Knowledge Discovery from Data Vol. 5(2), February 2011.
[Abstract] [BibTeX] [DOI] [PDF]

Abstract: The data in many disciplines such as social networks, web analysis, etc. is linkbased, and the link structure can be exploited for many dfiferent data mining tasks. In this paper, we consider the problem of temporal link prediction: Given link data for times 1 through T, can we predict the links at time T + 1? If our data has underlying periodic structure, can we predict out even further in time, i.e., links at time T + 2, T + 3, etc.? In this paper, we consider bipartite graphs that evolve over time and consider matrix and tensorbased methods for predicting future links. We present a weightbased method for collapsing multiyear data into a single matrix. We show how the wellknown Katz method for link prediction can be extended to bipartite graphs and, moreover, approximated in a scalable way using a truncated singular value decomposition. Using a CANDECOMP/PARAFAC tensor decomposition of the data, we illustrate the usefulness of exploiting the natural threedimensional structure of temporal link data. Through several numerical experiments, we demonstrate that both matrix and tensorbased techniques are effective for temporal link prediction despite the inherent difficulty of the problem. Additionally, we show that tensorbased techniques are particularly eective for temporal data with varying periodic patterns. 
BibTeX:
@article{DuKoAc11,
author = {Daniel M. Dunlavy and Tamara G. Kolda and Evrim Acar},
title = {Temporal Link Prediction using Matrix and Tensor Factorizations},
journal = {ACM Transactions on Knowledge Discovery from Data},
year = {2011},
volume = {5},
number = {2},
doi = {http://dx.doi.org/10.1145/1921632.1921636}
}

 Scalable Tensor Factorizations for Incomplete Data, Acar, E., Dunlavy, D.M., Kolda, T.G. & Morup, M.. Chemometrics and Intelligent Laboratory Systems Vol. 106(1), pp. 4156., March 2011.
[Abstract] [BibTeX] [DOI] [PDF]

Abstract: The problem of incomplete datai.e., data with missing or unknown valuesin multiway arrays is ubiquitous in biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, communication networks, etc. We consider the problem of how to factorize data sets with missing values with the goal of capturing the underlying latent structure of the data and possibly reconstructing missing values (i.e., tensor completion). We focus on one of the most wellknown tensor factorizations that captures multilinear structure, CANDECOMP/PARAFAC (CP). In the presence of missing data, CP can be formulated as a weighted least squares problem that models only the known entries. We develop an algorithm called CPWOPT (CP Weighted OPTimization) that uses a firstorder optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factorize tensors with noise and up to 99% missing data. A unique aspect of our approach is that it scales to sparse largescale data, e.g., 1000 X 1000 X 1000 with five million known entries (0.5% dense). We further demonstrate the usefulness of CPWOPT on two realworld applications: a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes and the problem of modeling computer network traffic where data may be absent due to the expense of the data collection process. 
BibTeX:
@article{AcDuKoMo11,
author = {Evrim Acar and Daniel M. Dunlavy and Tamara G. Kolda and Morten Morup},
title = {Scalable Tensor Factorizations for Incomplete Data},
journal = {Chemometrics and Intelligent Laboratory Systems},
year = {2011},
volume = {106},
number = {1},
pages = {4156},
doi = {http://dx.doi.org/10.1016/j.chemolab.2010.08.004}
}

 A Scalable Optimization Approach for Fitting Canonical Tensor Decompositions, Acar, E., Dunlavy, D.M. & Kolda, T.G.. Journal of Chemometrics Vol. 25(2), pp. 6786., February 2011.
[Abstract] [BibTeX] [DOI] [PDF]

Abstract: Tensor decompositions are higherorder analogues of matrix decompositions and have proven to be powerful tools for data analysis. In particular, we are interested in the canonical tensor decomposition, otherwise known as CANDECOMP/PARAFAC (CP), which expresses a tensor as the sum of component rankone tensors and is used in a multitude of applications such as chemometrics, signal processing, neuroscience, and web analysis. The task of computing CP, however, can be difficultt. The typical approach is based on alternating least squares (ALS) optimization, but it is not accurate in the case of overfactoring. High accuracy can be obtained by using nonlinear least squares (NLS) methods; the disadvantage is that NLS methods are much slower than ALS. In this paper, we propose the use of gradientbased optimization methods. We discuss the mathematical calculation of the derivatives and show that they can be computed efficiently, at the same cost as one iteration of ALS. Computational experiments demonstrate that the gradientbased optimization methods are more accurate than ALS and faster than NLS. 
BibTeX:
@article{AcDuKo11,
author = {Evrim Acar and Daniel M. Dunlavy and Tamara G. Kolda},
title = {A Scalable Optimization Approach for Fitting Canonical Tensor Decompositions},
journal = {Journal of Chemometrics},
year = {2011},
volume = {25},
number = {2},
pages = {6786},
doi = {http://dx.doi.org/10.1002/cem.1335}
}

 QCS: A System for Querying, Clustering and Summarizing Documents, Dunlavy, D.M., O'Leary, D.P., Conroy, J.M. & Schlesinger, J.D.. Information Processing & Management Vol. 43(6), pp. 15881605. 2007.
[Abstract] [BibTeX] [DOI] [PDF]

Abstract: Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel integrated information retrieval systemthe Query, Cluster, Summarize (QCS) systemwhich is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of methods in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) as measured by the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (endtoend) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical kmeans is used for the document clustering, and a method coupling sentence trimming and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules. 
BibTeX:
@article{DuOlCoSc07,
author = {Daniel M. Dunlavy and Dianne P. O'Leary and John M. Conroy and Judith D. Schlesinger},
title = {QCS: A System for Querying, Clustering and Summarizing Documents},
journal = {Information Processing & Management},
year = {2007},
volume = {43},
number = {6},
pages = {15881605},
note = {Text Summarization},
doi = {http://dx.doi.org/10.1016/j.ipm.2007.01.003}
}

 HOPE: A Homotopy Optimization Method for Protein Structure Prediction, Dunlavy, D.M., O'leary, D.P., Klimov, D. & Thirumalai, D.. Journal of Computational Biology Vol. 12(10), pp. 12751288. 2005.
[Abstract] [BibTeX] [DOI] [URL] [PDF]

Abstract: We use a homotopy optimization method, HOPE, to minimize the potential energy associated with a protein model. The method uses the minimum energy conformation of one protein as a template to predict the lowest energy structure of a query sequence. This objective is achieved by following a path of conformations determined by a homotopy between the potential energy functions for the two proteins. Ensembles of solutions are produced by perturbing conformations along the path, increasing the likelihood of predicting correct structures. Successful results are presented for pairs of homologous proteins, where HOPE is compared to a variant of Newton's method and to simulated annealing. 
BibTeX:
@article{DuOlKlTh05,
author = {Dunlavy, Daniel M. and O'leary, Dianne P. and Klimov, Dmitri and Thirumalai, D.},
title = {HOPE: A Homotopy Optimization Method for Protein Structure Prediction},
journal = {Journal of Computational Biology},
year = {2005},
volume = {12},
number = {10},
pages = {12751288},
note = {PMID: 16379534},
url = {http://www.liebertonline.com/doi/abs/10.1089/cmb.2005.12.1275},
doi = {http://dx.doi.org/10.1089/cmb.2005.12.1275}
}

 Structure Preserving Algorithms for Perplectic Eigenproblems, Mackey, D.S., Mackey, N. & Dunlavy, D.M.. Electronic Journal of Linear Algebra Vol. 13, pp. 1039., February 2005.
[Abstract] [BibTeX] [URL] [PDF]

Abstract: Abstract: Structured real canonical forms for matrices in R^n x n that are symmetric or skewsymmetric about the antidiagonal as well as the main diagonal are presented, and Jacobi algorithms for solving the complete eigenproblem for three of these four classes of matrices are developed. Based on the direct solution of 4 x 4 subproblems constructed via quaternions, the algorithms calculate structured orthogonal bases for the invariant subspaces of the associated matrix. In addition to preserving structure, these methods are inherently parallelizable, numerically stable, and show asymptotic quadratic convergence. 
BibTeX:
@article{MaMaDu05,
author = {D. Steven Mackey and Niloufer Mackey and Daniel M. Dunlavy},
title = {Structure Preserving Algorithms for Perplectic Eigenproblems},
journal = {Electronic Journal of Linear Algebra},
year = {2005},
volume = {13},
pages = {1039},
note = {Supplmental media available at http://www.math.technion.ac.il/iic/ela/elaarticles/articles/media/perplectic.html.},
url = {http://www.math.technion.ac.il/iic/ela/elaarticles/articles/media/perplectic.html}
}

Refereed Conference and Workshop Proceedings 
 Using NoSQL Databases for Streaming Network Analysis, Wylie, B., Dunlavy, D., IV, W.D. & Baumes, J., In Proceedings of the IEEE Symposium on Large Scale Data Analysis and Visualization (LDAV), 2012.
[Abstract] [BibTeX] [PDF]

Abstract: The highvolume, lowlatency world of network traffic presents significant obstacles for complex analysis techniques. The unique challenge of adapting powerful but highlatency models to realtime network streams is the basis of our cyber security project. In this paper we discuss our use of NoSQL databases in a framework that enables the application of computationally expensive models against a realtime network data stream. We describe how this approach transforms the highly constrained (and sometimes arcane) world of realtime network analysis into a more developer friendly model that relaxes many of the traditional constraints associated with streaming data. 
BibTeX:
@conference{WyDuDaBa12,
author = {Brian Wylie and Daniel Dunlavy and Warren Davis IV and Jeff Baumes},
title = {Using NoSQL Databases for Streaming Network Analysis},
booktitle = {Proceedings of the IEEE Symposium on Large Scale Data Analysis and Visualization (LDAV)},
year = {2012}
}

 TopicView: Visually Comparing Topic Models of Text Collections, Crossno, P.J., Wilson, A.T., Shead, T.M. & Dunlavy, D.M., In Proceedings of the 2011 IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Special Session on Text and Web Mining (TWM), 2011.
[Abstract] [BibTeX] [PDF]

Abstract: We present TopicView, an application for visually comparing and exploring multiple models of text corpora. TopicView uses multiple linked views to visually analyze both the conceptual content and the document relationships in models generated using different algorithms. To illustrate TopicView, we apply it to models created using two standard approaches: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Conceptual content is compared through the combination of (i) a bipartite graph matching LSA concepts with LDA topics based on the cosine similarities of model factors and (ii) a table containing the terms for each LSA concept and LDA topic listed in decreasing order of importance. Document relationships are examined through the combination of (i) sidebyside document similarity graphs, (ii) a table listing the weights for each document's contribution to each concept/topic, and (iii) a full text reader for documents selected in either of the graphs or the table. We demonstrate the utility of TopicView's visual approach to model assessment by comparing LSA and LDA models of two example corpora. 
BibTeX:
@conference{CrWiShDu11,
author = {Patricia J. Crossno and Andrew T. Wilson and Timothy M. Shead and Daniel M. Dunlavy},
title = {TopicView: Visually Comparing Topic Models of Text Collections},
booktitle = {Proceedings of the 2011 IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Special Session on Text and Web Mining (TWM)},
year = {2011}
}

 TopicView: Understanding Document Relationships Using Latent Dirichlet Allocation Models, Crossno, P.J., Wilson, A.T., Dunlavy, D.M. & Shead, T.M., In Proceedings of the IEEE Workshop on Interactive Visual Text Analytics for Decision Making, 2011.
[Abstract] [BibTeX] [PDF]

Abstract: Document similarity graphs are a useful visual metaphor for assessing the conceptual content of a corpus. Algorithms such as Latent Dirichlet Allocation (LDA) provide a means for constructing such graphs by extracting topics and their associated term lists, which can be converted into similarity measures. Given that users' understanding of the corpus content (and therefore their decisionmaking) depends upon the outputs provided by LDA as well as how those outputs are translated into a visual representation, an examination of how the LDA algorithm behaves and an understanding of the impact of this behavior on the final visualization is critical. We examine some puzzling relationships between documents with seemingly disparate topics that are linked in LDA graphs. We use TopicView, a visual analytics tool, to uncover the source of these unexpected connections. 
BibTeX:
@conference{CrWiDuSh11,
author = {Patricia J. Crossno and Andrew T. Wilson and Daniel M. Dunlavy and Timothy M. Shead},
title = {TopicView: Understanding Document Relationships Using Latent Dirichlet Allocation Models},
booktitle = {Proceedings of the IEEE Workshop on Interactive Visual Text Analytics for Decision Making},
year = {2011}
}

 Allatonce Optimization for Coupled Matrix and Tensor Factorizations, Acar, E., Kolda, T.G. & Dunlavy, D.M., In Proceedings of Mining and Learning with Graphs (MLG), 2011.
[Abstract] [BibTeX] [PDF]

Abstract: Joint analysis of data from multiple sources has the potential to improve our understanding of the underlying structures in complex data sets. For instance, in restaurant recommendation systems, recommendations can be based on rating histories of customers. In addition to rating histories, customers' social networks (e.g., Facebook friendships) and restaurant categories information (e.g., Thai or Italian) can also be used to make better recommendations. The task of fusing data, however, is challenging since data sets can be incomplete and heterogeneous, i.e., data consist of both matrices, e.g., the person by person social network matrix or the restaurant by category matrix, and higherorder tensors, e.g., the ratings tensor of the form restaurant by meal by person. In this paper, we are particularly interested in fusing data sets with the goal of capturing their underlying latent structures. We formulate this problem as a coupled matrix and tensor factorization (CMTF) problem where heterogeneous data sets are modeled by tting outerproduct models to higherorder tensors and matrices in a coupled manner. Unlike traditional approaches solving this problem using alternating algorithms, we propose an allatonce optimization approach called CMTFOPT (CMTFOPTimization), which is a gradientbased optimization approach for joint analysis of matrices and higherorder tensors. We also extend the algorithm to handle coupled incomplete data sets. Using numerical experiments, we demonstrate that the proposed allatonce approach is more accurate than the alternating least squares approach. 
BibTeX:
@conference{AcKoDu11,
author = {Evrim Acar and Tamara G. Kolda and Daniel M. Dunlavy},
title = {Allatonce Optimization for Coupled Matrix and Tensor Factorizations},
booktitle = {Proceedings of Mining and Learning with Graphs (MLG)},
year = {2011}
}

 ParaText: Scalable Text Modeling and Analysis, Dunlavy, D.M., Shead, T.M. & Stanton, E.T., In Proceedings of the 19th International ACM Symposium on High Performance Distributed Computing, Chicago, IL, USA , pp. 344347., June 2325 2010.
[Abstract] [BibTeX] [PDF]

Abstract: Automated analysis of unstructured text documents (e.g., web pages, newswire articles, research publications, business reports) is a key capability for solving important problems in areas including decision making, risk assessment, social network analysis, intelligence analysis, scholarly research and others. However, as data sizes continue to grow in these areas, scalable processing, modeling, and semantic analysis of text collections becomes essential. In this paper, we present the ParaText text analysis engine, a distributed memory software framework for processing, modeling, and analyzing collections of unstructured text documents. Results on several document collections using hundreds of processors are presented to illustrate the exibility, extensibility, and scalability of the the entire process of text modeling from raw data ingestion to application analysis. 
BibTeX:
@conference{DuShSt10,
author = {Daniel M. Dunlavy and Timothy M. Shead and Eric T. Stanton},
title = {ParaText: Scalable Text Modeling and Analysis},
booktitle = {Proceedings of the 19th International ACM Symposium on High Performance Distributed Computing},
year = {2010},
pages = {344347},
note = {(34% acceptance rate)}
}

 Scalable Tensor Factorizations with Missing Data, Acar, E., Dunlavy, D.M., Kolda, T.G. & Morup, M., In Proceedings of the 2010 SIAM International Conference on Data Mining, Columbus, OH, USA , April 2010.
[Abstract] [BibTeX] [URL] [PDF]

Abstract: The problem of missing data is ubiquitous in domains such as biomedical signal processing, network trace analysis, bibliometrics, social network analysis, chemometrics, computer vision, and communication networksall domains in which data collection is subject to occasional errors. Moreover, these data sets can be quite large and have more than two axes of variation, e.g., sender, receiver, time. Many applications in those domains aim to capture the underlying latent structure of the data; in other words, they need to factorize data sets with missing entries. If we cannot address the problem of missing data, many important data sets will be discarded or improperly analyzed. Therefore, we need a robust and scalable approach for factorizing multiway arrays (i.e., tensors) in the presence of missing data. We focus on one of the most wellknown tensor factorizations, CANDECOMP/PARAFAC (CP), and formulate the CP model as a weighted least squares problem that models only the known entries. We develop an algorithm called CPWOPT (CP Weighted OPTimization) using a rstorder optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factor tensors with noise and up to 70% missing data. Moreover, our approach is significantly faster than the leading alternative and scales to larger problems. To show the realworld usefulness of CPWOPT, we illustrate its applicability on a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes. 
BibTeX:
@conference{AcDuKoMo10,
author = {Evrim Acar and Daniel M. Dunlavy and Tamara G. Kolda and Morten Morup},
title = {Scalable Tensor Factorizations with Missing Data},
booktitle = {Proceedings of the 2010 SIAM International Conference on Data Mining},
year = {2010},
note = {(23% acceptance rate)},
url = {http://www.siam.org/proceedings/datamining/2010/dm10_061_acare.pdf}
}

 Link Prediction on Evolving Data using Matrix and Tensor Factorizations, Acar, E., Dunlavy, D.M. & Kolda, T.G., In Proceedings of the Workshop on Largescale Data Mining: Theory and Applications (LDMTA 2009), Miami, FL, USA , pp. 262269., December 2009.
[Abstract] [BibTeX] [DOI] [PDF]

Abstract: The data in many disciplines such as social networks, web analysis, etc. is linkbased, and the link structure can be exploited for many different data mining tasks. In this paper, we consider the problem of temporal link prediction: Given link data for time periods 1 through T, can we predict the links in time period T+1? Specifically, we look at bipartite graphs changing over time and consider matrix and tensorbased methods for predicting links. We present a weightbased method for collapsing multiyear data into a single matrix. We show how the wellknown Katz method for link prediction can be extended to bipartite graphs and, moreover, approximated in a scalable way using a truncated singular value decomposition. Using a CANDECOMP/PARAFAC tensor decomposition of the data, we illustrate the usefulness of exploiting the natural three dimensional structure of temporal link data. Through several numerical experiments, we demonstrate that both matrix and tensorbased techniques are effective for temporal link prediction despite the inherent difficulty of the problem. 
BibTeX:
@conference{AcDuKo09,
author = {Evrim Acar and Daniel M. Dunlavy and Tamara G. Kolda},
title = {Link Prediction on Evolving Data using Matrix and Tensor Factorizations},
booktitle = {Proceedings of the Workshop on Largescale Data Mining: Theory and Applications (LDMTA 2009)},
year = {2009},
pages = {262269},
note = {(26% acceptance rate)},
doi = {http://dx.doi.org/10.1109/ICDMW.2009.54}
}

 LSAView: A Tool for Visual Exploration of Latent Semantic Modeling, Crossno, P., Dunlavy, D. & Shead, T., In IEEE Symposium on Visual Analytics Science and Technology, Atlantic City, NJ , October 2009.
[Abstract] [BibTeX] [DOI] [PDF]

Abstract: Latent Semantic Analysis (LSA) is a commonlyused method for automated processing, modeling, and analysis of unstructured text data. One of the biggest challenges in using LSA is determining the appropriate model parameters to use for different data domains and types of analyses. Although automated methods have been developed to make rank and scaling parameter choices, these approaches often make choices with respect to noise in the data, without an understanding of how those choices impact analysis and problem solving. Further, no tools currently exist to explore the relationships between an LSA model and analysis methods. Our work focuses on how parameter choices impact analysis and problem solving. In this paper, we present LSAView, a system for interactively exploring parameter choices for LSA models. We illustrate the use of LSAView's small multiple views, linked matrixgraph views, and data views to analyze parameter selection and application in the context of graph layout and clustering. 
BibTeX:
@conference{CrDuSh09,
author = {P.J. Crossno and D.M. Dunlavy and T.M. Shead},
title = {LSAView: A Tool for Visual Exploration of Latent Semantic Modeling},
booktitle = {IEEE Symposium on Visual Analytics Science and Technology},
year = {2009},
doi = {http://dx.doi.org/10.1109/VAST.2009.5333428}
}

 Formulations for SurrogateBased Optimization with Data Fit, Multifidelity, and ReducedOrder Models, Eldred, M.S. & Dunlavy, D.M., In Proceedings of the 11th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference, (AIAA20067117), September 2006.
[Abstract] [BibTeX] [PDF]

Abstract: Surrogatebased optimization (SBO) methods have become established as effective techniques for engineering design problems through their ability to tame nonsmoothness and reduce computational expense. Possible surrogate modeling techniques include data fits (local, multipoint, or global), multifidelity model hierarchies, and reducedorder models, and each of these types has unique features when employed within SBO. This paper explores a number of SBO algorithmic variations and their effect for different surrogate modeling cases. First, general facilities for constraint management are explored through approximate subproblem formulations (e.g., direct surrogate), constraint relaxation techniques (e.g., homotopy), merit function selections (e.g., augmented Lagrangian), and iterate acceptance logic selections (e.g., filter methods). Second, techniques specialized to particular surrogate types are described. Computational results are presented for sets of algebraic test problems and an engineering design application solved using the DAKOTA software. 
BibTeX:
@conference{ElDu06,
author = {Michael S. Eldred and Daniel M Dunlavy},
title = {Formulations for SurrogateBased Optimization with Data Fit, Multifidelity, and ReducedOrder Models},
booktitle = {Proceedings of the 11th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference},
year = {2006},
number = {AIAA20067117},
note = {Refereed conference paper}
}

 From TREC to DUC to TREC Again, Conroy, J.M., O'Leary, D.P. & Dunlavy, D.M., In Proceedings of the Twelfth Text Retrieval Conference (TREC), , November 2003.
[Abstract] [BibTeX] [URL] [PDF]

Abstract: The Document Understanding Conference (DUC) uses TREC data as a test bed for algorithms for single and multiple document summarization. For the 2003 DUC task of choosing relevant and novel sentences, we tested a system based on a Hidden Markov Model (HMM). In this work, we use variations of this system on the tasks of the TREC Novelty Track for finding relevant and new sentences. 
BibTeX:
@conference{CoOlDu03,
author = {John M. Conroy and Dianne P. O'Leary and Daniel M. Dunlavy},
title = {From TREC to DUC to TREC Again},
booktitle = {Proceedings of the Twelfth Text Retrieval Conference (TREC)},
year = {2003},
url = {http://trec.nist.gov/pubs/trec12/papers/ccs.novelty.pdf}
}

 Performance of a ThreeStage System for MultiDocument Summarization, Dunlavy, D.M., Conroy, J.M., Schlesinger, J.D., Goodman, S.A., Okurowski, M.E., O'Leary, D.P. & van Halteren, H., In Proceedings of the Document Understanding Conference (DUC), , June 2003.
[Abstract] [BibTeX] [URL] [PDF]

Abstract: Our participation in DUC 2003 was limited to Tasks 2, 3, and 4. Although the tasks differed slightly in their goals, we applied the same approach in each case: preprocess the data for input to our system, apply our singledocument and multidocument summarization algorithms, postprocess the data for DUC evaluation. We did not use the topic descriptions for Task 2 or the viewpoint descriptions for Task 3, and used only the novel sentences for Task 4. The preprocessing of the data for our needs consisted of term identification, partofspeech (POS) tagging, sentence boundary detection and SGML DTD processing. With the exception of sentence boundary detection for Task 4 (the test data was sentencedelimited using SGML tags), each of these preprocessing tasks was performed on all of the documents. The summarization algorithms were enhanced versions of those presented by members of our group in the past DUC evaluations (Conroy et al., 2001; Schlesinger et al., 2002). Previous postprocessing consisted of removing lead adverbs such as "And" or "But" to make our summaries flow more easily. For DUC 2003, we added more extensive editing, eliminating part or all of selected sentences. 
BibTeX:
@conference{DuCoScGo03,
author = {Daniel M. Dunlavy and John M. Conroy and Judith D. Schlesinger and Sarah A. Goodman and Mary Ellen Okurowski and Dianne P. O'Leary and Han van Halteren},
title = {Performance of a ThreeStage System for MultiDocument Summarization},
booktitle = {Proceedings of the Document Understanding Conference (DUC)},
year = {2003},
url = {http://duc.nist.gov/pubs/2003final.papers/schlesinger.nsa.ps}
}

Other Conference and Workshop Proceedings 
 CPOPT: Optimization for Fitting CANDECOMP/PARAFAC Models, Acar, E., Kolda, T.G. & Dunlavy, D.M., In CASTA 2008: Workshop on Computational Algebraic Statistics, Theories and Applications, , December 2008.
[Abstract] [BibTeX] [PDF]

Abstract: Tensor decompositions (e.g., higherorder analogues of matrix decompositions) are powerful tools for data analysis. In particular, the CANDECOMP/PARAFAC (CP) model has proved useful in many applications such chemometrics, signal processing, and web analysis. The problem of computing the CP decomposition is typically solved using an alternating least squares (ALS) approach. We discuss the use of optimizationbased algorithms for CP, including how to efficiently compute the derivatives necessary for the optimization methods. Numerical studies highlight the positive features of our CPOPT algorithms, as compared with ALS and GaussNewton approaches. 
BibTeX:
@conference{AcKoDu08,
author = {Evrim Acar and Tamara G. Kolda and Daniel M. Dunlavy},
title = {CPOPT: Optimization for Fitting CANDECOMP/PARAFAC Models},
booktitle = {CASTA 2008: Workshop on Computational Algebraic Statistics, Theories and Applications},
year = {2008}
}

 QCS: A Tool for Querying, Clustering, and Summarizing Documents, Dunlavy, D.M., Conroy, J.M. & O'Leary, D.P., In Proceedings of the HLTNAACL Conference, , June 2003.
[Abstract] [BibTeX]

Abstract: The QCS information retrieval (IR) system is presented as a tool for querying, clustering, and summarizing document sets. QCS has been developed as a modular development framework, and thus facilitates the inclusion of new technologies targeting these three IR tasks. Details of the system architecture, the QCS interface, and preliminary results are presented. 
BibTeX:
@conference{DuCoOl03,
author = {Daniel M. Dunlavy and John M. Conroy and Dianne P. O'Leary},
title = {QCS: A Tool for Querying, Clustering, and Summarizing Documents},
booktitle = {Proceedings of the HLTNAACL Conference},
year = {2003}
}

Technical Reports 
 ParaText  Scalable Solutions for Processing and Searching Very Large Document Collections: Final LDRD Report, Dunlavy, D.M., Shead, T.M., Crossno, P.J. & Stanton, E.T.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20106269, September 2010.
[BibTeX] [PDF]

BibTeX:
@techreport{SAND20106269,
author = {Daniel M. Dunlavy and Timothy M. Shead and Patricia J. Crossno and Eric T. Stanton},
title = {ParaText  Scalable Solutions for Processing and Searching Very Large Document Collections: Final LDRD Report},
year = {2010},
number = {SAND20106269}
}

 Poblano v1.0: A Matlab Toolbox for GradientBased Optimization, Dunlavy, D.M., Kolda, T.G. & Acar, E.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20101422, March 2010.
[Abstract] [BibTeX] [PDF]

Abstract: We present Poblano v1.0, a Matlab toolbox for solving gradientbased unconstrained optimization problems. Poblano implements three optimization methods (nonlinear conjugate gradients, limitedmemory BFGS, and truncated Newton) that require only first order derivative information. In this paper, we describe the Poblano methods, provide numerous examples on how to use Poblano, and present results of Poblano used in solving problems from a standard test collection of unconstrained optimization problems. 
BibTeX:
@techreport{SAND20101422,
author = {Daniel M. Dunlavy and Tamara G. Kolda and Evrim Acar},
title = {Poblano v1.0: A Matlab Toolbox for GradientBased Optimization},
year = {2010},
number = {SAND20101422}
}

 Scalable Tensor Factorizations with Missing Data, Acar, E., Dunlavy, D.M., Kolda, T.G. & Morup, M.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20096764, October 2009.
[Abstract] [BibTeX] [PDF]

Abstract: The problem of missing data is ubiquitous in domains such as biomedical signal processing, network trace analysis, bibliometrics, social network analysis, chemometrics, computer vision, and communication networksall domains in which data collection is subject to occasional errors. Moreover, these data sets can be quite large and have more than two axes of variation, e.g., sender, receiver, time. Many applications in those domains aim to capture the underlying latent structure of the data; in other words, they need to factorize data sets with missing entries. If we cannot address the problem of missing data, many important data sets will be discarded or improperly analyzed. Therefore, we need a robust and scalable approach for factorizing multiway arrays (i.e., tensors) in the presence of missing data. We focus on one of the most wellknown tensor factorizations, CANDECOMP/PARAFAC (CP), and formulate the CP model as a weighted least squares problem that models only the known entries. We develop an algorithm called CPWOPT (CP Weighted OPTimization) using a rstorder optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factor tensors with noise and up to 70% missing data. Moreover, our approach is significantly faster than the leading alternative and scales to larger problems. To show the realworld usefulness of CPWOPT, we illustrate its applicability on a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes. 
BibTeX:
@techreport{SAND20096764,
author = {Evrim Acar and Daniel M. Dunlavy and Tamara G. Kolda and Morten Morup},
title = {Scalable Tensor Factorizations with Missing Data},
year = {2009},
number = {SAND20096764}
}

 Relationships Between Accuracy and Diversity in Heterogeneous Ensemble Classifiers, Gilpin, S.A. & Dunlavy, D.M.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20096940C 2009.
[Abstract] [BibTeX] [PDF]

Abstract: The relationship between ensemble classier performance and the diversity of the predictions made by ensemble base classiers is explored in the context of heterogeneous ensemble classiers. Specically, numerical studies indicate that heterogeneous ensembles can be generated from base classiers of homogeneous ensemble classiers that are both signicantly more accurate and diverse than the base classiers. Results for experiments using several standard diversity measures on a variety of binary and multiclass classication problems are presented to illustrate the improved performance. 
BibTeX:
@techreport{SAND20096940C,
author = {Sean A. Gilpin and Daniel M. Dunlavy},
title = {Relationships Between Accuracy and Diversity in Heterogeneous Ensemble Classifiers},
year = {2009},
number = {SAND20096940C}
}

 Semisupervised Named Entity Recognition, Turpen, T.P. & Dunlavy, D.M., In The Computer Science Research Institute Summer Proceedings, , Sandia National Laboratories, Albuquerque, NM and Livermore, CA 2009.
[BibTeX] [PDF]

BibTeX:
@incollection{SAND20103083P,
author = {Taylor P. Turpen and Daniel M. Dunlavy},
title = {Semisupervised Named Entity Recognition},
booktitle = {The Computer Science Research Institute Summer Proceedings},
publisher = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
year = {2009}
}

 An Optimization Approach for Fitting Canonical Tensor Decompositions, Acar, E., Kolda, T.G. & Dunlavy, D.M.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20090857, February 2009.
[Abstract] [BibTeX] [PDF]

Abstract: Tensor decompositions are higherorder analogues of matrix decompositions and have proven to be powerful tools for data analysis. In particular, we are interested in the canonical tensor decomposition, otherwise known as the CANDECOMP/PARAFAC decomposition (CPD), which expresses a tensor as the sum of component rankone tensors and is used in a multitude of applications such as chemometrics, signal processing, neuroscience, and web analysis. The task of computing the CPD, however, can be difficult. The typical approach is based on alternating least squares (ALS) optimization, which can be remarkably fast but is not very accurate. Previously, nonlinear least squares (NLS) methods have also been recommended; existing NLS methods are accurate but slow. In this paper, we propose the use of gradientbased optimization methods. We discuss the mathematical calculation of the derivatives and further show that they can be computed efficiently, at the same cost as one iteration of ALS. Computational experiments demonstrate that the gradientbased optimization methods are much more accurate than ALS and orders of magnitude faster than NLS. 
BibTeX:
@techreport{AcKoDu09,
author = {Evrim Acar and Tamara G. Kolda and Daniel M. Dunlavy},
title = {An Optimization Approach for Fitting Canonical Tensor Decompositions},
year = {2009},
number = {SAND20090857}
}

 Heterogeneous Ensemble Classification, Gilpin, S.A. & Dunlavy, D.M.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20090203P, January 2009.
[Abstract] [BibTeX] [URL] [PDF]

Abstract: The problem of multiclass classification is explored using heterogeneous ensemble classifiers. Heterogeneous ensembles classifiers are defined as ensembles, or sets, of classifier models created using more than one type of classification algorithm. For example, the outputs of decision tree classifiers could be combined with the outputs of support vector machines (SVM) to create a heterogeneous ensemble. We explore how, when, and why heterogeneous ensembles should be used over other classification methods. Specifically we look into the use of bagging and different fusion methods for heterogeneous and homogeneous ensembles. We also introduce the Hemlock framework, a software tool for creating and testing heterogeneous ensembles. 
BibTeX:
@techreport{SAND20090203P,
author = {Sean A. Gilpin and Daniel M. Dunlavy},
title = {Heterogeneous Ensemble Classification},
booktitle = {CSRI Summer Proceedings 2008, Technical Report SAND20077977, Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
year = {2009},
number = {SAND20090203P},
url = {http://www.cs.sandia.gov/CSRI/Proceedings/}
}

 Trilinos CMake Evaluation, Barlett, R.A., Dunlavy, D.M., Guillen, E.J. & Shead, T.M.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20087593, October 2008.
[Abstract] [BibTeX] [PDF]

Abstract: Autotools is terrible. We need something better. How about CMake? Here we document our evaluation of CMake as a build system and testing infrastructure to replace the current autotoolsbased system. 
BibTeX:
@techreport{SAND20087593,
author = {Ross A. Barlett and Daniel M. Dunlavy and Estaban J. Guillen and Timothy M. Shead},
title = {Trilinos CMake Evaluation},
year = {2008},
number = {SAND20087593}
}

 Heterogeneous Ensemble Classification, Dunlavy, D.M. & Gilpin, S.A., In Proceedings of the 2008 Sandia Workshop on Data Mining and Data Analysis, (SAND20086109), pp. 3335., Sandia National Laboratories, Albuquerque, NM and Livermore, CA, September 2008.
[Abstract] [BibTeX] [PDF]

Abstract: Recent results in solving classification problems indicate that the use of ensembles classifier models often leads to improved performance over using single classifier models. In this work, we discuss heterogeneous ensemble classifier models, where the member classifier models are not of the same model type. A discussion of the issues associated with creating such classifiers along with a brief description of the new HEterogeneous Machine Learning Open Classification Kit (HEMLOCK) will be presented. Results for a problem of text classification and several standard multiclass test problems illustrate the performance of heterogeneous ensemble classifiers. 
BibTeX:
@incollection{SAND20086109,
author = {Daniel M. Dunlavy and Sean A. Gilpin},
title = {Heterogeneous Ensemble Classification},
booktitle = {Proceedings of the 2008 Sandia Workshop on Data Mining and Data Analysis},
publisher = {Sandia National Laboratories, Albuquerque, NM and Livermore, CA},
year = {2008},
number = {SAND20086109},
pages = {3335}
}

 Proceedings of the 2008 Sandia Workshop on Data Mining and Data Analysis, Brandt, J.M., Dunlavy, D.M. & Gentile, A.C.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20086109 2008.
[Abstract] [BibTeX] [PDF]

Abstract: In this document, we report the proceedings of the 2008 Sandia Workshop on Data Mining and Data Analysis. This year?s workshop focused on the the data analysis capabilities and needs of the space systems, satellite, groundbased monitoring, and remote sensing communities. In addition to the extended abstracts of each presentation of the workshop, summaries of the discussion sessions and resultant recommendations of the workshop committee are given. 
BibTeX:
@techreport{SAND20086109a,
author = {James M. Brandt and Daniel M. Dunlavy and Ann C. Gentile},
title = {Proceedings of the 2008 Sandia Workshop on Data Mining and Data Analysis},
year = {2008},
number = {SAND20086109}
}

 Yucca Mountain LSN Archive Assistant, Basilico, J.D., Dunlavy, D.M., Verzi, S.J., Bauer, T.L. & Shaneyfelt, W.. Sandia National Laboratories, Technical Report SAND20081622, March 2008.
[Abstract] [BibTeX] [PDF]

Abstract: This report describes the Licensing Support Network (LSN) Assistant applicationa set of toolsthe Licensing Support Network (LSN) Assistantfor categorizing email messages and documents, and investigating and correcting existing archives of categorized email messages and documents. The two main tools in the LSN Assistant are the LSN Archive Assistant (LSNAA) tool for recategorizing manually labeled email messages and documents and the LSN Realtime Assistant (LSNRA) tool for categorizing new email messages and documents. This report focuses on the the LSNAAtool. There are two main components of the LSNAA tool. The first is the Sandia Categorizer Framework, which is responsible for providing categorizations for documents in an archive and storing them in an appropriate Categorization Database. The second is the actual user interface, which primarily interacts with the Categorization Database, providing a way for finding and correcting categorizations errors in the database. A procedure for applying the LSNAA tool and an example use case of the LSNAA tool applied to a set of email messages are provided. Performance results of the categorization model designed for this example use case are presented. 
BibTeX:
@techreport{SAND20081622,
author = {Justin D. Basilico and Daniel M. Dunlavy and Stephen J. Verzi and Travis L. Bauer and Wendy Shaneyfelt},
title = {Yucca Mountain LSN Archive Assistant},
year = {2008},
number = {SAND20081622}
}

 QCS: A System for Querying, Clustering and Summarizing Documents, Dunlavy, D.M., O'Leary, D.P., Conroy, J.M. & Schlesinger, J.D.. Sandia National Laboratories, Technical Report Technical Report Number SAND20065000, October 2006.
[Abstract] [BibTeX] [PDF]

Abstract: Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel hybrid information retrieval systemthe Query, Cluster, Summarize (QCS) systemwhich is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of components methods in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) along with the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (endtoend) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical kmeans is used for the document clustering, and a method coupling sentence trimming, and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules. 
BibTeX:
@techreport{SAND20065000,
author = {Daniel M. Dunlavy and Dianne P. O'Leary and John M. Conroy and Judith D. Schlesinger},
title = {QCS: A System for Querying, Clustering and Summarizing Documents},
year = {2006},
number = {Technical Report Number SAND20065000},
note = {newer version available}
}

 DAKOTA, A Multilevel Parallel ObjectOriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 4.0 Users Manual, Eldred, M.S., Brown, S.L., Adams, B.M., Dunlavy, D.M., Gay, D.M., Swiler, L.P., Giunta, A.A., Hart, W.E., JeanPaulWatson, Eddy, J.P., Griffin, J.D., Hough, P.D., Kolda, T.G., MartinezCanales, M.L. & J.Williams, P.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20066637, October 2006.
[Abstract] [BibTeX] [URL] [PDF]

Abstract: The DAKOTA (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a exible and extensible interface between simulation codes and iterative analysis methods. DAKOTA contains algorithms for optimization with gradient and nongradientbased methods; uncertainty quantication with sampling, reliability, and stochastic nite element methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogatebased optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing objectoriented design to implement abstractions of the key components required for iterative systems analyses, the DAKOTA toolkit provides a exible and extensible problemsolving environment for design and performance analysis of computational models on high performance computers. 
BibTeX:
@techreport{SAND20066337,
author = {Michael S. Eldred and Shannon L. Brown and Brian M. Adams and Daniel M. Dunlavy and David M. Gay and Laura P. Swiler and Anthony A. Giunta and William E. Hart and JeanPaulWatson and John P. Eddy and Josh D. Griffin and Patty D. Hough and Tammy G. Kolda and Monica L. MartinezCanales and Pamela J.Williams},
title = {DAKOTA, A Multilevel Parallel ObjectOriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 4.0 Users Manual},
year = {2006},
number = {SAND20066637},
url = {http://www.cs.sandia.gov/DAKOTA/licensing/release/Users4.0.pdf}
}

 DAKOTA, A Multilevel Parallel ObjectOriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 4.0 Developers Manual, Eldred, M.S., Brown, S.L., Adams, B.M., Dunlavy, D.M., Gay, D.M., Swiler, L.P., Giunta, A.A., Hart, W.E., JeanPaulWatson, Eddy, J.P., Griffin, J.D., Hough, P.D., Kolda, T.G., MartinezCanales, M.L. & J.Williams, P.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20064056, September 2006.
[Abstract] [BibTeX] [URL] [PDF]

Abstract: The DAKOTA (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a exible and extensible interface between simulation codes and iterative analysis methods. DAKOTA contains algorithms for optimization with gradient and nongradientbased methods; uncertainty quantication with sampling, reliability, and stochastic nite element methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogatebased optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing objectoriented design to implement abstractions of the key components required for iterative systems analyses, the DAKOTA toolkit provides a exible and extensible problemsolving environment for design and performance analysis of computational models on high performance computers. 
BibTeX:
@techreport{SAND20064056,
author = {Michael S. Eldred and Shannon L. Brown and Brian M. Adams and Daniel M. Dunlavy and David M. Gay and Laura P. Swiler and Anthony A. Giunta and William E. Hart and JeanPaulWatson and John P. Eddy and Josh D. Griffin and Patty D. Hough and Tammy G. Kolda and Monica L. MartinezCanales and Pamela J.Williams},
title = {DAKOTA, A Multilevel Parallel ObjectOriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 4.0 Developers Manual},
year = {2006},
number = {SAND20064056},
url = {http://www.cs.sandia.gov/DAKOTA/licensing/release/Developers4.0.pdf}
}

 DAKOTA, A Multilevel Parallel ObjectOriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 4.0 Reference Manual, Eldred, M.S., Brown, S.L., Adams, B.M., Dunlavy, D.M., Gay, D.M., Swiler, L.P., Giunta, A.A., Hart, W.E., JeanPaulWatson, Eddy, J.P., Griffin, J.D., Hough, P.D., Kolda, T.G., MartinezCanales, M.L. & J.Williams, P.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20064055, September 2006.
[Abstract] [BibTeX] [URL] [PDF]

Abstract: The DAKOTA (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a exible and extensible interface between simulation codes and iterative analysis methods. DAKOTA contains algorithms for optimization with gradient and nongradientbased methods; uncertainty quantication with sampling, reliability, and stochastic nite element methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogatebased optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing objectoriented design to implement abstractions of the key components required for iterative systems analyses, the DAKOTA toolkit provides a exible and extensible problemsolving environment for design and performance analysis of computational models on high performance computers. 
BibTeX:
@techreport{SAND20064055,
author = {Michael S. Eldred and Shannon L. Brown and Brian M. Adams and Daniel M. Dunlavy and David M. Gay and Laura P. Swiler and Anthony A. Giunta and William E. Hart and JeanPaulWatson and John P. Eddy and Josh D. Griffin and Patty D. Hough and Tammy G. Kolda and Monica L. MartinezCanales and Pamela J.Williams},
title = {DAKOTA, A Multilevel Parallel ObjectOriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 4.0 Reference Manual},
year = {2006},
number = {SAND20064055},
url = {http://www.cs.sandia.gov/DAKOTA/licensing/release/Reference4.0.pdf}
}

 Multilinear Algebra for Analyzing Data with Multiple Linkages, Dunlavy, D.M., Kolda, T.G. & Kegelmeyer, W.P.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20062079, April 2006.
[Abstract] [BibTeX] [PDF]

Abstract: Link analysis typically focuses on a single type of connection, e.g., two journal papers are linked because they are written by the same author. However, often we want to analyze data that has multiple linkages between objects, e.g., two papers may have the same keywords and one may cite the other. The goal of this paper is to show that multilinear algebra provides a tool for multilink analysis. We analyze five years of publication data from journals published by the Society for Industrial and Applied Mathematics. We explore how papers can be grouped in the context of multiple link types using a tensor to represent all the links between them. A PARAFAC decomposition on the resulting tensor yields information similar to the SVD decomposition of a standard adjacency matrix. We show how the PARAFAC decomposition can be used to understand the structure of the document space and define paperpaper similarities based on multiple linkages. Examples are presented where the decomposed tensor data is used to find papers similar to a body of work (e.g., related by topic or similar to a particular author's papers), find related authors using linkages other than explicit coauthorship or citations, distinguish between papers written by different authors with the same name, and predict the journal in which a paper was published. 
BibTeX:
@techreport{SAND20062079,
author = {Daniel M. Dunlavy and Tamara G. Kolda and W. Philip Kegelmeyer},
title = {Multilinear Algebra for Analyzing Data with Multiple Linkages},
year = {2006},
number = {SAND20062079}
}

 Homotopy Optimization Methods for Global Optimization, Dunlavy, D.M. & O'Leary, D.P.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20057495, December 2005.
[Abstract] [BibTeX] [PDF]

Abstract: We define a new method for global optimization, the Homotopy Optimization Method (HOM). This method differs from previous homotopy and continuation methods in that its aim is to find a minimizer for each of a set of values of the homotopy parameter, rather than to follow a path of minimizers. We define a second method, called HOPE, by allowing HOM to follow an ensemble of points obtained by perturbation of previous ones. We relate this new method to standard methods such as simulated annealing and show under what circumstances it is superior. We present results of extensive numerical experiments demonstrating performance of HOM and HOPE. 
BibTeX:
@techreport{SAND20057495,
author = {Daniel M. Dunlavy and Dianne P. O'Leary},
title = {Homotopy Optimization Methods for Global Optimization},
year = {2005},
number = {SAND20057495}
}

 Structure Preserving Algorithms for Perplectic Eigenproblems, Mackey, D.S., Mackey, N. & Dunlavy, D.M.. Manchester Centre for Computational Mathematics, Technical Report Numerical Analysis Report No. 427, May 2003.
[Abstract] [BibTeX] [URL] [PDF]

Abstract: Abstract: Structured real canonical forms for matrices in R^n x n that are symmetric or skewsymmetric about the antidiagonal as well as the main diagonal are presented, and Jacobi algorithms for solving the complete eigenproblem for three of these four classes of matrices are developed. Based on the direct solution of 4 x 4 subproblems constructed via quaternions, the algorithms calculate structured orthogonal bases for the invariant subspaces of the associated matrix. In addition to preserving structure, these methods are inherently parallelizable, numerically stable, and show asymptotic quadratic convergence. 
BibTeX:
@techreport{MaMaDu03,
author = {D. Steven Mackey and Niloufer Mackey and Daniel M. Dunlavy},
title = {Structure Preserving Algorithms for Perplectic Eigenproblems},
year = {2003},
number = {Numerical Analysis Report No. 427},
note = {newer version available},
url = {http://www.maths.man.ac.uk/~nareports/narep427.pdf}
}

 Numerical SteadyState Solutions of NonLinear DAE's Arising in RF Communication Circuit Design, Dunlavy, D., Joo, S., Lin, R., Marcia, R., Minut, A. & Sun, J.. Institute for Mathematics and its Application, Technical Report IMA Preprint Series 17521, February 2001.
[Abstract] [BibTeX] [URL] [PDF]

Abstract: Large systems of coupled nonlinear differential algebraic equations (DAE) arise naturally in applications areas like in the design of radiofrequency integrated circuits. The steadystate response of a nonlinear system to periodic or quasiperiodic stimulus is of primary interest to a designer because certain aspects of system performance are easier to characterize and verify in steady state. For example: noise, distortion, blocking are best measured when a circuit is in this state. The system of equations generated in circuit design has the following form, "f(v(t)) + d/dt q(v(t))  b(t) = 0" where m is the number of circuit nodes excluding the reference, q(v(t)) is the mvector of sums of capacitor charges at each node, f(v(t)) is the mvector of sums of resistor currents at each node, b(t) is the mvector of input currents, and v(t) is the mvector of node voltages. "Closed form" solutions to these DAE's are extremely difficult, if not impossible, to obtain because of the size of the problem and the complexity of nonlinear models. Computing the solutions numerically is a highly effective alternative to computing the solutions analytically. 
BibTeX:
@techreport{DuJoLiMaMiSu01,
author = {Danny Dunlavy and Sookhyung Joo and Runchang Lin and Roummel Marcia and Aurelia Minut and Jianzhong Sun},
title = {Numerical SteadyState Solutions of NonLinear DAE's Arising in RF Communication Circuit Design},
year = {2001},
number = {IMA Preprint Series 17521},
url = {http://www.ima.umn.edu/preprints/feb01/17521.pdf}
}

Expository Articles, Etc. 
 Mathematical Challenges in Cybersecurity, Dunlavy, D.M., Hendrickson, B. & Kolda, T.G.. Sandia National Laboratories, Albuquerque, NM and Livermore, CA, Technical Report SAND20090805, February 2009.
[Abstract] [BibTeX] [PDF]

Abstract: This white paper is a response to a recent report on cybersecurity submitted to the U.S. Department of Energy (Catlett, 2008). We discuss what we see as some of the major mathematical challenges in cybersecurity. The document is not intended to be comprehensive, but rather the articulation of a few key themes. We have organized our thoughts into three challenge areas: modeling largescale networks, threat discovery, and network dynamics. 
BibTeX:
@techreport{SAND20090805,
author = {Daniel M. Dunlavy and Bruce Hendrickson and Tamara G. Kolda},
title = {Mathematical Challenges in Cybersecurity},
year = {2009},
number = {SAND20090805}
}

Dissertation and Thesis 
 Homotopy Optimization Methods and Protein Structure Prediction, Dunlavy, D.M., School: AMSC Program, University of Maryland , August 2005.
[Abstract] [BibTeX] [PDF]

Abstract: The focus of this dissertation is a new method for solving unconstrained minimization problemshomotopy optimization using perturbations and ensembles (HOPE). HOPE is a homotopy optimization method that finds a sequence of minimizers of a homotopy function that maps a template function to the target function, the function from our minimization problem. To increase the likelihood of finding a global minimizer, points in the sequence are perturbed and used as starting points to find other minimizers. Points in the resulting ensemble of minimizers are used as starting points to find minimizers of the homotopy function as it deforms the template function into the target function. We show that certain choices of the parameters used in HOPE lead to instances of existing methods: probabilityone homotopy methods, stochastic search methods, and simulated annealing. We use these relations and further analysis to demonstrate the convergence properties of HOPE. The development of HOPE was motivated by the protein folding problem, the problem of predicting the structure of a protein as it exists in nature, given its amino acid sequence. However, we demonstrate that HOPE is also successful as a general purpose minimization method for nonconvex functions. Numerical experiments performed to test HOPE include solving several standard test problems and the protein folding problem using two different protein models. In the first model, proteins are modeled as chains of charged particles in two dimensions. The second is a backbone protein model, where the particles represent amino acids, each corresponding to a hydrophobic, hydrophilic, or neutral residue. In most of these experiments, standard homotopy functions are used in HOPE. Additionally, several new homotopy functions are introduced for solving the protein folding problems to demonstrate how HOPE can be used to exploit the properties or structure of particular problems. Results of experiments demonstrate that HOPE outperforms several methods often used for solving unconstrained minimization problemsa quasiNewton method with BFGS Hessian update, a globally convergent variant of Newton's method, and ensemblebased simulated annealing. 
BibTeX:
@phdthesis{Du05,
author = {Daniel M. Dunlavy},
title = {Homotopy Optimization Methods and Protein Structure Prediction},
school = {AMSC Program, University of Maryland},
year = {2005}
}

 QCS: An Information Retrieval System for Improving Efficiency in Scientific Literature Searches, Dunlavy, D.M., School: AMSC Program, University of Maryland , August 2003.
[BibTeX] [PDF]

BibTeX:
@mastersthesis{Du03,
author = {Daniel M. Dunlavy},
title = {QCS: An Information Retrieval System for Improving Efficiency in Scientific Literature Searches},
school = {AMSC Program, University of Maryland},
year = {2003}
}

Created by JabRef on 11/16/2010.

