|
Contact
Daniel M. Dunlavy
Senior Member of Technical Staff
dmdunla@sandia.gov
(505) 284-6092
Related Links
Department
Center
CSRI
|
Publications
|
Journal Articles
-
"QCS: A System for Querying, Clustering and Summarizing Documents."
Daniel M. Dunlavy, Dianne P. O'Leary, John M. Conroy, and Judith D. Schlesinger.
Information Processing & Management,
in press (accepted October 2006).
[Preprint PDF]
- Abstract:
Information retrieval systems consist of many complicated components.
Research and development of such systems is often hampered by the
difficulty in evaluating how each particular component would behave
across multiple systems. We present a novel hybrid information
retrieval system---the Query, Cluster, Summarize (QCS) system---which
is portable, modular, and permits experimentation with different
instantiations of each of the constituent text analysis components.
Most importantly, the combination of the three types of components
methods in the QCS design improves retrievals by providing users more
focused information organized by topic.
We demonstrate the improved performance by a series of experiments
using standard test sets from the Document Understanding Conferences
(DUC) along with the best known automatic metric for summarization
system evaluation, ROUGE. Although the DUC data and evaluations
were originally designed to test multidocument summarization, we
developed a framework to extend it to the task of evaluation for each
of the three components: query, clustering, and summarization. Under
this framework, we then demonstrate that the QCS system (end-to-end)
achieves performance as good as or better than the best summarization
engines.
Given a query, QCS retrieves relevant documents, separates the
retrieved documents into topic clusters, and creates a single summary
for each cluster. In the current implementation, Latent Semantic
Indexing is used for retrieval, generalized spherical k-means is used
for the document clustering, and a method coupling sentence
"trimming," and a hidden Markov model, followed by a pivoted QR
decomposition, is used to create a single extract summary for each
cluster. The user interface is designed to provide access to detailed
information in a compact and useful format.
Our system demonstrates the feasibility of assembling an effective IR
system from existing software libraries, the usefulness of the
modularity of the design, and the value of this particular combination
of modules.
- Key words:
information retrieval, latent semantic indexing, clustering,
summarization, text processing, sentence trimming
-
"HOPE: A Homotopy Optimization Method for Protein Structure Prediction."
Daniel M. Dunlavy, Dianne P. O'Leary, Dmitri Klimov and D. Thirumalai.
Journal of Computational Biology,
12(10):1275-1288, December 2005.
[PDF]
- Abstract: We use a homotopy optimization method,
HOPE, to minimize the potential energy associated with a protein
model. The method uses the minimum energy conformation of one protein
as a template to predict the lowest energy structure of a query
sequence. This objective is achieved by following a path of
conformations determined by a homotopy between the potential energy
functions for the two proteins. Ensembles of solutions are produced by
perturbing conformations along the path, increasing the likelihood of
predicting correct structures. Successful results are presented for
pairs of homologous proteins, where HOPE is compared to a variant of
Newton's method and to simulated annealing.
- Key words: protein structure prediction, energy
minimization, global optimization, homotopy method, simulated
annealing.
-
"Structure Preserving Algorithms for Perplectic Eigenproblems."
D. Steven Mackey, Niloufer Mackey, and Daniel M. Dunlavy.
Electronic Journal of Linear Algebra,
13:10-39, February 2005.
[PDF]
[Supplemental Material]
- Abstract: Structured real canonical forms for
matrices in R^{n x n} that are symmetric or skewsymmetric about the
anti-diagonal as well as the main diagonal are presented, and Jacobi
algorithms for solving the complete eigenproblem for three of these
four classes of matrices are developed. Based on the direct solution
of 4 x 4 subproblems constructed via quaternions, the algorithms
calculate structured orthogonal bases for the invariant subspaces of
the associated matrix. In addition to preserving structure, these
methods are inherently parallelizable, numerically stable, and show
asymptotic quadratic convergence.
- Key words: Canonical form, Eigenvalues,
Eigenvectors, Jacobi method, Double structure preserving, Symmetric,
Persymmetric, Skew-symmetric, Perskew-symmetric, Centrosymmetric,
Perplectic, Quaternion, Tensor product, Lie algebra, Jordan algebra,
Bilinear form.
Top of page
Conference Proceedings
-
"Formulations for Surrogate-Based Optimization with Data Fit, Multifidelity, and Reduced-Order Models."
Michael S. Eldred and Daniel M Dunlavy. AIAA-2006-7117,
Proceedings of the 11th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference,
September 2006.
[PDF]
- Abstract: Surrogate-based optimization (SBO)
methods have become established as effective techniques for
engineering design problems through their ability to tame
nonsmoothness and reduce computational expense. Possible surrogate
modeling techniques include data fits (local, multipoint, or global),
multifidelity model hierarchies, and reduced-order models, and each of
these types has unique features when employed within SBO. This paper
explores a number of SBO algorithmic variations and their effect for
different surrogate modeling cases. First, general facilities for
constraint management are explored through approximate subproblem
formulations (e.g., direct surrogate), constraint relaxation
techniques (e.g., homotopy), merit function selections (e.g.,
augmented Lagrangian), and iterate acceptance logic selections
(e.g., filter methods). Second, techniques specialized to particular
surrogate types are described. Computational results are presented for
sets of algebraic test problems and an engineering design application
solved using the DAKOTA software.
-
"From TREC to DUC to TREC Again."
John M. Conroy, Dianne P. O'Leary, and Daniel M. Dunlavy.
Proceedings of the Twelfth Text Retrieval Conference (TREC),
November 2003.
[PDF]
- Abstract: The Document Understanding Conference
(DUC) uses TREC data as a test bed for algorithms for single and
multiple document summarization. For the 2003 DUC task of choosing
relevant and novel sentences, we tested a system based on a Hidden
Markov Model (HMM). In this work, we use variations of this system on
the tasks of the TREC Novelty Track for finding relevant and new
sentences.
-
"Performance of a Three-Stage System for Multi-Document Summarization."
Daniel M. Dunlavy, John M. Conroy, Judith D. Schlesinger, Sarah A. Goodman, Mary Ellen Okurowski, Dianne P. O'Leary, and Han van Halteren.
Proceedings of the Document Understanding Conference (DUC),
June 2003.
[PS]
- Abstract: Our participation in DUC 2003 was
limited to Tasks 2, 3, and 4. Although the tasks differed slightly in
their goals, we applied the same approach in each case: preprocess the
data for input to our system, apply our single-document and
multi-document summarization algorithms, post-process the data for
DUC evaluation. We did not use the topic descriptions for Task 2 or
the viewpoint descriptions for Task 3, and used only the novel
sentences for Task 4. The preprocessing of the data for our needs
consisted of term identification, part-of-speech (POS) tagging,
sentence boundary detection and SGML DTD processing. With the exception
of sentence boundary detection for Task 4 (the test data was
sentence-delimited using SGML tags), each of these preprocessing tasks
was performed on all of the documents. The summarization algorithms
were enhanced versions of those presented by members of our group in
the past DUC evaluations (Conroy et al., 2001; Schlesinger et al.,
2002). Previous post-processing consisted of removing lead adverbs
such as "And" or "But" to make our summaries flow more easily. For
DUC 2003, we added more extensive editing, eliminating part or all
of selected sentences.
-
"QCS: A Tool for Querying, Clustering, and Summarizing Documents."
Daniel M. Dunlavy, John M. Conroy and Dianne P. O'Leary.
Proceedings of the HLT-NAACL Conference,
June 2003.
[PDF]
- Abstract: The QCS information retrieval (IR)
system is presented as a tool for querying, clustering, and
summarizing document sets. QCS has been developed as a modular
development framework, and thus facilitates the inclusion of new
technologies targeting these three IR tasks. Details of the system
architecture, the QCS interface, and preliminary results are
presented.
Top of page
Technical Reports
-
"QCS: A System for Querying, Clustering and Summarizing Documents."
Daniel M. Dunlavy, Dianne P. O'Leary, John M. Conroy, and Judith D. Schlesinger.
Sandia National Laboratories Technical Report, SAND2006-5000,
October 2006.
[PDF]
- Newer version available
-
"DAKOTA, A Multilevel Parallel Object-Oriented Framework for Design Optimization, Parameter Estimation, Uncertainty Quantification, and Sensitivity Analysis: Version 4.0"
Eldred, M.S., Brown, S.L., Adams, B.M., Dunlavy, D.M., Gay, D.M., Swiler, L.P., Giunta, A.A., Hart, W.E., Watson, J.-P., Eddy, J.P., Griffin, J.D., Hough, P.D., Kolda, T.G., Martinez-Canales, M.L. and Williams, P.J.
Sandia National Laboratories Technical Reports, September-October 2006.
[Users Manual (SAND2006-6637)]
[Developers Manual (SAND2006-4056)]
[Reference Manual (SAND2006-4055)]
- Abstract:
The DAKOTA (Design Analysis Kit for Optimization and Terascale
Applications) toolkit provides a exible and extensible interface
between simulation codes and iterative analysis methods. DAKOTA
contains algorithms for optimization with gradient and
nongradient-based methods; uncertainty quantication with sampling,
reliability, and stochastic nite element methods; parameter estimation
with nonlinear least squares methods; and sensitivity/variance
analysis with design of experiments and parameter study methods. These
capabilities may be used on their own or as components within advanced
strategies such as surrogate-based optimization, mixed integer
nonlinear programming, or optimization under uncertainty. By employing
object-oriented design to implement abstractions of the key components
required for iterative systems analyses, the DAKOTA toolkit provides a
exible and extensible problem-solving environment for design and
performance analysis of computational models on high performance
computers.
-
"Multilinear Algebra for Analyzing Data with Multiple Linkages."
Daniel M. Dunlavy, Tamara G. Kolda, and W. Philip Kegelmeyer.
Sandia National Laboratories Technical Report, SAND2006-2079,
April 2006.
[PDF]
- Abstract: Link analysis typically focuses on a single
type of connection, e.g., two journal papers are linked because they
are written by the same author. However, often we want to analyze data
that has multiple linkages between objects, e.g., two papers may have
the same keywords and one may cite the other. The goal of this paper
is to show that multilinear algebra provides a tool for multi-link
analysis. We analyze five years of publication data from journals
published by the Society for Industrial and Applied Mathematics. We
explore how papers can be grouped in the context of multiple link
types using a tensor to represent all the links between them. A
PARAFAC decomposition on the resulting tensor yields information
similar to the SVD decomposition of a standard adjacency matrix. We
show how the PARAFAC decomposition can be used to understand the
structure of the document space and define paper-paper similarities
based on multiple linkages. Examples are presented where the
decomposed tensor data is used to find papers similar to a body of
work (e.g., related by topic or similar to a particular author's
papers), find related authors using linkages other than explicit
co-authorship or citations, distinguish between papers written by
different authors with the same name, and predict the journal in which
a paper was published.
-
"Homotopy Optimization Methods for Global Optimization."
Daniel M. Dunlavy and Dianne P. O'Leary.
Sandia National Laboratories Technical Report, SAND2005-7495,
December 2005.
[PDF]
- Abstract:
We define a new method for global optimization, the Homotopy
Optimization Method (HOM). This method differs from previous homotopy
and continuation methods in that its aim is to find a minimizer for
each of a set of values of the homotopy parameter, rather than to
follow a path of minimizers. We define a second method, called HOPE,
by allowing HOM to follow an ensemble of points obtained by
perturbation of previous ones. We relate this new method to standard
methods such as simulated annealing and show under what circumstances
it is superior. We present results of extensive numerical experiments
demonstrating performance of HOM and HOPE.
-
"Homotopy Optimization Methods and Protein Structure Prediction."
Daniel M. Dunlavy.
Ph.D. Dissertation, AMSC Program, University of Maryland,
August 2005.
[Request]
- Abstract:
The focus of this dissertation is a new method for solving
unconstrained minimization problems---homotopy optimization using
perturbations and ensembles (HOPE). HOPE is a homotopy optimization
method that finds a sequence of minimizers of a homotopy function that
maps a template function to the target function, the function from our
minimization problem. To increase the likelihood of finding a global
minimizer, points in the sequence are perturbed and used as starting
points to find other minimizers. Points in the resulting ensemble of
minimizers are used as starting points to find minimizers of the
homotopy function as it deforms the template function into the target
function.
We show that certain choices of the parameters used in HOPE lead to
instances of existing methods: probability-one homotopy methods,
stochastic search methods, and simulated annealing. We use these
relations and further analysis to demonstrate the convergence
properties of HOPE.
The development of HOPE was motivated by the protein folding problem,
the problem of predicting the structure of a protein as it exists in
nature, given its amino acid sequence. However, we demonstrate that
HOPE is also successful as a general purpose minimization method for
nonconvex functions.
Numerical experiments performed to test HOPE include solving several
standard test problems and the protein folding problem using two
different protein models. In the first model, proteins are modeled as
chains of charged particles in two dimensions. The second is a
backbone protein model, where the particles represent amino acids,
each corresponding to a hydrophobic, hydrophilic, or neutral
residue. In most of these experiments, standard homotopy functions are
used in HOPE. Additionally, several new homotopy functions are
introduced for solving the protein folding problems to demonstrate how
HOPE can be used to exploit the properties or structure of particular
problems.
Results of experiments demonstrate that HOPE outperforms several
methods often used for solving unconstrained minimization problems---a
quasi-Newton method with BFGS Hessian update, a globally convergent
variant of Newton's method, and ensemble-based simulated annealing.
-
"QCS: An Information Retrieval System for Improving Efficiency in Scientific Literature Searches."
Daniel M. Dunlavy.
M.S. Scholarly Paper,
Applied Mathematics and Scientific Computation Program, University of Maryland,
December 2003.
[PDF]
- Abstract:
Conducting scientific research most often involves a search through
existing literature in order to avoid repeating research efforts,
review methods already developed for solving a problem, gain a better
understanding of a problem, etc. Typically, this search is performed
using the Internet, which is a convenient portal to various databases
of books, journal articles, technical reports, preprints, etc. The
Query, Cluster, Summarize (QCS) information retrieval system is
presented in an attempt to improve efficiency in these literature
searches. Given a query, QCS retrieve documents relevant to the
query, separates the retrieved documents into topic clusters, and
creates a single summary for each of topic clusters. Latent Semantic
Indexing is used retrieval, generalized spherical k-means (gmeans)
is used for the document clustering, and a hidden Markov model coupled
with a pivoted QR decomposition is used to create a single extract
summary for each topic cluster. Algorithm and implementation details
of the current version of the QCS system, QCS v1.0, are presented, and
a description of the user interface to the system is discussed.
Examples of the use of QCS v1.0 are presented using data from the
Document Understanding Conferences, a conference series dedicated to
furthering progress in the area of automatic summarization.
-
"Structure Preserving Algorithms for Perplectic Eigenproblems."
D. Steven Mackey, Niloufer Mackey, and Daniel M. Dunlavy.
Numerical Analysis Report No. 427,
Manchester Centre for Computational Mathematics, Manchester, England,
May 2003.
[PDF]
- Newer version available
-
"Numerical Steady-State Solutions of Non-Linear DAE's Arising in RF Communication Circuit Design."
Danny Dunlavy, Sookhyung Joo, Runchang Lin, Roummel Marcia, Aurelia Minut, and Jianzhong Sun (Robert Melville, mentor).
IMA Preprint Series 1752-1,
February 2001.
[PDF]
- Abstract:
Large systems of coupled non-linear differential algebraic equations (DAE) arise naturally in
applications areas like in the design of radio-frequency integrated circuits. The steady-state
response of a non-linear system to periodic or quasi-periodic stimulus is of primary interest to
a designer because certain aspects of system performance are easier to characterize and verify
in steady state. For example: noise, distortion, blocking are best measured when a circuit is in
this state. The system of equations generated in circuit design has the following form,
"f(v(t)) + d/dt q(v(t)) - b(t) = 0"
where m is the number of circuit nodes excluding the reference, q(v(t)) is the m-vector of
sums of capacitor charges at each node, f(v(t)) is the m-vector of sums of resistor currents
at each node, b(t) is the m-vector of input currents, and v(t) is the m-vector of node
voltages. "Closed form" solutions to these DAE's are extremely difficult, if not impossible, to
obtain because of the size of the problem and the complexity of non-linear models. Computing
the solutions numerically is a highly effective alternative to computing the solutions analytically.
Top of page
|
|