Title: Using Visualization for Relevancy Feedback Tuning of Text Analysis Algorithms Speaker: Pat Crossno, Sandia National Laboratories Date/Time: Thursday, April 17, 2008, 2:00-3:00pm Location: CSRI Building, Room 90 (Sandia NM) Brief Abstract: The volume of data contained in textual form is enormous. Automated and scalable methods are needed to evaluate the contents of document collections without reading them. The ParaText project at Sandia National Laboratories is creating a scalable text analysis engine that uses statistical methods, such as Latent Semantic Analysis (LSA), to evaluate the concepts found within a large corpus of documents. Using LSA to extract concepts and relationships between documents, the corpus can be interactively explored through a visual application where documents are grouped by concept within a landscape metaphor. As we are developing ParaText, we are using visualization to assess the impact of various algorithmic choices on the relevancy of documents returned by queries to the engine (i.e. we are assessing how the document-concept relationships change with changing parameter values). We have created a visual analytics tool, LSAView, for presenting statistical information and correlations. LSAView uses multiple-linked views of document-similarity graphs and the difference matrices between them to enable exploration of various configurations. LSAView is a work in progress, so we are just starting to use it to evaluate questions about how altering the statistical bias of our matrices impacts selection retrieval.CSRI POC: Daniel Dunlavy, (505) 284-6092 |