Title: Achieving Human Performance for Multi-Lingual Multi-Document Summarization

Speaker: John Conroy, IDA Center for Computing Sciences

Date/Time: Tuesday, April 10, 2007, 9:00am – 10:00am

Location: CSRI Building/Room 90 (Sandia NM)

Brief Abstract: Given a group of approximately 10 topically related documents in English and Arabic, compose a 100-word resume of that topic, capturing the important people, places, and details surrounding the topic event. This was the task of the 2005 and 2006 Multi-Lingual Summarization Evaluation. In this talk, I will describe a computational approach to this problem which performs at human performance levels as measured by both automatic and human evaluation.
The approach consists of three stages: a linguistic step to identify and shorten the original sentences, a statistical approach of identifying sentences with the largest expected number of terms which would appear in a human abstract, and a linear algebraic approach for selecting a non-redundant subset of the sentences with good coverage of the important terms.
See http://research.microsoft.com/~lucyv/MSE2006.htm for more information about the Multilingual Summarization Evaluation.

CSRI POC: Danny Dunlavy, (505) 284-6092


©2005 Sandia Corporation | Privacy and Security | Maintained by Bernadette Watts and Deanna Ceballos