Multilingual Statistical News Summarisation: Preliminary Experiments with English

In this paper we present a generic approachfor summarising multilingual news clusters such as the ones produced by the Europe Media Monitor (EMM) system. It is generic because it uses robust statistical techniques to perform the summarisation step and its multilinguality is inherited fromthe multilingual entity disambiguation system used to buildthe source representation. We ran preliminary experimentswith the TAC 2008 data, an English corpus for summarisationresearch, and we obtained promising improvements over asummarisation system ranked in the top 20% at the TAC 2008competition.

Keywords: Text Summarization; Multilingual Text Mining;Entity Disambiguation; Latent Semantic Analysis;

Year: 2009

Authors of this publication:

Josef Steinberger


Josef is an associated professor at the Department of computer science and engineering at the University of West Bohemia in Pilsen, Czech Republic. He is interested in media monitoring and analysis, mainly automatic text summarisation, sentiment analysis and coreference resolution.

