Text-Mining Research Group

Multilingual Statistical News Summarization

In this chapter we present a generic approach for summarizing clusters ofmultilingual news articles such as the ones produced by the Europe Media Monitor(EMM) system. Our approach uses robust statistical techniques as well as multilingualtools for named entity recognition and disambiguation to produce entitycenteredsummaries. We run experiments with the TAC 2008 and 2009 data sets(English corpora for summarization research), and we obtained very promising results;at TAC 2009 our runs attained top rank for linguistic quality and second bestfor overall responsiveness. We also run a small-scale evaluation on languages otherthan English, demonstrating thereby the multilinguality of our approach, but alsoproviding interesting evidence that contradicts the pervasive assumption “if it worksfor English, it works for any language”. Finally, we present an online system currentlyunder development which will eventually incorporate all the elements of thesummarization approach discussed hereby and we show sample output summariesin various languages.

Keywords: News, summarisation, multilingual

Year: 2013

Journal ISSN: 2192-032X

Download:

Full text

Authors of this publication:

Mijail Alexandrov Kabadjov

Josef Steinberger

E-mail: jstein@kiv.zcu.cz

Josef is an associated professor at the Department of computer science and engineering at the University of West Bohemia in Pilsen, Czech Republic. He is interested in media monitoring and analysis, mainly automatic text summarisation, sentiment analysis and coreference resolution.

Ralf Steinberger

Related Projects:

Automatic Text Summarisation
Authors:	Josef Steinberger, Karel Ježek, Michal Campr, Jiří Hynek
Desc.:	Automatic text summarisation using various text mining methods, mainly Latent Semantic Analysis (LSA).