
Multilingual Statistical News Summarization
In this chapter we present a generic approach for summarizing clusters ofmultilingual news articles such as the ones produced by the Europe Media Monitor(EMM) system. Our approach uses robust statistical techniques as well as multilingualtools for named entity recognition and disambiguation to produce entitycenteredsummaries. We run experiments with the TAC 2008 and 2009 data sets(English corpora for summarization research), and we obtained very promising results;at TAC 2009 our runs attained top rank for linguistic quality and second bestfor overall responsiveness. We also run a small-scale evaluation on languages otherthan English, demonstrating thereby the multilinguality of our approach, but alsoproviding interesting evidence that contradicts the pervasive assumption “if it worksfor English, it works for any language”. Finally, we present an online system currentlyunder development which will eventually incorporate all the elements of thesummarization approach discussed hereby and we show sample output summariesin various languages.
Keywords: News, summarisation, multilingual
Year: 2013

Authors of this publication:

Josef Steinberger
E-mail: jstein@kiv.zcu.cz
Related Projects:

Automatic Text Summarisation | |
Authors: | Josef Steinberger, Karel Ježek, Michal Campr, Jiří Hynek |
Desc.: | Automatic text summarisation using various text mining methods, mainly Latent Semantic Analysis (LSA). |