Aspects of Multilingual News Summarisation

Aspects of Multilingual News Summarisation

In this chapter, the authors discuss several pertinent aspects of an automatic system that generates summaries in multiple languages for sets of topic-related news articles (multilingual multi-document summarisation), gathered by news aggregation systems. The discussion follows a framework based on Latent Semantic Analysis (LSA) because LSA was shown to be a high-performing method across many different languages. Starting from a sentence-extractive approach, the authors show how domain-specific aspects can be used and how a compression and paraphrasing method can be plugged in. They also discuss the challenging problem of summarisation evaluation in different languages. In particular, the authors describe two approaches: the first uses a parallel corpus and the second statistical machine translation.

Keywords: multilingual summarisation, aspect-driven summarisation, latent semantic analysis, multilingual summarisation evaluation, parallel corpus

Year: 2014

Download: download Full text [586 kB]

Authors of this publication:


Josef Steinberger


E-mail: jstein@kiv.zcu.cz

Josef is an associated professor at the Department of computer science and engineering at the University of West Bohemia in Pilsen, Czech Republic. He is interested in media monitoring and analysis, mainly automatic text summarisation, sentiment analysis and coreference resolution.

Related Projects:


Project

Automatic Text Summarisation

Authors:  Josef Steinberger, Karel Ježek, Michal Campr, Jiří Hynek
Desc.:Automatic text summarisation using various text mining methods, mainly Latent Semantic Analysis (LSA).