Using Parallel Corpora for Multilingual (Multi-Document) Summarisation Evaluation

Using Parallel Corpora for Multilingual (Multi-Document) Summarisation Evaluation

We are presenting a method for the evaluation of multilin-gual multi-document summarisation that allows saving precious annota-tion time and that makes the evaluation results across languages directlycomparable. The approach is based on the manual selection of the mostimportant sentences in a cluster of documents from a sentence-alignedparallel corpus, and by projecting the sentence selection to various targetlanguages. We also present two ways of exploiting inter-annotator agree-ment levels, apply them both to a baseline sentence extraction sum-mariser in seven languages, and discuss the result differences betweenthe two evaluation versions, as well as a preliminary analysis betweenlanguages. The same method can in principle be used to evaluate single-document summarisers or information extraction tools.

Year: 2010

Journal ISSN: 0302-9743
Download: download Full text 
View record in Web of Science®

Authors of this publication:


Josef Steinberger


E-mail: jstein@kiv.zcu.cz

Josef is an associated professor at the Department of computer science and engineering at the University of West Bohemia in Pilsen, Czech Republic. He is interested in media monitoring and analysis, mainly automatic text summarisation, sentiment analysis and coreference resolution.

Related Projects:


Project

Automatic Text Summarisation

Authors:  Josef Steinberger, Karel Ježek, Michal Campr, Jiří Hynek
Desc.:Automatic text summarisation using various text mining methods, mainly Latent Semantic Analysis (LSA).