
Multi-document multilingual summarization corpus preparation, Part 2: Czech, Hebrew and Spanish
This document overviews the strategy, effort and aftermath of the MultiLing 2013 multilingual summarization data collection. We describe how the Data Contributors of MultiLing collected and generated a multilingual multi-document summarization corpus on 10 different languages: Arabic, Chinese, Czech, English, French, Greek, Hebrew, Hindi, Romanian and Spanish. We discuss the rationale behind the main decisions of the collection, the methodology used to generate the multilingual corpus, as well as challenges and problems faced per language. This paper overviews the work on Czech, Hebrew and Spanish languages.
Keywords: summarization, summarization evaluation, multilinguality
Year: 2013

Authors of this publication:

Josef Steinberger
E-mail: jstein@kiv.zcu.cz
Related Projects:

Automatic Text Summarisation | |
Authors: | Josef Steinberger, Karel Ježek, Michal Campr, Jiří Hynek |
Desc.: | Automatic text summarisation using various text mining methods, mainly Latent Semantic Analysis (LSA). |