Text Summarization within the LSA Framework

Text Summarization within the LSA Framework

This thesis deals with the development of a new text summarization method that uses the latent semantic analysis (LSA). The language-independent analysis is able to capture interrelationships among terms, so that we can obtain a representation of document topics. This feature is exploited by the proposed summarization approach. The method originally combines both lexical and anaphoric information. Moreover, anaphora resolution is employed in correcting false references in the summary. Then, I describe a new sentence compression algorithm that takes advantage from the LSA properties.Next, I created a method which evaluates the similarity of main topics of an original text and its summary, motivated by the ability of LSA to extracttopics of a text. Using summaries in multilingual searching system muse led to better user orientation in the retrieved texts and to faster searching when summaries were indexed instead of full texts.

Keywords: Summarization, latent semantic analysis, anaphora resolution, sentence compression, summary evaluation, multilingual searching

Year: 2007

Download: download Full text [994 kB]

Authors of this publication:


Josef Steinberger


E-mail: jstein@kiv.zcu.cz

Josef is an associated professor at the Department of computer science and engineering at the University of West Bohemia in Pilsen, Czech Republic. He is interested in media monitoring and analysis, mainly automatic text summarisation, sentiment analysis and coreference resolution.

Related Projects:


Project

Automatic Text Summarisation

Authors:  Josef Steinberger, Karel Ježek, Michal Campr, Jiří Hynek
Desc.:Automatic text summarisation using various text mining methods, mainly Latent Semantic Analysis (LSA).