Sentence Compression for the LSA-based Summarizer

Sentence Compression for the LSA-based Summarizer

We present a simple sentence compression approach for our summarizer based on latentsemantic analysis (LSA). The summarization method assesses each sentence by an LSA score.The compression algorithm removes unimportant clauses from a full sentence. Firstly, a sentence isdivided into clauses by Charniak parser, then compression candidates are generated and finally, thebest candidate is selected to represent the sentence. The candidates gain an importance score whichis directly proportional to its LSA score and indirectly to its length. We evaluated the approachin two ways. By intrinsic evaluation we found that the compressions produced by our algorithmare better than baseline ones but still worse than what humans can make. Then we compared theresulting summaries with human abstracts by a standard n-gram based ROUGE measure.

Keywords: Sentence compression, summarization, latent semantic analysis

Year: 2006

Download: download Full text [152 kB]

Authors of this publication:

Josef Steinberger


Josef is an associated professor at the Department of computer science and engineering at the University of West Bohemia in Pilsen, Czech Republic. He is interested in media monitoring and analysis, mainly automatic text summarisation, sentiment analysis and coreference resolution.

Karel Je┼żek

Phone:  +420 377632475

Karel is the former group coordinator and a supervisor of PhD students working at research projects of this Group.

Related Projects:


Automatic Text Summarisation

Authors:  Josef Steinberger, Karel Je┼żek, Michal Campr, Ji┼Ö├ş Hynek
Desc.:Automatic text summarisation using various text mining methods, mainly Latent Semantic Analysis (LSA).