Web Topic Summarization

Web Topic Summarization

In this paper, we present our online summarization system of web topics. The user defines the topic by a set of keywords. Then the system searches the Web for the relevant documents. The top ranked documents are returned and passed on to the summarization component. The summarizer produces a summary which is finally shown to the user. The proposed architecture is fully modular. This enables us to quickly substitute a new version of any module and thus the quality of the system’s output will get better with module improvements. The crucial module which extracts the most important sentences from the documents is based on the latent semantic analysis. Its main property is independency of the language of the source documents. In the system interface, one can choose to search a news site in English or Czech. The results show a very good search quality. Most of the retrieved documents are fully relevant, only a few being marginally relevant. The summarizer is comparable to state-of-the-art systems.

Keywords: Information retrieval; searching; summarization; latent semantic analysis

Year: 2008

Download: download Full text [446 kB]

Authors of this publication:


Josef Steinberger


E-mail: jstein@kiv.zcu.cz

Josef is an associated professor at the Department of computer science and engineering at the University of West Bohemia in Pilsen, Czech Republic. He is interested in media monitoring and analysis, mainly automatic text summarisation, sentiment analysis and coreference resolution.

Karel Ježek


Phone:  +420 377632475, 377632400
E-mail: jezek_ka@kiv.zcu.cz
WWW: http://www-kiv.zcu.cz/~jezek_ka/

Karel is a group coordinator and a supervisor of PhD students working at research projects of this Group.

Martin Sloup


E-mail: msloup@students.zcu.cz

Martin finished the bachelor degree at University of West Bohemia in 2008. Now, he studies the master degree.

Related Projects:


Project

Automatic Text Summarisation

Authors:  Josef Steinberger, Karel Ježek, Michal Campr, Jiří Hynek
Desc.:Automatic text summarisation using various text mining methods, mainly Latent Semantic Analysis (LSA).