Web Topic Summarization
In this paper, we present our online summarization system of web topics. The user defines the topic by a set of keywords. Then the system searches the Web for the relevant documents. The top ranked documents are returned and passed on to the summarization component. The summarizer produces a summary which is finally shown to the user. The proposed architecture is fully modular. This enables us to quickly substitute a new version of any module and thus the quality of the system’s output will get better with module improvements. The crucial module which extracts the most important sentences from the documents is based on the latent semantic analysis. Its main property is independency of the language of the source documents. In the system interface, one can choose to search a news site in English or Czech. The results show a very good search quality. Most of the retrieved documents are fully relevant, only a few being marginally relevant. The summarizer is comparable to state-of-the-art systems.
Keywords: Information retrieval; searching; summarization; latent semantic analysis
Year: 2008
Authors of this publication:
Josef Steinberger
E-mail: jstein@kiv.zcu.cz
Karel Ježek
Phone: +420 377632475
E-mail: jezek_ka@kiv.zcu.cz
WWW: https://cs.wikipedia.org/wiki/Karel_Je%C5%BEek_(informatik)
Martin Sloup
E-mail: msloup@students.zcu.cz
Related Projects:
Automatic Text Summarisation | |
Authors: | Josef Steinberger, Karel Ježek, Michal Campr, Jiří Hynek |
Desc.: | Automatic text summarisation using various text mining methods, mainly Latent Semantic Analysis (LSA). |