
Exploration of Semantic Spaces Obtained from Czech Corpora
This paper is focused on semantic relations between Czech words. Knowledge of these relations is crucial in many research fields such as information retrieval, machine translation or document clustering. We obtained these relations from newspaper articles. With the help of LSA, HAL and COALS algorithms, many semantic spaces were generated. Experiments were conducted on various settings of parameters and on different ways of corpus preprocessing. The preprocessing included lemmatization and an attempt to use only "open class" words. The computed relations between words were evaluated using the Czech equivalent of the Rubenstein-Goodenough test. The results of our experiments can serve as the clue whether the algorithms (LSA, HAL and COALS) originally developed for English can be also used for Czech texts.
Keywords: Information retrieval, Semantic space, LSA, HAL, COALS,Rubenstein-Goodenough test
Year: 2011

Authors of this publication:
LubomÃr KrÄmář
E-mail: lkrcmar@kiv.zcu.cz

Karel Ježek
Phone: +420 377632475
E-mail: jezek_ka@kiv.zcu.cz
WWW: https://cs.wikipedia.org/wiki/Karel_Je%C5%BEek_(informatik)
Related Projects:

Exploration of Semantic Spaces | |
Authors: | Karel Ježek, LubomÃr KrÄmář, Miloslav KonopÃk |
Desc.: | This work is focused on semantic relations between words and application of these relations in research fields such as information retrieval, machine translation or document clustering. |