Documents Categorization in Multilingual Environment

Documents Categorization in Multilingual Environment

This paper deals with various methods for multilingual document categorization and informs about the results of experiments in which EuroWordNet (EWN) plays the central role and serves as a fundamental problem solving tool. We describe both the algorithmic principles and the methodologies used in our classification system and consequently prove their functionality by experimental results. The aim of experiments was to verify the impact of multilingual collection on the quality of categorization and also find how thesaurus can be used to improve the classification and how the use of multilingual thesaurus can generalize monolingual version of categorization.

Keywords: multilingual document categorization, EuroWordNet

Year: 2005

Download: download Full text [210 kB]

Authors of this publication:


Karel Ježek


Phone:  +420 377632475, 377632400
E-mail: jezek_ka@kiv.zcu.cz
WWW: http://www-kiv.zcu.cz/~jezek_ka/

Karel is a group coordinator and a supervisor of PhD students working at research projects of this Group.

Michal Toman


E-mail: mtoman@kiv.zcu.cz

Michal graduated at UWB in 2003, specialized in software engineering. Currently, he is a PhD student interested in information retrieval, multilingual text processing, word sense disambiguation and knowledge discovery.

Related Projects:


Project

Document Classification

Authors:  Jiří Hynek, Karel Ježek, Michal Toman, Roman Tesař, Zdeněk Češka, Petr Grolmus
Desc.:Use of inductive machine learning methods in classification of short text documents.