In Czech: Automatick├í klasifikace dokument┼» do t┼Ö├şd za pou┼żit├ş metody Itemsets

In Czech: Automatick├í klasifikace dokument┼» do t┼Ö├şd za pou┼żit├ş metody Itemsets

Automatic Document Classification into Categories using the Itemsets Method

The essential point of this paper is to develop a method for automating time-consuming document classification in a digital library. The original method proposed in this paper is based on itemsets, extending traditional application of the Apriori algorithm. It is suitable for automatic classification of short documents (abstracts, summaries) impeding usage of repeated occurrence of terms, such as in term-frequency-based methods. The paper presents basic principles of this method as well as results of its practical use. High success rate of the classification algorithm allows its usage in real-life environment. The method will become an integral part of the information system of a regional utility company.

Keywords: itemset, classification, Apriori algorithm, similarity, electronic library

Year: 2001

Download: download Full text [272 kB]

Authors of this publication:


Ji┼Ö├ş Hynek


Phone: +420 603492837
E-mail: jhynek@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/staff/osobni.php?id_osoby=147&lang=EN

Jiri, a co-founder of the Text-Mining Research Group, works as a lecturer at the Dept. of Computer Science and Engineering. His research interests include machine learning and language-related problems. JiriÔÇÖs teaching activity is focused on good writing style and technical writing in general.

Karel Je┼żek


Phone:  +420 377632475, 377632400
E-mail: jezek_ka@kiv.zcu.cz
WWW: http://www-kiv.zcu.cz/~jezek_ka/

Karel is a group coordinator and a supervisor of PhD students working at research projects of this Group.