In Czech: Automatická klasifikace dokumentů do tříd za použití metody Itemsets

In Czech: Automatická klasifikace dokumentů do tříd za použití metody Itemsets

Automatic Document Classification into Categories using the Itemsets Method

The essential point of this paper is to develop a method for automating time-consuming document classification in a digital library. The original method proposed in this paper is based on itemsets, extending traditional application of the Apriori algorithm. It is suitable for automatic classification of short documents (abstracts, summaries) impeding usage of repeated occurrence of terms, such as in term-frequency-based methods. The paper presents basic principles of this method as well as results of its practical use. High success rate of the classification algorithm allows its usage in real-life environment. The method will become an integral part of the information system of a regional utility company.

Keywords: itemset, classification, Apriori algorithm, similarity, electronic library

Year: 2001

Download: download Full text [272 kB]

Authors of this publication:


Jiří Hynek


Phone: +420 603492837
E-mail: jhynek@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/staff/osobni.php?id_osoby=147&lang=EN

Jiri, a co-founder of the Text-Mining Research Group, works as a lecturer at the Dept. of Computer Science and Engineering. His research interests include machine learning and language-related problems. Jiri’s teaching activity is focused on good writing style and technical writing in general.

Karel Ježek


Phone:  +420 377632475
E-mail: jezek_ka@kiv.zcu.cz
WWW: https://cs.wikipedia.org/wiki/Karel_Je%C5%BEek_(informatik)

Karel is the former group coordinator and a supervisor of PhD students working at research projects of this Group.