
In Czech: Automatická klasifikace dokumentů do tříd za použití metody Itemsets
Automatic Document Classification into Categories using the Itemsets Method
The essential point of this paper is to develop a method for automating time-consuming document classification in a digital library. The original method proposed in this paper is based on itemsets, extending traditional application of the Apriori algorithm. It is suitable for automatic classification of short documents (abstracts, summaries) impeding usage of repeated occurrence of terms, such as in term-frequency-based methods. The paper presents basic principles of this method as well as results of its practical use. High success rate of the classification algorithm allows its usage in real-life environment. The method will become an integral part of the information system of a regional utility company.
Keywords: itemset, classification, Apriori algorithm, similarity, electronic library
Year: 2001

Authors of this publication:

Jiří Hynek
Phone: +420 603492837
E-mail: jhynek@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/staff/osobni.php?id_osoby=147&lang=EN

Karel Ježek
Phone: +420 377632475
E-mail: jezek_ka@kiv.zcu.cz
WWW: https://cs.wikipedia.org/wiki/Karel_Je%C5%BEek_(informatik)