
Short Document Categorization - Itemsets Method
A method for automating time-consuming document categorization in a digital library. The method proposed in this paper is based on itemsets, extending traditional application of the apriori algorithm. It is suitable for automatic categorization of short documents (abstracts, summaries) impeding usage of repeated occurrence of terms, such as in term-frequency-based methods. The paper presents basic principles of this method as well as preliminary results of an on-going research. The method is designed to fit to an extensive commercial application
Keywords: itemset, classification, class generation, cluster, clustering, apriori algorithm, document similarity, document categorization, electronic library, digital library
Year: 2000

Authors of this publication:

Jiřà Hynek
Phone: +420 603492837
E-mail: jhynek@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/staff/osobni.php?id_osoby=147&lang=EN

Karel Ježek
Phone: +420 377632475
E-mail: jezek_ka@kiv.zcu.cz
WWW: https://cs.wikipedia.org/wiki/Karel_Je%C5%BEek_(informatik)