Text-Mining Research Group

Short Document Categorization - Itemsets Method

A method for automating time-consuming document categorization in a digital library. The method proposed in this paper is based on itemsets, extending traditional application of the apriori algorithm. It is suitable for automatic categorization of short documents (abstracts, summaries) impeding usage of repeated occurrence of terms, such as in term-frequency-based methods. The paper presents basic principles of this method as well as preliminary results of an on-going research. The method is designed to fit to an extensive commercial application

Keywords: itemset, classification, class generation, cluster, clustering, apriori algorithm, document similarity, document categorization, electronic library, digital library

Year: 2000

Download:

Full text [57 kB]

Authors of this publication:

Jiří Hynek

Phone: +420 603492837
E-mail: jhynek@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/staff/osobni.php?id_osoby=147&lang=EN

Jiri, a co-founder of the Text-Mining Research Group, works as a lecturer at the Dept. of Computer Science and Engineering. His research interests include machine learning and language-related problems. Jiri’s teaching activity is focused on good writing style and technical writing in general.

Karel Ježek

Phone: +420 377632475
E-mail: jezek_ka@kiv.zcu.cz
WWW: https://cs.wikipedia.org/wiki/Karel_Je%C5%BEek_(informatik)

Karel is the former group coordinator and a supervisor of PhD students working at research projects of this Group.

Ondřej Rohlík

E-mail: rohlik@kiv.zcu.cz