Short Document Categorization - Itemsets Method

Short Document Categorization - Itemsets Method

A method for automating time-consuming document categorization in a digital library. The method proposed in this paper is based on itemsets, extending traditional application of the apriori algorithm. It is suitable for automatic categorization of short documents (abstracts, summaries) impeding usage of repeated occurrence of terms, such as in term-frequency-based methods. The paper presents basic principles of this method as well as preliminary results of an on-going research. The method is designed to fit to an extensive commercial application

Keywords: itemset, classification, class generation, cluster, clustering, apriori algorithm, document similarity, document categorization, electronic library, digital library

Year: 2000

Download: download Full text [57 kB]

Authors of this publication:


Jiří Hynek


Phone: +420 603492837
E-mail: jhynek@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/staff/osobni.php?id_osoby=147&lang=EN

Jiri, a co-founder of the Text-Mining Research Group, works as a lecturer at the Dept. of Computer Science and Engineering. His research interests include machine learning and language-related problems. Jiri’s teaching activity is focused on good writing style and technical writing in general.

Karel Ježek


Phone:  +420 377632475, 377632400
E-mail: jezek_ka@kiv.zcu.cz
WWW: http://www-kiv.zcu.cz/~jezek_ka/

Karel is a group coordinator and a supervisor of PhD students working at research projects of this Group.