
Use of Text Mining Methods in a Digital Library
The article deals with use of Itemsets classifier based on inductive machine learning in the context of digital library environment. We provide a brief description of a real-world digital library implemented at a power utility. Its implementation and operating experience have motivated our research in inductive machine learning methods for text mining described in the paper. Being inspired by mining of association rules, we have developed a new categorization method named “Itemsets classifier”. By performing various experiments we have proved its ability to surpass some well-known categorization methods, both in terms of precision/recall and efficiency. As the task of classification is closely related to clustering, we have integrated the principles of Itemsets method into a new document-clustering algorithm as well. We are also presenting other Itemsets classifier applications in unsolicited mail filtering and enhancement of the Naïve Bayes classifier. Main ideas and experimental results are presented in the paper.Copyright for the full paper: Verlag für Wissenschaft und Forschung, VWF, Berlin, Germany.
Keywords: classification, clustering, categorization, classifier, spam filter, machine learning
Year: 2002

Authors of this publication:

Jiří Hynek
Phone: +420 603492837
E-mail: jhynek@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/staff/osobni.php?id_osoby=147&lang=EN

Karel Ježek
Phone: +420 377632475
E-mail: jezek_ka@kiv.zcu.cz
WWW: https://cs.wikipedia.org/wiki/Karel_Je%C5%BEek_(informatik)
Related Projects:

Document Classification | |
Authors: | Jiří Hynek, Karel Ježek, Michal Toman, Roman Tesař, Zdeněk Češka, Petr Grolmus |
Desc.: | Use of inductive machine learning methods in classification of short text documents. |