
Extracting Information from Web Content and Structure | |
Keywords: | Web mining, information retrieval, classification, ranking algorithms |
Description: | Web is a vast data repository. By mining from this data efficiently, we can gain valuable knowledge. Unfortunately, in addition to useful content there are also many Web documents considered harmful (e.g. pornography, terrorism, illegal drugs). Web mining that includes three main areas – content, structure, and usage mining – may help us detect and eliminate these sites. In this project, we concentrate on applications of Web content and Web structure mining. First, we introduce a system for detection of pornographic textual Web pages. We discuss its classification methods and depict its architecture. Second, we present analysis of relations among Czech academic computer science Web sites. We give an overview of ranking algorithms and determine importance of the sites we analyzed. |
Status: | Finished |
People on this project:

Dalibor Fiala
Phone: +420 377 63 2429
E-mail: dalfia@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/~dalfia/
Dalibor is the research group coordinator and an associate professor at the Department of Computer Science and Engineering at the University of West Bohemia in Pilsen, Czech Republic. He is interested in data mining, web mining, information retrieval, informetrics, and information science.

Roman TesaÅ™
Phone: +420 377632479
E-mail: roman.tesar@gmail.com
WWW: http://www.sweb.cz/romant1/CV.pdf
Roman is a PhD student at the Department of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia in Pilsen, Czech Republic. His work is focused on the utilization of word n-grams in text classification and document filtering.

Karel Ježek
Phone: +420 377632475
E-mail: jezek_ka@kiv.zcu.cz
WWW: https://cs.wikipedia.org/wiki/Karel_Je%C5%BEek_(informatik)
Karel is the former group coordinator and a supervisor of PhD students working at research projects of this Group.
Publications:

Extended Formal Model for Ranking of Authoritative Resources | |
Authors: | Karel Ježek |
Source: | Proccedings XXII Mezdunarodnoj Naucnoj Konferencii MMTT-22, ISBN 978-5-91116-087-2 (Tom 7), pp.193-195, Pskov 2009. |

Exploration and Evaluation of Citation Networks | |
Authors: | Karel Ježek, Dalibor Fiala, Josef Steinberger |
Source: | Proceedings of the 12th International Conference on Electronic Publishing, ISBN 978-0-7727-6315-0, pp 351-362, Toronto, Canada 2008 |
Download: | ![]() |

PageRank for bibliographic networks | |
Authors: | Dalibor Fiala, François Rousselot, Karel Ježek |
Source: | Scientometrics, vol. 76, no. 1, pp. 135-158, 2008. |
ISSN: | 0138-9130 |
Download: | ![]() |
View record in Web of Science® |

Extracting Information from Web Content and Structure | |
Authors: | Dalibor Fiala, Roman Tesař, Karel Ježek, François Rousselot |
Source: | Proc. 9th Int. Conf. on Information Systems Implementation and Modelling ISIM’06, Přerov, Czech Republic, pp. 133-140, 2006. (ISBN 80-86840-19-0) |
Download: | ![]() |