
Internet Content Filtering | |
Keywords: | topic detection, web content analysis, improper content |
Description: | Nowadays, Internet has become a popular information medium. It opens new opportunities of presenting information by means of the links and this opportunity is widely used for information search and perception. More and more people use Internet for work and free time spending. But the rapid growth of Internet and public access causes some issues. There is no restriction, everybody can publish on Internet anything and everybody can see it. Many sites in Internet contain indecent, violent and generally unseemly content. Parents worry that their children might be accosted by pornography, violence, extremism or pedophiles. In cyberspace, people can change their personality very easily and they can be whoever they want to be. It is very difficult to trace the owners of unseemly web sites. There exist even servers containing especially unseemly or forbidden content. Our primary task is to detect these servers or individual sites in various languages. There exist many commercial applications for Internet content filtering using pre-classified web sites database. Our approach is to allow users to set their own level of exceptionability during web watching. Then we want to analyze stored exceptionable web sites (they usually contain links referring to next inappropriate sites) and to find servers containing most of these sites. The goal of this system is to facilitate the work of governmental institutions in preventing and combating Internet crime. Next usability is mainly in public institutions (schools, universities, libraries). |
Status: | Finished |
People on this project:

Roman Tesař
Phone: +420 377632479
E-mail: roman.tesar@gmail.com
WWW: http://www.sweb.cz/romant1/CV.pdf
Roman is a PhD student at the Department of Computer Science and Engineering, Faculty of Applied Sciences, University of West Bohemia in Pilsen, Czech Republic. His work is focused on the utilization of word n-grams in text classification and document filtering.

Karel Ježek
Phone: +420 377632475
E-mail: jezek_ka@kiv.zcu.cz
WWW: https://cs.wikipedia.org/wiki/Karel_Je%C5%BEek_(informatik)
Karel is the former group coordinator and a supervisor of PhD students working at research projects of this Group.
Publications:

Extending the Single Words-Based Document Model: A Comparison of Bigrams and 2-Itemsets | |
Authors: | Roman Tesař, Massimo Poesio, Václav Strnad, Karel Ježek |
Source: | The 2006 ACM Symposium on Document Engineering(DocEng’06), Amsterdam, Netherlands, ACM press (New York, NY, USA), ISBN 1-59593-515-0, pages 138-146. |
Download: | ![]() |

A comparison of two algorithms for discovering repeated word sequences | |
Authors: | Roman Tesař, Dalibor Fiala, François Rousselot, Karel Ježek |
Source: | The 6th International Conference on Data Mining, Text Mining and their Business Applications (Data Mining 2005), Skiathos, Greece, ISBN 1-84564-017-9, pages121-131, WIT Transaction on Information and Communication Technologies, ISSN 1743-3517. |
Download: | ![]() |
View record in Web of Science® |

In Czech: Klasifikace Suffix Tree frázemi - srovnání s metodou Itemsets | |
Authors: | Roman Tesař, Karel Ježek |
Source: | Znalosti 2005 conference, Stará Lesná, Slovakia, ISBN 80-248-0755-6, pages 144-153. |
Download: | ![]() |