
Roman Tesař
Phone: +420 377632479
E-mail: roman.tesar@gmail.com
WWW: http://www.sweb.cz/romant1/CV.pdf
Roman graduated at the University of West Bohemia in 2003, specialized in software engineering. Currently, he is a PhD student focused on internet document filtering, web mining, text classification and generally information retrieval. He also examines the possible utilization and inluence of word n-grams on the areas mentioned above.
Publications:
Sort by: | Year | | Title | | Citations |

In Czech: Extrakce N-gramů z rozsáhlých textů | |
Authors: | Zdeněk Češka, Ivo Hanák, Roman Tesař |
Source: | Proceedings of the 7th Annual Conference ZNALOSTI 2008, Bratislava, Slovakia, pp. 54-65, February 2008. ISBN 978-80-227-2827-0. |
Download: | ![]() |

In Czech: Rozšíření bag-of-words modelu dokumentu: srovnání bigramů a 2-itemsetů | |
Authors: | Roman Tesař, Massimo Poesio, Václav Strnad, Karel Ježek |
Source: | In Proceedings of Znalosti 2007 Conference, Ostrava, Czech Republic, pp. 131-142, ISBN 978-80-248-1279-3, February 2007. |
Download: | ![]() |

In Czech: Vliv normalizace slov na klasifikaci textů | |
Authors: | Michal Toman, Roman Tesař, Karel Ježek |
Source: | Znalosti 2007, Ostrava |
Download: | ![]() |

Knowledge-poor Multilingual Sentence Compression | |
Authors: | Josef Steinberger, Roman Tesař |
Source: | In Proceedings of 7th Conference on Language Engineering, Cairo, Egypt, December 2007, pp. 369-379, The Egyptian Society of Language Engineering. |
Download: | ![]() |

Teraman: A Tool for N-gram Extraction from Large Datasets | |
Authors: | Zdeněk Češka, Ivo Hanák, Roman Tesař |
Source: | Proceedings of the IEEE 3rd International Conference on Intelligent Computer Communication and Processing (IEEE ICCP 2007), Cluj-Napoca, Romania, pp. 209-216, September 2007. ISBN 978-1-4244-1491-8. |
Download: | ![]() |
View record in Web of Science® |

Extending the Single Words-Based Document Model: A Comparison of Bigrams and 2-Itemsets | |
Authors: | Roman Tesař, Massimo Poesio, Václav Strnad, Karel Ježek |
Source: | The 2006 ACM Symposium on Document Engineering(DocEng’06), Amsterdam, Netherlands, ACM press (New York, NY, USA), ISBN 1-59593-515-0, pages 138-146. |
Download: | ![]() |

Extracting Information from Web Content and Structure | |
Authors: | Dalibor Fiala, Roman Tesař, Karel Ježek, François Rousselot |
Source: | Proc. 9th Int. Conf. on Information Systems Implementation and Modelling ISIM’06, Přerov, Czech Republic, pp. 133-140, 2006. (ISBN 80-86840-19-0) |
Download: | ![]() |

Influence of Word Normalization on Text Classification | |
Authors: | Michal Toman, Roman Tesař, Karel Ježek |
Source: | InSciT 2006, Proceeding of Multidisciplinary Approaches to Global Information Systems, vol II, Merida, Spain |
Download: | ![]() |

A comparison of two algorithms for discovering repeated word sequences | |
Authors: | Roman Tesař, Dalibor Fiala, François Rousselot, Karel Ježek |
Source: | The 6th International Conference on Data Mining, Text Mining and their Business Applications (Data Mining 2005), Skiathos, Greece, ISBN 1-84564-017-9, pages121-131, WIT Transaction on Information and Communication Technologies, ISSN 1743-3517. |
Download: | ![]() |
View record in Web of Science® |

In Czech: Klasifikace Suffix Tree frázemi - srovnání s metodou Itemsets | |
Authors: | Roman Tesař, Karel Ježek |
Source: | Znalosti 2005 conference, Stará Lesná, Slovakia, ISBN 80-248-0755-6, pages 144-153. |
Download: | ![]() |
Projects:

Document Classification | |
Authors: | Jiří Hynek, Karel Ježek, Michal Toman, Roman Tesař, Zdeněk Češka, Petr Grolmus |
Desc.: | Use of inductive machine learning methods in classification of short text documents. |

Extracting Information from Web Content and Structure | |
Authors: | Dalibor Fiala, Roman Tesař, Karel Ježek |
Desc.: | This project deals with classification of Web documents and determination of authoritative Web sites. It was supported in part by the Ministry of Education of the Czech Republic under grant FRVS 1347/2005/G1. |

Internet Content Filtering | |
Authors: | Roman Tesař, Karel Ježek |
Desc.: | This project includes Web sites processing, analyzing, classification by means of their content and searching for other Web sites with similar content. |