In Czech: Porovnání technik předzpracování textu pro detekci plagiátů

In Czech: Porovnání technik předzpracování textu pro detekci plagiátů

This paper deals with the comparison of stop-word removal, lemmatization, synonym replacement, and number replacement techniques for plagiarism detection. Further, we propose advanced word normalization with the use of hyperonyms. We examine the influence of different pre-processing on plagiarism detection methods and recommend the best one solution.

Keywords: Plagiarism, Copy Detection, Nature Language Processing, Stop-word Removal, Lemmatization, Synonymy, WordNet.

Year: 2009

Download: download Full text [289 kB]

Authors of this publication:

Zdeněk Češka


Zdeněk has been working for various international companies in the field of Software Engineering. He has earned Master's Degree and PhD's Degree in the field of Computer Science and Engineering. His research interests include Mathematics & Algorithmization, Plagiarism Detection, Multilingual Processing, Text Classification, and other related fields.

Related Projects:


Automatic Plagiarism Detection

Authors:  Zdeněk Češka
Desc.:This project focuses on the particular field of automatic plagiarism detection in written text. The main principle of this project is the application of Latent Semantic Analysis in conjunction with word N-grams.