In Czech: Využití moderních přístupů pro detekci plagiátů

In Czech: Využití moderních přístupů pro detekci plagiátů

Plagiarism is a widely-spread problem you can meet anywhere. However, the education environment is the most problematic. This paper deals with modern appreaches for plagiarism detection. We propose a method that employes text normalization and Latent Semantic Analysis to infer latent semantic association among documents. Further, we introduce preliminar experiments on our testing corpus, which consists of 950 documents about politics. Our preliminar experiments indicate better results for our approach than the other tested methods. Finally, we discuss the use of WordNet to improve the accuracy of plagiarism detection method. Another issue is the identification of translated documents.

Keywords: Plagiarism, Copy Detection, Phrases, N-grams, WordNet, Thesaurus, Singular Value Decomposition, Latent Semantic Analysis

Year: 2008

Download: download Full text [626 kB]

Authors of this publication:

Zdeněk Češka


Zdeněk has been working for various international companies in the field of Software Engineering. He has earned Master's Degree and PhD's Degree in the field of Computer Science and Engineering. His research interests include Mathematics & Algorithmization, Plagiarism Detection, Multilingual Processing, Text Classification, and other related fields.

Related Projects:


Automatic Plagiarism Detection

Authors:  Zdeněk Češka
Desc.:This project focuses on the particular field of automatic plagiarism detection in written text. The main principle of this project is the application of Latent Semantic Analysis in conjunction with word N-grams.