
In Czech: Využití N-gramů pro odhalování plagiátů
Growing popularity of Internet has brought the possibility to download a lot of different documents. This paper describes some common methods relating to the widely spread plagiarism. Employing N-grams is discussed in detail to detect overlapping documents and to avoid issues caused by text shifting. At the end of this paper we compare various methods for higher N-gram sizes extraction.
Keywords: Plagiarism, Copy Detection, N-grams, Comparison
Year: 2007
Download:
Full text [212 kB]

Authors of this publication:

Zdeněk Češka
E-mail: zceska@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/en/department/members/detail.html?login=zceska
Zdeněk has been working for various international companies in the field of Software Engineering. He has earned Master's Degree and PhD's Degree in the field of Computer Science and Engineering. His research interests include Mathematics & Algorithmization, Plagiarism Detection, Multilingual Processing, Text Classification, and other related fields.
Related Projects:

Automatic Plagiarism Detection | |
Authors: | Zdeněk Češka |
Desc.: | This project focuses on the particular field of automatic plagiarism detection in written text. The main principle of this project is the application of Latent Semantic Analysis in conjunction with word N-grams. |