
In Czech: Využití techniky náhodného indexování v oblasti detekce plagiátů
Plagiarism is a wide spread problem that is of great interest these days because of the ease with which electronic documents can be copied. This paper extends the idea of the Latent Semantic Analysis (LSA) application in the field of plagiarism detection and proposes new improvements. The main subject of this paper is the application of a feature compression technique to overcome the problem of processing large amounts of data. Another issue to be discussed is document similarity normalization. A Czech corpus of 1,500 text documents about politics was employed for the experiments. This corpus included documents that had been manually plagiarized by students. The results indicate that the proposed compression technique is able to essentially decrease time execution requirements. Moreover, it has been proved that the new proposed document similarity normalization formula increases the accuracy of plagiarism detection.
Keywords: Plagiarism, Copy Detection, Comparison, Random Indexing, Feature Compression, Latent Semantic Analysis, Singular Value Decomposition
Year: 2009

Authors of this publication:

Zdeněk Češka
E-mail: zceska@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/en/department/members/detail.html?login=zceska
Related Projects:

Automatic Plagiarism Detection | |
Authors: | Zdeněk Češka |
Desc.: | This project focuses on the particular field of automatic plagiarism detection in written text. The main principle of this project is the application of Latent Semantic Analysis in conjunction with word N-grams. |