
Free-Text Plagiarism Detection Based on Latent Semantic Analysis
Plagiarism is a widely spread problem that is the main focus of interest these days. In this work, I describe the state of the art of free-text plagiarism detection methods. Further, I discuss approaches for text pre-processing and N-gram extraction that essentially influence the effectiveness of copy detection methods. I propose an advanced plagiarism detection method based on Latent Semantic Analysis (LSA). This method can employ two different document model representations and four variants for model factorization, such as Singular Value Decomposition (SVD), High Order Singular Value Decomposition (HOSVD), Non negative Matrix Factorization (NMF), and Non negative Tensor Factorization (NTF). The final goal is to propose a new method to be more effective than others do.
Keywords: Plagiarism, Copy Detection, Natural Language Processing, N-grams, Phrases, WordNet, Synonyms, Lemmatization, Latent Semantic Analysis, SVD, HOSVD, NMF, NTF.
Year: 2008
Authors of this publication:

Zdeněk Češka
E-mail: zceska@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/en/department/members/detail.html?login=zceska
Related Projects:

Automatic Plagiarism Detection | |
Authors: | Zdeněk Češka |
Desc.: | This project focuses on the particular field of automatic plagiarism detection in written text. The main principle of this project is the application of Latent Semantic Analysis in conjunction with word N-grams. |