Free-Text Plagiarism Detection Based on Latent Semantic Analysis

Free-Text Plagiarism Detection Based on Latent Semantic Analysis

Plagiarism is a widely spread problem that is the main focus of interest these days. In this work, I describe the state of the art of free-text plagiarism detection methods. Further, I discuss approaches for text pre-processing and N-gram extraction that essentially influence the effectiveness of copy detection methods. I propose an advanced plagiarism detection method based on Latent Semantic Analysis (LSA). This method can employ two different document model representations and four variants for model factorization, such as Singular Value Decomposition (SVD), High Order Singular Value Decomposition (HOSVD), Non negative Matrix Factorization (NMF), and Non negative Tensor Factorization (NTF). The final goal is to propose a new method to be more effective than others do.

Keywords: Plagiarism, Copy Detection, Natural Language Processing, N-grams, Phrases, WordNet, Synonyms, Lemmatization, Latent Semantic Analysis, SVD, HOSVD, NMF, NTF.

Year: 2008

Authors of this publication:


Zdeněk Češka


E-mail: zceska@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/en/department/members/detail.html?login=zceska

Zdeněk has been working for various international companies in the field of Software Engineering. He has earned Master's Degree and PhD's Degree in the field of Computer Science and Engineering. His research interests include Mathematics & Algorithmization, Plagiarism Detection, Multilingual Processing, Text Classification, and other related fields.

Related Projects:


Project

Automatic Plagiarism Detection

Authors:  Zdeněk Češka
Desc.:This project focuses on the particular field of automatic plagiarism detection in written text. The main principle of this project is the application of Latent Semantic Analysis in conjunction with word N-grams.