Automatic Plagiarism Detection Based on Latent Semantic Analysis

Automatic Plagiarism Detection Based on Latent Semantic Analysis

Plagiarism is a widely spread problem that is the main focus of interest these days. The main objective of this work is the application of Latent Semantic Analysis (LSA) framework in the field of written-text plagiarism detection. This particular field faces various issues that are discussed thoroughly. In order to infer the latent semantics from the given text, Singular Value Decomposition (SVD) is employed for the purpose of large statistical computations. To overcome issues connected with a large amount of extracted N-grams from the text, a feature selection and subsequently a random indexing techniques are applied. Moreover, this thesis deals with the influence of text pre-processing on the accuracy of plagiarism detection. Simultaneously, the aspects of multilingual environment are explored. Various approaches in common use are discussed and compared with the new proposed method.

Keywords: Plagiarism, Copy Detection, Comparison, N-grams, Random Indexing, Feature Compression, Singular Value Decomposition, Latent Semantic Analysis, Lemmatization, Thesaurus, WordNet, Multilingual Processing

Year: 2010

Download: download Full text 

Authors of this publication:


Zden─Ťk ─îe┼íka


E-mail: zceska@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/en/department/members/detail.html?login=zceska

Zden─Ťk has been working for various international companies in the field of Software Engineering. He has earned Master's Degree and PhD's Degree in the field of Computer Science and Engineering. His research interests include Mathematics & Algorithmization, Plagiarism Detection, Multilingual Processing, Text Classification, and other related fields.

Related Projects:


Project

Automatic Plagiarism Detection

Authors:  Zden─Ťk ─îe┼íka
Desc.:This project focuses on the particular field of automatic plagiarism detection in written text. The main principle of this project is the application of Latent Semantic Analysis in conjunction with word N-grams.