Text-Mining Research Group

The Future of Copy Detection Techniques

Internet is one of the richest encyclopaedias in the world. Students can easily download various free documents and then plagiarize their content. This paper describes the current state of copy detection methods and proposes some new trends. New approaches, closer to nature language processing, can essentially improve identification of hardly-detectable cases of plagiarism, i.e. single-word changes and sentence structure changes. Synonyms and Latent Semantic Analysis are discussed in detail for better understanding of the semantics within documents.

Keywords: Plagiarism, Copy Detection, Natural Language Processing, N-grams, Phrases, Synonyms, Singular Value Decomposition, Latent Semantic Analysis

Year: 2007

Download:

Full text [374 kB]

Authors of this publication:

Zdeněk Češka

E-mail: zceska@kiv.zcu.cz
WWW: http://www.kiv.zcu.cz/en/department/members/detail.html?login=zceska

Zdeněk has been working for various international companies in the field of Software Engineering. He has earned Master's Degree and PhD's Degree in the field of Computer Science and Engineering. His research interests include Mathematics & Algorithmization, Plagiarism Detection, Multilingual Processing, Text Classification, and other related fields.

Related Projects:

Automatic Plagiarism Detection
Authors:	Zdeněk Češka
Desc.:	This project focuses on the particular field of automatic plagiarism detection in written text. The main principle of this project is the application of Latent Semantic Analysis in conjunction with word N-grams.

Text-Mining Research Group

University of West Bohemia

The Future of Copy Detection Techniques

Authors of this publication:

Zdeněk Češka

Related Projects:

Automatic Plagiarism Detection