Change search
ReferencesLink to record
Permanent link

Direct link
Graph-based Natural Language Processing: Graph edit distance applied to the task of detecting plagiarism
Norwegian University of Science and Technology, Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Computer and Information Science.
2012 (English)MasteroppgaveStudent thesis
Abstract [en]

The focus of this thesis is the exploration of graph-based similarity, in the context of natural language processing. The work is motivated by a need for richer representations of text. A graph edit distance algorithm was implemented, that calculates the difference between graphs. Sentences were represented by means of dependency graphs, which consist of words connected by dependencies. A dependency graph captures the syntactic structure of a sentence. The graph-based similarity approach was applied to the problem of detecting plagiarism, and was compared against state of the art systems. The key advantages of graph-based textual representations are mainly word order indifference and the ability to capture similarity between words, based on the sentence structure. The approach was compared against contributions made to the PAN plagiarism detection challenge at the CLEF 2011 conference, and would have achieved a 5th place out of 10 contestants. The evaluation results suggest that the approach can be applicable to the task of detecting plagiarism, but require some fine tuning on input parameters. The evaluation results demonstrated that dependency graphs are best represented by directed edges. The graph edit distance algorithm scored best with a combination of node and edge label matching. Different edit weights were applied, which increased performance. Keywords: Graph Edit Distance, Natural Language Processing, Dependency Graphs, Plagiarism Detection

Place, publisher, year, edition, pages
Institutt for datateknikk og informasjonsvitenskap , 2012. , 61 p.
URN: urn:nbn:no:ntnu:diva-20778Local ID: ntnudaim:6700OAI: diva2:618487
Available from: 2013-04-28 Created: 2013-04-28 Last updated: 2013-06-21Bibliographically approved

Open Access in DiVA

fulltext(726 kB)1479 downloads
File information
File name FULLTEXT01.pdfFile size 726 kBChecksum SHA-512
Type fulltextMimetype application/pdf
cover(184 kB)25 downloads
File information
File name COVER01.pdfFile size 184 kBChecksum SHA-512
Type coverMimetype application/pdf
attachment(49747 kB)1245 downloads
File information
File name ATTACHMENT01.zipFile size 49747 kBChecksum SHA-512
Type attachmentMimetype application/zip

By organisation
Department of Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 1479 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 185 hits
ReferencesLink to record
Permanent link

Direct link