Detection of deceptive reviews: using classification and natural language processing features
Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
With the great growth of open forums online where anyone can givetheir opinion on everything, the Internet has become a place wherepeople are trying to mislead others. By assuming that there is acorrelation between a deceptive text's purpose and the way to writethe text, our goal with this thesis was to develop a model fordetecting these fake texts by taking advantage of this correlation.Our approach was to use classification together with threedifferent feature types, term frequency-inverse document frequency,word2vec and probabilistic context-free grammar. We have managed todevelop a model which have improved all, to us known, results for twodifferent datasets.With machine translation, we have detected that there is apossibility to hide the stylometric footprints and thecharacteristics of deceptive texts, making it possible to slightlydecrease the accuracy of a classifier and still convey a message.Finally we investigated whether it was possible to train and test ourmodel on data from different sources and managed to achieve anaccuracy hardly better than chance. That indicated the resultingmodel is not versatile enough to be used on different kinds ofdeceptive texts than it has been trained on.
Place, publisher, year, edition, pages
2016. , 53 p.
UPTEC F, ISSN 1401-5757 ; 16056
Machine learning, deceptive detection, SVM, support vector machines, natural language processing, NLP, classification
IdentifiersURN: urn:nbn:se:uu:diva-306956OAI: oai:DiVA.org:uu-306956DiVA: diva2:1044927
Master Programme in Engineering Physics
García Lozano, Marianela
Nyberg, TomasCassel, Sofia