Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Sentiment analysis of Swedish reviews and transfer learning using Convolutional Neural Networks
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Systems and Control.
2018 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Sentiment analysis is a field within machine learning that focus on determine the contextual polarity of subjective information. It is a technique that can be used to analyze the "voice of the customer" and has been applied with success for the English language for opinionated information such as customer reviews, political opinions and social media data. A major problem regarding machine learning models is that they are domain dependent and will therefore not perform well for other domains. Transfer learning or domain adaption is a research field that study a model's ability of transferring knowledge across domains. In the extreme case a model will train on data from one domain, the source domain, and try to make accurate predictions on data from another domain, the target domain. The deep machine learning model Convolutional Neural Network (CNN) has in recent years gained much attention due to its performance in computer vision both for in-domain classification and transfer learning. It has also performed well for natural language processing problems but has not been investigated to the same extent for transfer learning within this area. The purpose of this thesis has been to investigate how well suited the CNN is for cross-domain sentiment analysis of Swedish reviews. The research has been conducted by investigating how the model perform when trained with data from different domains with varying amount of source and target data. Additionally, the impact on the model’s transferability when using different text representation has also been studied.

This study has shown that a CNN without pre-trained word embedding is not that well suited for transfer learning since it performs worse than a traditional logistic regression model. Substituting 20% of source training data with target data can in many of the test cases boost the performance with 7-8% both for the logistic regression and the CNN model. Using pre-trained word embedding produced by a word2vec model increases the CNN's transferability as well as the in-domain performance and outperform the logistic regression model and the CNN model without pre-trained word embedding in the majority of test cases. 

Place, publisher, year, edition, pages
2018. , p. 56
Series
UPTEC STS, ISSN 1650-8319 ; 18001
Keyword [en]
Sentiment analysis, transfer learning, domain adaption, convolutional neural networks, cnn, machine learning, word2vec
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:uu:diva-339066OAI: oai:DiVA.org:uu-339066DiVA, id: diva2:1174477
External cooperation
Findwise AB
Educational program
Systems in Technology and Society Programme
Supervisors
Examiners
Available from: 2018-01-17 Created: 2018-01-15 Last updated: 2018-01-17Bibliographically approved

Open Access in DiVA

Examensarbete_Johan_Sundström(3247 kB)75 downloads
File information
File name FULLTEXT01.pdfFile size 3247 kBChecksum SHA-512
54b3ded1b5d83d0fcb9855dd276385d1d833f8bec1b0ced974467d08f74dfc7f176f6bc7b4971f264659b012001f8048857eb5fa4a78e539f2129ee7a2a73426
Type fulltextMimetype application/pdf

By organisation
Division of Systems and Control
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 75 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 221 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf