Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Cutting the Sentiment
KTH, School of Computer Science and Communication (CSC).
2014 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Att snitta sentimentet (Swedish)
Abstract [en]

In this thesis we have adapted a graph method used by Pang and Lee tonot use any machine learning, in order to use it for near real time sentimentanalysis. The machine learning step has been replaced with a novel approach toconstruct one positive document and one negative document respectively. These are positive and negative with respect to the features deemed most important inthe corpus and said features are extracted using an algorithm devised by Cataldiet al. This enables us to replace the machine learning step in Pang and Lees algorithm with a semantic matching between the documents to classify and the constructed documents.We find that TF-IDF is not a suitable measurement for discerning sentiment,and we also note some general difficulties in using the minimum cut algorithm ona near complete graph relating to the instability of said algorithm. The methodis found to give better results for the negative reviews than for the positive, and some possible reasons for this is also discussed.

Abstract [sv]

I detta examensarbete har vi anpassat en grafmetod som användes av Pang och Lee till att inte använda maskininlärningstekniker, så att den kan användas för “near real time” sentimentanalys. Maskininlärningssteget har ersatts men en nyskapande metod för att konstruera ett positivt respaktive ett negativt dokument. Dessa är positiva respektive negative med avseende på de substantiv som bedöms vara viktigast i korpuset, och nämnda substantiv extraheras meden algoritm utvecklad av Cataldi et al. Detta gör att vi kan ersätta maskininlärningsstegeti Pand och Lees algoritm med en semantisk matchning mellan dokumenten som skall klassificeras och de konstruerade dokumenten.Vi finner att TF-IDF inte är en lämplig metod för att urskilja sentiment och vi gör också en del allmänna observationer angående svårigheter med att använda minimala-snittet-metoder i en nästan komplett graf. Metoden som beskrivs ger bättre resultat för negativa recensioner än för positiva och några möjliga skäl till detta diskuteras också.

Place, publisher, year, edition, pages
2014.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-153965OAI: oai:DiVA.org:kth-153965DiVA: diva2:754558
Examiners
Available from: 2014-11-21 Created: 2014-10-10 Last updated: 2014-11-21Bibliographically approved

Open Access in DiVA

fulltext(476 kB)184 downloads
File information
File name FULLTEXT01.pdfFile size 476 kBChecksum SHA-512
3d991e368c04bc1300d9396db9151773609ee8cbe20532c09cd26f6c771f949cea30e8125eddaea8ce40b9b3c208147e8d9cff2386d352077c7db0d8c689e764
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 184 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 486 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf