Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Combining Lexicon- and Learning-based Approaches for Improved Performance and Convenience in Sentiment Classification
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2015 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Sentiment classification is the process of categorizing data into categories based on its polarity with a wide array of applications across several industries. This report examines a combination of two prominent approaches to sentiment classification using a lexicon of weighted words and machine learning respectively. These approaches are compared with the combined hybrid approach in order to give an account of their relative strengths and weaknesses. When run on a set of IMDb movie reviews the results indicate that the hybrid model performs better than the lexicon-based approach, in turn being outperformed by the learning-based approach. However, the gain in convenience brought on by eliminating the need for training data makes the hybrid model an appealing alternative to the other approaches with a slight trade-off in performance.

Abstract [sv]

Att klassificera text i kategorier baserat på känslan de uttrycker är ett aktuellt område idag och kan tillämpas inom många industrier. Rapporten undersöker en kombination av de två framstående tillvägagångssätten till denna typ av klassificering baserade på ett lexikon med definerade ordvikter respektive maskininlärning. Denna hybridlösning jämförs mot de två andra tillvägagångssätten för att framlägga deras relativa styrkor och svagheter. På ett dataset med filmrecensioner från IMDb får maskininlärningsklassificeraren bäst resultat, följt av hybridlösningen och sist den lexikonbaserade lösningen. Trots det kan hybridlösningen vara att föredra i situationer där det är ogenomförbart eller oskäligt att förbereda träningsdata för maskininlärningsklassificeraren, dock med ett visst avkall på prestanda.

Place, publisher, year, edition, pages
2015.
Keyword [en]
sentiment classification, sentiment analysis, opinion mining, lexicon, machine learning, hybrid, movie reviews, social media
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-166430OAI: oai:DiVA.org:kth-166430DiVA: diva2:811021
Supervisors
Examiners
Available from: 2015-05-28 Created: 2015-05-09 Last updated: 2015-05-28Bibliographically approved

Open Access in DiVA

fulltext(1656 kB)490 downloads
File information
File name FULLTEXT01.pdfFile size 1656 kBChecksum SHA-512
7a3278640117e8bf4912939788d29f4c497e154ed720028c5e5222ac843a453ba714e6f495af64422739bb6c9c4f376fcea054733dc334887ba845c894fc5926
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Sommar, FredrikWielondek, Milosz
By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 490 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 890 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf