Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Sentiment Classification in Social Media: An Analysis of Methods and the Impact of Emoticon Removal
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2016 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Attitydanalys i Sociala Medier : En Analys av Metoder och Uttryckssymbolers Inverkan (Swedish)
Abstract [en]

Sentiment classification is the process of analyzing data and classifying it based on its sentiment conveying properties and the process has a multitude of applications in different industries. However, the different application areas also introduce diverse challenges in implementing the methods successfully. This report examines two of the main approaches commonly used for sentiment classification which entail the use of machine learning and a glossary of weighted words respectively. In addition, preprocessing is explored as an enhancement to the previously mentioned approaches. The approaches are tested on data collected from Twitter to examine their performance in social media. The results indicate that lexicon-based classifiers are the most performant, and that removal of emoticons increases the correctness of classification.

Abstract [sv]

Att kategorisera text beroende på vilken känsla som uttrycks har fått många användningsområden i många industrier. De olika användningsområdena introducerar olika svårigheter att på ett korrekt och konsekvent sätt uppfylla de krav som ställs. Denna rapport avser utforska och bedöma två tillvägagångssätt, ett i form av maskininlärning samt en metod som jämför orden i en text med ordvikter från ett fördefinierat lexikon. Utöver detta analyseras emoji-borttagning som ett möjligt förbättringssätt till båda tillvägagångssätten. Metoderna är testade på data taget från Twitter i syfte att analysera prestandan när data från sociala medier används. Resultaten indikerar att den lexikon-baserade metoden presterar bättre, och att borttagning av emojis ökar korrektheten av klassificeringen.

Place, publisher, year, edition, pages
2016.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-187481OAI: oai:DiVA.org:kth-187481DiVA: diva2:930520
Supervisors
Examiners
Available from: 2016-05-24 Created: 2016-05-24 Last updated: 2016-05-24Bibliographically approved

Open Access in DiVA

fulltext(860 kB)447 downloads
File information
File name FULLTEXT01.pdfFile size 860 kBChecksum SHA-512
53d7e90d28cdb44978f5f7b2e77bc646cbf4518a5b27dbc73a1e319b52a656e1a758975e15405f28098ccb2a32c1145a583f8113610257e2db230a68a3c4eb65
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 447 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 303 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf