Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Sentiment Analysis on Stack Overflow with Respect to Document Type and Programming Language
KTH, School of Electrical Engineering and Computer Science (EECS).
KTH, School of Electrical Engineering and Computer Science (EECS).
2018 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Attitydanalys på Stack Overflow med avseende på dokumenttyp och programmeringsspråk (Swedish)
Abstract [en]

The sentiment expressed in software engineering (SE) texts has been shown to affect both the productivity and the quality of collaborative work. This is one reason for why sentiment analysis on SE texts has gained attention in research in recent yerars. A large and open resource of SE texts is Stack Overflow (SO). SO is the largest question and answer (Q&A) web site in the Stack Exchange network, and has been the subject for several sentiment analysis studies. It has lately been established that sentiment analyzers trained on social media perform poorly on SE texts, which could challenge the credibility of some of these studies. The Senti4SD sentiment polarity classifier was developed and trained on SO documents to address some of these issues. In this study, random samples of SO documents are drawn and then classified with Senti4SD. The classification into positive, negative and neutral sentiment is used to model the sentiment probability distributions of different document types on SO as a whole, as well as for the eight most popular programming languages. The results indicate that the sentiment of a document is correlated to both the document type and the associated programming language. Among the three sentiment classes, neutral sentiment dominates throughout all SO documents. However, the reliability of the results are reduced by concerns regarding the accuracy of Senti4SD, vaguely specified pre-processing steps and possibly varying classifier bias in different subdomains. In conclusion, further research on sentiment classifiers for SE is needed before any detailed comparative studies of this kind can yield reliable results.

Abstract [sv]

Attityden i tekniska texter har visats påverka både produktivitet och kvalitet i det relaterade arbetet. Detta är en av anledningarna till att attitydanalys på sådana texter har blivit uppmärksammad de senaste åren. Stack Overflow (SO) är en fråge- och svar-webbsida för programmering, och en stor resurs till tekniska texter. SO har undersökts i flertalet studier på attitiydanalys. Nyligen har det dock framkommit att attitydanalysverktyg som tränats på sociala medier presterar dåligt på tekniska texter, vilket kan utmana trovärdigheten hos flera av dessa studier. Senti4SD är ett attitydanalysverktyg för klassificering av attitydpolaritet som tränats specifikt på dokument från SO i syfte att bättre klassificera tekniska exter. I denna studie plockas obundna slumpmässiga urval av SO-dokument som klassificeras med Senti4SD. Klassificeringen av dokument i “negativ”, “neutral” och “positiv” attityd används för att modellera sannolikhetsfördelningen för attitydpolaritet hos olika dokumenttyper på SO i sin helhet. Vidare genomförs likadana modelleringar av attityden för dokument relaterade till vardera av de åtta mest populära programmeringsspråken på SO. Resultaten antyder att attityden i ett dokument är korrelerad till både dokumenttyp och relaterat programmeringsspråk, samt att neutral attityd dominerar. Resultatens tillförlitlighet minskas dock av osäkerheter kring Senti4SDs korrekthet, vagt specificerade förbehandlingssteg samt potentiellt varierande systematiska klassificeringsfel bland olika underdomäner. Sammanfattningsvis bör mer forskning genomföras på attitydanalysverktyg för tekniska texter innan denna typ av detaljerad jämförelsestudie kan ge pålitliga resultat.

Place, publisher, year, edition, pages
2018.
Series
TRITA-EECS-EX ; 2018:196
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-229785OAI: oai:DiVA.org:kth-229785DiVA, id: diva2:1214448
Subject / course
Computer Science
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2018-06-26 Created: 2018-06-06 Last updated: 2018-06-26Bibliographically approved

Open Access in DiVA

fulltext(896 kB)24 downloads
File information
File name FULLTEXT01.pdfFile size 896 kBChecksum SHA-512
3c0229a42c7c90b50b96e686bd3a11b69b0516e398279c6229103091817fd726663a5ce1d74e3b56ecf05c6057f0ed92fb44ad2fba9c22a1fdd0d78bfcbe973f
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 24 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 7 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf