Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Machine Learning Based Sentiment Classification of Text, with Application to Equity Research Reports
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Maskininlärningsbaserad sentimentklassificering av text, med tillämpning på aktieanalysrapporte (Swedish)
Abstract [en]

In this thesis, we analyse the sentiment in equity research reports written by analysts at Skandinaviska Enskilda Banken (SEB). We provide a description of established statistical and machine learning methods for classifying the sentiment in text documents as positive or negative. Specifically, a form of recurrent neural network known as long short-term memory (LSTM) is of interest. We investigate two different labelling regimes for generating training data from the reports. Benchmark classification accuracies are obtained using logistic regression models. Finally, two different word embedding models and bidirectional LSTMs of varying network size are implemented and compared to the benchmark results. We find that the logistic regression works well for one of the labelling approaches, and that the best LSTM models outperform it slightly.

Abstract [sv]

I denna rapport analyserar vi sentimentet, eller attityden, i aktieanalysrapporter skrivna av analytiker på Skandinaviska Enskilda Banken (SEB). Etablerade statistiska metoder och maskininlärningsmetoder för klassificering av sentimentet i textdokument som antingen positivt eller negativt presenteras. Vi är speciellt intresserade av en typ av rekurrent neuronnät känt som long short-term memory (LSTM). Vidare undersöker vi två olika scheman för att märka upp träningsdatan som genereras från rapporterna. Riktmärken för klassificeringsgraden erhålls med hjälp av logistisk regression. Slutligen implementeras två olika ordrepresentationsmodeller och dubbelriktad LSTM av varierande nätverksstorlek, och jämförs med riktmärkena. Vi finner att logistisk regression presterar bra för ett av märkningsschemana, och att LSTM har något bättre prestanda.

Place, publisher, year, edition, pages
2019.
Series
TRITA-SCI-GRU ; 2019:318
Keywords [en]
Equity research, NLP, sentiment classification, logistic regression, word2vec, LSTM
Keywords [sv]
Aktieanalys, NLP, sentimentklassificering, logistisk regression, word2vec, LSTM
National Category
Mathematics
Identifiers
URN: urn:nbn:se:kth:diva-257506OAI: oai:DiVA.org:kth-257506DiVA, id: diva2:1348302
External cooperation
Skandinaviska Enskilda Banken
Subject / course
Mathematical Statistics
Educational program
Master of Science - Applied and Computational Mathematics
Supervisors
Examiners
Available from: 2019-09-06 Created: 2019-09-04 Last updated: 2019-09-06Bibliographically approved

Open Access in DiVA

fulltext(2442 kB)23 downloads
File information
File name FULLTEXT01.pdfFile size 2442 kBChecksum SHA-512
9bc18d26c6d40065ff337d0be735831278833ab531f8e3a479192eecf66a05a8b428df3d866d322f240d60b61b4befd755ffedf9b42a33fff70c83d5f18a7e77
Type fulltextMimetype application/pdf

By organisation
Mathematical Statistics
Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 23 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 219 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf