Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
LSTM vs Random Forest for Binary Classification of Insurance Related Text
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
LSTM vs Random Forest för binär klassificering av försäkringsrelaterad text (Swedish)
Abstract [en]

The field of natural language processing has received increased attention lately, but less focus is put on comparing models, which differ in complexity. This thesis compares Random Forest to LSTM, for the task of classifying a message as question or non-question. The comparison was done by training and optimizing the models on historic chat data from the Swedish insurance company Hedvig. Different types of word embedding were also tested, such as Word2vec and Bag of Words. The results demonstrated that LSTM achieved slightly higher scores than Random Forest, in terms of F1 and accuracy. The models’ performance were not significantly improved after optimization and it was also dependent on which corpus the models were trained on.

An investigation of how a chatbot would affect Hedvig’s adoption rate was also conducted, mainly by reviewing previous studies about chatbots’ effects on user experience. The potential effects on the innovation’s five attributes, relative advantage, compatibility, complexity, trialability and observability were analyzed to answer the problem statement. The results showed that the adoption rate of Hedvig could be positively affected, by improving the first two attributes. The effects a chatbot would have on complexity, trialability and observability were however suggested to be negligible, if not negative.

Abstract [sv]

Det vetenskapliga området språkteknologi har fått ökad uppmärksamhet den senaste tiden, men mindre fokus riktas på att jämföra modeller som skiljer sig i komplexitet. Den här kandidatuppsatsen jämför Random Forest med LSTM, genom att undersöka hur väl modellerna kan användas för att klassificera ett meddelande som fråga eller icke-fråga. Jämförelsen gjordes genom att träna och optimera modellerna på historisk chattdata från det svenska försäkringsbolaget Hedvig. Olika typer av word embedding, så som Word2vec och Bag of Words, testades också. Resultaten visade att LSTM uppnådde något högre F1 och accuracy än Random Forest. Modellernas prestanda förbättrades inte signifikant efter optimering och resultatet var också beroende av vilket korpus modellerna tränades på.

En undersökning av hur en chattbot skulle påverka Hedvigs adoption rate genomfördes också, huvudsakligen genom att granska tidigare studier om chattbotars effekt på användarupplevelsen. De potentiella effekterna på en innovations fem attribut, relativ fördel, kompatibilitet, komplexitet, prövbarhet and observerbarhet analyserades för att kunna svara på frågeställningen. Resultaten visade att Hedvigs adoption rate kan påverkas positivt, genom att förbättra de två första attributen. Effekterna en chattbot skulle ha på komplexitet, prövbarhet och observerbarhet ansågs dock vara försumbar, om inte negativ.

Place, publisher, year, edition, pages
2019.
Series
TRITA-SCI-GRU ; 2019:151
Keywords [en]
Random Forest, Classification, Natural Language Processing, Machine Learning, Neural Networks, Bag of Words, Bachelor Thesis, Diffusion of Innovation, Adoption Rate, User Experience
Keywords [sv]
Random Forest, Klassificering, Språkteknologi, Maskininlärning, Neurala nätverk, Bag of Words, Kandidatexamensarbete, Användarupplevelse
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:kth:diva-252748OAI: oai:DiVA.org:kth-252748DiVA, id: diva2:1334069
External cooperation
Hedvig
Subject / course
Applied Mathematics and Industrial Economics
Educational program
Master of Science in Engineering - Industrial Engineering and Management
Supervisors
Examiners
Available from: 2019-07-02 Created: 2019-07-02 Last updated: 2019-07-02Bibliographically approved

Open Access in DiVA

fulltext(1540 kB)44 downloads
File information
File name FULLTEXT01.pdfFile size 1540 kBChecksum SHA-512
d9c27311bc4d5371f13759ed304be9319ade9eac8d54cd46ce4bf715074e22c3540ee03c2069006dba93f355a4bc4ccf09ba9614defd1936353f934c74aed0e6
Type fulltextMimetype application/pdf

By organisation
Mathematical Statistics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 44 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 153 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf