Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Comparing performance of K-Means and DBSCAN on customer support queries
KTH, School of Electrical Engineering and Computer Science (EECS).
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

In customer support, there are often a lot of repeat questions, and questions that does not need novel answers. In a quest to increase the productivity in the question answering task within any business, there is an apparent room for automatic answering to take on some of the workload of customer support functions. We look at clustering corpora of older queries and texts as a method for identifying groups of semantically similar questions and texts that would allow a system to identify new queries that fit a specific cluster to receive a connected, automatic response. The approach compares the performance of K-means and density-based clustering algorithms on three different corpora using document embeddings encoded with BERT. We also discuss the digital transformation process, why companies are unsuccessful in their implementation as well as the possible room for a new more iterative model.

Abstract [sv]

I kundtjänst förekommer det ofta upprepningar av frågor samt sådana frågor som inte kräver unika svar. I syfte att öka produktiviteten i kundtjänst funktionens arbete att besvara dessa frågor undersöks metoder för att automatisera en del av arbetet. Vi undersöker olika metoder för klusteranalys, applicerat på existerande korpusar innehållande texter så väl som frågor. Klusteranalysen genomförs i syfte att identifiera dokument som är semantiskt lika, vilket i ett automatiskt system för frågebevarelse skulle kunna användas för att besvara en ny fråga med ett existerande svar. En jämförelse mellan hur K-means och densitetsbaserad metod presterar på tre olika korpusar vars dokumentrepresentationer genererats med BERT genomförs. Vidare diskuteras den digitala transformationsprocessen, varför företag misslyckas avseende implementation samt även möjligheterna för en ny mer iterativ modell.

Place, publisher, year, edition, pages
2019. , p. 12
Series
TRITA-EECS-EX ; 2019:402
Keywords [en]
Classification, Digital Transformation, Natural language processing, Short text clustering.
Keywords [sv]
Digital transformation, Klassifikation, Klusteranalys, Språkteknologi.
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-260252OAI: oai:DiVA.org:kth-260252DiVA, id: diva2:1354954
Examiners
Available from: 2019-10-09 Created: 2019-09-26 Last updated: 2019-10-09Bibliographically approved

Open Access in DiVA

fulltext(1250 kB)6 downloads
File information
File name FULLTEXT01.pdfFile size 1250 kBChecksum SHA-512
0d1917ce979a0adcbf9ba5c493bfed568bf03693d0690d3e588693e46651cd59395e651fc335eb20158f11fca8f7015cb4c61e30bf47d239dbf3f6e90411a2d5
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 6 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 30 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf