Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Finding competitors using Latent Dirichlet Allocation
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2016 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesisAlternative title
Hitta konkurrenter med Latent Dirichlet Allocation (Swedish)
Abstract [en]

Identifying business competitors is of interest to many, but is becoming increasingly hard in an expanding global market. The aim of this report is to investigate whether Latent Dirichlet Allocation (LDA) can be used to identify and rank competitors based on distances between LDA representations of company descriptions. The performance of the LDA model was compared to that of bag-of-words and random ordering by evaluating then comparing them on a handful of common information retrieval metrics. Several different distance metrics were evaluated to determine which metric had best correspondence between representation distance and companies being competitors. Cosine similarity was found to outperform the other distance metrics. While both LDA and bag-of-words representations were found to be significantly better than random ordering, LDA was found to perform worse than bag-of-words. However, computation of distance metrics was considerably faster for LDA representations. The LDA representations capture features that are not helpful for identifying competitors, and it is suggested that LDA representations could be used together with some other data source or heuristic.

Abstract [sv]

Det finns ett intresse av att kunna identifiera affärskonkurrenter, men detta blir allt svårare på en ständigt växande och alltmer global marknad. Syftet med denna rapport är att undersöka om Latent Dirichlet Allocation (LDA) kan användas för att identifiera och rangordna konkurrenter. Detta genom att jämföra avstånden mellan LDA-representationerna av dessas företagsbeskrivningar. Effektiviteten av LDA i detta syfte jämfördes med den för bag-of-words samt slumpmässig ordning, detta med hjälp av några vanliga informationsteoretiska mått. Flera olika avståndsmått utvärderades för att bestämma vilken av dessa som bäst åstadkommer att konkurrerande företag hamnar nära varandra. I detta fall fanns Cosine similarity överträffa andra avståndsmått. Medan både LDA och bag-of-words konstaterades vara signifikant bättre än slumpmässig ordning så fanns att LDA presterar kvalitativt sämre än bag-of-words. Uträkning av avståndsmått var dock betydligt snabbare med LDA-representationer. Att omvandla webbinnehåll till LDA-representationer fångar dock vissa ospecifika likheter som inte nödvändigt beskriver konkurrenter. Det kan möjligen vara fördelaktigt att använda LDA-representationer ihop med någon ytterligare datakälla och/eller heuristik.

Place, publisher, year, edition, pages
2016.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-186386OAI: oai:DiVA.org:kth-186386DiVA: diva2:927147
External cooperation
5 Monkeys Agency AB
Supervisors
Examiners
Available from: 2016-05-12 Created: 2016-05-11 Last updated: 2016-05-12Bibliographically approved

Open Access in DiVA

fulltext(1397 kB)260 downloads
File information
File name FULLTEXT01.pdfFile size 1397 kBChecksum SHA-512
d2083a2aec3f5bfa0329da89523302b636664eef732050864774f7213b292ea83e9260d91e99e6f197a44a7f6014f38a4d023229a7862b04f460ef99b3a71baa
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 260 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 25259 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf