Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient K-means clustering and the importanceof seeding
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2013 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Data clustering is the process of grouping data elements based on some

aspect of similarity between the elements in the group. Clustering has

many applications such as data compression, data mining, pattern recognition

and machine learning and there are many different clustering

methods. This paper examines the k-means method of clustering and

how the choice of initial seeding affects the result. Lloyd’s algorithm is

used as a base line and it is compared to an improved algorithm utilizing

kd-trees. Two different methods of seeding are compared, random

seeding and partial clustering seeding.

Abstract [sv]

Klustring av data innebär att man grupperar dataelement baserat på någon

typ a likhet mellan de grupperade elementen. Klustring har många

olika användningsråden såsom datakompression, datautvinning, mönsterigenkänning,

och maskininlärning och det finns många olika klustringsmetoder.

Den här uppsatsen undersöker klustringsmetoden k-means och

hur valet av startvärden för metoden påverkar resultatet. Lloyds algorithm

används som utgångspunkt och den jämförs med en förbättrad

algorithm som använder sig av kd-träd. Två olika metoder att välja

startvärden jämförs, slumpmässigt val av startvärde och delklustring.

Place, publisher, year, edition, pages
2013.
Series
Kandidatexjobb CSC, K13021
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-134910OAI: oai:DiVA.org:kth-134910DiVA: diva2:668713
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2013-12-13 Created: 2013-12-02 Last updated: 2013-12-13Bibliographically approved

Open Access in DiVA

Efficient K-means clustering and the importance(430 kB)216 downloads
File information
File name FULLTEXT01.pdfFile size 430 kBChecksum SHA-512
c8c6f628bdba666a0db3cd9a6d9a1e84eb8eda697535243b33de3aa55bca2dd08242e856f2dec652d393e1e4e30a51d3cd44f1827d7e3f880e8f4015063497b1
Type fulltextMimetype application/pdf

Other links

http://www.csc.kth.se/utbildning/kth/kurser/DD143X/dkand13/Group4Per/report/40-eliasson-rosen.pdf
By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 216 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 209 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf