Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Clusters (k) Identification without Triangle Inequality: A newly modelled theory
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media.
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media.
2012 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Clustering(k) without Triangle Inequality : A newly modelled theory (English)
Abstract [en]

Cluster analysis characterizes data that are similar enough and useful into meaningful groups (clusters).For example, cluster analysis can be applicable to find group of genes and proteins that are similar, to retrieve information from World Wide Web, and to identify locations that are prone to earthquakes. So the study of clustering has become very important in several fields, which includes psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning and data mining [1] [2].

 

Cluster analysis is the one of the widely used technique in the area of data mining. According to complexity and amount of data in a system, we can use variety of cluster analysis algorithms. K-means clustering is one of the most popular and widely used among the ten algorithms in data mining [3]. Like other clustering algorithms, it is not the silver bullet. K-means clustering requires pre analysis and knowledge before the number of clusters and their centroids are determined. Recent studies show a new approach for K-means clustering which does not require any pre knowledge for determining the number of clusters [4].

 

In this thesis, we propose a new clustering procedure to solve the central problem of identifying the number of clusters (k) by imitating the desired number of clusters with proper properties. The proposed algorithm is validated by investigating different characteristics of the analyzed data with modified theory, analyze parameters efficiency and their relationships. The parameters in this theory include the selection of embryo-size (m), significance level (α), distributions (d), and training set (n), in the identification of clusters (k).

 

 

Place, publisher, year, edition, pages
2012. , 165 p.
Keyword [en]
K-means clustering, modifying K-means clustering, nearest neighbor clustering, general clustering procedure, Kolmogorov Simonov-test, parameters descriptions
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:uu:diva-183608OAI: oai:DiVA.org:uu-183608DiVA: diva2:563483
Subject / course
Information Systems
Educational program
Master programme in Information Systems
Presentation
2012-03-30, 12:40 (English)
Uppsok
Technology
Supervisors
Examiners
Available from: 2012-11-21 Created: 2012-10-30 Last updated: 2012-11-21Bibliographically approved

Open Access in DiVA

Registering Master Thesis and Publishing in DIVA(1513 kB)365 downloads
File information
File name FULLTEXT01.pdfFile size 1513 kBChecksum SHA-512
68b1eb2c4d8f82955db633a2295f9759f2f7aee20fe555f715becbf154a0b293a562c7e4c5cfe56f74481105635e2f952e2c17a765f08674f529a69080db737d
Type fulltextMimetype application/pdf

By organisation
Department of Informatics and Media
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 365 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 603 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf