Change search
ReferencesLink to record
Permanent link

Direct link
Clusters (k) Identification without Triangle Inequality: A newly modelled theory
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media.
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Informatics and Media.
2012 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Clustering(k) without Triangle Inequality : A newly modelled theory (English)
Abstract [en]

Cluster analysis characterizes data that are similar enough and useful into meaningful groups (clusters).For example, cluster analysis can be applicable to find group of genes and proteins that are similar, to retrieve information from World Wide Web, and to identify locations that are prone to earthquakes. So the study of clustering has become very important in several fields, which includes psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning and data mining [1] [2].


Cluster analysis is the one of the widely used technique in the area of data mining. According to complexity and amount of data in a system, we can use variety of cluster analysis algorithms. K-means clustering is one of the most popular and widely used among the ten algorithms in data mining [3]. Like other clustering algorithms, it is not the silver bullet. K-means clustering requires pre analysis and knowledge before the number of clusters and their centroids are determined. Recent studies show a new approach for K-means clustering which does not require any pre knowledge for determining the number of clusters [4].


In this thesis, we propose a new clustering procedure to solve the central problem of identifying the number of clusters (k) by imitating the desired number of clusters with proper properties. The proposed algorithm is validated by investigating different characteristics of the analyzed data with modified theory, analyze parameters efficiency and their relationships. The parameters in this theory include the selection of embryo-size (m), significance level (α), distributions (d), and training set (n), in the identification of clusters (k).



Place, publisher, year, edition, pages
2012. , 165 p.
Keyword [en]
K-means clustering, modifying K-means clustering, nearest neighbor clustering, general clustering procedure, Kolmogorov Simonov-test, parameters descriptions
National Category
Computer and Information Science
URN: urn:nbn:se:uu:diva-183608OAI: diva2:563483
Subject / course
Information Systems
Educational program
Master programme in Information Systems
2012-03-30, 12:40 (English)
Available from: 2012-11-21 Created: 2012-10-30 Last updated: 2012-11-21Bibliographically approved

Open Access in DiVA

Registering Master Thesis and Publishing in DIVA(1513 kB)277 downloads
File information
File name FULLTEXT01.pdfFile size 1513 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
Department of Informatics and Media
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 277 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 351 hits
ReferencesLink to record
Permanent link

Direct link