Clusters (k) Identification without Triangle Inequality: A newly modelled theory
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Clustering(k) without Triangle Inequality : A newly modelled theory (English)
Cluster analysis characterizes data that are similar enough and useful into meaningful groups (clusters).For example, cluster analysis can be applicable to find group of genes and proteins that are similar, to retrieve information from World Wide Web, and to identify locations that are prone to earthquakes. So the study of clustering has become very important in several fields, which includes psychology and other social sciences, biology, statistics, pattern recognition, information retrieval, machine learning and data mining  .
Cluster analysis is the one of the widely used technique in the area of data mining. According to complexity and amount of data in a system, we can use variety of cluster analysis algorithms. K-means clustering is one of the most popular and widely used among the ten algorithms in data mining . Like other clustering algorithms, it is not the silver bullet. K-means clustering requires pre analysis and knowledge before the number of clusters and their centroids are determined. Recent studies show a new approach for K-means clustering which does not require any pre knowledge for determining the number of clusters .
In this thesis, we propose a new clustering procedure to solve the central problem of identifying the number of clusters (k) by imitating the desired number of clusters with proper properties. The proposed algorithm is validated by investigating different characteristics of the analyzed data with modified theory, analyze parameters efficiency and their relationships. The parameters in this theory include the selection of embryo-size (m), significance level (α), distributions (d), and training set (n), in the identification of clusters (k).
Place, publisher, year, edition, pages
2012. , 165 p.
K-means clustering, modifying K-means clustering, nearest neighbor clustering, general clustering procedure, Kolmogorov Simonov-test, parameters descriptions
Computer and Information Science
IdentifiersURN: urn:nbn:se:uu:diva-183608OAI: oai:DiVA.org:uu-183608DiVA: diva2:563483
Subject / course
Master programme in Information Systems
2012-03-30, 12:40 (English)