Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Viral Clustering: A Robust Method to Extract Structures in Heterogeneous Datasets
KTH, School of Electrical Engineering (EES), Automatic Control.
KTH, School of Electrical Engineering (EES), Automatic Control.
2016 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Cluster validation constitutes one of the most challenging problems in unsupervised cluster analysis. For example, identifying the true number of clusters present in a dataset has been investigated for decades, and is still puzzling researchers today. The difficulty stems from the high variety of the dataset characteristics. Some datasets exhibit a strong structure with a few well-separated and normally distributed clusters, but most often real-world datasets contain possibly many overlapping non-gaussian clusters with heterogeneous variances and shapes. This calls for the design of robust clustering algorithms that could adapt to the structure of the data and in particular accurately guess the true number of clusters. They have recently been interesting attempts to design such algorithms, e.g. based on involved non-parametric statistical inference techniques. In this paper, we develop Viral Clustering (VC), a simple algorithm that jointly estimates the number of clusters and outputs clusters. The VC algorithm relies on two antagonist and interacting components. The first component tends to regroup neighbouring samples together, while the second component tends to spread samples in various clusters. This spreading component is performed using an analogy with the way virus spread over networks. We present extensive numerical experiments illustrating the robustness of the VC algorithm, and its superiority compared to existing algorithms.

Place, publisher, year, edition, pages
AAAI Press, 2016. 1986-1992 p.
Keyword [en]
Clustering, K-means, Cluster Validation, Number of Clusters
National Category
Computer Science
Research subject
Mathematics
Identifiers
URN: urn:nbn:se:kth:diva-181109Scopus ID: 2-s2.0-85007251785OAI: oai:DiVA.org:kth-181109DiVA: diva2:898710
Conference
The Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), February 12-17, Phoenix, USA
Note

QC 20160323

Available from: 2016-01-29 Created: 2016-01-29 Last updated: 2017-02-27Bibliographically approved

Open Access in DiVA

fulltext(863 kB)167 downloads
File information
File name FULLTEXT01.pdfFile size 863 kBChecksum SHA-512
d8365d1700daa59d9a3d0dd86078bae9bb4e23988f446ddc8303879f35451871d8bf902a85c2731e5ec4177633ddfb19e2efc427a787f52875f657fe94bd7e82
Type fulltextMimetype application/pdf

Other links

ScopusConference website

Search in DiVA

By author/editor
Petrosyan, VahanProutiere, Alexandre
By organisation
Automatic Control
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 167 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 564 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf