Change search
ReferencesLink to record
Permanent link

Direct link
Knowing an Object by the Company It Keeps: A Domain-Agnostic Scheme for Similarity Discovery
RISE, Swedish ICT, SICS. Decisions, Networks and Analytics lab.
RISE, Swedish ICT, SICS. Decisions, Networks and Analytics lab.
RISE, Swedish ICT, SICS. Decisions, Networks and Analytics lab.
Number of Authors: 3
2015 (English)Conference paper (Refereed)
Abstract [en]

Appropriately defining and then efficiently calculating similarities from large data sets are often essential in data mining, both for building tractable representations and for gaining understanding of data and generating processes. Here we rely on the premise that given a set of objects and their correlations, each object is characterized by its context, i.e. its correlations to the other objects, and that the similarity between two objects therefore can be expressed in terms of the similarity between their respective contexts. Resting on this principle, we propose a data-driven and highly scalable approach for discovering similarities from large data sets by representing objects and their relations as a correlation graph that is transformed to a similarity graph. Together these graphs can express rich structural properties among objects. Specifically, we show that concepts - representations of abstract ideas and notions - are constituted by groups of similar objects that can be identified by clustering the objects in the similarity graph. These principles and methods are applicable in a wide range of domains, and will here be demonstrated for three distinct types of objects: codons, artists and words, where the numbers of objects and correlations range from small to very large.

Place, publisher, year, edition, pages
2015, 18. 121-130 p.
National Category
Computer and Information Science
URN: urn:nbn:se:ri:diva-24463OAI: diva2:1043547
IEEE International Conference on Data Mining (ICDM)
Available from: 2016-10-31 Created: 2016-10-31

Open Access in DiVA

fulltext(510 kB)7 downloads
File information
File name FULLTEXT01.pdfFile size 510 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

By organisation
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 7 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 4 hits
ReferencesLink to record
Permanent link

Direct link