Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Pattern Mining in Uncertain Tensors
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Mönsterutvinning i obestämda tensorer (Swedish)
Abstract [en]

Data mining is the art of extracting information from data and creating useful knowledge. Itemset mining, or pattern mining, is an important subfield that consists in finding relevant patterns in datasets. We focus on two subproblems: high-utility itemset mining, where a numerical value called utility is associated to every tuple of the dataset, and patterns are extracted whose utilities sum up to a high-enough value; and skypattern mining, which is the extraction of patterns optimizing various measures, using the notion of Pareto domination. To tackle both of these challenges, we follow a generalistic approach based on measures’ piecewise (anti-)monotonicity. This mathematical property is used in multidupehack, an algorithm in which it is proved useful to prune the search space. Our contributions are implemented as extensions of multidupehack in order to benefit from its powerful pruning strategy. It also allows the extraction of patterns in a broad context: many existing algorithms only handle datasets that are 0/1-matrices, while this work deals with uncertain tensors, i.e. n-dimensional datasets in which the values are numerical numbers between 0 and 1. Experiments on real-life datasets show the efficiency of our approach, and its ability to extract semantically highly relevant patterns. Comparative studies on reference datasets prove its competitiveness with state-of-the-art algorithms: despite its greater versatility, it is often shown faster than its competitors.

Abstract [sv]

Datautvinning är konsten att skapa användbar kunskap genom att utvinna information ur data. Itemset-utvinning, eller mönsteranalys, är ett viktigt delområde inom datautvinning som består av att hitta mönster i dataset. Denna studie fokuserar på två sub-problem: high utility itemset-utvinning, mönsteranalys där ett numerisk värde befästs på nyttan (utility) hos varje tupel i datasetet för att sedan utvinna endast den data med hög nytta; och skypattern mining, som är en utvinning av mönster där man optimerar olika värden, med hjälp av Pareto-dominans. För att tackla dessa problem använder vi ett generalistiskt angreppssätt som baseras på mätvärdens delvisa (anti-)monotonicitet. Denna matematiska egenskap används i multidupehack, en algoritm där egenskapen visar sig nyttig för att beskära sökrymder. Vårt bidrag innefattar en fördjupad implementation av multidupehack för att dra nytta av dess kraftfulla beskärningsstrategi. Bidraget tillåter även utvinning av mönster i en större kontext: många samtida algoritmer kan bara hantera dataset bestående av 0/1-matriser, medan detta bidrag tacklar obestämda tensorer, n-dimensionella dataset i vilka värdena är numeriska tal mellan 0 och 1. Experiment på dataset ur vardagen visar effektiviteten av vårt angreppssätt och dess förmåga att utvinna mycket semantiskt relevanta mönster. Liknande studier på referensdataset visar dess prestationsförmåga gentemot state-of-the-art algoritmer: trots algoritmens allsidighet, visar den prov på att vara bättre än sina konkurrenter.

Place, publisher, year, edition, pages
2019. , p. 48
Series
TRITA-EECS-EX ; 2019:694
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-265629OAI: oai:DiVA.org:kth-265629DiVA, id: diva2:1380265
Supervisors
Examiners
Available from: 2020-01-29 Created: 2019-12-18 Last updated: 2020-01-29Bibliographically approved

Open Access in DiVA

fulltext(3818 kB)6 downloads
File information
File name FULLTEXT01.pdfFile size 3818 kBChecksum SHA-512
5d540720e04c1ae606de0d5c265e615b6ecc574657c960c5315adf5c96f5cae172f67ef7e921cf67c658003927517fa353f102d33b2942adf872d37173bbb2b8
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 6 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 28 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf