Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Pre-Processing Structured Data for Standard Machine Learning Algorithms by Supervised Graph Propositionalization - a Case Study with Medicinal Chemistry Datasets
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
2010 (engelsk)Inngår i: Ninth International Conference on Machine Learning and Applications (ICMLA), 2010: Proceedings, IEEE Computer Society, 2010, 828-833 s.Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Graph propositionalization methods can be used to transform structured and relational data into fixed-length feature vectors, enabling standard machine learning algorithms to be used for generating predictive models. It is however not clear how well different propositionalization methods work in conjunction with different standard machine learning algorithms. Three different graph propositionalization methods are investigated in conjunction with three standard learning algorithms: random forests, support vector machines and nearest neighbor classifiers. An experiment on 21 datasets from the domain of medicinal chemistry shows that the choice of propositionalization method may have a significant impact on the resulting accuracy. The empirical investigation further shows that for datasets from this domain, the use of the maximal frequent item set approach for propositionalization results in the most accurate classifiers, significantly outperforming the two other graph propositionalization methods considered in this study, SUBDUE and MOSS, for all three learning methods.

sted, utgiver, år, opplag, sider
IEEE Computer Society, 2010. 828-833 s.
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
URN: urn:nbn:se:su:diva-51976DOI: 10.1109/ICMLA.2010.128ISBN: 978-1-4244-9211-4 (tryckt)OAI: oai:DiVA.org:su-51976DiVA: diva2:386457
Konferanse
Ninth International Conference on Machine Learning and Applications (ICMLA), 12-14 December 2010, Washington D.C., USA
Tilgjengelig fra: 2011-01-12 Laget: 2011-01-12 Sist oppdatert: 2014-02-26bibliografisk kontrollert
Inngår i avhandling
1. Learning predictive models from graph data using pattern mining
Åpne denne publikasjonen i ny fane eller vindu >>Learning predictive models from graph data using pattern mining
2014 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Learning from graphs has become a popular research area due to the ubiquity of graph data representing web pages, molecules, social networks, protein interaction networks etc. However, standard graph learning approaches are often challenged by the computational cost involved in the learning process, due to the richness of the representation. Attempts made to improve their efficiency are often associated with the risk of degrading the performance of the predictive models, creating tradeoffs between the efficiency and effectiveness of the learning. Such a situation is analogous to an optimization problem with two objectives, efficiency and effectiveness, where improving one objective without the other objective being worse off is a better solution, called a Pareto improvement. In this thesis, it is investigated how to improve the efficiency and effectiveness of learning from graph data using pattern mining methods. Two objectives are set where one concerns how to improve the efficiency of pattern mining without reducing the predictive performance of the learning models, and the other objective concerns how to improve predictive performance without increasing the complexity of pattern mining. The employed research method mainly follows a design science approach, including the development and evaluation of artifacts. The contributions of this thesis include a data representation language that can be characterized as a form in between sequences and itemsets, where the graph information is embedded within items. Several studies, each of which look for Pareto improvements in efficiency and effectiveness are conducted using sets of small graphs. Summarizing the findings, some of the proposed methods, namely maximal frequent itemset mining and constraint based itemset mining, result in a dramatically increased efficiency of learning, without decreasing the predictive performance of the resulting models. It is also shown that additional background knowledge can be used to enhance the performance of the predictive models, without increasing the complexity of the graphs.

sted, utgiver, år, opplag, sider
Stockholm: Department of Computer and Systems Sciences, Stockholm University, 2014. 118 s.
Serie
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 14-003
Emneord
Machine Learning, Graph Data, Pattern Mining, Classification, Regression, Predictive Models
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
urn:nbn:se:su:diva-100713 (URN)978-91-7447-837-2 (ISBN)
Disputas
2014-03-25, room B, Forum, Isafjordsgatan 39, Kista, 13:00 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2014-03-03 Laget: 2014-02-11 Sist oppdatert: 2014-03-04bibliografisk kontrollert

Open Access i DiVA

fulltext(236 kB)101 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 236 kBChecksum SHA-512
d4827fb808746c469d99035b0fb040eb3a8b5553bd3bf8b0bd33341ed0f9a5f2dfa616684aac3a218f46485ae467f18b998d8e48cb439ac382a98c7b954153d9
Type fulltextMimetype application/pdf

Andre lenker

Forlagets fulltekst

Søk i DiVA

Av forfatter/redaktør
Karunaratne, ThashmeeBoström, Henrik
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 101 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

Altmetric

Totalt: 71 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf