Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Computing Random Forests Variable Importance Measures (VIM) on Mixed Numerical and Categorical Data
KTH, School of Computer Science and Communication (CSC).
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Beräkning av Random Forests variable importance measures (VIM) på kategoriska och numeriska prediktorvariabler (Swedish)
Abstract [en]

The Random Forest model is commonly used as a predictor function and the model have been proven useful in a variety of applications. Their popularity stems from the combination of providing high prediction accuracy, their ability to model high dimensional complex data, and their applicability under predictor correlations. This report investigates the random forest variable importance measure (VIM) as a means to find a ranking of important variables. The robustness of the VIM under imputation of categorical noise, and the capability to differentiate informative predictors from non-informative variables is investigated. The selection of variables may improve robustness of the predictor, improve the prediction accuracy, reduce computational time, and may serve as a exploratory data analysis tool. In addition the partial dependency plot obtained from the random forest model is examined as a means to find underlying relations in a non-linear simulation study.

Abstract [sv]

Random Forest (RF) är en populär prediktormodell som visat goda resultat vid en stor uppsättning applikationsstudier. Modellen ger hög prediktionsprecision, har förmåga att modellera komplex högdimensionell data och modellen har vidare visat goda resultat vid interkorrelerade prediktorvariabler. Detta projekt undersöker ett mått, variabel importance measure (VIM) erhållna från RF modellen, för att beräkna graden av association mellan prediktorvariabler och målvariabeln. Projektet undersöker känsligheten hos VIM vid kvalitativt prediktorbrus och undersöker VIMs förmåga att differentiera prediktiva variabler från variabler som endast, med aveende på målvariableln, beskriver brus. Att differentiera prediktiva variabler vid övervakad inlärning kan användas till att öka robustheten hos klassificerare, öka prediktionsprecisionen, reducera data dimensionalitet och VIM kan användas som ett verktyg för att utforska relationer mellan prediktorvariabler och målvariablel.

Place, publisher, year, edition, pages
2016.
Keyword [en]
machine learning, ml, variable importance, vim, random forests, rf, feature selection, variable selection, exploratory data analysis, eda
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-185496OAI: oai:DiVA.org:kth-185496DiVA: diva2:921542
External cooperation
Handelsbanken
Educational program
Master of Science in Engineering -Engineering Physics
Supervisors
Examiners
Available from: 2016-05-11 Created: 2016-04-20 Last updated: 2016-05-11Bibliographically approved

Open Access in DiVA

fulltext(1534 kB)930 downloads
File information
File name FULLTEXT01.pdfFile size 1534 kBChecksum SHA-512
14864a7d7178ebc369f9778a224545784d3b37911b8548c3937d36cadbe1c95dd6008a1ce9c724717122a6549b489126678da2bbd90b22033e9332f9b1c955dd
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 930 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 580 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf