Change search
ReferencesLink to record
Permanent link

Direct link
Computing Random Forests Variable Importance Measures (VIM) on Mixed Numerical and Categorical Data
KTH, School of Computer Science and Communication (CSC).
2016 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Beräkning av Random Forests variable importance measures (VIM) på kategoriska och numeriska prediktorvariabler (Swedish)
Abstract [en]

The Random Forest model is commonly used as a predictor function and the model have been proven useful in a variety of applications. Their popularity stems from the combination of providing high prediction accuracy, their ability to model high dimensional complex data, and their applicability under predictor correlations. This report investigates the random forest variable importance measure (VIM) as a means to find a ranking of important variables. The robustness of the VIM under imputation of categorical noise, and the capability to differentiate informative predictors from non-informative variables is investigated. The selection of variables may improve robustness of the predictor, improve the prediction accuracy, reduce computational time, and may serve as a exploratory data analysis tool. In addition the partial dependency plot obtained from the random forest model is examined as a means to find underlying relations in a non-linear simulation study.

Abstract [sv]

Random Forest (RF) är en populär prediktormodell som visat goda resultat vid en stor uppsättning applikationsstudier. Modellen ger hög prediktionsprecision, har förmåga att modellera komplex högdimensionell data och modellen har vidare visat goda resultat vid interkorrelerade prediktorvariabler. Detta projekt undersöker ett mått, variabel importance measure (VIM) erhållna från RF modellen, för att beräkna graden av association mellan prediktorvariabler och målvariabeln. Projektet undersöker känsligheten hos VIM vid kvalitativt prediktorbrus och undersöker VIMs förmåga att differentiera prediktiva variabler från variabler som endast, med aveende på målvariableln, beskriver brus. Att differentiera prediktiva variabler vid övervakad inlärning kan användas till att öka robustheten hos klassificerare, öka prediktionsprecisionen, reducera data dimensionalitet och VIM kan användas som ett verktyg för att utforska relationer mellan prediktorvariabler och målvariablel.

Place, publisher, year, edition, pages
Keyword [en]
machine learning, ml, variable importance, vim, random forests, rf, feature selection, variable selection, exploratory data analysis, eda
National Category
Computer Science
URN: urn:nbn:se:kth:diva-185496OAI: diva2:921542
Educational program
Master of Science in Engineering -Engineering Physics
Available from: 2016-05-11 Created: 2016-04-20 Last updated: 2016-05-11Bibliographically approved

Open Access in DiVA

fulltext(1534 kB)82 downloads
File information
File name FULLTEXT01.pdfFile size 1534 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 82 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 105 hits
ReferencesLink to record
Permanent link

Direct link