Change search
ReferencesLink to record
Permanent link

Direct link
Active learning via Transduction in Regression Forests
Blekinge Institute of Technology, Faculty of Computing, Department of Creative Technologies.
Blekinge Institute of Technology, Faculty of Computing, Department of Creative Technologies.
2015 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Context. The amount of training data required to build accurate modelsis a common problem in machine learning. Active learning is a techniquethat tries to reduce the amount of required training data by making activechoices of which training data holds the greatest value.Objectives. This thesis aims to design, implement and evaluate the Ran-dom Forests algorithm combined with active learning that is suitable forpredictive tasks with real-value data outcomes where the amount of train-ing data is small. machine learning algorithms traditionally requires largeamounts of training data to create a general model, and training data is inmany cases sparse and expensive or difficult to create.Methods.The research methods used for this thesis is implementation andscientific experiment. An approach to active learning was implementedbased on previous work for classification type problems. The approachuses the Mahalanobis distance to perform active learning via transduction.Evaluation was done using several data sets were the decrease in predictionerror was measured over several iterations. The results of the evaluationwas then analyzed using nonparametric statistical testing.Results. The statistical analysis of the evaluation results failed to detect adifference between our approach and a non active learning approach, eventhough the proposed algorithm showed irregular performance. The evalu-ation of our tree-based traversal method, and the evaluation of the Maha-lanobis distance for transduction both showed that these methods performedbetter than Euclidean distance and complete graph traversal.Conclusions. We conclude that the proposed solution did not decreasethe amount of required training data on a significant level. However, theapproach has potential and future work could lead to a working active learn-ing solution. Further work is needed on key areas of the implementation,such as the choice of instances for active learning through transduction un-certainty as well as choice of method for going from transduction model toinduction model.

Place, publisher, year, edition, pages
2015. , 36 p.
Keyword [en]
Active learning, Regression, Random Forests, Semi-supervised learning, Transduction
National Category
Other Engineering and Technologies not elsewhere specified
URN: urn:nbn:se:bth-10935OAI: diva2:867838
Subject / course
DV2524 Degree Project in Computer Science for Engineers
Educational program
PAACI Master of Science in Game and Software Engineering
Available from: 2015-11-13 Created: 2015-11-06 Last updated: 2016-02-22Bibliographically approved

Open Access in DiVA

fulltext(559 kB)82 downloads
File information
File name FULLTEXT02.pdfFile size 559 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Hansson, KimHörlin, Erik
By organisation
Department of Creative Technologies
Other Engineering and Technologies not elsewhere specified

Search outside of DiVA

GoogleGoogle Scholar
Total: 82 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 159 hits
ReferencesLink to record
Permanent link

Direct link