Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Machine learning algorithms in a distributed context
Linköping University, Department of Computer and Information Science.
Linköping University, Department of Computer and Information Science.
2018 (English)Independent thesis Basic level (degree of Bachelor), 10,5 credits / 16 HE creditsStudent thesisAlternative title
Maskininlärningalgoritmer i en distribuerad kontext (Swedish)
Abstract [en]

Interest in distributed approaches to machine learning has increased significantly in recent years due to continuously increasing data sizes for training machine learning models. In this thesis we describe three popular machine learning algorithms: decision trees, Naive Bayes and support vector machines (SVM) and present existing ways of distributing them. We also perform experiments with decision trees distributed with bagging, boosting and hard data partitioning and evaluate them in terms of performance measures such as accuracy, F1 score and execution time.

Our experiments show that the execution time of bagging and boosting increase linearly with the number of workers, and that boosting performs significantly better than bagging and hard data partitioning in terms of F1 score. The hard data partitioning algorithm works well for large datasets where the execution time decrease as the number of workers increase without any significant loss in accuracy or F1 score, while the algorithm performs poorly on small data with an increase in execution time and loss in accuracy and F1 score when the number of workers increase.

Place, publisher, year, edition, pages
2018. , p. 35
Keywords [en]
Machine learning, ensemble algorithms, hard data partitioning, decision trees
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:liu:diva-148920ISRN: LIU-IDA/LITH-EX-G--18/060--SEOAI: oai:DiVA.org:liu-148920DiVA, id: diva2:1222641
Subject / course
Information Technology
Supervisors
Examiners
Available from: 2018-06-29 Created: 2018-06-21 Last updated: 2018-06-29Bibliographically approved

Open Access in DiVA

fulltext(665 kB)43 downloads
File information
File name FULLTEXT01.pdfFile size 665 kBChecksum SHA-512
0e70cf626e19be8e01e0a0c651171a23807a43d5b8cfe206d2a543d317b203f89dc966fad28317ff3aaa7db32b58db43a107422e1dc6369b326d3424c6bf3c15
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Johansson, SamuelWojtulewicz, Karol
By organisation
Department of Computer and Information Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 43 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 84 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf