Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Accelerated Deep Learning using Intel Xeon Phi
Linnaeus University, Faculty of Technology, Department of Computer Science.
2015 (English)Independent thesis Advanced level (degree of Master (One Year)), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Deep learning, a sub-topic of machine learning inspired by biology, have achieved wide attention in the industry and research community recently. State-of-the-art applications in the area of computer vision and speech recognition (among others) are built using deep learning algorithms. In contrast to traditional algorithms, where the developer fully instructs the application what to do, deep learning algorithms instead learn from experience when performing a task. However, for the algorithm to learn require training, which is a high computational challenge. High Performance Computing can help ease the burden through parallelization, thereby reducing the training time; this is essential to fully utilize the algorithms in practice. Numerous work targeting GPUs have investigated ways to speed up the training, less attention have been paid to the Intel Xeon Phi coprocessor. In this thesis we present a parallelized implementation of a Convolutional Neural Network (CNN), a deep learning architecture, and our proposed parallelization scheme, CHAOS. Additionally a theoretical analysis and a performance model discuss the algorithm in detail and allow for predictions if even more threads are available in the future. The algorithm is evaluated on an Intel Xeon Phi 7120p, Xeon E5-2695v2 2.4 GHz and Core i5 661 3.33 GHz using various architectures and thread counts on the MNIST dataset. Findings show a 103.5x, 99.9x, 100.4x speed up for the large, medium, and small architecture respectively for 244 threads compared to 1 thread on the coprocessor. Moreover, a 10.9x - 14.1x (large to small) speed up compared to the sequential version running on Xeon E5. We managed to decrease training time from 7 days on the Core i5 and 31 hours on the Xeon E5, to 3 hours on the Intel Xeon Phi when training our large network for 15 epochs

Place, publisher, year, edition, pages
2015. , 84 p.
Keyword [en]
Machine Learning, Deep Learning, Supervised Deep Learning, Intel Xeon Phi, Convolutional Neural Network, CNN, High Performance Computing, CHAOS, parallel computing, coprocessor, MIC, speed up, performance model, evaluation
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:lnu:diva-45491OAI: oai:DiVA.org:lnu-45491DiVA: diva2:841963
Subject / course
Computer Science
Educational program
Software Technology Programme, Master Programme, 60 credits
Supervisors
Examiners
Available from: 2015-08-10 Created: 2015-07-15 Last updated: 2018-01-11Bibliographically approved

Open Access in DiVA

fulltext(4366 kB)1131 downloads
File information
File name FULLTEXT01.pdfFile size 4366 kBChecksum SHA-512
077e35d1e9c6eaa4c159a8b2d0b536c91e68f01aab68738207952ed2c90cec2acbd229cacba2352ec6eef873b4dff5d7f5d4ae5d48ace3be9dbcc2988f826fcb
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1131 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 1646 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf