Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Real-time hand pose estimation on a smart-phone using Deep Learning
KTH, School of Electrical Engineering and Computer Science (EECS).
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Hand pose estimation is a computer vision challenge that consists of detecting the coordinates of a hand’s key points in an image. This research investigates several deep learning-based solutions to determine whether or not it is possible to improve current state-of-the-art detectors for smartphone applications. Several models are tested and compared based on accuracy, processing speed and memory size. A final network is selected and detailed to compare it to the state-of-the-art. The proposed solution is obtained by combining the Differentiable Spatial to Numerical Transform layer to predict numerical coordinates together with the Fire module presented in the SqueezeNet architecture. This deep neural network contains around 1 million parameters and is able to outperform the current best documented model in all the metrics described above. A qualitative analysis is also performed to examine the predictions of the final solution on test images.

Abstract [sv]

Att bestämma en hands orientering är en utmaning inom bildanalys som består i att detektera koordinaterna för olika nyckelpunkter för handen i en bild. I denna studie undersöks ett antal metoder baserade på djupinlärning för att avgöra huruvida det är möjligt att förbättra existerande detektorer för tillämpningar på smartphones. Flera olika modeller testas och jämförs baserat på noggrannhet, beräkningshastighet och minneskrav. Ett slutligt nätverk väljs, analyseras och jämföras med nuvarande state-of-the-art teknik. Den lösning som föreslås erhålls genom att kombinera ett så kallat Differentiable Spatial to Numerical Transform-lager, för att förutsäga numeriska koordinater, tillsammans med en så kallad Fire-modul som tidigare presenteras som en del av arkitekturen SqueezeNet. Detta djupa neurala nätverk innehåller cirka en miljon parametrar och kan överträffa den nuvarande mest dokumenterade modellen i alla de avseenden som beskrivits ovan. En kvalitativ analys utförs också för att undersöka den slutliga lösningens uppskattningar på testbilder.

Place, publisher, year, edition, pages
2019. , p. 57
Series
TRITA-EECS-EX ; 2019:518
Keywords [en]
Hand joints, Deep Learning, Convolutional neural networks, Artificial intelligence, Embedded devices.
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-262686OAI: oai:DiVA.org:kth-262686DiVA, id: diva2:1361974
External cooperation
Manomotion AB
Supervisors
Examiners
Available from: 2019-11-11 Created: 2019-10-17 Last updated: 2019-11-11Bibliographically approved

Open Access in DiVA

fulltext(1217 kB)5 downloads
File information
File name FULLTEXT01.pdfFile size 1217 kBChecksum SHA-512
8276d1e3359983ebcfe09b200c89e2c8794b6a06332c411675042e0b7a57f0a2b1fdd97065ef7c6bc98f2fd3bfe61979723359fb8391b02528764ad4ba808dc1
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 5 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 20 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf