Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
3D Position Estimation using Deep Learning
KTH, School of Electrical Engineering and Computer Science (EECS).
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The estimation of the 3D position of an object is one of the most important topics in the computer vision field. Where the final aim is to create automated solutions that can localize and detect objects from images, new high-performing models and algorithms are needed. Due to lack of relevant information in the single 2D images, approximating the 3D position can be considered a complex problem. This thesis describes a method based on two deep learning models: the image net and the temporal net that can tackle this task. The former is a deep convolutional neural network with the intention to extract meaningful features from the images, while the latter exploits the temporal information to reach a more robust prediction. This solution reaches a better Mean Absolute Error compared to already existing computer vision methods on different conditions and configurations. A new data-driven pipeline has been created to deal with 2D videos and extract the 3D information of an object. The same architecture can be generalized to different domains and applications.

Abstract [sv]

Uppskattning av 3D-positionen för ett objekt är ett viktigt område inom datorseende. Då det slutliga målet är att skapa automatiserade lösningar som kan lokalisera och upptäcka objekt i bilder, behövs nya, högpresterande modeller och algoritmer. Bristen på relevant information i de enskilda 2D-bilderna gör att approximering av 3D-positionen blir ett komplext problem. Denna uppsats beskriver en metod baserad på två djupinlärningsmodeller: image net och temporal net. Den förra är ett djupt nätverk som kan extrahera meningsfulla egenskaper från bilderna, medan den senare utnyttjar den tidsmässiga informationen för att kunna göra mer robusta förutsägelser. Denna lösning erhåller ett lägre genomsnittligt absolut fel jämfört med existerande metoder, under olika villkor och konfigurationer. En ny datadriven arkitektur har skapats för att hantera 2D-videoklipp och extrahera 3D-informationen för ett objekt. Samma arkitektur kan generaliseras till olika domäner och applikationer.

Place, publisher, year, edition, pages
2018. , p. 61
Series
TRITA-EECS-EX ; 2018:669
Keywords [en]
Convolutional Neural Network, Deep Learning, Computer Vision, 3D Object Localization.
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-254876OAI: oai:DiVA.org:kth-254876DiVA, id: diva2:1335815
Subject / course
Computer Science
Educational program
Degree of Master
Examiners
Available from: 2019-07-08 Created: 2019-07-08 Last updated: 2019-07-08Bibliographically approved

Open Access in DiVA

fulltext(13271 kB)9 downloads
File information
File name FULLTEXT01.pdfFile size 13271 kBChecksum SHA-512
a676d75eb74dd083cd123a2bcd692d6a4d84678773132f8e761beb8a45bb86efa13c8a977f293e56b956c8bdd29df0e05ebe3a7993b4ef06d15bc2c8035a6f68
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 9 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 18 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf