Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Tracking of Humans in Video Stream Using LSTM Recurrent Neural Network
KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In this master thesis, the problem of tracking humans in video streams by using Deep Learning is examined. We use spatially supervised recurrent convolutional neural networks for visual human tracking. In this method, the recurrent convolutional network uses both the history of locations and the visual features from the deep neural networks. This method is used for tracking, based on the detection results. We concatenate the location of detected bounding boxes with high-level visual features produced by convolutional networks and then predict the tracking bounding box for next frames. Because a video contain continuous frames, we decide to have a method which uses the information from history of frames to have a robust tracking in different visually challenging cases such as occlusion, motion blur, fast movement, etc. Long Short-Term Memory (LSTM) is a kind of recurrent convolutional neural network and useful for our purpose. Instead of using binary classification which is commonly used in deep learning based tracking methods, we use a regression for direct prediction of the tracking locations. Our purpose is to test our method on real videos which is recorded by head-mounted camera. So our test videos are very challenging and contain different cases of fast movements, motion blur, occlusions, etc. Considering the limitation of the training data-set which is spatially imbalanced, we have a problem for tracking the humans who are in the corners of the image but in other challenging cases, the proposed tracking method worked well.

Abstract [sv]

I detta examensarbete undersöks problemet att spåra människor i videoströmmar genom att använda deep learning. Spårningen utförs genom att använda ett recurrent convolutional neural network. Input till nätverket består av visuella features extraherade med hjälp av ett convolutional neural network, samt av detektionsresultat från tidigare frames. Vi väljer att använda oss av historiska detektioner för att skapa en metod som är robust mot olika utmanande situationer, som t.ex. snabba rörelser, rörelseoskärpa och ocklusion. Long Short- Term Memory (LSTM) är ett recurrent convolutional neural network som är användbart för detta ändamål. Istället för att använda binära klassificering, vilket är vanligt i många deep learning-baserade tracking-metoder, så använder vi oss av regression för att direkt förutse positionen av de spårade subjekten. Vårt syfte är att testa vår metod på videor som spelats in med hjälp av en huvudmonterad kamera. På grund av begränsningar i våra träningsdataset som är spatiellt oblanserade har vi problem att spåra människor som befinner sig i utkanten av bildområdet, men i andra utmanande fall lyckades spårningen bra.

Place, publisher, year, edition, pages
2017.
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:kth:diva-217495OAI: oai:DiVA.org:kth-217495DiVA, id: diva2:1156631
External cooperation
Tobii
Supervisors
Examiners
Available from: 2017-11-23 Created: 2017-11-13 Last updated: 2017-11-23Bibliographically approved

Open Access in DiVA

fulltext(18162 kB)679 downloads
File information
File name FULLTEXT01.pdfFile size 18162 kBChecksum SHA-512
f1ac807825532720e4f06867df7da6ba3ee61fe97dacc283b4ff209634396b3f66f4cfa4e77c3941774e64c37a6dedd4a78ded7e77011f7b973eec561660f1f2
Type fulltextMimetype application/pdf

By organisation
Theoretical Computer Science, TCS
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 679 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 188 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
v. 2.34-SNAPSHOT
|