Incorporating Scene Depth in Discriminative Correlation Filters for Visual Tracking
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Visual tracking is a computer vision problem where the task is to follow a targetthrough a video sequence. Tracking has many important real-world applications in several fields such as autonomous vehicles and robot-vision. Since visual tracking does not assume any prior knowledge about the target, it faces different challenges such occlusion, appearance change, background clutter and scale change. In this thesis we try to improve the capabilities of tracking frameworks using discriminative correlation filters by incorporating scene depth information. We utilize scene depth information on three main levels. First, we use raw depth information to segment the target from its surroundings enabling occlusion detection and scale estimation. Second, we investigate different visual features calculated from depth data to decide which features are good at encoding geometric information available solely in depth data. Third, we investigate handling missing data in the depth maps using a modified version of the normalized convolution framework. Finally, we introduce a novel approach for parameter search using genetic algorithms to find the best hyperparameters for our tracking framework. Experiments show that depth data can be used to estimate scale changes and handle occlusions. In addition, visual features calculated from depth are more representative if they were combined with color features. It is also shown that utilizing normalized convolution improves the overall performance in some cases. Lastly, the usage of genetic algorithms for hyperparameter search leads to accuracy gains as well as some insights on the performance of different components within the framework.
Place, publisher, year, edition, pages
2018. , p. 132
Keywords [en]
Tracking, Visual, Deep, Learning, Machine, Learning, CNN, Convolutional, Neural, Network, Unsupervised, Learning, Clustering, Genetic Algorithms, Features, Visual featues, Channel, Coding, RGBD, Scene, Depth, Map, Kinect, Discriminative, Correlation, Filters, SRDCF, DCF, Spatial, Spatially, Regularized, Hyperparameter, Search, Occlusion, Detection, Handling, Kalman, Filters, Normalized, Convolution, Bayesian, Gaussian, Mixture, Scale, Estimation, Conjugate, Gradient, Linkoping, Sweden
Keywords [sv]
Visuell, Följning, Särdrag, Djupa, Faltningsnätverk, Maskininlärning, Djup, Inlärning, Genetiska, Algoritmer, Klustring, Djup, RGBD, Linköping, Sverige
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:liu:diva-153110ISRN: LiTH-ISY-EX–18/5178–SEOAI: oai:DiVA.org:liu-153110DiVA, id: diva2:1266346
External cooperation
SICK IVP
Subject / course
Computer Vision Laboratory
Presentation
2018-11-14, Systemet, Linköping, 15:00 (English)
Supervisors
Examiners
2019-08-272018-11-272025-02-07Bibliographically approved