Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning visual perception for autonomous systems
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering.ORCID iD: 0000-0001-6199-9362
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In the last decade, developments in hardware, sensors and software have made it possible to create increasingly autonomous systems. These systems can be as simple as limited driver assistance software lane-following in cars, or limited collision warning systems for otherwise manually piloted drones. On the other end of the spectrum exist fully autonomous cars, boats or helicopters. With increasing abilities to function autonomously, the demands to operate with minimal human supervision in unstructured environments increase accordingly.

Common to most, if not all, autonomous systems is that they require an accurate model of the surrounding world. While there is currently a large number of possible sensors useful to create such models available, cameras are one of the most versatile. From a sensing perspective cameras have several advantages over other sensors in that they require no external infrastructure, are relatively cheap and can be used to extract such information as the relative positions of other objects, their movements over time, create accurate maps and locate the autonomous system within these maps.

Using cameras to produce a model of the surroundings require solving a number of technical problems. Often these problems have a basis in recognizing that an object or region of interest is the same over time or in novel viewpoints. In visual tracking this type of recognition is required to follow an object of interest through a sequence of images. In geometric problems it is often a requirement to recognize corresponding image regions in order to perform 3D reconstruction or localization. 

The first set of contributions in this thesis is related to the improvement of a class of on-line learned visual object trackers based on discriminative correlation filters. In visual tracking estimation of the objects size is important for reliable tracking, the first contribution in this part of the thesis investigates this problem. The performance of discriminative correlation filters is highly dependent on what feature representation is used by the filter. The second tracking contribution investigates the performance impact of different features derived from a deep neural network.

A second set of contributions relate to the evaluation of visual object trackers. The first of these are the visual object tracking challenge. This challenge is a yearly comparison of state-of-the art visual tracking algorithms. A second contribution is an investigation into the possible issues when using bounding-box representations for ground-truth data.

In real world settings tracking typically occur over longer time sequences than is common in benchmarking datasets. In such settings it is common that the model updates of many tracking algorithms cause the tracker to fail silently. For this reason it is important to have an estimate of the trackers performance even in cases when no ground-truth annotations exist. The first of the final three contributions investigates this problem in a robotics setting, by fusing information from a pre-trained object detector in a state-estimation framework. An additional contribution describes how to dynamically re-weight the data used for the appearance model of a tracker. A final contribution investigates how to obtain an estimate of how certain detections are in a setting where geometrical limitations can be imposed on the search region. The proposed solution learns to accurately predict stereo disparities along with accurate assessments of each predictions certainty.

Abstract [sv]

De senaste årens allt snabbare utveckling av beräkningshårdvara, sensorer och mjukvarutekniker har gjort det möjligt att skapa allt mer autonoma system. Sådana kan variera i autonomigrad från ett antisladdsystem för en i övrigt manuellt kontrollerad bil, till system för kollisionsundvikning i en manuellt kontrollerad drönare, till en helt autonom bil eller annan farkost. Med en ökande förmåga att arbeta självständigt utan mänsklig övervakning ökar också bredden på möjliga situationer som systemen förväntas hantera. 

Gemensamt för många, om inte alla, autonoma system är att de behöver en korrekt och updaterad bild av sin omgivning för att kunna agera på ett intelligent sätt. En lång rad av sensorer som gör detta möjligt finns tillgängliga, där kameror är en av de mest mångsidiga. Jämfört med andra typer av sensorer har kameror en rad fördelar, som att de är relativt billiga, passiva, och kan användas utan krav på extern infrastruktur. Det visuella data som kameror genererar kan användas för att följa externa objekt, bestämma positionen för kameran själv, eller beräkna avstånd. 

Att framgångsrikt utnyttja möjligheterna i denna information kräver dock att en lång rad tekniska problem hanteras. Många av dessa problem är grundar sig i att kunna känna igen att två bildregioner från olika tidpunkter eller betraktningsvinklar avbildar samma sak. 

Ett typexempel på ett sådant problem är det visuella följningsproblemet. I det visuella följningsproblemet är målet att bestämma ett objekts position och storlek för alla bilder i en sekvens av bilder. I allmänhet är objektets utseende inte känt av algoritmen, utan en utseendemodell måste skapas succesivt med hjälp av maskininlärning. 

Problem som liknar detta förekommer inom många andra områden av datorseende, speciellt inom geometri. Inom många geometriska problem krävs det till exempel att man finner korresponderande punkter i ett flertal bilder. 

Den första samlingen av bidrag i denna avhandling behandlar det visuella följningsproblemet. De föreslagna metoderna är baserade på en adaptiv utseendemodell kallad diskriminativa korrelationsfilter. I det första bidraget till sådana metoder utökas ramverket till att skatta ett objekts storlek såväl som position. Ett andra bidrag undersöker hur korrelationsfilterbaserade metoder kan utökas till att även utnyttja visuella särdrag som har framställt med hjälp av maskininlärning. 

En andra samling med bidrag behandlar utvärdering av metoder för visuell följning. Dels inom den årligt förekommande tävlingen visual object tracking challenge. Ett andra bidrag till utvärderingsmetodig inom visuell följning syftar till att unvdika fallgropar som lätt uppkommer då metoder anpassas allt för väl för måtten som används för att utvärdera dem. 

En tredje samling med bidrag relaterar till olika sätt att hantera situationer då inlärningsprocessen i de tidigare beskrivna följningsmetoderna introducerar felaktiga data i modellen. Detta görs i ett första bidrag i ett robotiksystem för följning av människor i en ostrukturerad miljö. Ett andra bidrag är baserat på dynamisk omviktning av tidigare samlad data för att dynamiskt vikta ned datapunkter som inte representerar det följda objektet väl. I ett sista bidrag undersöks hur en prediktions osäkerhet kan skattas samtidigt som prediktionen själv.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2021. , p. 49
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2138
Keywords [en]
computer vision, visual object tracking, tracking, machine learning, deep learning
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:liu:diva-175177DOI: 10.3384/diss.diva-175177ISBN: 9789179296711 (print)OAI: oai:DiVA.org:liu-175147DiVA, id: diva2:1545918
Public defence
2021-06-04, Ada Lovelace, B-Building, Campus Valla, Linköping, 09:15 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Available from: 2021-05-04 Created: 2021-04-20 Last updated: 2021-05-26Bibliographically approved
List of papers
1. Discriminative Scale Space Tracking
Open this publication in new window or tab >>Discriminative Scale Space Tracking
2017 (English)In: IEEE Transactions on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 39, no 8, p. 1561-1575Article in journal (Refereed) Published
Abstract [en]

Accurate scale estimation of a target is a challenging research problem in visual object tracking. Most state-of-the-art methods employ an exhaustive scale search to estimate the target size. The exhaustive search strategy is computationally expensive and struggles when encountered with large scale variations. This paper investigates the problem of accurate and robust scale estimation in a tracking-by-detection framework. We propose a novel scale adaptive tracking approach by learning separate discriminative correlation filters for translation and scale estimation. The explicit scale filter is learned online using the target appearance sampled at a set of different scales. Contrary to standard approaches, our method directly learns the appearance change induced by variations in the target scale. Additionally, we investigate strategies to reduce the computational cost of our approach. Extensive experiments are performed on the OTB and the VOT2014 datasets. Compared to the standard exhaustive scale search, our approach achieves a gain of 2.5 percent in average overlap precision on the OTB dataset. Additionally, our method is computationally efficient, operating at a 50 percent higher frame rate compared to the exhaustive scale search. Our method obtains the top rank in performance by outperforming 19 state-of-the-art trackers on OTB and 37 state-of-the-art trackers on VOT2014.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2017
Keywords
Visual tracking; scale estimation; correlation filters
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-139382 (URN)10.1109/TPAMI.2016.2609928 (DOI)000404606300006 ()27654137 (PubMedID)
Note

Funding Agencies|Swedish Foundation for Strategic Research; Swedish Research Council; Strategic Vehicle Research and Innovation (FFI); Wallenberg Autonomous Systems Program; National Supercomputer Centre; Nvidia

Available from: 2017-08-07 Created: 2017-08-07 Last updated: 2023-04-03Bibliographically approved
2. Convolutional Features for Correlation Filter Based Visual Tracking
Open this publication in new window or tab >>Convolutional Features for Correlation Filter Based Visual Tracking
2015 (English)In: 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), IEEE conference proceedings, 2015, p. 621-629Conference paper, Published paper (Refereed)
Abstract [en]

Visual object tracking is a challenging computer vision problem with numerous real-world applications. This paper investigates the impact of convolutional features for the visual tracking problem. We propose to use activations from the convolutional layer of a CNN in discriminative correlation filter based tracking frameworks. These activations have several advantages compared to the standard deep features (fully connected layers). Firstly, they mitigate the need of task specific fine-tuning. Secondly, they contain structural information crucial for the tracking problem. Lastly, these activations have low dimensionality. We perform comprehensive experiments on three benchmark datasets: OTB, ALOV300++ and the recently introduced VOT2015. Surprisingly, different to image classification, our results suggest that activations from the first layer provide superior tracking performance compared to the deeper layers. Our results further show that the convolutional features provide improved results compared to standard handcrafted features. Finally, results comparable to state-of-theart trackers are obtained on all three benchmark datasets.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2015
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-128869 (URN)10.1109/ICCVW.2015.84 (DOI)000380434700075 ()9781467397117 (ISBN)9781467397100 (ISBN)
Conference
15th IEEE International Conference on Computer Vision Workshops, ICCVW 2015, 7-13 December 2015, Santiago, Chile
Available from: 2016-06-02 Created: 2016-06-02 Last updated: 2023-04-03Bibliographically approved
3. The Visual Object Tracking VOT2017 challenge results
Open this publication in new window or tab >>The Visual Object Tracking VOT2017 challenge results
Show others...
2017 (English)In: 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), IEEE , 2017, p. 1949-1972Conference paper, Published paper (Refereed)
Abstract [en]

The Visual Object Tracking challenge VOT2017 is the fifth annual tracker benchmarking activity organized by the VOT initiative. Results of 51 trackers are presented; many are state-of-the-art published at major computer vision conferences or journals in recent years. The evaluation included the standard VOT and other popular methodologies and a new "real-time" experiment simulating a situation where a tracker processes images as if provided by a continuously running sensor. Performance of the tested trackers typically by far exceeds standard baselines. The source code for most of the trackers is publicly available from the VOT page. The VOT2017 goes beyond its predecessors by (i) improving the VOT public dataset and introducing a separate VOT2017 sequestered dataset, (ii) introducing a realtime tracking experiment and (iii) releasing a redesigned toolkit that supports complex experiments. The dataset, the evaluation kit and the results are publicly available at the challenge website(1).

Place, publisher, year, edition, pages
IEEE, 2017
Series
IEEE International Conference on Computer Vision Workshops, ISSN 2473-9936
National Category
Computer Sciences
Identifiers
urn:nbn:se:liu:diva-145822 (URN)10.1109/ICCVW.2017.230 (DOI)000425239602001 ()978-1-5386-1034-3 (ISBN)
Conference
16th IEEE International Conference on Computer Vision (ICCV)
Note

Funding Agencies|Slovenian research agency research programs [P2-0214, P2-0094]; Slovenian research agency project [J2-8175]; Czech Science Foundation Project [GACR P103/12/G084]; WASP; VR (EMC2); SSF (SymbiCloud); SNIC; AIT Strategic Research Programme Visual Surveillance and Insight; Faculty of Computer Science, University of Ljubljana, Slovenia

Available from: 2018-03-21 Created: 2018-03-21 Last updated: 2023-04-03
4. Countering bias in tracking evaluations
Open this publication in new window or tab >>Countering bias in tracking evaluations
2018 (English)In: Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications / [ed] Francisco Imai, Alain Tremeau and Jose Braz, Science and Technology Publications, Lda , 2018, Vol. 5, p. 581-587Conference paper, Published paper (Refereed)
Abstract [en]

Recent years have witnessed a significant leap in visual object tracking performance mainly due to powerfulfeatures, sophisticated learning methods and the introduction of benchmark datasets. Despite this significantimprovement, the evaluation of state-of-the-art object trackers still relies on the classical intersection overunion (IoU) score. In this work, we argue that the object tracking evaluations based on classical IoU score aresub-optimal. As our first contribution, we theoretically prove that the IoU score is biased in the case of largetarget objects and favors over-estimated target prediction sizes. As our second contribution, we propose a newscore that is unbiased with respect to target prediction size. We systematically evaluate our proposed approachon benchmark tracking data with variations in relative target size. Our empirical results clearly suggest thatthe proposed score is unbiased in general.

Place, publisher, year, edition, pages
Science and Technology Publications, Lda, 2018
National Category
Signal Processing
Identifiers
urn:nbn:se:liu:diva-151306 (URN)10.5220/0006714805810587 (DOI)000576679800066 ()9789897582905 (ISBN)
Conference
13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, January 27-29, Funchal, Madeira
Available from: 2018-09-17 Created: 2018-09-17 Last updated: 2021-07-15Bibliographically approved
5. Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking
Open this publication in new window or tab >>Adaptive Decontamination of the Training Set: A Unified Formulation for Discriminative Visual Tracking
2016 (English)In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 1430-1438Conference paper, Published paper (Refereed)
Abstract [en]

Tracking-by-detection methods have demonstrated competitive performance in recent years. In these approaches, the tracking model heavily relies on the quality of the training set. Due to the limited amount of labeled training data, additional samples need to be extracted and labeled by the tracker itself. This often leads to the inclusion of corrupted training samples, due to occlusions, misalignments and other perturbations. Existing tracking-by-detection methods either ignore this problem, or employ a separate component for managing the training set. We propose a novel generic approach for alleviating the problem of corrupted training samples in tracking-by-detection frameworks. Our approach dynamically manages the training set by estimating the quality of the samples. Contrary to existing approaches, we propose a unified formulation by minimizing a single loss over both the target appearance model and the sample quality weights. The joint formulation enables corrupted samples to be down-weighted while increasing the impact of correct ones. Experiments are performed on three benchmarks: OTB-2015 with 100 videos, VOT-2015 with 60 videos, and Temple-Color with 128 videos. On the OTB-2015, our unified formulation significantly improves the baseline, with a gain of 3.8% in mean overlap precision. Finally, our method achieves state-of-the-art results on all three datasets.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2016
Series
IEEE Conference on Computer Vision and Pattern Recognition, E-ISSN 1063-6919 ; 2016
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-137882 (URN)10.1109/CVPR.2016.159 (DOI)000400012301051 ()9781467388511 (ISBN)9781467388528 (ISBN)
Conference
29th IEEE Conference on Computer Vision and Pattern Recognition, 27-30 June 2016, Las Vegas, NV, USA
Note

Funding Agencies|SSF (CUAS); VR (EMC2); VR (ELLIIT); Wallenberg Autonomous Systems Program; NSC; Nvidia

Available from: 2017-06-01 Created: 2017-06-01 Last updated: 2023-04-03Bibliographically approved
6. Combining Visual Tracking and Person Detection for Long Term Tracking on a UAV
Open this publication in new window or tab >>Combining Visual Tracking and Person Detection for Long Term Tracking on a UAV
Show others...
2016 (English)In: Proceedings of the 12th International Symposium on Advances in Visual Computing, Springer, 2016Conference paper, Published paper (Refereed)
Abstract [en]

Visual object tracking performance has improved significantly in recent years. Most trackers are based on either of two paradigms: online learning of an appearance model or the use of a pre-trained object detector. Methods based on online learning provide high accuracy, but are prone to model drift. The model drift occurs when the tracker fails to correctly estimate the tracked object’s position. Methods based on a detector on the other hand typically have good long-term robustness, but reduced accuracy compared to online methods.

Despite the complementarity of the aforementioned approaches, the problem of fusing them into a single framework is largely unexplored. In this paper, we propose a novel fusion between an online tracker and a pre-trained detector for tracking humans from a UAV. The system operates at real-time on a UAV platform. In addition we present a novel dataset for long-term tracking in a UAV setting, that includes scenarios that are typically not well represented in standard visual tracking datasets.

Place, publisher, year, edition, pages
Springer, 2016
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:liu:diva-137897 (URN)10.1007/978-3-319-50835-1_50 (DOI)2-s2.0-85007039301 (Scopus ID)978-3-319-50834-4 (ISBN)978-3-319-50835-1 (ISBN)
Conference
International Symposium on Advances in Visual Computing
Available from: 2017-05-31 Created: 2017-05-31 Last updated: 2023-04-03Bibliographically approved

Open Access in DiVA

fulltext(3806 kB)546 downloads
File information
File name FULLTEXT02.pdfFile size 3806 kBChecksum SHA-512
f09ce40b4a8e6d3d6246f9cce0626235b22c18ca926aab9433b3de8899bd2e70cffd7367abb5bd91652243cf90112905f1893e0aba0742491f5cae28277bb8ae
Type fulltextMimetype application/pdf
Order online >>

Other links

Publisher's full text

Search in DiVA

By author/editor
Häger, Gustav
By organisation
Computer VisionFaculty of Science & Engineering
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 547 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 2314 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf