Digitala Vetenskapliga Arkivet

Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Semi-automatic Annotation of Objects in Visual-Thermal Video
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Termisk Systemteknik AB, Linköping, Sweden.ORCID iD: 0000-0002-6591-9400
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Zenuity AB, Göteborg, Sweden.ORCID iD: 0000-0003-2553-3367
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Grenoble INP, France.
Linköping University, Department of Electrical Engineering, Computer Vision. Linköping University, Faculty of Science & Engineering. Termisk Systemteknik AB, Linköping, Sweden.ORCID iD: 0000-0002-6763-5487
Show others and affiliations
2019 (English)In: 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Institute of Electrical and Electronics Engineers (IEEE), 2019Conference paper, Published paper (Refereed)
Abstract [en]

Deep learning requires large amounts of annotated data. Manual annotation of objects in video is, regardless of annotation type, a tedious and time-consuming process. In particular, for scarcely used image modalities human annotationis hard to justify. In such cases, semi-automatic annotation provides an acceptable option.

In this work, a recursive, semi-automatic annotation method for video is presented. The proposed method utilizesa state-of-the-art video object segmentation method to propose initial annotations for all frames in a video based on only a few manual object segmentations. In the case of a multi-modal dataset, the multi-modality is exploited to refine the proposed annotations even further. The final tentative annotations are presented to the user for manual correction.

The method is evaluated on a subset of the RGBT-234 visual-thermal dataset reducing the workload for a human annotator with approximately 78% compared to full manual annotation. Utilizing the proposed pipeline, sequences are annotated for the VOT-RGBT 2019 challenge.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019.
Series
IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), ISSN 2473-9936, E-ISSN 2473-9944
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:liu:diva-161076DOI: 10.1109/ICCVW.2019.00277ISI: 000554591602039ISBN: 978-1-7281-5023-9 (electronic)ISBN: 978-1-7281-5024-6 (print)OAI: oai:DiVA.org:liu-161076DiVA, id: diva2:1362582
Conference
IEEE International Conference on Computer Vision Workshop (ICCVW)
Funder
Swedish Research Council, 2013-5703Swedish Foundation for Strategic Research Wallenberg AI, Autonomous Systems and Software Program (WASP)Vinnova, VS1810-Q
Note

Funding agencies: Swedish Research CouncilSwedish Research Council [2013-5703]; project ELLIIT (the Strategic Area for ICT research - Swedish Government); Wallenberg AI, Autonomous Systems and Software Program (WASP); Visual Sweden project ndimensional Modelling [VS1810-Q]

Available from: 2019-10-21 Created: 2019-10-21 Last updated: 2025-02-07
In thesis
1. Learning to Analyze what is Beyond the Visible Spectrum
Open this publication in new window or tab >>Learning to Analyze what is Beyond the Visible Spectrum
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Thermal cameras have historically been of interest mainly for military applications. Increasing image quality and resolution combined with decreasing camera price and size during recent years have, however, opened up new application areas. They are now widely used for civilian applications, e.g., within industry, to search for missing persons, in automotive safety, as well as for medical applications. Thermal cameras are useful as soon as there exists a measurable temperature difference. Compared to cameras operating in the visual spectrum, they are advantageous due to their ability to see in total darkness, robustness to illumination variations, and less intrusion on privacy.

This thesis addresses the problem of automatic image analysis in thermal infrared images with a focus on machine learning methods. The main purpose of this thesis is to study the variations of processing required due to the thermal infrared data modality. In particular, three different problems are addressed: visual object tracking, anomaly detection, and modality transfer. All these are research areas that have been and currently are subject to extensive research. Furthermore, they are all highly relevant for a number of different real-world applications.

The first addressed problem is visual object tracking, a problem for which no prior information other than the initial location of the object is given. The main contribution concerns benchmarking of short-term single-object (STSO) visual object tracking methods in thermal infrared images. The proposed dataset, LTIR (Linköping Thermal Infrared), was integrated in the VOT-TIR2015 challenge, introducing the first ever organized challenge on STSO tracking in thermal infrared video. Another contribution also related to benchmarking is a novel, recursive, method for semi-automatic annotation of multi-modal video sequences. Based on only a few initial annotations, a video object segmentation (VOS) method proposes segmentations for all remaining frames and difficult parts in need for additional manual annotation are automatically detected. The third contribution to the problem of visual object tracking is a template tracking method based on a non-parametric probability density model of the object's thermal radiation using channel representations.

The second addressed problem is anomaly detection, i.e., detection of rare objects or events. The main contribution is a method for truly unsupervised anomaly detection based on Generative Adversarial Networks (GANs). The method employs joint training of the generator and an observation to latent space encoder, enabling stratification of the latent space and, thus, also separation of normal and anomalous samples. The second contribution is the previously unaddressed problem of obstacle detection in front of moving trains using a train-mounted thermal camera. Adaptive correlation filters are updated continuously and missed detections of background are treated as detections of anomalies, or obstacles. The third contribution to the problem of anomaly detection is a method for characterization and classification of automatically detected district heat leakages for the purpose of false alarm reduction.

Finally, the thesis addresses the problem of modality transfer between thermal infrared and visual spectrum images, a previously unaddressed problem. The contribution is a method based on Convolutional Neural Networks (CNNs), enabling perceptually realistic transformations of thermal infrared to visual images. By careful design of the loss function the method becomes robust to image pair misalignments. The method exploits the lower acuity for color differences than for luminance possessed by the human visual system, separating the loss into a luminance and a chrominance part.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2019. p. 94
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2024
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-161077 (URN)10.3384/diss.diva-161077 (DOI)9789179299811 (ISBN)
Public defence
2019-12-18, Ada Lovelace, B-huset, Campus Valla, Linköping, 13:15 (English)
Opponent
Supervisors
Funder
Swedish Research Council, D0570301
Available from: 2019-11-13 Created: 2019-10-23 Last updated: 2025-02-07Bibliographically approved
2. Dynamic Visual Learning
Open this publication in new window or tab >>Dynamic Visual Learning
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Autonomous robots act in a \emph{dynamic} world where both the robots and other objects may move. The surround sensing systems of said robots therefore work with dynamic input data and need to estimate both the current state of the environment as well as its dynamics. One of the key elements to obtain a high-level understanding of the environment is to track dynamic objects. This enables the system to understand what the objects are doing; predict where they will be in the future; and in the future better estimate where they are. In this thesis, I focus on input from visual cameras, images. Images have, with the advent of neural networks, become a cornerstone in sensing systems. Image-processing neural networks are optimized to perform a specific computer vision task -- such as recognizing cats and dogs -- on vast datasets of annotated examples. This is usually referred to as \emph{offline training} and given a well-designed neural network, enough high-quality data, and a suitable offline training formulation, the neural network is expected to become adept at the specific task.

This thesis starts with a study of object tracking. The tracking is based on the visual appearance of the object, achieved via discriminative convolution filters (DCFs). The first contribution of this thesis is to decompose the filter into multiple subfilters. This serves to increase the robustness during object deformations or rotations. Moreover, it provides a more fine-grained representation of the object state as the subfilters are expected to roughly track object parts. In the second contribution, a neural network is trained directly for object tracking. In order to obtain a fine-grained representation of the object state, it is represented as a segmentation. The main challenge lies in the design of a neural network able to tackle this task. While the common neural networks excel at recognizing patterns seen during offline training, they struggle to store novel patterns in order to later recognize them. To overcome this limitation, a novel appearance learning mechanism is proposed. The mechanism extends the state-of-the-art and is shown to generalize remarkably well to novel data. In the third contribution, the method is used together with a novel fusion strategy and failure detection criterion to semi-automatically annotate visual and thermal videos.

Sensing systems need not only track objects, but also detect them. The fourth contribution of this thesis strives to tackle joint detection, tracking, and segmentation of all objects from a predefined set of object classes. The challenge here lies not only in the neural network design, but also in the design of the offline training formulation. The final approach, a recurrent graph neural network, outperforms prior works that have a runtime of the same order of magnitude.

Last, this thesis studies \emph{dynamic} learning of novel visual concepts. It is observed that the learning mechanisms used for object tracking essentially learns the appearance of the tracked object. It is natural to ask whether this appearance learning could be extended beyond individual objects to entire semantic classes, enabling the system to learn new concepts based on just a few training examples. Such an ability is desirable in autonomous systems as it removes the need of manually annotating thousands of examples of each class that needs recognition. Instead, the system is trained to efficiently learn to recognize new classes. In the fifth contribution, we propose a novel learning mechanism based on Gaussian process regression. With this mechanism, our neural network outperforms the state-of-the-art and the performance gap is especially large when multiple training examples are given.

To summarize, this thesis studies and makes several contributions to learning systems that parse dynamic visuals and that dynamically learn visual appearances or concepts.

Place, publisher, year, edition, pages
Linköping: Linköping University Electronic Press, 2022. p. 59
Series
Linköping Studies in Science and Technology. Dissertations, ISSN 0345-7524 ; 2196
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:liu:diva-181604 (URN)10.3384/9789179291488 (DOI)9789179291471 (ISBN)9789179291488 (ISBN)
Public defence
2022-01-19, Ada Lovelace, B Building, Campus Valla, Linköping, 09:00 (English)
Opponent
Supervisors
Projects
WASP Industrial PhD student
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2021-12-08 Created: 2021-12-03 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

fulltext(1157 kB)883 downloads
File information
File name FULLTEXT02.pdfFile size 1157 kBChecksum SHA-512
17ebed7cf4cdccc8dff31fd0e345d0b162c8cec3a06ec535cf3beff703722d11e8b8420d08e29839578f594f99461b07f68bb7c1cb1332ddc62c32e59b4c6279
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Berg, AmandaJohnander, JoakimDurand de Gevigney, FlavieAhlberg, JörgenFelsberg, Michael
By organisation
Computer VisionFaculty of Science & Engineering
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar
Total: 883 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 681 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf