Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Mixture component identification and learning for visual recognition
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. (Computer Vision Group)
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. (Computer Vision Group)ORCID-id: 0000-0001-5211-6388
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. (Computer Vision Group)
KTH, Skolan för datavetenskap och kommunikation (CSC), Datorseende och robotik, CVAP. (Computer Vision Group)
2012 (Engelska)Ingår i: Computer Vision – ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI, Springer, 2012, s. 115-128Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The non-linear decision boundary between object and background classes - due to large intra-class variations - needs to be modelled by any classifier wishing to achieve good results. While a mixture of linear classifiers is capable of modelling this non-linearity, learning this mixture from weakly annotated data is non-trivial and is the paper's focus. Our approach is to identify the modes in the distribution of our positive examples by clustering, and to utilize this clustering in a latent SVM formulation to learn the mixture model. The clustering relies on a robust measure of visual similarity which suppresses uninformative clutter by using a novel representation based on the exemplar SVM. This subtle clustering of the data leads to learning better mixture models, as is demonstrated via extensive evaluations on Pascal VOC 2007. The final classifier, using a HOG representation of the global image patch, achieves performance comparable to the state-of-the-art while being more efficient at detection time.

Ort, förlag, år, upplaga, sidor
Springer, 2012. s. 115-128
Serie
Lecture Notes in Computer Science, ISSN 0302-9743 ; 7577
Nyckelord [en]
Decision boundary, Detection time, Image patches, Intra-class variation, Linear classifiers, Mixture components, Mixture model, Non-Linearity, Non-trivial, Positive examples, Visual recognition, Visual similarity, Weakly annotated data
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
URN: urn:nbn:se:kth:diva-106987DOI: 10.1007/978-3-642-33783-3_9ISI: 000342828800009Scopus ID: 2-s2.0-84867892975ISBN: 978-364233782-6 (tryckt)OAI: oai:DiVA.org:kth-106987DiVA, id: diva2:574329
Konferens
12th European Conference on Computer Vision, ECCV 2012;Florence;7 October 2012 through 13 October 2012
Forskningsfinansiär
ICT - The Next Generation
Anmärkning

QC 20121207

Tillgänglig från: 2012-12-05 Skapad: 2012-12-05 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
Ingår i avhandling
1. Data Driven Visual Recognition
Öppna denna publikation i ny flik eller fönster >>Data Driven Visual Recognition
2014 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

This thesis is mostly about supervised visual recognition problems. Based on a general definition of categories, the contents are divided into two parts: one which models categories and one which is not category based. We are interested in data driven solutions for both kinds of problems.

In the category-free part, we study novelty detection in temporal and spatial domains as a category-free recognition problem. Using data driven models, we demonstrate that based on a few reference exemplars, our methods are able to detect novelties in ego-motions of people, and changes in the static environments surrounding them.

In the category level part, we study object recognition. We consider both object category classification and localization, and propose scalable data driven approaches for both problems. A mixture of parametric classifiers, initialized with a sophisticated clustering of the training data, is demonstrated to adapt to the data better than various baselines such as the same model initialized with less subtly designed procedures. A nonparametric large margin classifier is introduced and demonstrated to have a multitude of advantages in comparison to its competitors: better training and testing time costs, the ability to make use of indefinite/invariant and deformable similarity measures, and adaptive complexity are the main features of the proposed model.

We also propose a rather realistic model of recognition problems, which quantifies the interplay between representations, classifiers, and recognition performances. Based on data-describing measures which are aggregates of pairwise similarities of the training data, our model characterizes and describes the distributions of training exemplars. The measures are shown to capture many aspects of the difficulty of categorization problems and correlate significantly to the observed recognition performances. Utilizing these measures, the model predicts the performance of particular classifiers on distributions similar to the training data. These predictions, when compared to the test performance of the classifiers on the test sets, are reasonably accurate.

We discuss various aspects of visual recognition problems: what is the interplay between representations and classification tasks, how can different models better adapt to the training data, etc. We describe and analyze the aforementioned methods that are designed to tackle different visual recognition problems, but share one common characteristic: being data driven.

Ort, förlag, år, upplaga, sidor
KTH Royal Institute of Technology, 2014. s. ix, 36
Nyckelord
Visual Recognition, Data Driven, Supervised Learning, Mixture Models, Non-Parametric Models, Category Recognition, Novelty Detection
Nationell ämneskategori
Datorseende och robotik (autonoma system)
Identifikatorer
urn:nbn:se:kth:diva-145865 (URN)978-91-7595-197-3 (ISBN)
Disputation
2014-06-12, F3, Lindstedtsvägen 26, KTH, Stockholm, 14:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20140604

Tillgänglig från: 2014-06-04 Skapad: 2014-06-02 Senast uppdaterad: 2018-01-11Bibliografiskt granskad
2. Visual Representations and Models: From Latent SVM to Deep Learning
Öppna denna publikation i ny flik eller fönster >>Visual Representations and Models: From Latent SVM to Deep Learning
2016 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Two important components of a visual recognition system are representation and model. Both involves the selection and learning of the features that are indicative for recognition and discarding those features that are uninformative. This thesis, in its general form, proposes different techniques within the frameworks of two learning systems for representation and modeling. Namely, latent support vector machines (latent SVMs) and deep learning.

First, we propose various approaches to group the positive samples into clusters of visually similar instances. Given a fixed representation, the sampled space of the positive distribution is usually structured. The proposed clustering techniques include a novel similarity measure based on exemplar learning, an approach for using additional annotation, and augmenting latent SVM to automatically find clusters whose members can be reliably distinguished from background class. 

In another effort, a strongly supervised DPM is suggested to study how these models can benefit from privileged information. The extra information comes in the form of semantic parts annotation (i.e. their presence and location). And they are used to constrain DPMs latent variables during or prior to the optimization of the latent SVM. Its effectiveness is demonstrated on the task of animal detection.

Finally, we generalize the formulation of discriminative latent variable models, including DPMs, to incorporate new set of latent variables representing the structure or properties of negative samples. Thus, we term them as negative latent variables. We show this generalization affects state-of-the-art techniques and helps the visual recognition by explicitly searching for counter evidences of an object presence.

Following the resurgence of deep networks, in the last works of this thesis we have focused on deep learning in order to produce a generic representation for visual recognition. A Convolutional Network (ConvNet) is trained on a largely annotated image classification dataset called ImageNet with $\sim1.3$ million images. Then, the activations at each layer of the trained ConvNet can be treated as the representation of an input image. We show that such a representation is surprisingly effective for various recognition tasks, making it clearly superior to all the handcrafted features previously used in visual recognition (such as HOG in our first works on DPM). We further investigate the ways that one can improve this representation for a task in mind. We propose various factors involving before or after the training of the representation which can improve the efficacy of the ConvNet representation. These factors are analyzed on 16 datasets from various subfields of visual recognition.

Ort, förlag, år, upplaga, sidor
Stockholm, Sweden: KTH Royal Institute of Technology, 2016. s. 172
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 21
Nyckelord
Computer Vision, Machine Learning, Artificial Intelligence, Deep Learning, Learning Representation, Deformable Part Models, Discriminative Latent Variable Models, Convolutional Networks, Object Recognition, Object Detection
Nationell ämneskategori
Elektroteknik och elektronik Datorsystem
Forskningsämne
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-192289 (URN)978-91-7729-110-7 (ISBN)
Externt samarbete:
Disputation
2016-09-27, Kollegiesalen, Brinellvägen 8, KTH-huset, våningsplan 4, KTH Campus, Stockholm, 15:26 (Engelska)
Opponent
Handledare
Anmärkning

QC 20160908

Tillgänglig från: 2016-09-08 Skapad: 2016-09-08 Senast uppdaterad: 2016-09-09Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Sök vidare i DiVA

Av författaren/redaktören
Aghazadeh, OmidAzizpour, HosseinSullivan, JosephineCarlsson, Stefan
Av organisationen
Datorseende och robotik, CVAP
Datorseende och robotik (autonoma system)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 533 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf