Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data Driven Visual Recognition
KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. (Computer Vision Group)
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis is mostly about supervised visual recognition problems. Based on a general definition of categories, the contents are divided into two parts: one which models categories and one which is not category based. We are interested in data driven solutions for both kinds of problems.

In the category-free part, we study novelty detection in temporal and spatial domains as a category-free recognition problem. Using data driven models, we demonstrate that based on a few reference exemplars, our methods are able to detect novelties in ego-motions of people, and changes in the static environments surrounding them.

In the category level part, we study object recognition. We consider both object category classification and localization, and propose scalable data driven approaches for both problems. A mixture of parametric classifiers, initialized with a sophisticated clustering of the training data, is demonstrated to adapt to the data better than various baselines such as the same model initialized with less subtly designed procedures. A nonparametric large margin classifier is introduced and demonstrated to have a multitude of advantages in comparison to its competitors: better training and testing time costs, the ability to make use of indefinite/invariant and deformable similarity measures, and adaptive complexity are the main features of the proposed model.

We also propose a rather realistic model of recognition problems, which quantifies the interplay between representations, classifiers, and recognition performances. Based on data-describing measures which are aggregates of pairwise similarities of the training data, our model characterizes and describes the distributions of training exemplars. The measures are shown to capture many aspects of the difficulty of categorization problems and correlate significantly to the observed recognition performances. Utilizing these measures, the model predicts the performance of particular classifiers on distributions similar to the training data. These predictions, when compared to the test performance of the classifiers on the test sets, are reasonably accurate.

We discuss various aspects of visual recognition problems: what is the interplay between representations and classification tasks, how can different models better adapt to the training data, etc. We describe and analyze the aforementioned methods that are designed to tackle different visual recognition problems, but share one common characteristic: being data driven.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2014. , ix, 36 p.
Keyword [en]
Visual Recognition, Data Driven, Supervised Learning, Mixture Models, Non-Parametric Models, Category Recognition, Novelty Detection
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:kth:diva-145865ISBN: 978-91-7595-197-3 OAI: oai:DiVA.org:kth-145865DiVA: diva2:720768
Public defence
2014-06-12, F3, Lindstedtsvägen 26, KTH, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20140604

Available from: 2014-06-04 Created: 2014-06-02 Last updated: 2017-02-22Bibliographically approved
List of papers
1. Novelty Detection from an Ego-Centric perspective
Open this publication in new window or tab >>Novelty Detection from an Ego-Centric perspective
2011 (English)In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2011, 3297-3304 p.Conference paper, Published paper (Refereed)
Abstract [en]

This paper demonstrates a system for the automatic extraction of novelty in images captured from a small video camera attached to a subject's chest, replicating his visual perspective, while performing activities which are repeated daily. Novelty is detected when a (sub)sequence cannot be registered to previously stored sequences captured while performing the same daily activity. Sequence registration is performed by measuring appearance and geometric similarity of individual frames and exploiting the invariant temporal order of the activity. Experimental results demonstrate that this is a robust way to detect novelties induced by variations in the wearer's ego-motion such as stopping and talking to a person. This is an essentially new and generic way of automatically extracting information of interest to the camera wearer and can be used as input to a system for life logging or memory support.

Series
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-38873 (URN)10.1109/CVPR.2011.5995731 (DOI)000295615803073 ()2-s2.0-80052890189 (Scopus ID)978-145770394-2 (ISBN)
Conference
2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011; Colorado Springs, CO; 20 June 2011 through 25 June 2011
Projects
VINST
Funder
ICT - The Next Generation
Note
QC 20111012Available from: 2011-09-01 Created: 2011-09-01 Last updated: 2014-06-04Bibliographically approved
2. Multi view registration for novelty/background separation
Open this publication in new window or tab >>Multi view registration for novelty/background separation
2012 (English)In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, IEEE Computer Society, 2012, 757-764 p.Conference paper, Published paper (Refereed)
Abstract [en]

We propose a system for the automatic segmentation of novelties from the background in scenarios where multiple images of the same environment are available e.g. obtained by wearable visual cameras. Our method finds the pixels in a query image corresponding to the underlying background environment by comparing it to reference images of the same scene. This is achieved despite the fact that all the images may have different viewpoints, significantly different illumination conditions and contain different objects cars, people, bicycles, etc. occluding the background. We estimate the probability of each pixel, in the query image, belonging to the background by computing its appearance inconsistency to the multiple reference images. We then, produce multiple segmentations of the query image using an iterated graph cuts algorithm, initializing from these estimated probabilities and consecutively combine these segmentations to come up with a final segmentation of the background. Detection of the background in turn highlights the novel pixels. We demonstrate the effectiveness of our approach on a challenging outdoors data set.

Place, publisher, year, edition, pages
IEEE Computer Society, 2012
Series
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919
Keyword
Automatic segmentations, Background environment, Data sets, Graph cut, Illumination conditions, Multi-view registration, Multiple image, Multiple reference images, Multiple segmentation, Query images, Reference image, Computer vision, Pixels, Image segmentation
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-105314 (URN)10.1109/CVPR.2012.6247746 (DOI)000309166200095 ()2-s2.0-84866662308 (Scopus ID)978-146731226-4 (ISBN)
Conference
2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012, 16 June 2012 through 21 June 2012, Providence, RI
Funder
ICT - The Next Generation
Note

QC 20121121

Available from: 2012-11-21 Created: 2012-11-20 Last updated: 2014-06-04Bibliographically approved
3. Mixture component identification and learning for visual recognition
Open this publication in new window or tab >>Mixture component identification and learning for visual recognition
2012 (English)In: Computer Vision – ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI, Springer, 2012, 115-128 p.Conference paper, Published paper (Refereed)
Abstract [en]

The non-linear decision boundary between object and background classes - due to large intra-class variations - needs to be modelled by any classifier wishing to achieve good results. While a mixture of linear classifiers is capable of modelling this non-linearity, learning this mixture from weakly annotated data is non-trivial and is the paper's focus. Our approach is to identify the modes in the distribution of our positive examples by clustering, and to utilize this clustering in a latent SVM formulation to learn the mixture model. The clustering relies on a robust measure of visual similarity which suppresses uninformative clutter by using a novel representation based on the exemplar SVM. This subtle clustering of the data leads to learning better mixture models, as is demonstrated via extensive evaluations on Pascal VOC 2007. The final classifier, using a HOG representation of the global image patch, achieves performance comparable to the state-of-the-art while being more efficient at detection time.

Place, publisher, year, edition, pages
Springer, 2012
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 7577
Keyword
Decision boundary, Detection time, Image patches, Intra-class variation, Linear classifiers, Mixture components, Mixture model, Non-Linearity, Non-trivial, Positive examples, Visual recognition, Visual similarity, Weakly annotated data
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-106987 (URN)10.1007/978-3-642-33783-3_9 (DOI)000342828800009 ()2-s2.0-84867892975 (Scopus ID)978-364233782-6 (ISBN)
Conference
12th European Conference on Computer Vision, ECCV 2012;Florence;7 October 2012 through 13 October 2012
Funder
ICT - The Next Generation
Note

QC 20121207

Available from: 2012-12-05 Created: 2012-12-05 Last updated: 2016-09-08Bibliographically approved
4. Properties of Datasets Predict the Performance of Classifiers
Open this publication in new window or tab >>Properties of Datasets Predict the Performance of Classifiers
2013 (English)Manuscript (preprint) (Other academic)
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-145982 (URN)
Note

QS 2014

Available from: 2014-06-04 Created: 2014-06-04 Last updated: 2014-06-04Bibliographically approved
5. Large Scale, Large Margin Classification using Indefinite Similarity Measurens
Open this publication in new window or tab >>Large Scale, Large Margin Classification using Indefinite Similarity Measurens
(English)Manuscript (preprint) (Other academic)
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-145979 (URN)
Note

QS 2014

Available from: 2014-06-04 Created: 2014-06-04 Last updated: 2014-06-04Bibliographically approved

Open Access in DiVA

Thesis(6092 kB)367 downloads
File information
File name FULLTEXT02.pdfFile size 6092 kBChecksum SHA-512
506e40a42331d74209897ac0ee69c0d51a51ed61bf4d272b5535c738294aa62cb062aa885ec109a62a2e5225d738ba8bbebe26d0751f48495a3e202953717b75
Type fulltextMimetype application/pdf

Other links

http://www.csc.kth.se/~omida/PhD_thesis.pdf

Search in DiVA

By author/editor
Aghazadeh, Omid
By organisation
Computer Vision and Active Perception, CVAP
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 367 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 327 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf