Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Action Recognition for Robot Learning
KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.ORCID iD: 0000-0003-2314-2880
2015 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis builds on the observation that robots cannot be programmed to handle any possible situation in the world. Like humans, they need mechanisms to deal with previously unseen situations and unknown objects. One of the skills humans rely on to deal with the unknown is the ability to learn by observing others. This thesis addresses the challenge of enabling a robot to learn from a human instructor. In particular, it is focused on objects. How can a robot find previously unseen objects? How can it track the object with its gaze? How can the object be employed in activities? Throughout this thesis, these questions are addressed with the end goal of allowing a robot to observe a human instructor and learn how to perform an activity. The robot is assumed to know very little about the world and it is supposed to discover objects autonomously. Given a visual input, object hypotheses are formulated by leveraging on common contextual knowledge often used by humans (e.g. gravity, compactness, convexity). Moreover, unknown objects are tracked and their appearance is updated over time since only a small fraction of the object is visible from the robot initially. Finally, object functionality is inferred by looking how the human instructor is manipulating objects and how objects are used in relation to others. All the methods included in this thesis have been evaluated on datasets that are publicly available or that we collected, showing the importance of these learning abilities.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2015. , v, 38 p.
Series
TRITA-CSC-A, ISSN 1653-5723 ; 2015:09
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-165680OAI: oai:DiVA.org:kth-165680DiVA: diva2:808750
Public defence
2015-05-21, F3, Lindstedtsvägen 26, KTH, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

QC 20150504

Available from: 2015-05-04 Created: 2015-04-29 Last updated: 2015-05-04Bibliographically approved
List of papers
1. Unsupervised object exploration using context
Open this publication in new window or tab >>Unsupervised object exploration using context
2014 (English)In: The 23rd IEEE International Symposium on Robot and Human Interactive Communication, 2014 RO-MAN, IEEE conference proceedings, 2014, -506 p.Conference paper, Published paper (Refereed)
Abstract [en]

In order for robots to function in unstructured environments in interaction with humans, they must be able to reason about the world in a semantic meaningful way. An essential capability is to segment the world into semantic plausible object hypotheses. In this paper we propose a general framework which can be used for reasoning about objects and their functionality in manipulation activities. Our system employs a hierarchical segmentation framework that extracts object hypotheses from RGB-D video. Motivated by cognitive studies on humans, our work leverages on contextual information, e.g., that objects obey the laws of physics, to formulate object hypotheses from regions in a mathematically principled manner.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2014
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-158006 (URN)10.1109/ROMAN.2014.6926302 (DOI)978-1-4799-6763-6 (ISBN)
Conference
International Symposium on Robot and Human Interactive Communication,25-29th August, Edinburgh, Scotland, UK
Note

Qc 20150122

Available from: 2014-12-18 Created: 2014-12-18 Last updated: 2015-05-04Bibliographically approved
2. Robust 3D tracking of unknown objects
Open this publication in new window or tab >>Robust 3D tracking of unknown objects
(English)Manuscript (preprint) (Other academic)
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-165777 (URN)
Note

QS 2015

Available from: 2015-04-29 Created: 2015-04-29 Last updated: 2015-05-04Bibliographically approved
3. Functional Object Descriptors for Human Activity Modeling
Open this publication in new window or tab >>Functional Object Descriptors for Human Activity Modeling
2013 (English)In: 2013 IEEE International Conference on Robotics and Automation (ICRA), IEEE conference proceedings, 2013, 1282-1289 p.Conference paper, Published paper (Refereed)
Abstract [en]

The ability to learn from human demonstration is essential for robots in human environments. The activity models that the robot builds from observation must take both the human motion and the objects involved into account. Object models designed for this purpose should reflect the role of the object in the activity - its function, or affordances. The main contribution of this paper is to represent object directly in terms of their interaction with human hands, rather than in terms of appearance. This enables the direct representation of object affordances/function, while being robust to intra-class differences in appearance. Object hypotheses are first extracted from a video sequence as tracks of associated image segments. The object hypotheses are encoded as strings, where the vocabulary corresponds to different types of interaction with human hands. The similarity between two such object descriptors can be measured using a string kernel. Experiments show these functional descriptors to capture differences and similarities in object affordances/function that are not represented by appearance.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2013
Series
IEEE International Conference on Robotics and Automation, ISSN 1050-4729
Keyword
Activity models, Functional object, Human activities, Human demonstrations, Human environment, Image segments, Object descriptors, Video sequences
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-138526 (URN)10.1109/ICRA.2013.6630736 (DOI)000337617301042 ()2-s2.0-84887281861 (Scopus ID)978-1-4673-5641-1 (ISBN)
Conference
2013 IEEE International Conference on Robotics and Automation, ICRA 2013; Karlsruhe; Germany; 6 May 2013 through 10 May 2013
Note

QC 20140107

Available from: 2013-12-19 Created: 2013-12-19 Last updated: 2015-05-04Bibliographically approved
4. Recognizing Object Affordances in Terms of Spatio-Temporal Object-Object Relationships
Open this publication in new window or tab >>Recognizing Object Affordances in Terms of Spatio-Temporal Object-Object Relationships
2014 (English)In: Humanoid Robots (Humanoids), 2014 14th IEEE-RAS International Conference on, IEEE conference proceedings, 2014, 52-58 p.Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we describe a probabilistic framework that models the interaction between multiple objects in a scene.We present a spatio-temporal feature encoding pairwise interactions between each object in the scene. By the use of a kernel representation we embed object interactions in a vector space which allows us to define a metric comparing interactions of different temporal extent. Using this metric we define a probabilistic model which allows us to represent and extract the affordances of individual objects based on the structure of their interaction. In this paper we focus on the presented pairwise relationships but the model can naturally be extended to incorporate additional cues related to a single object or multiple objects. We compare our approach with traditional kernel approaches and show a significant improvement.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2014
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-158008 (URN)10.1109/HUMANOIDS.2014.7041337 (DOI)2-s2.0-84945185392 (Scopus ID)
Conference
International Conference on Humanoid Robots,November 18-20th 2014, Madrid, Spain
Note

QC 20141223

Available from: 2014-12-18 Created: 2014-12-18 Last updated: 2015-05-04Bibliographically approved
5. Audio-Visual Classification and Detection of Human Manipulation Actions
Open this publication in new window or tab >>Audio-Visual Classification and Detection of Human Manipulation Actions
2014 (English)In: 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), IEEE conference proceedings, 2014, 3045-3052 p.Conference paper, Published paper (Refereed)
Abstract [en]

Humans are able to merge information from multiple perceptional modalities and formulate a coherent representation of the world. Our thesis is that robots need to do the same in order to operate robustly and autonomously in an unstructured environment. It has also been shown in several fields that multiple sources of information can complement each other, overcoming the limitations of a single perceptual modality. Hence, in this paper we introduce a data set of actions that includes both visual data (RGB-D video and 6DOF object pose estimation) and acoustic data. We also propose a method for recognizing and segmenting actions from continuous audio-visual data. The proposed method is employed for extensive evaluation of the descriptive power of the two modalities, and we discuss how they can be used jointly to infer a coherent interpretation of the recorded action.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2014
Series
IEEE International Conference on Intelligent Robots and Systems, ISSN 2153-0858
Keyword
Acoustic data, Audio-visual, Audio-visual data, Coherent representations, Human manipulation, Multiple source, Unstructured environments, Visual data
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-158004 (URN)10.1109/IROS.2014.6942983 (DOI)000349834603023 ()2-s2.0-84911478073 (Scopus ID)978-1-4799-6934-0 (ISBN)
Conference
2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2014, Palmer House Hilton Hotel Chicago, United States, 14 September 2014 through 18 September 2014
Note

QC 20150122

Available from: 2014-12-18 Created: 2014-12-18 Last updated: 2015-05-04Bibliographically approved

Open Access in DiVA

Thesis(9743 kB)217 downloads
File information
File name FULLTEXT02.pdfFile size 9743 kBChecksum SHA-512
924cbd66b0855ae1a88755f2654da083bcd2be5d002102012c519064988845b5219402b596819a74c1e936f29b79da833a49a0250091230cf3432057c7193ccf
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Pieropan, Alessandro
By organisation
Computer Vision and Active Perception, CVAP
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 217 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 615 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf