Change search
ReferencesLink to record
Permanent link

Direct link
Multi-Modal Scene Understanding for Robotic Grasping
KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Autonomous Systems, CAS.
2011 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

Current robotics research is largely driven by the vision of creatingan intelligent being that can perform dangerous, difficult orunpopular tasks. These can for example be exploring the surface of planet mars or the bottomof the ocean, maintaining a furnace or assembling a car.   They can also be more mundane such as cleaning an apartment or fetching groceries. This vision has been pursued since the 1960s when the first robots were built. Some of the tasks mentioned above, especially those in industrial manufacturing, arealready frequently performed by robots. Others are still completelyout of reach. Especially, household robots are far away from beingdeployable as general purpose devices. Although advancements have beenmade in this research area, robots are not yet able to performhousehold chores robustly in unstructured and open-ended environments givenunexpected events and uncertainty in perception and execution.In this thesis, we are analyzing which perceptual andmotor capabilities are necessaryfor the robot to perform common tasks in a household scenario. In that context, an essential capability is tounderstand the scene that the robot has to interact with. This involvesseparating objects from the background but also from each other.Once this is achieved, many other tasks becomemuch easier. Configuration of objectscan be determined; they can be identified or categorized; their pose can be estimated; free and occupied space in the environment can be outlined.This kind of scene model can then inform grasp planning algorithms to finally pick up objects.However, scene understanding is not a trivial problem and evenstate-of-the-art methods may fail. Given an incomplete, noisy andpotentially erroneously segmented scene model, the questions remain howsuitable grasps can be planned and how they can be executed robustly.In this thesis, we propose to equip the robot with a set of predictionmechanisms that allow it to hypothesize about parts of the sceneit has not yet observed. Additionally, the robot can alsoquantify how uncertain it is about this prediction allowing it toplan actions for exploring the scene at specifically uncertainplaces. We consider multiple modalities includingmonocular and stereo vision, haptic sensing and information obtainedthrough a human-robot dialog system. We also study several scene representations of different complexity and their applicability to a grasping scenario. Given an improved scene model from this multi-modalexploration, grasps can be inferred for each objecthypothesis. Dependent on whether the objects are known, familiar orunknown, different methodologies for grasp inference apply. In thisthesis, we propose novel methods for each of these cases. Furthermore,we demonstrate the execution of these grasp both in a closed andopen-loop manner showing the effectiveness of the proposed methods inreal-world scenarios.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology , 2011. , vi, 194 p.
Trita-CSC-A, ISSN 1653-5723 ; 2011:17
Keyword [en]
Robotics, Grasping, Manipulation, Computer Vision, Machine Learning
National Category
URN: urn:nbn:se:kth:diva-49062ISBN: 978-91-7501-184-4OAI: diva2:459199
Public defence
2011-12-16, D2, Lindstedtsvägen 5, KTH, Stockholm, 10:00 (English)
EU, FP7, Seventh Framework Programme, IST-FP7-IP-215821ICT - The Next Generation

QC 20111125

Available from: 2011-11-25 Created: 2011-11-25 Last updated: 2013-04-15Bibliographically approved

Open Access in DiVA

thesis_bohg.pdf(6718 kB)982 downloads
File information
File name FULLTEXT01.pdfFile size 6718 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Bohg, Jeannette
By organisation
Computer Vision and Active Perception, CVAPCentre for Autonomous Systems, CAS

Search outside of DiVA

GoogleGoogle Scholar
Total: 982 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 279 hits
ReferencesLink to record
Permanent link

Direct link