It has been demonstrated in a number of robotic areas how the use of virtual fixtures improves task performance both in terms of execution time and overall precision, [1]. However, the fixtures are typically inflexible, resulting in a degraded performance in cases of unexpected obstacles or incorrect fixture models. In this paper, we propose the use of adaptive virtual fixtures that enable us to cope with the above problems. A teleoperative or human machine collaborative setting is assumed with the core idea of dividing the task, that the operator is executing, into several subtasks. The operator may remain in each of these subtasks as long as necessary and switch freely between them. Hence, rather than executing a predefined plan, the operator has the ability to avoid unforeseen obstacles and deviate from the model. In our system, the probability that the user is following a certain trajectory (subtask) is estimated and used to automatically adjusts the compliance. Thus, an on-line decision of how to fixture the movement is provided.
Acquiring, representing and modeling human skins is one of the key research areas in teleoperation, programming. by-demonstration and human-machine collaborative settings. One of the common approaches is to divide the task that the operator is executing into several subtasks in order to provide manageable modeling. In this paper we consider the use of a Layered Hidden Markov Model (LHMM) to model human skills. We evaluate a gestem classifier that classifies motions into basic action-primitives, or gestems. The gestem classifiers are then used in a LHMM to model a simulated teleoperated task. We investigate the online and offline classilication performance with respect to noise, number of gestems, type of HAIM and the available number of training sequences. We also apply the LHMM to data recorded during the execution of a trajectory-tracking task in 2D and 3D with a robotic manipulator in order to give qualitative as well as quantitative results for the proposed approach. The results indicate that the LHMM is suitable for modeling teleoperative trajectory-tracking tasks and that the difference in classification performance between one and multi dimensional HMMs for gestem classification is small. It can also be seen that the LHMM is robust w.r.t misclassifications in the underlying gestem classifiers.
Acquiring, representing and modelling human skills is one of the key research areas in teleoperation, programming-by-demonstration and human-machine collaborative settings. The problems are challenging mainly because of the lack of a general mathematical model to describe human skills. One of the common approaches is to divide the task that the operator is executing into several subtasks or low-level subsystems in order to provide manageable modelling. In this paper we consider the use of a Layered Hidden Markov Model (LHMM) to model human skills. We evaluate a gesteme classifier that classifies motions into basic action-primitives, or gestemes. The gesteme classifiers are then used in a LHMM to model a teleoperated task. The proposed methodology uses three different HMM models at the gesteme level: one-dimensional HMM, multi-dimensional HMM and multidimensional HMM with Fourier transform. The online and off-line classification performance of these three models is evaluated with respect to the number of gestemes, the influence of the number of training samples, the effect of noise and the effect of the number of observation symbols. We also apply the LHMM to data recorded during the execution of a trajectory tracking task in 2D and 3D with a mobile manipulator in order to provide qualitative as well as quantitative results for the proposed approach. The results indicate that the LHMM is suitable for modelling teleoperative trajectory-tracking tasks and that the difference in classification performance between one and multidimensional HMMs for gesteme classification is small. It can also be seen that the LHMM is robust with respect to misclassifications in the underlying gesteme classifiers.
Probabilistic roadmap methods (PRMs) have been successfully used to solve difficult path planning problems but their efficiency is limited when the free space contains narrow passages through which the robot must pass. This paper presents a new sampling scheme that aims to increase the probability of finding paths through narrow passages. Here, a biased sampling scheme is used to increase the distribution of nodes in narrow regions of the free space. A partial computation of the artificial potential field is used to bias the distribution of nodes.
This paper presents our ongoing research in the design of a versatile service robot capable of operating in a home or office environment. Ideas presented here cover architectural issues and possible applications for such a robot system with focus on tasks requiring constrained end-effector motions. Two key components of such system is a path planner and a reactive behavior capable of force relaxation and path adaptation. These components are presented in detail along with an overview of the software architecture they fit into.
One of the main challenges in the field of robotics is to make robots ubiquitous. To intelligently interact with the world, such robots need to understand the environment and situations around them and react appropriately, they need context-awareness. But how to equip robots with capabilities of gathering and interpreting the necessary information for novel tasks through interaction with the environment and by providing some minimal knowledge in advance? This has been a longterm question and one of the main drives in the field of cognitive system development. The main idea behind the work presented in this paper is that the robot should, like a human infant, learn about objects by interacting with them, forming representations of the objects and their categories that are grounded in its embodiment. For this purpose, we study an early learning of object grasping process where the agent, based on a set of innate reflexes and knowledge about its embodiment. We stress out that this is not the work on grasping, it is a system that interacts with the environment based on relations of 3D visual features generated trough a stereo vision system. We show how geometry, appearance and spatial relations between the features can guide early reactive grasping which can later on be used in a more purposive manner when interacting with the environment.
In this letter, we investigate learning forward dynamics models and multi-step prediction of state variables (long-term prediction) for contact-rich manipulation. The problems are formulated in the context of model-based reinforcement learning (MBRL). We focus on two aspects-discontinuous dynamics and data-efficiency-both of which are important in the identified scope and pose significant challenges to State-of-the-Art methods. We contribute to closing this gap by proposing a method that explicitly adopts a specific hybrid structure for the model while leveraging the uncertainty representation and data-efficiency of Gaussian process. Our experiments on an illustrative moving block task and a 7-DOF robot demonstrate a clear advantage when compared to popular baselines in low data regimes.
Deep reinforcement learning (DRL) has been successfully used to solve various robotic manipulation tasks. However, most of the existing works do not address the issue of control stability. This is in sharp contrast to the control theory community where the well-established norm is to prove stability whenever a control law is synthesized. What makes traditional stability analysis difficult for DRL are the uninterpretable nature of the neural network policies and unknown system dynamics. In this work, stability is obtained by deriving an interpretable deep policy structure based on the energy shaping control of Lagrangian systems. Then, stability during physical interaction with an unknown environment is established based on passivity. The result is a stability guaranteeing DRL in a model-free framework that is general enough for contact-rich manipulation tasks. With an experiment on a peg-in-hole task, we demonstrate, to the best of our knowledge, the first DRL with stability guarantee on a real robotic manipulator.
Deep reinforcement learning (DRL) has been successfully used to solve various robotic manipulation tasks. However, most of the existing works do not address the issue of control stability. This is in sharp contrast to the control theory community where the well-established norm is to prove stability whenever a control law is synthesized. What makes traditional stability analysis difficult for DRL are the uninterpretable nature of the neural network policies and unknown system dynamics. In this work, unconditional stability is obtained by deriving an interpretable deep policy structure based on the energy shaping control of Lagrangian systems. Then, stability during physical interaction with an unknown environment is established based on passivity. The result is a stability guaranteeing DRL in a model-free framework that is general enough for contact-rich manipulation tasks. With an experiment on a peg-in-hole task, we demonstrate, to the best of our knowledge, the first DRL with stability guarantee on a real robotic manipulator.
Reinforcement Learning (RL) of robotic manipulation skills, despite its impressive successes, stands to benefit from incorporating domain knowledge from control theory. One of the most important properties that is of interest is control stability. Ideally, one would like to achieve stability guarantees while staying within the framework of state-of-the-art deep RL algorithms. Such a solution does not exist in general, especially one that scales to complex manipulation tasks. We contribute towards closing this gap by introducing normalizing-flow control structure, that can be deployed in any latest deep RL algorithms. While stable exploration is not guaranteed, our method is designed to ultimately produce deterministic controllers with provable stability. In addition to demonstrating our method on challenging contact-rich manipulation tasks, we also show that it is possible to achieve considerable exploration efficiency-reduced state space coverage and actuation efforts- without losing learning efficiency.
Reinforcement Learning (RL) of robotic manipu-lation skills, despite its impressive successes, stands to benefitfrom incorporating domain knowledge from control theory. Oneof the most important properties that is of interest is controlstability. Ideally, one would like to achieve stability guaranteeswhile staying within the framework of state-of-the-art deepRL algorithms. Such a solution does not exist in general,especially one that scales to complex manipulation tasks. Wecontribute towards closing this gap by introducing normalizing-flow control structure, that can be deployed in any latest deepRL algorithms. While stable exploration is not guaranteed,our method is designed to ultimately produce deterministiccontrollers with provable stability. In addition to demonstratingour method on challenging contact-rich manipulation tasks, wealso show that it is possible to achieve considerable explorationefficiency–reduced state space coverage and actuation efforts–without losing learning efficiency.
The work presented here is a culmination of developments within the Swedish project COIN: Co-adaptive human-robot interactive systems, funded by the Swedish Foundation for Strategic Research (SSF), which addresses a unified framework for co-adaptive methodologies in human-robot co-existence. We investigate co-adaptation in the context of safe planning/control, trust, and multi-modal human-robot interactions, and present novel methods that allow humans and robots to adapt to one another and discuss directions for future work.
In this work we summarize the solution developed by Team KTH for the Amazon Picking Challenge 2016 in Leipzig, Germany. The competition simulated a warehouse automation scenario and it was divided in two tasks: a picking task where a robot picks items from a shelf and places them in a tote and a stowing task which is the inverse task where the robot picks items from a tote and places them in a shelf. We describe our approach to the problem starting from a high level overview of our system and later delving into details of our perception pipeline and our strategy for manipulation and grasping. The solution was implemented using a Baxter robot equipped with additional sensors.
In this chapter we summarize the solution developed by team KTH for the Amazon Picking Challenge 2016 in Leipzig, Germany. The competition, which simulated a warehouse automation scenario, was divided into two parts: a picking task, where the robot picks items from a shelf and places them into a tote, and a stowing task, where the robot picks items from a tote and places them in a shelf. We describe our approach to the problem starting with a high-level overview of the system, delving later into the details of our perception pipeline and strategy for manipulation and grasping. The hardware platform used in our solution consists of a Baxter robot equipped with multiple vision sensors.
In this work we propose an approach to learn a robust policy for solving the pivoting task. Recently, several model-free continuous control algorithms were shown to learn successful policies without prior knowledge of the dynamics of the task. However, obtaining successful policies required thousands to millions of training episodes, limiting the applicability of these approaches to real hardware. We developed a training procedure that allows us to use a simple custom simulator to learn policies robust to the mismatch of simulation vs robot. In our experiments, we demonstrate that the policy learned in the simulator is able to pivot the object to the desired target angle on the real robot. We also show generalization to an object with different inertia, shape, mass and friction properties than those used during training. This result is a step towards making model-free reinforcement learning available for solving robotics tasks via pre-training in simulators that offer only an imprecise match to the real-world dynamics.
We develop an approach that benefits from large simulated datasets and takes full advantage of the limited online data that is most relevant. We propose a variant of Bayesian optimization that alternates between using informed and uninformed kernels. With this Bernoulli Alternation Kernel we ensure that discrepancies between simulation and reality do not hinder adapting robot control policies online. The proposed approach is applied to a challenging real-world problem of task-oriented grasping with novel objects. Our further contribution is a neural network architecture and training pipeline that use experience from grasping objects in simulation to learn grasp stability scores. We learn task scores from a labeled dataset with a convolutional network, which is used to construct an informed kernel for our variant of Bayesian optimization. Experiments on an ABB Yumi robot with real sensor data demonstrate success of our approach, despite the challenge of fulfilling task requirements and high uncertainty over physical properties of objects.
We address the problem of learning reusable state representations from streaming high-dimensional observations. This is important for areas like Reinforcement Learning (RL), which yields non-stationary data distributions during training. We make two key contributions. First, we propose an evaluation suite that measures alignment between latent and true low-dimensional states. We benchmark several widely used unsupervised learning approaches. This uncovers the strengths and limitations of existing approaches that impose additional constraints/objectives on the latent space. Our second contribution is a unifying mathematical formulation for learning latent relations. We learn analytic relations on source domains, then use these relations to help structure the latent space when learning on target domains. This formulation enables a more general, flexible and principled way of shaping the latent space. It formalizes the notion of learning independent relations, without imposing restrictive simplifying assumptions or requiring domain-specific information. We present mathematical properties, concrete algorithms for implementation and experimental validation of successful learning and transfer of latent relations.
Gaussian Processes (GPs) have been widely used in robotics as models, and more recently as key structures in active learning algorithms, such as Bayesian optimization. GPs consist of two main components: the mean function and the kernel. Specifying a prior mean function has been a common way to incorporate prior knowledge. When a prior mean function could not be constructed manually, the next default has been to incorporate prior (simulated) observations into a GP as 'fake' data. Then, this GP would be used to further learn from true data on the target (real) domain. We argue that embedding prior knowledge into GP kernels instead provides a more flexible way to capture simulation-based information. We give examples of recent works that demonstrate the wide applicability of such kernel-centric treatment when using GPs as part of Bayesian optimization. We also provide discussion that helps to build intuition for why such 'kernels as priors' view is beneficial.
Data-efficiency is crucial for autonomous robots to adapt to new tasks and environments. In this work, we focus on robotics problems with a budget of only 10-20 trials. This is a very challenging setting even for data- efficient approaches like Bayesian optimization (BO), especially when optimizing higher-dimensional controllers. Previous work extracted expert-designed low-dimensional features from simulation trajectories to construct informed kernels and run ultra sample-efficient BO on hardware. We remove the need for expert-designed features by proposing a model and architecture for a sequential variational autoencoder that embeds the space of simulated trajectories into a lower-dimensional space of latent paths in an unsupervised way. We further compress the search space for BO by reducing exploration in parts of the state space that are undesirable, without requiring explicit constraints on controller parameters. We validate our approach with hardware experiments on a Daisy hexapod robot and an ABB Yumi manipulator. We also present simulation experiments with further comparisons to several baselines on Daisy and two manipulators. Our experiments indicate the proposed trajectory-based kernel with dynamic compression can offer ultra data-efficient optimization.
Deformable objects present a formidable challenge for robotic manipulation due to the lack of canonical low-dimensional representations and the difficulty of capturing, predicting, and controlling such objects. We construct compact topological representations to capture the state of highly deformable objects that are topologically nontrivial. We develop an approach that tracks the evolution of this topological state through time. Under several mild assumptions, we prove that the topology of the scene and its evolution can be recovered from point clouds representing the scene. Our further contribution is a method to learn predictive models that take a sequence of past point cloud observations as input and predict a sequence of topological states, conditioned on target/future control actions. Our experiments with highly deformable objects in simulation show that the proposed multistep predictive models yield more precise results than those obtained from computational topology libraries. These models can leverage patterns inferred across various objects and offer fast multistep predictions suitable for real-time applications.
Reinforcement Learning methods are capable of solving complex problems, but resulting policies might perform poorly in environments that are even slightly different. In robotics especially, training and deployment conditions often vary and data collection is expensive, making retraining undesirable. Simulation training allows for feasible training times, but on the other hand suffer from a reality-gap when applied in real-world settings. This raises the need of efficient adaptation of policies acting in new environments. We consider the problem of transferring knowledge within a family of similar Markov decision processes. We assume that Q-functions are generated by some low-dimensional latent variable. Given such a Q-function, we can find a master policy that can adapt given different values of this latent variable. Our method learns both the generative mapping and an approximate posterior of the latent variables, enabling identification of policies for new tasks by searching only in the latent space, rather than the space of all policies. The low-dimensional space, and master policy found by our method enables policies to quickly adapt to new environments. We demonstrate the method on both a pendulum swing-up task in simulation, and for simulation-to-real transfer on a pushing task.
Manipulation of deformable objects has given rise to an important set of open problems in the field of robotics. Application areas include robotic surgery, household robotics, manufacturing, logistics, and agriculture, to name a few. Related research problems span modeling and estimation of an object's shape, estimation of an object's material properties, such as elasticity and plasticity, object tracking and state estimation during manipulation, and manipulation planning and control. In this survey article, we start by providing a tutorial on foundational aspects of models of shape and shape dynamics. We then use this as the basis for a review of existing work on learning and estimation of these models and on motion planning and control to achieve desired deformations. We also discuss potential future lines of work.
Kernel methods have been used very successfully to classify data in various application domains. Traditionally, kernels have been constructed mainly for vectorial data defined on a specific vector space. Much less work has been addressing the development of kernel functions for non-vectorial data. In this paper, we present a new kernel for encoding sequential data. We present our results comparing the proposed kernel to the state of the art, showing a significant improvement in classification and a much improved robustness and interpretability.
We define a novel kernel function for finite sequences of arbitrary length which we call the path kernel. We evaluate this kernel in a classification scenario using synthetic data sequences and show that our kernel can outperform state of the art sequential similarity measures. Furthermore, we find that, in our experiments, a clustering of data based on the path kernel results in much improved interpretability of such clusters compared to alternative approaches such as dynamic time warping or the global alignment kernel.
We present two approaches to modeling affordance relations between objects, actions and effects. The first approach we present focuses on a probabilistic approach which uses a voting function to learn which objects afford which types of grasps. We compare the success rate of this approach to a second approach which uses an ontological reasoning engine for learning affordances. Our second approach employs a rule-based system with axioms to reason on grasp selection for a given object.
Representing 3D geometry for different tasks, e.g. rendering and reconstruction, is an important goal in different fields, such as computer graphics, computer vision and robotics. Robotic applications often require perception of object shape information extracted from sensory data that can be noisy and incomplete. This is a challenging task and in order to facilitate analysis of new methods and comparison of different approaches for shape modeling (e.g. surface estimation), completion and exploration, we provide real sensory data acquired from exploring various objects of different complexities. The dataset includes visual and tactile readings in the form of 3D point clouds obtained using two different robot setups that are equipped with visual and tactile sensors. During data collection, the robots touch the experiment objects in a predefined manner at various exploration configurations and gather visual and tactile points in the same coordinate frame based on calibration between the robots and the used cameras. The goal of this exhaustive exploration procedure is to sense unseen parts of the objects which are not visible to the cameras, but can be sensed via tactile sensors activated at touched areas. The data was used for shape completion and modeling via Implicit Surface representation and Gaussian-Process-based regression, in the work “Object shape estimation and modeling, based on sparse Gaussian process implicit surfaces, combining visual data and tactile exploration” [3], and also used partially in “Enhancing visual perception of shape through tactile glances” [4], both studying efficient exploration of objects to reduce number of touches.
We present a probabilistic model for joint representation of several sensory modalities and action parameters in a robotic grasping scenario. Our non-linear probabilistic latent variable model encodes relationships between grasp-related parameters, learns the importance of features, and expresses confidence in estimates. The model learns associations between stable and unstable grasps that it experiences during an exploration phase. We demonstrate the applicability of the model for estimating grasp stability, correcting grasps, identifying objects based on tactile imprints and predicting tactile imprints from object-relative gripper poses. We performed experiments on a real platform with both known and novel objects, i.e., objects the robot trained with, and previously unseen objects. Grasp correction had a 75% success rate on known objects, and 73% on new objects. We compared our model to a traditional regression model that succeeded in correcting grasps in only 38% of cases.
This paper studies the viability of concurrentobject pose tracking and tactile sensing for assessing graspstability on a physical robotic platform. We present a kernellogistic-regression model of pose- and touch-conditional graspsuccess probability. Models are trained on grasp data whichconsist of (1) the pose of the gripper relative to the object,(2) a tactile description of the contacts between the objectand the fully-closed gripper, and (3) a binary descriptionof grasp feasibility, which indicates whether the grasp canbe used to rigidly control the object. The data is collectedby executing grasps demonstrated by a human on a roboticplatform composed of an industrial arm, a three-finger gripperequipped with tactile sensing arrays, and a vision-based objectpose tracking system. The robot is able to track the poseof an object while it is grasping it, and it can acquiregrasp tactile imprints via pressure sensor arrays mounted onits gripper’s fingers. We consider models defined on severalsubspaces of our input data – using tactile perceptions orgripper poses only. Models are optimized and evaluated with f-fold cross-validation. Our preliminary results show that stabilityassessments based on both tactile and pose data can providebetter rates than assessments based on tactile data alone.
Our aim is to predict the stability of a grasp from the perceptions available to a robot before attempting to lift up and transport an object. The percepts we consider consist of the tactile imprints and the object-gripper configuration read before and until the robot’s manipulator is fully closed around an object. Our robot is equipped with multiple tactile sensing arrays and it is able to track the pose of an object during the application of a grasp. We present a kernel-logistic-regression model of pose- and touch-conditional grasp success probability which we train on grasp data collected by letting the robot experience the effect on tactile and visual signals of grasps suggested by a teacher, and letting the robot verify which grasps can be used to rigidly control the object. We consider models defined on several subspaces of our input data – e.g., using tactile perceptions or pose information only. Our experiment demonstrates that joint tactile and pose-based perceptions carry valuable grasp-related information, as models trained on both hand poses and tactile parameters perform better than the models trained exclusively on one perceptual input.
This paper presents an integration of grasp planning and online grasp stability assessment based on tactile data. We show how the uncertainty in grasp execution posterior to grasp planning can be dealt with using tactile sensing and machine learning techniques. The majority of the state-of-the-art grasp planners demonstrate impressive results in simulation. However, these results are mostly based on perfect scene/object knowledge allowing for analytical measures to be employed. It is questionable how well these measures can be used in realistic scenarios where the information about the object and robot hand may be incomplete and/or uncertain. Thus, tactile and force-torque sensory information is necessary for successful online grasp stability assessment. We show how a grasp planner can be integrated with a probabilistic technique for grasp stability assessment in order to improve the hypotheses about suitable grasps on different types of objects. Experimental evaluation with a three-fingered robot hand equipped with tactile array sensors shows the feasibility and strength of the integrated approach.
In this paper, the problem of learning grasp stability in robotic object grasping based on tactile measurements is studied. Although grasp stability modeling and estimation has been studied for a long time, there are few robots today able of demonstrating extensive grasping skills. The main contribution of the work presented here is an investigation of probabilistic modeling for inferring grasp stability based on learning from examples. The main objective is classification of a grasp as stable or unstable before applying further actions on it, e.g. lifting. The problem cannot be solved by visual sensing which is typically used to execute an initial robot hand positioning with respect to the object. The output of the classification system can trigger a regrasping step if an unstable grasp is identified. An off-line learning process is implemented and used for reasoning about grasp stability for a three-fingered robotic hand using Hidden Markov models. To evaluate the proposed method, experiments are performed both in simulation and on a real robot system.
An important ability of a robot that interacts with the environment and manipulates objects is to deal with the uncertainty in sensory data. Sensory information is necessary to, for example, perform online assessment of grasp stability. We present methods to assess grasp stability based on haptic data and machinelearning methods, including AdaBoost, support vector machines (SVMs), and hidden Markov models (HMMs). In particular, we study the effect of different sensory streams to grasp stability. This includes object information such as shape; grasp information such as approach vector; tactile measurements fromfingertips; and joint configuration of the hand. Sensory knowledge affects the success of the grasping process both in the planning stage (before a grasp is executed) and during the execution of the grasp (closed-loop online control). In this paper, we study both of these aspects. We propose a probabilistic learning framework to assess grasp stability and demonstrate that knowledge about grasp stability can be inferred using information from tactile sensors. Experiments on both simulated and real data are shown. The results indicate that the idea to exploit the learning approach is applicable in realistic scenarios, which opens a number of interesting venues for the future research.
We present a probabilistic framework for grasp modeling and stability assessment. The framework facilitates assessment of grasp success in a goal-oriented way, taking into account both geometric constraints for task affordances and stability requirements specific for a task. We integrate high-level task information introduced by a teacher in a supervised setting with low-level stability requirements acquired through a robot's self-exploration. The conditional relations between tasks and multiple sensory streams (vision, proprioception and tactile) are modeled using Bayesian networks. The generative modeling approach both allows prediction of grasp success, and provides insights into dependencies between variables and features relevant for object grasping.
We propose a method for interactive modeling ofobjects and object relations based on real-time segmentation ofvideo sequences. In interaction with a human, the robot canperform multi-object segmentation through principled model-ing of physical constraints. The key contribution is an efficientmulti-labeling framework, that allows object modeling anddisambiguation in natural scenes. Object modeling and labelingis done in a real-time, to which hypotheses and constraintsdenoting relations between objects can be added incrementally.Through instructions such as key presses or spoken words, ascene can be segmented in regions corresponding to multiplephysical objects. The approach solves some of the difficultproblems related to disambiguation of objects merged due totheir direct physical contact. Results show that even a limited setof simple interactions with a human operator can substantiallyimprove segmentation results.
In this paper, we propose a method that generates grasping actions for novel objects based on visual input from a stereo camera. We are integrating two methods that are advantageous either in predicting how to grasp an object or where to apply a grasp. The first one reconstructs a wire frame object model through curve matching. Elementary grasping actions can be associated to parts of this model. The second method predicts grasping points in a 2D contour image of an object. By integrating the information from the two approaches, we can generate a sparse set, of full grasp configurations that are of a good quality. We demonstrate our approach integrated in a vision system for complex shaped objects as well as in cluttered scenes.
We propose a framework for detecting, extracting and mod-eling objects in natural scenes from multi-modal data. Our frameworkis iterative, exploiting different hypotheses in a complementary manner.We employ the framework in realistic scenarios, based on visual appear-ance and depth information. Using a robotic manipulator that interactswith the scene, object hypotheses generated using appearance informa-tion are confirmed through pushing. The framework is iterative, eachgenerated hypothesis is feeding into the subsequent one, continuously re-fining the predictions about the scene. We show results that demonstratethe synergic effect of applying multiple hypotheses for real-world sceneunderstanding. The method is efficient and performs in real-time.
State estimation and control are intimately related processes in robot handling of flexible and articulated objects. While for rigid objects, we can generate a CAD model before-hand and a state estimation boils down to estimation of pose or velocity of the object, in case of flexible and articulated objects, such as a cloth, the representation of the object's state is heavily dependent on the task and execution. For example, when folding a cloth, the representation will mainly depend on the way the folding is executed.
Dexterous manipulation is one of the primary goals in robotics. Robots with this capability could sort and package objects, chop vegetables, and fold clothes. As robots come to work side by side with humans, they must also become human-aware. Over the past decade, research has made strides toward these goals. Progress has come from advances in visual and haptic perception and in mechanics in the form of soft actuators that offer a natural compliance. Most notably, immense progress in machine learning has been leveraged to encapsulate models of uncertainty and to support improvements in adaptive and robust control. Open questions remain in terms of how to enable robots to deal with the most unpredictable agent of all, the human.
Object shape information is an important parameter in robot grasping tasks. However, it may be difficult to obtain accurate models of novel objects due to incomplete and noisy sensory measurements. In addition, object shape may change due to frequent interaction with the object (cereal boxes, etc). In this paper, we present a probabilistic approach for learning object models based on visual and tactile perception through physical interaction with an object. Our robot explores unknown objects by touching them strategically at parts that are uncertain in terms of shape. The robot starts by using only visual features to form an initial hypothesis about the object shape, then gradually adds tactile measurements to refine the object model. Our experiments involve ten objects of varying shapes and sizes in a real setup. The results show that our method is capable of choosing a small number of touches to construct object models similar to real object shapes and to determine similarities among acquired models.
This article presents a unified framework for detecting, segmenting and tracking unknown objects in everyday scenes, allowing for inspection of object hypotheses during interaction over time. A heterogeneous scene representation is proposed, with background regions modeled as a combinations of planar surfaces and uniform clutter, and foreground objects as 3D ellipsoids. Recent energy minimization methods based on loopy belief propagation, tree-reweighted message passing and graph cuts are studied for the purpose of multi-object segmentation and benchmarked in terms of segmentation quality, as well as computational speed and how easily methods can be adapted for parallel processing. One conclusion is that the choice of energy minimization method is less important than the way scenes are modeled. Proximities are more valuable for segmentation than similarity in colors, while the benefit of 3D information is limited. It is also shown through practical experiments that, with implementations on GPUs, multi-object segmentation and tracking using state-of-art MRF inference methods is feasible, despite the computational costs typically associated with such methods.
We present an active vision system for segmentationof visual scenes based on integration of several cues. The system serves as a visual front end for generation of object hypotheses for new, previously unseen objects in natural scenes. The system combines a set of foveal and peripheral cameraswhere, through a stereo based fixation process, object hypotheses are generated. In addition to considering the segmentation process in 3D, the main contribution of the paper is integration of different cues in a temporal framework and improvement of initial hypotheses over time.
We present an approach for active segmentation based on integration of several cues.It serves as a framework for generation of object hypotheses of previously unseen objectsin natural scenes. Using an approximate Expectation-Maximisation method, the appearance,3D shape and size of objects are modelled in an iterative manner, with fixation usedfor unsupervised initialisation. To better cope with situations where an object is hard tosegregate from the surface it is placed on, a flat surface model is added to the typical twohypotheses used in classical figure-ground segmentation. The framework is further extendedto include modelling over time, in order to exploit temporal consistency for bettersegmentation and to facilitate tracking.
In this paper, we present a real-time vision system that integrates a number of algorithms using monocular and binocular cues to achieve robustness in realistic settings, for tasks such as object recognition, tracking and pose estimation. The system consists of two sets of binocular cameras; a peripheral set for disparity based attention and a foveal one for higher level processes. Thus the conflicting requirements of a wide field of view and high resolution can be overcome. One important property of the system is that the step from task specification through object recognition to pose estimation is completely automatic, combining both appearance and geometric models. Experimental evaluation is performed in a realistic indoor environment with occlusions, clutter, changing lighting and background conditions.
A distinct property of robot vision systems is that they are embodied. Visual information is extracted for the purpose of moving in and interacting with the environment. Thus, different types of perception-action cycles need to be implemented and evaluated. In this paper, we study the problem of designing a vision system for the purpose of object grasping in everyday environments. This vision system is firstly targeted at the interaction with the world through recognition and grasping of objects and secondly at being an interface for the reasoning and planning module to the real world. The latter provides the vision system with a certain task that drives it and defines a specific context, i.e. search for or identify a certain object and analyze it for potential later manipulation. We deal with cases of: (i) known objects, (ii) objects similar to already known objects, and (iii) unknown objects. The perception-action cycle is connected to the reasoning system based on the idea of affordances. All three cases are also related to the state of the art and the terminology in the neuroscientific area.