Abstract
In this paper, we study the problem of adapting manipulation trajectories involving grasped objects (e.g. tools) defined for a single grasp pose to novel grasp poses. A common approach to address this is to define a new trajectory for each possible grasp explicitly, but this is highly inefficient. Instead, we propose a method to adapt such trajectories directly while only requiring a period of self-supervised data collection, during which a camera observes the robot's end-effector moving with the object rigidly grasped. Importantly, our method requires no prior knowledge of the grasped object (such as a 3D CAD model), it can work with RGB images, depth images, or both, and it requires no camera calibration. Through a series of real-world experiments involving 1360 evaluations, we find that self-supervised RGB data consistently outperforms alternatives that rely on depth images including several state-of-the-art pose estimation methods. Compared to the best-performing baseline, our method results in an average of 28.5% higher success rate when adapting manipulation trajectories to novel grasps on several everyday tasks.
Video Explanation
Key Idea In a self-supervised manner, the robot emulates novel grasps by moving its end-effector with an object rigidly grasped while capturing images of that object using an external camera. By leveraging the captured images, a network is trained that is used to adapt a skill learned for a single object grasp to any novel object grasp during skill deployment immediately. This process allows adaptation of skills to novel grasp poses with very high accuracy using any camera modality, while requiring no prior object knowledge, camera calibration or human time.
Motivation
Manipulating grasped objects, such as tools is at the core of robotics manipulation. Consequently, for manipulation skills to be widely applicable they need to generalize across different possible poses of the grasped objects in the robot's gripper. However, in practice, skills are either tailored to a specific grasp pose or significant effort is required to define a skill variation for each possible different grasp. This is either limiting, as object grasps often change, or very inefficient, as redefining a skill for every possible grasp can be time-consuming. Despite these important limitations, the problem of adapting trajectories of skills defined for a single grasp to different possible grasps has received little attention. And while there exist methods to adapt skill trajectories defined for a single grasp pose to different grasps, they rely on: (1) prior object knowledge, such as 3D CAD models, or require object category-specific training data both of which may not be readily available in many practical scenarios; (2) depth images which can be noisy or have partially missing depth and (3) precise knowledge of a camera’s extrinsic parameters, which can negatively affect their performance due to challenges with camera calibration. As a result, existing methods are either not applicable to many practical scenarios as prior object knowledge is not available or cannot adapt skill trajectories successfully as their reliance on depth data and camera calibration hinders their performance.
​
Contributions. Motivated by all the above limitations, this work contributes a novel method that can adapt skill trajectories defined for a single grasp pose to any different grasp pose with considerably higher success rates compared to previous methods while (1) assuming no prior object knowledge, such a 3D CAD model, (2) can operate with only RGB images, (3) is robust to noisy or partially missing depth data if using depth images, and (4) does not require any camera calibration.
Results Overview. In 1360 real-world experimental evaluations, we demonstrate that our method can adapt skills with a 28.5% higher success rate when compared to 5 strong baselines.
Problem Overview
Method Overview
How does self-supervised data collection help?
How is the self-supervised data used?
How is the method deployed in practice?
Results
Examples of adapting skills taught with the imitation learning method DOME to different novel deployment grasps
Examples of adapting a precise peg-in-hole skill to a novel deployment grasp
Skill grasp.
Demonstration
Deployment grasp.
No Adaptation
Deployment grasp.
Adapted
Adapting different skills manipulating the same object to different grasps zero-shot using the same alignment network
Once our alignment network is trained for a grasped object, it can be used to adapt skills across any task for which that object is used, in a zero-shot manner with no further training. The following video shows how the same alignment network can be used across 3 different tasks to adapt skills to different grasps immediately after the skills are taught to the robot. The following video is uncut and demonstrates both the skill acquisition and skill adaption process for all tasks sequentially.