Robot Skill Adaptation
via Soft Actor-Critic Gaussian Mixture Models

scroll down

Robot Skill Adaptation via SAC-GMM


Figure 1. Overview of robot skill adaptation via SAC-GMM.

A core challenge for an autonomous agent acting in the real world is to adapt its repertoire of skills to cope with its noisy perception and dynamics. To scale learning of skills to long-horizon tasks, robots should be able to learn and later refine their skills in a structured manner through trajectories rather than making instantaneous decisions individually at each time step. To this end, we propose the Soft Actor-Critic Gaussian Mixture Model (SAC-GMM), a novel hybrid approach that learns robot skills through a dynamical system and adapts the learned skills in their own trajectory distribution space through interactions with the environment. Our approach combines classical robotics techniques of learning from demonstration with the deep reinforcement learning framework and exploits their complementary nature. We show that our method utilizes sensors solely available during the execution of preliminarily learned skills to extract relevant features that lead to faster skill refinement. Extensive evaluations in both simulation and real-world environments demonstrate the effectiveness of our method in refining robot skills by leveraging physical interactions, high-dimensional sensory data, and sparse task completion rewards.

SAC-GMM Image Overview
Figure 1. SAC-GMM: The Gaussian Mixture Model (GMM) agent learns a skill with few demonstrations and controls the robot with high-frequency F, the Soft Actor-Critic (SAC) agent affords to work with a lower frequency and refines the skill via adapting GMM parameters by leveraging physical interaction, high-dimensional sensory data, and sparse rewards.


Thinking ahead is a hallmark of human intelligence. From early infancy, we form rich primitive object concepts through our physical interactions with the real world and apply this knowledge as an intuitive model of physics for reasoning about physically plausible trajectories and adapting them to suit our purposes.

Figure 2: Trajectory reasoning is a capacity learned from early infancy through observing other people act and direct interaction with the world.

The most current deep imitation and reinforcement learning paradigms for robot sensorimotor control are typically trained to make isolated decisions at each time step of the trajectory. In fact, most of the existing end-to-end high-capacity models map directly from pixels to actions.
However, although these approaches can capture complex relationships and are flexible to adapt in face of noisy perception, they require extensive amounts of data, and the trained agent is typically bound to take a distinct decision at every time step.

The dynamical systems approach for trajectory learning from demonstrations provides a high level of reactivity and robustness against perturbations in the environment. Despite the great success of dynamical systems in affording flexible robotic systems for industry, where a high-precision state of the environment is available, they are still of limited use in more complex real-world robotics scenarios. The main limitations of current dynamical systems in contrast to deep sensorimotor learning methods are their incompetence in handling raw high-dimensional sensory data such as images, and their susceptibility to noise in the perception pipeline.

Figure 3: Dynamical system approach for trajectory learning from demonstrations.

We advocate for hybrid models in learning robot skills: “Soft Actor-Critic Gaussian Mixture Models”. SAC-GMMs learn and refine robot skills in the real-world and present a hybrid model that combines dynamical systems and deep reinforcement learning in order to leverage their complementary nature. More precisely, SAC-GMMs learn a trajectory-based Gaussian mixture policy of skills from demonstrations and refine it by physical interactions of a soft actor-critic agent with the world. Our hybrid formulation allows the dynamical system to utilize high-dimensional observation spaces and cope with noise in demonstrations and sensory observations while maintaining a reactive and robust trajectory-based policy when interacting with dynamic environments. The method is simple, sample efficient and readily applicable in a variety of robotics scenarios.


Our hybrid approach consists of two phases. In the first, we learn a dynamical system with a Gaussian mixture model from few demonstrations. In the second, we refine this dynamical system with the soft actor-critic algorithm through physical interactions with the world. The key challenges are to learn robot skills from few noisy demonstrations in a trajectory space and to later refine the skills in their own trajectory distribution space in a sparse task completion reward setting.

SAC-GMM Structure
Figure 4: Structure of SAC-GMM: We learn and adapt robot skills in trajectory distribution space. Given a set of human demonstrations, the GMM agent learns a dynamical system that provides an analytical description of the robot’s skill over time. After N interactions by the GMM agent with the environment (working with frequency F), the SAC agent receives a high-dimensional observation, robot state, and a sparse success reward from the environment, refines the GMM agent’s trajectory parameters, and optimizes for the robot’s skill success.

Dynamical Systems: Dynamical systems afford an analytical representation of the robot’s progression over time, particularly we use an autonomous dynamical system that does not explicitly depend on time and employs a first-order ordinary differential equation to map the robot pose to its velocity. From a machine learning perspective, we aim to learn the noise-free estimate of the robot skill model data as a regression problem. The problem can be addressed by using a Gaussian Mixture Model (GMM) with K Gaussian components to represent the joint probability of the robot pose and the corresponding velocity. Maximum-likelihood estimation of the mixture parameters can be carried out iteratively with different optimization techniques such as Expectation-Maximization (EM) algorithms. Finally, we use Gaussian Mixture Regression (GMR) to estimate the most likely velocity given a robot pose.

Soft Actor-Critic: Having learned the robot skill model in the trajectory space, we can now leverage the robot’s interactions with the world to explore and refine the global map of the robot skill. We formulate this refinement as a reinforcement learning problem in which the agent has to modify the learned skill in the trajectory space and has only access to sparse rewards. In RL, the goal is to learn a policy in a partially observable Markov decision process, consisting of an observation space O, a state space S and an action space A. In our skill refinement scenario, the observation space consists of the robot’s sensory measurements such as RGB images, tactile measurements, or depth maps. The state space is continuous and consists of the skill parameters, the robot pose, and latent representations of high-dimensional observations. The action space is also continuous and consists of the desired adaptation in the skill trajectory parameters. Moreover, the environment emits a sparse reward only if the robot executes the skill effectively. The robot has to learn this policy from its interactions with the world, such that it maximizes the expected total reward of the refined skill trajectory. Our particular choice for the reinforcement learning framework to learn the skill refinement policy is the soft actor-critic (SAC) algorithm.

Full model: Our hybrid model learns and refines a robot skill. The GMM agent is fitted on the provided demonstrations and represents a dynamical system, controlling the robot in the trajectory space. After each N interaction of the GMM agent with the world, the SAC agent gets an image, robot state, and reward from the world. The SAC agent optimizes for the robot’s skill success, updates the GMM agent parameters, and refines the robot skill in the trajectory space. While updating GMM parameters, we ensure that mixing weights sum to one and covariance matrices stay positive semi-definite. At the inference time, the same procedure repeats. The GMM agent interacts with the world, and with each N step, the SAC agent receives information about the skill progress and adapts the GMM agent.


We evaluate SAC-GMM for learning robot skills in both simulated and real-world environments. The goals of these experiments are to investigate:

We evaluate our approach in both simulated and real-world environments. We investigate two robot skills in simulation: Peg Insertion, and Power Lever Sliding. The environments are simulated with PyBullet. The robot receives a success reward whenever the peg is inserted correctly in the hole. We recognize the importance of the sense of touch in fitting a peg smoothly in a hole. Hence, during the skill adaptation phase, we provide the robot with vision-based tactile sensors on both fingers. For the power lever sliding skill, the robot is rewarded whenever it grasps the lever and slides it to the end, such that the light is turned on. For this skill, we provide the robot with static depth perception during the skill refinement to facilitate accurate lever pose estimation. Moreover, to simulate realistic real-world conditions, we also consider noise regarding the detected position of the hole and lever. Namely, in both environments, we add Gaussian noise with a standard deviation of 1 cm in the target pose’s x, y, and z dimensions. For both skills, we collect 20 demonstrations by teleoperating the robot in the simulation.

Peg insertion demonstrations
Lever sliding demonstrations
Peg insertion executed by GMM
Lever sliding executed by GMM
Figure 5. GMM fits a similar trajectory to the human collected demonstrations obtained by teleoperating the robot, but it is not capable of performing the task consistently.

We compare our skill model against the following models: GMM: This baseline represents the dynamical system that we learn with the provided demonstrations. We use SEDS and LPV-DS frameworks to fit our GMM agents to the demonstrated trajectories. This baseline is not able to explore or leverage high-dimensional observations to refine its performance. SAC: We employ the soft actor-critic agent to explore and learn the skills. We initialize the replay buffer of the SAC agent with the demonstrations of skills. This baseline employs the same SAC structure that we use in SAC-GMM, including the autoencoder for high-dimensional observations and the network architecture. However, this baseline does not have access to a dynamical system, and consequently, does not reason on the trajectory level. Res-GMM: This baseline first learns a GMM agent using the demonstrations and then employs a SAC agent for the exploration. The SAC agent here is very similar to what we use in SAC-GMM, with the difference that it receives information at each time step of the trajectory (instead of each N step), and instead of predicting change in trajectory parameters, it predicts a residual velocity which is summed up with the GMM agent’s predicted velocity. This baseline is inspired by recent approaches in the Residual RL domain. For the quantitative evaluation of skill models, we employ them to perform the skill 10 times at each evaluation step. We report the average accuracy of each skill model over these episodes in our plots.

Model Peg Insertion Lever sliding
GMM 20% 54%
SAC 0% 0%
Res-GMM 30% 62%
SAC-GMM 86% 81%
Table 1. SAC-GMM outperforms baseline skill models significantly in both robot skills by refining the initial dynamical systems (GMM) substantially. All models were evaluated by using a deterministic policy for 100 runs and trained in a sparse reward setting
Peg insertion executed by SAC
Lever sliding executed by SAC
Peg insertion executed by Residual GMM
Lever sliding executed by Residual GMM
Peg insertion executed by SAC-GMMs
Lever sliding executed by SAC-GMMs
Figure 6. SAC-GMM outperforms baseline skill models significantly in both robot skills by refining the initial dynamical systems (GMM) substantially.

For the real-world experiment, we investigate the door opening skill, and place our KUKA iiwa manipulator in front of a miniature door. We put an ArUco marker on the backside of the door. This marker is detected when the robot opens the door and is consequently rewarded. Moreover, to enable our robot to run autonomously without any human intervention, we equip our door with a door closing mechanism. Thus, the door shuts when the robot releases the door handle and starts a new interaction episode. We provide our robot with an Intel SR300 camera mounted on the gripper for an RGB eye-in-hand view during the skill adaptation phase. We use OpenPose to track the human hand and collect 5 human demonstrations of the door opening skill.

Figure 7. SAC-GMM learns to open a real door only after half an hour of physical interactions obtaining better performance than the baseline skill models.

We find that, although the initial dynamical system (the GMM agent fitted on human demonstrations) enables the robot to reach the door handle, the robot misses the proper position to apply its force and can only open the door with a 10% success rate. This failure is due to the robot’s noisy perception and dynamics. Our SACGMM exploits the wrist-mounted camera RGB images and sparse door opening rewards and achieves a 90% success rate after only half an hour of physical interactions (~100 episodes) with the door. The SAC baseline fails to learn the skill and the Res-GMM model performs poorly, as adding residual velocities at each time step results in non-smooth trajectories.

Door opening task executed by Res-GMM
Door opening task executed by SAC-GMM
Figure 7. SAC-GMM was able to open the door consistently after 30 minutes of training.


We present SAC-GMMs as a new framework for learning robot skills. This hybrid model leverages reinforcement learning to refine robot skills represented via dynamical systems in their trajectory distribution space and exploits the natural synergy between data-driven and analytical frameworks. Extensive experiments carried out in both simulation and realworld settings, demonstrate that our proposed skill model:

An interesting extension of our work is to investigate transferring skills learned in one environment to new environments, where the visual appearance of the overall setting will differ considerably while the goals of the skills remain similar (e. g., transfer the skill of door opening in one room to other rooms), and refining skills in new environments.


Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models
Iman Nematollahi*, Erick Rosete-Beas*, Adrian Röfer, Tim Welschehold, Abhinav Valada, Wolfram Burgard
Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models Iman Nematollahi*, Erick Rosete-Beas*, Adrian Röfer, Tim Welschehold, Abhinav Valada, Wolfram Burgard arxiv Pdf

BibTeX citation

title   = {Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models},
author  = {Iman Nematollahi, Erick Rosete-Beas, Adrian Roefer, Tim Welschehold, Abhinav Valada, Wolfram Burgard},
journal = {arXiv preprint arXiv:2111.13129},
year    = {2021}
@article{Nematollahi2022, title = {Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models}, author = {Nematollahi, Iman and Rosete-Beas, Erick and Roefer, Adrian and Welschehold, Tim and Valada, Abhinav and Burgard, Wolfram}, journal = {IEEE International Conference on Robotics and Automation (ICRA)}, year = {2022}, url = {}, pdf = {}, }