Researchers from UC Berkeley and CMU Introduce a Task-Agnostic Reinforcement Learning (RL) Method to Auto-Tune Simulations to the Real World


Applying Deep Learning techniques to complex control tasks depends on simulations before transferring models to the real world. However, there is a challenging “reality gap” associated with such transfers since it is difficult for simulators to precisely capture or predict the dynamics and visual properties of the real world.

Domain randomization methods are some of the most effective approaches to handle this issue. A model is incentivized to learn features invariant to the shift between simulation and reality data distributions. Still, this approach requires task-specific expert knowledge for feature engineering, and the process is usually laborious and time-consuming. 

Researchers from UC Berkeley and Carnegie Mellon University have proposed a task-agnostic reinforcement learning (RL) method that can reduce the task-specific engineering required for domain randomization of both visual and dynamics parameters. The approach only uses raw observations as inputs. It can auto-tune the system parameters of a simulation to map reality.

Auto-Tuned Sim-to-Real Transfer

The researchers have proposed an automatic system identification procedure with the critical insight of reformulating the issue of tuning a simulation as a search problem. They aim to design a Search Param Model (SPM) that updates the system parameters using raw pixel observations of the real world. 

The researchers have successfully demonstrated that the proposed method outperforms domain randomization on a range of robotic control tasks in sim-to-sim and sim-to-real transfer. A significant highlight of their research is that the researchers train the agent in a simulation to access the unobserved state-space and reward functions easily.

Simulator system parameters define the dynamics and visuals of the simulator. Domain randomization samples the simulator parameters from the distribution of system parameters. It then aims to train the policy in simulation to maximize these parameters.

The standard practices use expert knowledge and spend time manually engineering the environment so that the simulator parameters (ξmean) can be justifiably close to real-life parameters(ξreal). The selection can be made by comparing trajectories from differently parametrized simulations and the real world, but measuring trajectories requires obtaining state-space information in the real world, which is practically impossible. Therefore, the team has only utilized raw pixel observations in the real world to find simulator parameters and auto-tune their simulator.

The approach aims to automatically find ξmean ≈ ξreal using a function that relates a series of actions to their corresponding system parameters. The team reframed the auto-tuning problem as a search procedure rather than predicting ξreal accurately and proposed a Search Param Model (SPM). The model is a binary classifier designed to auto-tune ξmean to be nearer to ξreal iteratively. 


The team performed experiments on six sim-to-sim transfer tasks to prove that SPM can effectively update simulators to the correct system parameters. It can also improve real-world return over sim-to-real transfer with naive domain randomization. These included four from the DeepMind Control Suite and two robotic arm tasks.

They found that, In all DeepMind Control Suite environments, SPM matched or exceeded the two baselines. One is the domain randomization, and the other being a variant of the SPM method that directly regresses the simulation parameter values.


SPM similarly succeeded in the cabinet slide task and the rope task. The results show SPM’s ability to adjust the randomization mean to be closer to the real system parameters, thereby ultimately leading to improved transfer success in the real world.