DeepMind Introduces ‘RGB-Stacking’: A Reinforcement Learning Based Approach For Tackling Robotic Stacking of Diverse Shapes

For many people stacking one thing on top of another seems to be a simple job. Even the most advanced robots, however, struggle to manage many tasks at once. Stacking necessitates a variety of motor, perceptual, and analytical abilities and the ability to interact with various things. Because of the complexity required, this simple human job has been elevated to a “grand problem” in robotics, spawning a small industry dedicated to creating new techniques and approaches.

DeepMind researchers think that improving state of the art in robotic stacking will need the creation of a new benchmark. Researchers are investigating ways to allow robots to better comprehend the interactions of objects with various geometries as part of DeepMind’s goal and as a step toward developing more generalizable and functional robots. In a research paper to be presented at the Conference on Robot Learning (CoRL 2021), Deepmind research team introduces RGB-Stacking. The research team introduces RGB-Stacking as a new benchmark for vision-based robotic manipulation, which challenges a robot to learn how to grab various items and balance them on top of one another. While there are existing standards for stacking activities in the literature, the researchers claim that the range of objects utilized and the assessments done to confirm their findings make their research distinct. According to the researchers, the results show that a mix of simulation and real-world data may be used to learn “multi-object manipulation,” indicating a solid foundation for the challenge of generalizing to novel items.

The objective of RGB-Stacking is to teach a robotic arm to stack items of various shapes using reinforcement learning. Reinforcement learning is a machine learning approach that allows a system — in this example, a robot — to learn via trial and error while receiving feedback from its actions and experiences. RGB-Stacking positions a gripper linked to a robot arm above a basket with three red, green, and blue items in it (hence the name RGB). The red object must be stacked on top of the blue object in 20 seconds, while the green object acts as an impediment and a diversion.


According to DeepMind researchers, the learning method guarantees that a robot develops generic skills by training on numerous object sets. RGB-Stacking deliberately changes the grip and stack characteristics that determine how a robot may grasp and stack each object, forcing the robot to engage in more complex behaviors than a basic pick-and-place method.

Each triplet presents the agent with its own set of challenges: Triplet 1 necessitates a precise grasp of the top object; Triplet 2 frequently requires the use of the top object as a tool to flip the bottom object before stacking; Triplet 3 necessitates balancing; Triplet 4 necessitates precision stacking (the object centroids must align), and Triplet 5’s top object can easily roll off if not stacked gently. We discovered that our hand-coded scripted baseline had a 51 percent success rate at stacking when measuring the difficulty of this activity.

The researchers said that their RGB-Stacking benchmark comprises two task variants with varying levels of complexity. Their aim in ‘Skill Mastery’ is to train a single agent who can stack a specified set of five triplets. They utilize the same triplets for assessment in ‘Skill Generalization,’ but they train the agent on an extensive collection of training objects with over a million potential triplets. These training items exclude the family of objects from which the test triplets were chosen to test for generalization. Their learning pipeline is decoupled into three steps in both versions.

According to the researchers, their RGB-Stacking techniques provide “surprising” stacking strategies and “mastery” of stacking a subset of items. Even yet, they admit that they’ve just scratched the surface of what’s conceivable and that the generality problem has yet to be solved.

“As researchers continue to work on solving the open challenge of true generalization in robotics, we hope that this new benchmark, along with the environment, designs, and tools we’ve released, contribute to new ideas and methods that can make manipulation even easier and robots more capable,” the researchers concluded.

Also, DeepMind is open-sourcing a version of their simulated environment, the blueprints for creating the real-robot RGB-stacking environment, and the RGB-object models and information for 3D printing to help other researchers. They’re also making a variety of libraries and tools used in robotics research available to the public.



Reference 1:

Reference 2:

Prathamesh Ingle is a Mechanical Engineer and works as a Data Analyst. He is also an AI practitioner and certified Data Scientist with an interest in applications of AI. He is enthusiastic about exploring new technologies and advancements with their real-life applications

↗ Step by Step Tutorial on 'How to Build LLM Apps that can See Hear Speak'