Robots are becoming increasingly prevalent in our daily lives, from automated vacuum cleaners to drones delivering packages. We are witnessing growth in their ability to handle complex tasks as technology advances. They are starting to do the tasks that were once limited to human capabilities only.
One such task is grasping objects in dynamic and unpredictable environments, such as picking a cherry from a tree. The branch is not stable, the wind is unpredictable, and the cherry is a tiny little object for a robot. This is an extremely challenging task for a robot as they are used to operating in environments with rigid-surface support, like in a factory where certain objects are coming through a stable band.
Fine manipulation of small objects is a challenging task for robots due to perception errors, sensor noise, and the inherently dynamic nature of the problem. On the other hand, it is a ubiquitous task in many fields, including manufacturing, healthcare, and agriculture, and automating it could have immense practical and economic value.
When we think of a robot for a predetermined task, like the ones used in assembly lines in factories, it can be possible to design specific hardware for the given task. By analyzing the assembly process and the necessary tools, engineers can develop a robot design that can efficiently solve the problem at hand. This approach is effective because the robot is not intended to be used in other factories, and the objects it interacts with will not change within the factory environment. However, the story changes when we want to come up with a universal solution.
Assume that we need to develop a robot that can grasp objects in different environments without any limitations. We know the environment and objects will be dynamic. Is it still possible to develop a robot that can fine grasp the objects without stable support? This is the question the authors asked, and they came up with CherryBot.
CherryBot is a dynamic system for fine manipulation which learns behavior by pre-training in an approximate simulation and then fine-tuning with model-free RL in the real world. It is designed to be precise enough to handle the task successfully while being robust against perception errors and sensor noise. Moreover, it can handle dynamic scenarios like changing environments, moving objects, etc. Also, it can generalize well to objects with different sizes, shapes, and textures without requiring specific hardware.
CherryBot leverages imperfect information accessible on most robots, such as an inaccurate simulator and a heuristic-based baseline policy, to bootstrap RL training to be surprisingly sample-efficient for manipulation in the real world. Appropriately dynamic training tasks are designed to minimize human effort in the training process and enable significantly more robust policies. The action space is designed to efficiently balance the tractability of learning with reactiveness. The system is designed to accommodate plug-and-play perception modules and adapt to different objects and scenarios.
CherryBot uses generic hardware. An assembled robot arm and chopsticks. That’s it. Chopsticks are used for fine manipulation. The robot arm is not the perfect one as well. It can provide inaccurate sensor results from time to time. Despite these drawbacks, CherryBot demonstrates superhuman reactiveness on dynamic, high-precision tasks – using chopsticks to grasp a slippery ball swinging in the air – after only 30 minutes of interaction in the real world.
Check out the Paper. Don’t forget to join our 19k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, with his dissertation titled "Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning." His research interests include deep learning, computer vision, video encoding, and multimedia networking.