Although their presence is not as common as predicted in the 80s sci-fi movies, robots are becoming increasingly integrated into our daily lives. From robotic vacuums taking care of the dust in our houses to humanoids doing challenging parkours, the advancement of robots can be seen in many domains. However, despite their increased presence, robots are still expensive to build, fragile in most cases, and, more importantly, difficult to train to adapt to different environments.
It is crucial for a robot to have strong perceptual capabilities, meaning understanding and working around the surrounding environment. Imagine getting a robotic vacuum, and the first introduction you see is to change the layout of your home in a similar way to the test rooms in the factory; so that the robot can operate properly. This would be annoying and probably cause us to return the product immediately. To overcome such limitations, a robot should be trained properly.
Training a robot in the physical world to introduce it to different environments is possible but comes with multiple drawbacks. On the one hand, setting up logistics and replacing damaged robots would be costly. On the other hand, the learning speed would be bounded to real-time and logistic costs would increase significantly in parallelized training. Therefore, the popularity of learning-in-simulation has increased and remains an active topic.
The most important problem to address when it comes to training in simulation is the generalization from simulation to the real world. In other words, how to ensure the physics in the simulation is good enough to mimic the real world, and the visuals are good enough to be accepted as photorealistic. To address these questions, Stanford and UC Berkeley researchers have proposed Gibson Environment, a perceptual and physics simulation.
Named after James J. Gibson, the author of Ecological Approach to Visual Perception, Gibson Environment tries to provide a simulation that can mimic both the physical and visual properties of the real world. Gibson can be used to train and test real-world perceptual agents. It is possible to import an arbitrary agent, a car or a humanoid, to the simulation where the agent is contained in its physical body and placed in various sets of real spaces. This containment provides a real-time photorealistic visual stream, thanks to the novel rendering engine as if there was an onboard camera on the agent. Moreover, it also puts physical constraints such as collision and gravity on the agent as if it was in the real world.
Gibson is designed to preserve the performance of agents trained in simulation in the real world. This is achieved by the novel method used while constructing the visuals in the simulation environment. First, the environment is constructed based on scanned real spaces rather than artificial ones. Second, a mechanism is integrated into the simulation, which dissolves the differences between Gibson’s renderings and real camera capture. These two mechanisms ensure that images from a real-world camera and Gibson visual renderer are statistically indistinguishable for the agent.
Researchers showcase a set of active perceptual tasks such as obstacle avoidance, distant navigation, and stair climbing learned in Gibson were successfully transferred to the real world. Although it is really exciting and promising, Gibson still has limitations to address (e.g., including dynamic content such as moving objects, allowing manipulation in the simulation, etc.), as the authors describe at the end of their research paper.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Gibson Env: Real-World Perception for Embodied Agents'. All Credit For This Research Goes To Researchers on This Project. Check out the paper, github link and project link. Please Don't Forget To Join Our ML Subreddit
Ekrem Çetinkaya received his B.Sc. in 2018, and M.Sc. in 2019 from Ozyegin University, Istanbul, Türkiye. He wrote his M.Sc. thesis about image denoising using deep convolutional networks. He received his Ph.D. degree in 2023 from the University of Klagenfurt, Austria, with his dissertation titled "Video Coding Enhancements for HTTP Adaptive Streaming Using Machine Learning." His research interests include deep learning, computer vision, video encoding, and multimedia networking.