Perception and manipulation of a wide array of items are part of our daily activities. The alarm clock looks round and glossy, cutlery clinks when struck with a fork, and the fork feels sharp when touched on the edge. Each object has different physical characteristics, including 3D shapes, appearances, and material types, which contribute to their distinct sensory modes.
The goal of computer vision is to recognize and locate objects in static images. Hence they are frequently modeled in 2D. Prior works on shape modeling provide low-quality visual textures and solely geometric 3D CAD models of the objects instead of more realistic textures. Additionally, the majority of works concentrate on a single modality, usually vision, and don’t cover the entire range of physical object attributes. As a result, the modeling of real-world things in the past has been quite limited and unrealistic.
Researchers from Stanford University and Carnegie Mellon University recently introduced OBJECTFOLDER 2.0, a sizable dataset of implicitly represented multisensory duplicates of actual things, in a study. It has 1,000 top-notch 3D items that were gathered from online databases. We enhance the acoustic and tactile stimulation pipelines to display more realistic multisensory data in comparison to OBJECTFOLDER 1.01, which renders slowly and has poor multisensory simulation quality.
Building a sizable dataset of realistic and multimodal 3D object models is the team’s aim in order to enable learning with these virtualized things to generalize to their physical counterparts. Researchers used current, high-quality scans of actual objects to extract their physical characteristics, such as their visual textures, material compositions, and 3D forms. The scientists then employed an implicit neural representation network to encode the simulated multisensory data after simulating the visual, auditory, and tactile data for each object in accordance with its inherent properties. Models developed with these virtualized objects can subsequently be used for activities requiring these things in the real world if the sensory input is accurate enough.
Additionally, a brand-new implicit neural representation network was put out by researchers that displays tactile, auditory, and visual sensory data in real-time with the best rendering quality. The team was able to apply the models they had learned on the virtualized objects to three difficult real-world tasks, including contact localization, form reconstruction, and object scale estimate. Numerous applications, such as on-policy reinforcement learning, multisensory learning with vision, audio, and touch, robot grasping of a variety of real objects on multiple robotic platforms, and more are made possible by OBJECTFOLDER 2.0.
The goal of OBJECTFOLDER 2.0 is to advance multimodal learning in computer vision and robotics by providing a dataset of 1,000 items in the form of implicit neural representations. The dataset is ten times greater in scale and renders orders of magnitude more quickly than previous efforts. Researchers also considerably increased the multisensory data’s quality and realism. On three testing tasks, researchers demonstrated that models developed with the virtualized items successfully transfer to their real-world counterparts. The team is excited about the research that OBJECTFOLDER 2.0 will make possible and believes that the dataset offers a viable path for multimodal object-centric learning in computer vision and robotics.
This Article is written as a summary article by Marktechpost Staff based on the research paper 'OBJECTFOLDER 2.0: A Multisensory Object Dataset for Sim2Real Transfer'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper, github link and project. Please Don't Forget To Join Our ML Subreddit
Nitish is a computer science undergraduate with keen interest in the field of deep learning. He has done various projects related to deep learning and closely follows the new advancements taking place in the field.