Embodied artificial intelligence (AI) is a subset of superintelligent AI systems that are capable of commanding actual physical objects in the real-world environment. In simple terms, embodied AI enables physical objects to move through the real world and interact with it physically in a way analogous to how people would. An instance of this is a robot arm that can carry out daily routine duties. Previous studies, however, have shown that effectively deploying agents trained in a simulation to the actual world is exceedingly laborious and does not always produce the expected outcomes.
To simplify this process, a team of researchers from the Allen Institute of AI (A2I) introduced a new embodied AI training approach called Phone2Proc. With this lightweight approach, users can use a cellphone to scan an environment and procedurally generate targeted training scene variations of that location, whose usage results in successful and robust agents in the real environment. The first step in creating robust embodied AI agents in the real environment is to use an iOS app created by the research institute to scan the target area. Using Apple devices like an iPhone or iPad, users may scan a large apartment in a matter of minutes, and the application generates an environment template as a USDZ file.
The application makes use of Apple’s freely accessible RoomPlan API, which offers a high-level bounding box template of the environment that includes the arrangements of the rooms and the 3D positioning of significant objects visible to the camera. The software also offers extensive real-time feedback regarding the scene’s design while scanning a setting to aid the user in taking a more accurate scan. After the scanning procedure is concluded, the created scene versions are then based on the scanned layout and major objects, such as storage, a sofa, a table, a chair, a bed, a refrigerator, a fireplace, a toilet, and stairs, among other things. Some additional components, such as textures, lighting, and small objects, are added to create a greater variance. It is noteworthy that the researchers have developed their app in such a way generation process is extremely fast.
The researchers used five ObjectGoal Navigation (ObjectNav) tasks, in which agents must find an instance of an object in an unobserved environment. Yet, their method can be used in a variety of settings and embodied AI applications. Phone2Proc generates scenes based on the scan created for the real-world environment and then produces variations for that scene, in contrast to the baseline model, ProcTHOR, which generates and populates settings starting from a high-level room specification, such as a 3-bedroom house with a kitchen and living area. Six steps make up the process: parsing the environment template, creating the scene layout, selecting items from the asset library that correspond to the scanned semantic categories, and considering object collisions. The final two steps entail populating the scene with small objects that were not captured by scanning and assigning materials and lighting elements.
To evaluate their approach, the researchers conducted multiple experiments to compare their Phone2Proc approach with the ProcTHOR baseline approach in a number of contexts, such as a 6-room apartment, 3-room apartment, conference room, and much more. In every real-world scenario, Phone2Proc excels and outperforms the baseline ProcTHOR approach’s performance. Regarding numbers, the method created by A2I researchers has a success rate of 70.7% compared to the baseline’s rate of 34.7%. The researchers also conducted several experiments to show that Phone2Proc is resilient to various types of scene disturbance and environmental dynamism, emphasizing its strength. These include crowded spaces, the movement of people or things within the room, changes in lighting, and even the movement of the target objects.
Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 16k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.
Khushboo Gupta is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Goa. She is passionate about the fields of Machine Learning, Natural Language Processing and Web Development. She enjoys learning more about the technical field by participating in several challenges.