Allen Institute for AI Researchers Propose PROCTHOR: A Machine Learning Framework for Procedural Generation of Embodied AI Environments

Using large-scale training data, computer vision, and natural language processing models have strengthened. Recent models like CLIP, DALL-E, GPT-3, and Flamingo leverage vast quantities of task-agnostic data to pre-train large neural networks that perform amazingly well. In comparison, the Embodied AI research community mainly trains agents in simulators with significantly fewer situations. Due to the complexity of tasks and the necessity for extended planning horizons, the highest performing E-AI models continue to overfit constrained training scenes and consequently transfer poorly to unknown contexts.

Although E-AI simulators have gotten increasingly powerful in recent years, with support for physics, manipulators, object states, deformable objects, fluids, and real-sim equivalents, scaling them up to tens of thousands of scenes has remained challenging. Existing E-AI settings are either developed by hand or obtained from 3D scans of real-world structures. The former method necessitates a significant amount of effort by 3D designers to build 3D assets, organizing them in acceptable arrangements inside enormous locations and meticulously establishing the appropriate textures and lighting in these environments. The latter entails moving specialized cameras across various real-world situations and then stitching the resulting photos together to create 3D reconstructions of the scenes.

These techniques are not scalable, and it is not feasible to scale up existing scene repositories by orders of magnitude. PROCTHOR, a framework based on AI2-THOR, is presented to construct fully interactive procedurally, physics-enabled settings for E-AI research. PROCTHOR can generate a broad and diverse selection of floorplans that fit the specifications of a given space. To automatically fill each floorplan, a massive asset library of 108 item kinds and 1633 fully interactable instances is employed, guaranteeing that object placements are physically feasible, natural, and realistic.

The intensity and hue of lighting elements in each scene may also be changed to reflect differences in interior lighting and time of day. Assets and more significant buildings, such as walls and doors, can be assigned different colors and textures drawn from sets of realistic colors and materials for each asset type. The variety of layouts, components, placements, and lighting combined results in an arbitrarily huge collection of settings, allowing PROCTHOR to grow orders of magnitude beyond the number of scenes currently handled by modern simulators. Furthermore, PROCTHOR allows dynamic material randomizations, which allow specific asset colors and materials to be randomized each time an environment is put into memory for training.

ARCHITECTHOR is a 3D artist-designed collection of ten high-quality, fully interactable homes intended to be used as a test-only setting for research inside domestic contexts. ARCHITECTHOR environments are more comprehensive, diversified, and realistic than AI2-iTHOR and RoboTHOR settings. In contrast to settings created using 3D scans, PROCTHOR sceneries feature wholly interactive items and support several distinct object states, allowing them to be physically moved by agents equipped with robotic arms. Researchers illustrate PROCTHOR’s ease of use and efficacy by sampling an environment of 10,000 dwellings with various layouts ranging from modest 1-room cottages to bigger 10-room houses.

Agents are trained on PROCTHOR-10K using minimal neural architectures – no depth sensor, only RGB channels, no explicit mapping, and no human task supervision – and yield state-of-the-art models on various navigation and interaction benchmarks. 

In summary, the contributions include PROCTHOR, a framework for the performant procedural generation of an infinite number of diverse, fully-interactive simulated environments, ARCHITECTHOR, a new, 3D artist-designed set of houses for E-AI evaluation, and SoTA results across six E-AI benchmarks covering manipulation and navigation tasks, including strong 0-shot results. An ablation analysis demonstrates the benefits of scaling up from 10 to 100 to 1K, then to 10K scenes, and indicates that further gains may be acquired by calling PROCTHOR to generate even wider environments. PROCTHOR will be soon open-sourced, and the code used in this project will be made available. Until then, a Google Colab notebook was made to get started on ProcTHOR-10K. 

This Article is written as a research summary article by Marktechpost Research Staff based on the research paper 'ProcTHOR: Large-Scale Embodied AI Using Procedural Generation'. All Credit For This Research Goes To Researchers on This Project. Checkout the paper and project.

Please Don't Forget To Join Our ML Subreddit