Researchers From Allen Institute for AI Releases AI2-THOR 3.0, An Embodied AI Framework for Visual Object Manipulation In Robotics Testing Scenario


The Allen Institute for AI (AI2) announces the release of AI2-THOR 3.0, which is an embodied AI framework. Embodied Artificial Intelligence is a sub-specialty of artificial intelligence at the intersection of robotics, computer vision, and natural language processing and is an emerging area of interest for researchers.

In recent years, Embodied AI has seen impressive progress, especially in navigating agents with environments. This progress has laid a building block for researchers to study the tasks requiring agents to actively interact with the objects in their environment.  

Object manipulation (subdomain of robotics) poses various challenges such as manipulator motion, manipulation of mobile agents, grasping and long-horizon planning, generalization to unseen environment and objects, etc. 

AI2-THOR is the first testing framework to deal with the problem of object manipulation in more than a hundred visually rich, physics-enabled rooms. It already had an impressive testing framework; now, it has also added active object manipulation to the framework. AI2-THOR also celebrates its five years by adding an arm called ManipulaTHOR. ManipulaTHOR is an extension to the AI2- THOR framework that adds arms to its agents, allowing them to not only navigate in their environment but also manipulate the objects within them.

Figure 1: Arm Design and Kinematic constraints. Source:

ManipulaTHOR is a first-of-its-kind virtual agent. It’s a highly articulated robotic arm equipped with three joints of equal limb length. It is entirely composed of swivel joints for a more human-like approach to object manipulation. ManipulaTHOR allows much faster training of manipulation models, even in more complex environments than the current real-world training methods. It is safe to use and more cost-effective.

“Imagine a robot being able to navigate a kitchen, open a refrigerator and pull out a can of soda. This is one of the greatest and yet often overlooked challenges in robotics, and AI2-THOR is the first to design a benchmark for the task of moving objects to various locations in virtual rooms, enabling reproducibility and measuring progress,” says Dr. Oren Etzioni, CEO at AI2. “After five years of hard work, we can now begin to train robots to perceive and navigate the world more like we do, making real-world usage models more attainable than ever before.” 

The visual reasoning aspect of object manipulation has been one of the biggest challenges in robotics that researchers face. It’s difficult for the robots to perceive, navigate, act correctly, and communicate with the outer world objects. AI2-THOR solves this problem with complex simulated testing environments that can be used to train the robots for eventual activities in the real world.

As a first step towards generalizable object manipulation, the researchers proposed a task of ARMPOINTNAV, which involves the agent moving in the scene towards the object, picking it up, and moving it to the desired place. The end-to-end ARMPOINTNAV model gives strong baseline results and demonstrates the model’s ability to navigate and move objects within the environment. This result provides a strong foundation for learning generalizable object manipulation models.

Figure 2: Qualitative results of ARMPOINTNAV model

AI2-THOR enables researchers to devise solutions that efficiently address the object manipulation issue and other traditional problems associated with robotics testing. AI2-THOR has enhanced research on many different areas like navigation, instruction-following, multi-agent collaboration, performing household tasks, reasoning tasks, etc. 




Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.