Allen Institute for AI (AI2) has recently announced the 2.7.0 release of AI2-THOR. AI2-THOR is an open-source interactive environment for training and testing embodied AI. The 2.7.0 version of AI2-THOR contains several performance enhancements that can provide dramatic training time reductions. The new version introduces improvements to the IPC system between Unity/Python and serialization/deserialization format. It includes new actions that are much better to control the metadata.
Unity is a leading real-time development platform. To simulate AI2 created environments, AI2-THOR employs the Unity game engine. A server is designed for Python to communicate with Unity. The Unity Player process connects to the server to convey the environment’s state to the Python process.
In the latest version of AI2-THOR, the FIFOServer backend replaces the WSGIServer/JSON backend, which gives a significant boost to the performance.
Significance of FIFOServer
Unity communicates camera parameters such as depth, RGB, segmentation, etc., through a server that is launched post the AI2-THOR controlled is established. Once an action is finished, Unity’s component will collect the RGB frame from the camera and metadata on each object and agent within the scene. Then the metadata is serialized to JSON. Lastly, the entire payload is sent to Python over HTTP.
However, on analyzing the performance, it is noticed that the JSON serialization/deserialization and socket IO are significant bottlenecks. Therefore, the serialization format is switched from JSON to MsgPack/MessagePack, and the Named pipe server (having a purpose-built protocol to handle the payload) has replaced the WSGIServer.
MessagePack is an efficient binary serialization format. It enables users to exchange data among multiple languages like JSON in Python and C# (Unity). It has robust libraries and is much faster and smaller. The team states that by using MsgPack, the size of serialized metadata, serialization time, and deserialization time was reduced by 50%, 40%, and 60%, respectively. The team used Named pipes primarily for speed. They observed that with small payloads (< 128 bytes), the performance reached 100k messages per second (~10μs per message). Overall, they witnessed a 1.5x to 2x increase in FPS (frames per second) by switching to the FIFOServer.
A new action called SetObjectFilter is added for tasks (such as PointNav or ObjectNav) where one only bothers about zero or one object in the scene. It restricts the metadata to include only the explicitly specified objects. The team noticed an increase of 50% in FPS on employing this filter. However, the growth will vary depending on the number of objects in the scene and their actions.
On performing each action in AI2-THOR, a large amount of metadata is generated about the scene. The object containing information are collected, and these are serialized. They are then sent over the pipe to the Python controller for AI2-THOR.
The FastActionEmit feature
A typical type of action in AI2-THOR is retrieving a state about the environment but not manipulating it. GetReachablePositions is one such action type that queries the environment for all the possible locations in a scene for an agent to move. Since the action does not result in any changes, it must not need regenerated metadata or a fully rendered background. Earlier, this type of activity was much slower than it should have been. Instead of sending an RGB frame + full metadata payload, only a small metadata patch is sent to the python process with the FastActionEmit feature.
In the case of GetReachablePositions, the researchers observed a 2.5x increase in FPS by usingFastActionEmit. The FastActionEmit feature is enabled by default in the latest 2.7.0 version of AI2-THOR.
AllenAct is a modular learning framework that uses AI2-THOR for many different tasks, including PointNav and ObjectNav. The team used the ObjectNav implementation in AllenAct to benchmark the new release of AI2-THOR. The researchers used a single AWS p2.8xlarge machine having 8 GPUs and 16 CPU cores. It runs 60 instances of AI2-THOR with a frame resolution of 400×300. With the new FIFOServer, the team saw a dramatic speed increase to 600 fps, a 2.7x speedup, enabling them to reach an SPL of 0.15 in 9 hours of training.