Apple’s Machine Learning Team Introduces ‘Hypersim’: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding


The computer vision community has been eager to find a way for computers and humans alike to be able to understand the complexity of indoor scenes. They have created photorealistic synthetic datasets with no ground truth labels with interactive simulation environments that push rapid progress towards a holistic understanding of these surroundings.

However, existing synthetic datasets and simulators have limitations that make them fall short. These limitations include:

  • Synthetic datasets can be generated from 3D assets that are not publicly available and do not include the underlying 3D assets used during rendering. These data sets have limited use because they lack important information required for geometric learning problems, such as triangles in a mesh form or other geometry types.
  • Synthetic datasets are often not able to provide semantic segmentations in their data. This means that these segments may group pixels together rather than semantically meaningful objects. They can also lack the ability for a computer program or human user to understand which clusters are more important than another cluster.
  • Third, most datasets and simulators do not factor images into disentangled lighting and shading components, making them unsuitable for inverse rendering problems. No existing synthetic dataset or simulator addresses all of these limitations, including those that have been developed to help understand outdoor scenes better.

Apple researchers have developed ‘Hypersim,’ a photorealistic synthetic dataset for holistic indoor scene understanding that addresses all of the limitations described above. 

To create ‘Hypersim‘ dataset, apple researchers have used a large repository of synthetic scenes created by professional artists. They generated 77,400 images of 461 indoor scenes with detailed per-pixel labels and corresponding ground truth geometry.

Hypersim is a dataset that can provide realistic 3D scenes with high-resolution textures and dynamic lighting. The Hypersim database includes complete scene geometry, material information, and lighting information for every image in addition to dense per-pixel semantic instance segmentation for all images. These features make the Hypersim dataset an excellent choice for any geometric learning problem that requires direct 3D supervision, as well as multi-task problems involving reasoning over multiple input and output modalities.

The researchers analyzed the ‘Hypersim’ dataset at a variety of different levels. They found that it was possible to generate this entire data set from scratch for roughly half of what it would cost to train an advanced natural language processing model.




A message from Asif Razzaq, Co-founder of Marktechpost:

Show your support for our mission ‘making AI understandable for all’ by joining/connecting through our 34k+ FB GroupLinkedIn Page and Quora AI Group.

Advertisement/Sponsored Post:

If you are a company looking to promote your product/webinar/conference/service, feel free to reach out via email to [email protected] We offer sponsored posts and advertisements.