A New AI Research from Stanford, Cornell, and Oxford Introduces a Generative Model that Discovers Object Intrinsics from Just a Few Instances in a Single Image

The essence of a rose is made up of its unique geometry, texture, and material composition. This can be used to create roses of varying sizes and shapes in various positions and with a wide range of lighting effects. Even if each rose has a unique set of pixel values, we can still identify them as members of the same class. 

Using data from a single photograph, researchers from Stanford, Oxford, and Cornell Tech hope to create a model that can be used to generate new shapes and images from different perspectives and lighting.

There are three obstacles to solving this problem statement:

  1. The inference issue is extremely loosely bound since there is only one image in the training dataset, and it only has a few hundred instances. 
  2. There may be a wide range of possible pixel values in these few circumstances. This is because neither the stances nor the lighting conditions have been noted or are known. 
  3. No two roses are alike, and there is a need to capture a distribution of their shape, texture, and material to take advantage of the underlying multi-view information. Hence the object intrinsics intended to infer are probabilistic rather than deterministic. When compared to current multi-view reconstruction or neural rendering approaches for a static object or scene, this is a significant departure.

The proposed approach takes object intrinsics as a starting point for inducing biases in model creation. These rules have two parts: 

  1. The instances to be presented should all have the same object intrinsic or distribution of geometry, texture, and material.
  2. The intrinsic properties are not separate from one another but rather intertwined in a particular way, as defined by a rendering engine and, ultimately, by the physical world. 

More specifically, their model takes a single input image and, using a collection of instance masks and a particular pose distribution of the instances learns a neural representation of the distribution over 3D shape, surface albedo, and shininess of the object, therefore eliminating the effects of pose and illumination fluctuations. This physically-grounded, explicit disentanglement aids in their brief explanation of the instances. It allows the model to acquire object intrinsics without overfitting the sparse observations provided by a single image. 

As the researchers mention, multiple uses are made possible by the resulting model. For instance, new instances with distinct identities can be generated by randomly sampling from the learned object intrinsics. The synthetic instances can be re-rendered with new camera angles and lighting setups by adjusting these external elements.

The team conducted thorough tests to demonstrate the model’s improved shape reconstruction and generation performance, innovative view synthesis, and relighting.


Check Out The Paper, Github, and Project Page. Don’t forget to join our 25k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com

🚀 Check Out 100’s AI Tools in AI Tools Club

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone's life easy.

🚀 LLMWare Launches SLIMs: Small Specialized Function-Calling Models for Multi-Step Automation [Check out all the models]