In the past few months, Generative AI has become progressively popular. From multiple organizations to AI researchers, everyone is discovering the massive potential Generative AI holds to produce unique and original content. With the introduction of Large Language Models (LLMs), a number of tasks are conveniently getting executed. Models like DALL-E, developed by OpenAI, which enables users to create realistic pictures from a textual prompt, are already being used by more than a million users. This text-to-image generation model generates high-quality images based on the entered textual description.
For 3-dimensional image generation, a new project has recently been released by OpenAI. Called Shap·E, this conditional generative model has been designed to generate 3D assets. Unlike traditional models that just produce a single output representation, Shap·E generates the parameters of implicit functions. These functions can be rendered as textured meshes or neural radiance fields (NeRF), allowing for versatile and realistic 3D asset generation.
While training Shap·E, researchers first trained an encoder. The encoder takes 3D assets as input and maps them into the parameters of an implicit function. This mapping allows the model to learn the underlying representation of the 3D assets thoroughly. Followed by that, a conditional diffusion model was trained using the outputs of the encoder. The conditional diffusion model learns the conditional distribution of the implicit function parameters given the input data and thus generates diverse and complex 3D assets by sampling from the learned distribution. The diffusion model was trained using a large dataset of paired 3D assets and their corresponding textual descriptions.
Shap-E involves implicit neural representations (INRs) for 3D representations. Implicit neural representations encode 3D assets by mapping 3D coordinates to location-specific information, such as density and color, to represent a 3D asset. They provide a versatile and flexible framework by capturing detailed geometric properties of 3D assets. The two types of INRs that the team has discussed are –
- Neural Radiance Field (NeRF) – NeRF represents 3D scenes by mapping coordinates and viewing directions to densities and RGB colors. NeRF can be rendered from arbitrary viewpoints, enabling realistic and high-fidelity rendering of the scene, and can be trained to match ground-truth renderings.
- DMTet and its extension GET3D – These INRs have been used to represent a textured 3D mesh by mapping coordinates to colors, signed distances, and vertex offsets. By utilizing these functions, 3D triangle meshes can be constructed in a differentiable manner.
The team has shared a few examples of Shap·E’s results, including 3D results for textual prompts, including a bowl of food, a penguin, a voxelized dog, a campfire, a chair that looks like an avocado, and so on. The resulting models trained with Shap·E have demonstrated the model’s great performance. It can produce high-quality outputs in just seconds. For evaluation, Shap·E has been compared to another generative model called Point·E, which generates explicit representations over point clouds. Despite modeling a higher-dimensional and multi-representation output space, Shap·E on comparison showed faster convergence and achieved comparable or better sample quality.
In conclusion, Shap·E is an effective and efficient generative model for 3D assets. It seems promising and is a significant addition to the contributions of Generative AI.
Check out the Research Paper, Inference Code, and Samples. Don’t forget to join our 20k+ ML SubReddit, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at Asif@marktechpost.com
Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.