A Blog by Jonathan Low

 

Mar 5, 2020

Microsoft AI Generates 3D Objects From 2D Images

The ability to convert images into objects could eventually save significant time and cost. JL

Kyle Wiggers reports in Venture Beat:

They train a model for 3D shapes that generates images matching the distribution of a 2D data set. The generator model takes in a random input vector (values representing the data set’s features) and generates a voxel representation (values on a grid in 3D space) of the object. Their approach takes advantage of the lighting and shading cue provided by the images, enabling it to extract more meaningful information per training sample. It’s able to produce realistic samples when trained on data of natural images. “Differences in light exposures between surfaces enables it to capture concavities and hollow spaces.”
The AI research labs at Facebook, Nvidia, and startups like Threedy.ai have at various points tried their hand at the challenge of 2D-object-to-3D-shape conversion. But in a new preprint paper, a team hailing from Microsoft Research detail a framework that they claim is the first “scalable” training technique for 3D models from 2D data. They say it can consistently learn to generate better shapes than existing models when trained with exclusively 2D images, which could be a boon for video game developers, ecommerce businesses, and animation studios who lack the means or expertise to create 3D shapes from scratch.
In contrast to previous work, the researchers sought to take advantage of fully-featured industrial renderers — i.e., software that produces images from display data. To that end, they train a generative model for 3D shapes such that rendering the shapes generates images matching the distribution of a 2D data set. The generator model takes in a random input vector (values representing the data set’s features) and generates a continuous voxel representation (values on a grid in 3D space) of the 3D object. Then, the voxels are fed to a non-differentiable rendering process, where they’re thresholded to discrete values before they’re rendered using an off-the-shelf renderer (the Pyrender, which is built on top of OpenGL).
A novel proxy neural renderer directly renders the continuous voxel grid generated by the 3D generative model. As the researchers explain, it’s trained to match the rendering output of the off-the-shelf renderer given a 3D mesh input.
Microsoft 3D model
Above: Couches, chairs, and bathtubs generated by Microsoft’s model.
Image Credit: Microsoft
In experiments, the team employed a 3D convolutional GAN architecture for the above-mentioned generator. (GANs are two-part AI models comprising generators that produce synthetic examples from random noise sampled using a distribution, which along with real examples from a training data set are fed to the discriminator, which attempts to distinguish between the two.) Drawing on a range of synthetic data sets generated from 3D models and a real-life data set, they synthesized images from different object categories, which they rendered from different viewpoints throughout the training process.
Microsoft 3D model
Above: Mushrooms generated by the model.
Image Credit: Microsoft
The researchers say that their approach takes advantage of the lighting and shading cue provided by the images, enabling it to extract more meaningful information per training sample and produce better results in those settings. Moreover, it’s able to produce realistic samples when trained on data sets of natural images. “Our approach … successfully detects the interior structure of concave objects using the differences in light exposures between surfaces,” wrote the paper’s coauthors, “enabling it to accurately capture concavities and hollow spaces.”
They leave to future work incorporating color, material, and lighting prediction into their system to extend it to work with more “general” real-world data sets.

1 comments:

Mia said...

A 3D configurator https://cyber-fox.net/blog/turn-2d-image-to-3d-model helps you provide your customers with a realistic virtual representation of your products. It uses your product files and specifications to create high-fidelity renderings of your products. The results can be almost magical. ATLATL's 3D specialists are the best in the business and can work with existing or new 3D files.

Post a Comment