NVIDIA Research has announced the development of a new AI model, Neuralangelo, that uses neural networks for 3D reconstruction from 2D video clips. This technology can generate highly detailed and lifelike virtual replicas of buildings, sculptures, and countless other real-world objects.
The ability to translate the textures of complex materials from 2D videos to 3D assets significantly surpasses prior methods, making the 3D reconstructions easier for developers and creative professionals to create virtual objects for their projects using footage captured by smartphones.
Ming-Yu Liu, senior director of research and co-author on the paper, said:
“The 3D reconstruction capabilities Neuralangelo offers will be a huge benefit to creators, helping them recreate the real world in the digital world. This tool will eventually enable developers to import detailed objects — whether small statues or massive buildings — into virtual environments for video games or industrial digital twins.”
The technology is similar to Michelangelo sculpting stunning, life-like visions from blocks of marble. Neuralangelo generates 3D structures with intricate details and textures, which can then be imported as 3D objects into design applications for further editing. These objects can be used in art, video game development, robotics, and industrial digital twins.
The high fidelity of Neuralangelo’s 3D reconstructions means that it can recreate objects as iconic as Michelangelo’s David and as commonplace as a flatbed truck. It can also reconstruct building interiors and exteriors, demonstrated by a detailed 3D model of the park at NVIDIA’s Bay Area campus.
Prior AI models to reconstruct 3D scenes have struggled to accurately capture repetitive texture patterns, homogenous colors, and strong color variations. Neuralangelo adopts instant neural graphics primitives, the technology behind NVIDIA Instant NeRF, to help capture these finer details.
Using a 2D video of an object or scene filmed from various angles, the model selects several frames that capture different viewpoints. Once it’s determined the camera position of each frame, Neuralangelo’s AI creates a rough 3D representation of the scene.
The model then optimizes the render to sharpen the details, just as a sculptor painstakingly hews stone to mimic the texture of fabric or a human figure. The final result is a highly detailed 3D object or large-scale scene that can be used in virtual reality applications, digital twins or robotics development.
NVIDIA Research will present nearly 30 projects at the Conference on Computer Vision and Pattern Recognition (CVPR) in Vancouver from June 18-22, including Neuralangelo. The papers will span topics including pose estimation, 3D reconstruction, and video generation.
One of these projects, DiffCollage, is a diffusion method that creates large-scale content, such as long landscape orientation, 360-degree panorama, and looped-motion images. When fed a training dataset of images with a standard aspect ratio, DiffCollage treats these smaller images as sections of a larger visual, like pieces of a collage. This enables diffusion models to generate cohesive-looking large content without being trained on images of the same scale.
Neuralangelo has great potential for use in industries such as architecture, engineering, construction, and manufacturing, where detailed and accurate 3D models are essential. It could also revolutionize the entertainment industry, allowing for more realistic and immersive virtual worlds in video games and movies.
NVIDIA’s breakthroughs in AI technology have the potential to transform various industries and change the way we live and work. With Neuralangelo, NVIDIA has once again demonstrated its commitment to innovation and pushing the boundaries of what is possible with artificial intelligence.