Researchers at Carnegie Mellon University have demonstrated that they can combine iPhone videos shot “in the wild” by separate cameras to create 4D visualizations that allow viewers to watch action from various angles, or even erase people or objects that temporarily block sight lines.
Imagine a visualization of a wedding reception, where dancers can be seen from as many angles as there were cameras, and the tipsy guest who walked in front of the bridal party is nowhere to be seen.
The videos can be shot independently from variety of vantage points, as might occur at a wedding or birthday celebration.
It also is possible to record actors in one setting and then insert them into another.
“Virtualized reality” is nothing new, but in the past it has been restricted to studio setups, such as CMU’s Panoptic Studio, which boasts more than 500 video cameras embedded in its geodesic walls. Fusing visual information of real-world scenes shot from multiple, independent, handheld cameras into a single comprehensive model that can reconstruct a dynamic 3D scene simply hasn’t been possible.
The CMU researchers worked around that limitation by using convolutional neural nets (CNNs), a type of deep learning program that has proven adept at analyzing visual data. They found that scene-specific CNNs could be used to compose different parts of the scene.
The CMU researchers demonstrated their method using up to 15 iPhones to capture a variety of scenes — dances, martial arts demonstrations and even flamingos at the National Aviary in Pittsburgh.
The method also unlocks a host of potential applications in the movie industry and consumer devices, particularly as the popularity of virtual reality headsets continues to grow.
Though the method doesn’t necessarily capture scenes in full 3D detail, the system can limit playback angles so incompletely reconstructed areas are not visible and the illusion of 3D imagery is not shattered.
News Source: Eurekalert