Living in a dynamic physical world, it’s easy to forget how effortlessly we understand our surroundings. With minimal thought, we can figure out how scenes change and objects interact.
But it is still a huge problem for machines. With the limitless number of ways that objects can move, teaching computers to predict future actions can be difficult.
Recently, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have moved a step closer, developing a deep-learning algorithm that, given a still image from a scene, can create a brief video that simulates the future of that scene.
Trained on 2 million unlabeled videos that include a year’s worth of footage, the algorithm generated videos that human subjects deemed to be realistic 20 percent more often than a baseline model.
The team says that future versions could be used for everything from improved security tactics and safer self-driving cars.
The algorithm can also help machines recognize people’s activities without expensive human annotations.
These types of models aren’t limited to predicting the future. Generative videos can be used for adding animation to still images, like the animated newspaper from the Harry Potter books. They could also help detect anomalies in security footage and compress data for storing and sending longer videos.