Objects make distinctive sounds when they are hit or scratched. These sounds reveal aspects of an object’s material properties, as well as the actions that produced them.
MIT researchers have demonstrated an algorithm that has effectively learned how to predict sound: When shown a silent video clip of an object being hit, the algorithm can produce a sound for the hit that is realistic enough to fool human viewers.
Researchers envision future versions of similar algorithms being used to automatically produce sound effects for movies and TV shows, as well as to help robots better understand objects’ properties
The team used techniques from the field of “deep learning,” which involves teaching computers to sift through huge amounts of data to find patterns on their own. Deep learning approaches are especially useful because they free computer scientists from having to hand-design algorithms and supervise their progress.
The first step to training a sound-producing algorithm is to give it sounds to study. Over several months, the researchers recorded roughly 1,000 videos of an estimated 46,000 sounds that represent various objects being hit, scraped, and prodded with a drumstick.
Next, the team fed those videos to a deep-learning algorithm that deconstructed the sounds and analyzed their pitch, loudness and other features.
To then predict the sound of a new video, the algorithm looks at the sound properties of each frame of that video, and matches them to the most similar sounds in the database, Once the system has those bits of audio, it stitches them together to create one coherent sound.
The result is that the algorithm can accurately simulate the subtleties of different hits.
To test how realistic the fake sounds were, the team conducted an online study in which subjects saw two videos of collisions — one with the actual recorded sound, and one with the algorithm’s — and were asked which one was real.
The result: Subjects picked the fake sound over the real one twice as often as a baseline algorithm. They were particularly fooled by materials like leaves and dirt that tend to have less “clean” sounds than, say, wood or metal.
On top of that, the team found that the materials’ sounds revealed key aspects of their physical properties: An algorithm they developed could tell the difference between hard and soft materials 67 percent of the time.