MIT researchers have developed a New Artificial Intelligence (AI) System named as ” PixelPlayer” that can look at a video of a musical performance, and isolate the sounds of specific instruments and make them louder or softer.
The system, which is “self-supervised,” doesn’t require any human annotations on what the instruments are or what they sound like.The researchers say that the ability to change the volume of individual instruments means that in the future, systems like this could potentially help engineers improve the audio quality of old concert footage. You could even imagine producers taking specific instrument parts and previewing what they would sound like with other instruments.
The system first locates the image regions that produce sounds, and then separates the input sounds into a set of components that represent the sound from each pixel.
PixelPlayer uses methods of “deep learning,” meaning that it finds patterns in data using so-called “neural networks” that have been trained on existing videos. Specifically, one neural network analyzes the visuals of the video, one analyzes the audio, and a third “synthesizer” associates specific pixels with specific soundwaves to separate the different sounds.
The fact that PixelPlayer uses so-called “self-supervised” deep learning means that the MIT team doesn’t explicitly understand every aspect of how it learns which instruments make which sounds.
News Source: http://news.mit.edu/2018/ai-editing-music-videos-pixelplayer-csail-0705
Related Videos:
MIT’s Deep-learning system generates Videos that predict what will happen next in a scene
Artificial intelligence produces Realistic Sounds that fool Humans
Watch more Artificial Intelligence Videos at our YouTube Channel Qualitypointtech