Artificial Intelligence gives Sounds to Images

The Generation Z toddlers might be aware of the barn shaped toy that teaches kids fun facts about 16 different animals, called, ’See N Say The Farmer says’ toy where one has to select a page,  point the arrow to select an animal and pull the lever  to hear the farmer speak. Here comes the artificial intelligence version of it. It has been designed in such a manner that it can recognize and learn the association between the images and the sound they make.

This feat has been achieved by the researchers at Disney and ETH Zurich, Germany. A system that knows the sound of an aero plane, the sound of a chain saw, the different sounds of emergency vehicles or as simple as the sound of a slamming door can be used in the film industry during the after effects session or simply for giving voice feedback to the visually impaired.

The team took the help of a wide array of videos to collect the required data. “Videos provide the best platform to learn the collaboration between images and sounds”, said Jean-Charles Bazin,  an  Associate research scientist at Disney Research. Each frame of the video has proved to be quite erudite and at the same time, strenuous, for the researchers as the sounds associated with  video images can be highly ambiguous with sounds that hardly have anything to do with the visual content. The research team has done a plausible work in filtering out these unwanted sounds.

A study found that the system consistently performed better than the one trained with the unfiltered original video collection. It was presented at a European Conference on Computer Vision (ECCV) workshop in Amsterdam.

The research team included Matthias Soler and Andreas Krause of ETH Zurich’s Computer Science Department. Oliver Wang and Alexander Sorkine Hornung of Disney Research.

This research has carried forward Disney’s beautiful tradition of inventing new ways to tell a great story and innovating technologies for the future of the entertainment industry.