Our environment is filled with acoustic information. When we walk into a hall, the acoustic and visual information can help us map the hall and tell how big the theater is. If we drop a glass, the sound can tell us if it’s broken or not.
The creation of continuous, differentiable representations of the visual world directly from raw picture observations has been made possible by recent advancements in implicit neural representations. But there has been less research in the field of acoustic models of the world. However, we perceive the world both in visual and acoustic input.
Past works have explored capturing the acoustic of a scene, but these works require manual designing of the acoustic functions. A team of researchers at MIT has developed a machine learning model that captures how a sound will propagate through space, which enables it to simulate what a listener would hear at different locations. This modeling also helps the system to make a 3-D model of the room where the sound is propagating, which the researchers can use to build a visual 3-D render of the room.
There were several challenges the team faced. The first was the representation of acoustic reverberation(impulse response). It is easy to represent the visual appearance using a 3-D vector at the same time, acoustic reverberations have more than 20000 values which makes it challenging to express. The second was to develop a densely generalizable acoustic neural representation for new emitter-listener locations.
To overcome these, the team devised Neural Acoustic Fields (NAFs). NAFs encode and represent the impulses-response in the Fourier Frequency domain to capture their complex signal representation. NAFs learn a helpful representation that enables them to extract structural information about the image by modeling the dense acoustic fields of an environment.
The team showed that their method allowed the prediction of environmental reverberations even at unobserved points in the scene by locally conditioning NAFs on the underlying scene geometry. Additionally, they showed how practical and valuable the auditory representations acquired by NAFs are for facilitating audio-visual cross-modal learning and inferring scene organization.
In conclusion, Neural Acoustic Fields (NAFs) are implicit neural fields that compactly and continuously record a scene’s underlying acoustics. In terms of simulating scene acoustics, NAFs perform better than baselines and offer a thorough analysis of their design decisions.
This technology can be instrumental in fields like Augmented reality and Virtual Reality. It can also help Artificial Intelligence better understand the world around it by better representing the world.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'Learning Neural Acoustic Fields'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and reference article.
Please Don't Forget To Join Our ML Subreddit
Rishabh Jain, is a consulting intern at MarktechPost. He is currently pursuing B.tech in computer sciences from IIIT, Hyderabad. He is a Machine Learning enthusiast and has keen interest in Statistical Methods in artificial intelligence and Data analytics. He is passionate about developing better algorithms for AI.