Spatial audio: The revolution will be amplified

June 22, 2022 | By Keith Jordan

As individuals generate more significant amounts of data, insight-driven organizations are creating digital customer profiles that are rapidly extending outside of straightforward buying preferences and into the realm of understanding human behavior.

While companies make a concerted push to hyper-personalize their products and services, they are increasingly looking to virtual reality experiences for consumers. As this unfolds, the role of sound provides a more immersive experience and can drive emotional engagement.

Silicon Valley is currently awash with metaverse fever, and if we are to believe the hype, the holy grail of a virtual world appears to be within our reach. Yet people crave real-world, lifelike experiences in the virtual world too. As this 3D immersive environment comes into view, developers are working around the clock so that technologies can better approximate real-life experiences.  

This trend applies to sound too. The excitement in the metaverse is driving a concerted effort to enable interactive acoustics to create more emotional, atmospheric and dramatic experiences. As a result, the latest developments in acoustic technologies can place interactive dynamic sound all around you — a breakthrough known as “spatial audio.” 

This goes well beyond improving the way you binge-watch your favorite series. Spatial audio gives you a sense of space beyond conventional stereo, allowing the user to pinpoint where a sound is coming from, whether above, below, or a full 360 degrees around you.  While stereo enables you to hear things in front and to the left and right, you don’t get a sense of surround, nor from sounds above or below you. With the introduction of a new third acoustic dimension that adds depth, you get a sense of the exact location of sound sources.  

The importance of sound 

The pandemic and global lockdowns have utterly changed our sound patterns, especially in larger cities. Sounds of nature replaced the sound of traffic — people heard sounds they had never heard before. A new sonic-scape changed the ambience of cities as people became more aware of the impact of sound on their mood and behavior. Sight and sound are known as the two higher senses and the basis of emotions. Yet it is sound that more explicitly conveys feeling. Using the expressive qualities of sound, we can impact and alter the subtleties of emotion to create a more powerful connection than visual processing alone. 


Turning up the volume

Masterard’s sonic DNA is the connective tissue of the company’s first-ever music album, featuring 10 songs by 10 up-and-coming artists from around the world. Executive producer Nicolas Molinder helped the artists incorporate the brand’s melody into their songs, demonstrating how audio branding can be used innovatively. 

Read more

Take, for example, the sounds of a sporting event. As professional sports have slowly returned to empty stadiums, it’s more apparent that the crowd's noise enhances sporting events both for the viewers at home and the players in the stadium. Walking into a stadium without the usual fan cheering makes for a sterile atmosphere, which filters onto the playing field. As a result, recent games lacked the intensity of previous seasons. We could perceive a vital link between athletes’ concentration in subdued sonic situations and their responses within a packed stadium of noisy fans.  

Broadcasters also realized that something was missing from the usual experience. Soon, they began to simulate crowd noise to build a more vibrant atmosphere for viewers. Surprisingly, they turned to the world of gaming for their solution.

A videogame publisher supplied the crowd noise for the Premier Leagues' stadium soundtrack from the FIFA 20 console game. Using a system called “Atmospheric Audio," they provided 13 hours of game sounds, made from 1,300 individual assets to the English Premier League and the Spanish La Liga for live broadcasts. They created audio samples for specific moments, such as fouls when goal scoring. The producer watching the match then inserted the exact sample into the audio mix in real time to better convey how people would react if watching live. There are “home” and “away” versions for the different teams in each match to create an even more immersive sonic scape. The away team is panned off to the side to mimic the typical placement of away fans in one section of the stadium. The sound of home fans is presented more loudly for realism.  

Interestingly, this feature was also broadcast through the public address systems in the stadiums for the players’ benefit. While this proved effective, it led to an “audio-visual disconnect,” resulting from the crowd noise while viewers saw rows of empty seats.  The inclusion of seat covers and billboards in the stadiums later addressed this.  

Videoconferencing fatigue is an audio problem 

These new advances in acoustics may also transform teleconferencing. As we attend more video conferences every day, we are all experiencing call fatigue. After a call with multiple participants, we feel drained because the audio mixing and voices can significantly overload the brain, constantly assigning the sounds to the visuals. In a real-world conversation, the brain can localize sound sources in the environment, match the visual and audio sources, and create a “directional scene.” Using spatial audio in virtual meetings creates a natural acoustic scene, saves our brains unnecessary background cycles, and reduces the cognitive load — making meetings more efficient and less stressful. 

Spatial audio-enabled virtual meeting platforms are the future of video conferencing and collaboration. Distributing each person's voice in a 3D audio space makes conversations and discussions feel more lifelike and, most importantly, creates a different and more engaging atmosphere. It enables the angle of the sound source to shift as your head moves and increases immersion by further mimicking the real-world audio by adding levels and distance. To further mimic a real conversation and increase sound localization, we can also use head tracking to allow head movements and sync audio levels. 

Why does spatial audio matter? 

As our new world embraces remote working, spatial audio will facilitate better and more realistic virtual communication. Spatial audio will become even more key as we start working and collaborating in 3D worlds. Using spatial audio in 3D worlds allows us to use our natural senses to navigate and interact without cognitive reasoning. Instead of using visual communication as the primary sensory mode, we use sonics to reduce visual overload. Natural conversations allow us to work and engage with each other for extended periods and add a deeper emotional connection.  

From the sea splashing at your feet to the wind blowing around you to hearing voices behind you, audio that responds in real time using head-tracking will become a potent source of engagement in the future. 

The differences we see across video conferencing platforms are visual and constrained to the front-end experience. We will soon see a revolution in collaborative media driven by sonics. We will move away from a mono-sounding world and into a 3D spatial world. Due to significant sonic upgrades to video calls and center-stage cameras, voices will be locked into the direction of the device screen, so walking away or towards the device will automatically and subtly add the sound localization using machine learning.

An audio chat social network has recently announced spatial audio support on certain devices. This change will allow it to build a soundstage akin to an actual room. Instead of entering an audio chat room and getting a mono, flat sound experience, it will use 3D audio to place people around the “room” to allow for a “café” feel to the occasion. 

As the digital worlds bleed into the physical world, we will rely on spatial audio to enable more natural communication, provide a sense of presence and reduce cognitive overload in visual environments. Virtual reality is not just a visual experience, it is multi-sensory, and sound is a critical component in creating realistic simulated environments. Spatial audio will allow programmers to develop immersive content where sounds can come from any direction. From the sea splashing at your feet to the wind blowing around you to hearing voices behind you, audio that responds in real time using head-tracking will become a potent source of engagement in the future. 

Into the metaverse 

Augmented technology will give us different ways to perceive the world that will go way beyond sight, sound, touch, taste and smell. As we transition into the metaverse era, the use of spatial audio, in particular, will help us achieve a profound sense of presence. Realistic soundscapes will transport listeners to faraway places, evoking powerful emotions. Spatial sound will bring 3D depth to digital experiences from all around us.  We will feel we are “there” wherever “there” may be. Innovations in spatial sound represent the most significant disruption in acoustics since the shift from the silent movie era to the talkies. With the upcoming acoustic revolution, we may unlock the promise of the metaverse.  

Mastercard Signals

Digital footprints

With continued proliferation of data, how do we safeguard the privacy of individuals and keep them at the center of our product and solution design? Learn more in the latest edition of Mastercard Foundry's thought leadership series Signals.

Photo of Keith Jordan
Keith Jordan, Vice President of Innovation, Labs as a Service