Dolby Atmos Music Masters for Real-Time Applications

By Julian Messina and Robert Coomber

June 11th, 2022.

The purpose of this article is to promote the adoption of popular immersive music formats, such as Dolby Atmos, within game engine environments. We created a sample use-case attempting to reproduce a Dolby Atmos immersive music mix using current tools available to us, those being Wwise and Unity. The following describes our thought processes, highlights issues, describes setup and execution, as well as a brief perceptual impression, all to illustrate what is possible right now. It’s important to note, there is no officially supported way to import a singular Dolby Atmos masterfile into a game engine or audio middleware solution; this is an attempt to bridge this gap. All assets must be prepared meticulously, and those used within this project are as a means of research for the broader audio and VR community.

Project Ideation and Content Samples

Apple & Dolby

Apple and Dolby’s recent partnership has enabled an accelerated adoption, consumption, and creation of products based around immersive audio formatting. To make matters more interesting, Dolby additionally integrated their creation tools within Logic Pro, allowing authorship of creator’s own immersive audio content in a much more simplified and accessible manner to most new and working audio engineers.

Atmos Has Potential

We feel the Atmos format itself has an interesting potential for real-time applications and is currently being underutilized. Contained within an Atmos file is linear program material which can be read and written with automation metadata, and depending on the hardware this metadata can inform a 3D audio renderer in real-time where to place Audio Objects in 3D audio space.

Wwise Audio Objects

Wwise has recently begun to accommodate the use of Audio Objects for real-time applications which is an exciting opportunity for both consumers and creators. We see this as next steps towards experiencing what we’ve personally wanted, which is a blend of both real-time rendering of music sources as Audio Objects authored originally as linear content, while offering the flexibility of non-linear functionality such as head-tracking/6DOF from devices like VR headsets or mobile hardware, all while retaining the same sonic experience as was monitored during the creation process.

Sample Dolby Atmos Logic Project

After having used Logic Pro again recently we noticed a commercial music artist ‘Lil Nas X’ had a Demo Project which featured his popular single, ‘Montero’, mixed for Dolby Atmos utilizing ‘56’ Audio Objects and one 7.1.4 Bed. After discovering the existence of these available assets we felt there was an opportunity to re-purpose them as a use-case piece. Additionally, we streamed the music video content as a visual aide and muted the original audio.

Lil Nas X - Montero Logic Sample Project

Project Goals

  1. Create an immersive music experience from a Dolby Atmos masterfile to be run in real-time for VR as a use-case.
  2. Preserve integrity of the original Atmos mix as much as possible. 
  3. Repurpose metadata from the Atmos masterfile to inform the native 3D audio renderer on a Windows Desktop machine. 
  4. Compare the auditory performance of both the “Dolby Atmos for Headphones” renderer and the default “Windows Sonic for Headphones” renderer.
  5. Observe and update the audio community of the current state of Atmos masterfile integration methods.

Issues

Atmos Masterfile Import

After reading through Kristoffer Larson’s three-part blog on Dolby Atmos for Game Music, we discovered that there currently is no means of directly importing a Dolby Atmos masterfile. Larson additionally articulated other valuable considerations when bringing in Atmos assets, some of which pertain to properly exporting Beds using “Wwise’s Multi-channel Creator”. We highly recommend those interested in reading through their article.

Handling Atmos Animations in Wwise

In addition to Larson’s informative posts on workflow and pipeline considerations we also referenced Damian Kastbauer’s Wwise Blog post: “Create with Audio Objects in Wwise”. His video covers Audio Objects in-depth helping users to get more acquainted with the Wwise Audio Object environment. While watching his video we observed a potential opportunity to use the 3D positioning feature with written automation within Wwise’s editor as a means to predefine how Audio Objects could be panned before ever actually connecting to a game engine. A simple program was written to automate the translation of Atmos animation data to Wwise by appending positional data to the Wwise Project File.

The draw-back we discovered was the animation keyframe limit. This limit amounted to a little over 4000 keyframes (64 keyframes * 64 paths) per object, which sounds large enough but there is no limit for a Atmos masterfile meaning certain Atmos files with large amounts of animations cannot be re-created properly. Discovering these limitations was an important part in conducting our research in that we would essentially have to create an entire Audio Object monitoring experience within a game engine rather than apply animations directly through the middleware to be handled on lower level operations. Building a custom Unity application would ultimately suffice for our use-case, however this would amount to added work in our research and ultimately require careful thought in actions scripted and applied.

Windows Object Renderer Limitations

For the purposes of this project it was necessary to create two new Dolby Atmos masterfiles based on the original exports from Logic Pro. Reason being, Microsoft’s Spatial Audio SDK states there’s an existing Audio Object limitation of ‘16’ when rendering Audio Objects via “Dolby Atmos for Headphones”, while the native “Windows Sonic for Headphones” can accommodate ‘111’ Audio Objects simultaneously. Both versions can accommodate a multi-channel Bed width of 7.1.4. Due to the Audio Object limitation, The “Dolby Atmos for Headphones” version required mixing some of the original ‘56’ Audio Objects to the 7.1.4 Bed, thus making a mix decision which we were doing our best to avoid. We’re not exactly sure of the limitation disparity between both options, more information from Dolby would be appreciated in understanding why that is, especially when considering “Windows Sonic for Headphones” offers a higher limit.

💡 Audio assets within Wwise were set for streaming to heavily reduce memory usage.

Execution

Original Project File

Having made decisions on how to prepare for the Dolby Atmos version of the research we then opened the ‘Lil Nas X’ Logic Pro Spatial Audio session and prepared an “ADM BWF.wav” masterfile export from one of the menu options. The “BWF” is an acronym for “Broadcast Wav Format” while “ADM” stands for “Audio Definition Model”. The BBC does a great job defining the ADM portion as “an ITU (International Telecommunication Union) specification of metadata that can be used to describe object-based audio, scene-based audio and channel-based audio”. Dolby actually has its own particular ADM profile which has unique specifications and instructions on how their audio is treated by a renderer and the likes. More information about that profile can be found on their website.

Creating new Project Files

Once we had the “ADM BWF.wav” we then prepared a ‘16’ Object version in Pro Tools by importing the masterfile as ‘Session Data’ which conveniently brings not only the Bed and Audio Objects, but also the corresponding metadata for those Objects to be handled by the renderer. In order to make a decision on which ‘16’ Audio Objects to keep, we parsed out the original Audio Objects based on their respective frequency content and general timbre. The more percussive, transient, and higher frequency content was left to be rendered as Audio Objects while thoses that were weighted as more dominant in bass frequencies would then be bused to the 7.1.4 Bed. The entire project was created as a new masterfile within the standalone “Dolby Atmos Production Suite” or “Home Theater” renderer. We did a visual/auditory check on both the ‘16’ Object and full ‘56’ Object versions. 

Exporting Audio Files for Wwise

Following the check we exported those assets from those newly created masterfiles to create 7.1.4 multi-mono versions of the Beds. We then performed additional conversions from the masterfiles using Dolby’s “Conversion Tool '' allowing us to create “cinema .rpl” exports, or dub-outs, providing us all the Audio Objects as individual mono audio files from the newly created masterfiles. The last necessary piece we would need before moving on to Wwise and Unity would be a “.metadata” file which is accessible after converting or re-rendering an “ADM BWF.wav” as a “.atmos” file. More information on Atmos files and creation of them can be found here.

Wwise Project Creation

Now that we had audio assets and metadata from our masterfiles we could begin building our Wwise project. Two sets of Atmos assets were used. The first set included: (1) 7.1.4 interleaved Bed of audio and (16) individual Audio Objects. The second set included: (1) 7.1.4 interleaved Bed of audio and (56) individual Objects. It's important to note our Beds started as individual mono source files and were then interleaved using Wwise’s “Multi-channel Creator”. All of the assets were imported directly into Wwise as SFX elements and bused accordingly. Wwise accommodates the necessary classifications for this process and can be observed via Kastbauer’s Wwise Blog post: “Create with Audio Objects in Wwise” and video. The Atmos asset of (16) Objects was given its own Soundbank and likewise for the one with (56) Objects. 

Certain considerations were made on how the Objects and Beds were treated. Since there was in fact reference video for the purposes of research synced to the music. We decided to add a subtle amount of spatialization to the Bed as to help balance the sonic image dependent on where someone looked around in the scene. This must be done with care though as Wwise warns that height channels will all-together disappear if the Positioning slider is set all the way to 3D Position, for this project we used a value of ‘54’ . Likewise, ‘Play Events’ were created for every SFX element and attached to Game Objects within our Unity scene, which would then relay positional information, handled by Wwise as Audio Objects, to then be forwarded to the renderer on every frame update. For the purposes of this research reverb was not included since we wanted to observe the Dolby and Windows rendering process of audio, as-is and without additional colorations or a misleading representation of space.

In order to confirm, visualize, and identify Audio Objects within our Unity scene, RTPC values were created in Wwise, forwarding relevant audio amplitude data to visualize audio levels of those Audio Objects, similar to how Dolby visualizes their Audio Objects. 

Unity Project Setup

A converted “ADM BWF.wav” will output: “.atmos”, “.audio”, and “.metadata” files. Contained within the “.metadata” file is object position metadata in the YAML format. We found that using the “.metadata” file is easier to parse and read as plain text compared to the XML formatted metadata contained within the “ADM BWF.wav” file. The specific metadata needed is the Object ID, Object Position, Object Size, and Object Sample Position for each keyframe. These specific Audio Object data points were translated into gameobject positions at specific times as animations within Unity. A simple ScriptableObject is used to store each Object’s corresponding Wwise event. These events include Play events for the 7.1.4 bed, play events for each Audio Object, and pause/resume events for Wwise’s master bus. 

Following integration between both the Wwise and Unity projects, we made the proper connections and switched between corresponding Soundbanks as well as 3D audio renderers within Windows sound settings to monitor the effects on all audio passing through an interface to our headphones.

Results and Conclusions

💡 Use headphones and check volume

7.1.4, 16 Objects, Dolby Atmos for Headphones

Audio Device Editor (Dolby Atmos)

7.1.4, 56 Objects, Windows Sonic for Headphones

Audio Device Editor (Windows Sonic)

Logic Renderer Reference

Analysis

Both 3D audio renderers were successful in binauralizing the Beds and Objects, though they each had unique auditory thumbprints on the content respective of each 3D audio renderer. The Atmos version with (16) Objects was not quite what I’ve come to expect after having been monitoring in binaural via Dolby’s “Home Theater Renderer''. Additionally it's worth pointing out that just because we could forward some of the metadata from the masterfile did not mean we could utilize every bit of metadata created within that masterfile. For example, we could forward positional information however we couldn’t forward “binaural rendering modes”. The original masterfile included metadata which differentiated Audio Objects to be rendered with different “binaural rendering modes” which include: Off, Near, Mid, and Far. A compromise in bringing the masterfile over to a real-time environment was this “binaural rendering mode” attribute did not appear to be adjustable via the “Dolby Atmos for Headphones” renderer nor the “Windows Sonic for Headphones” renderer, which was loosely covered once on a Dolby forum by a Dolby employee, who essentially said the renderer used the Mid setting at all times. In addition to the lack of being able to apply said settings, you may also observe a lack of the same type of enveloping reflections one would typically get from the stand alone “Home Theater” binaural renderer. 

Object Monitoring

For demonstration purposes I’ve included a solo’d or isolated Audio Object being monitored within Pro Tools as well as that same Audio Object solo’d in Wwise, you can also observe the audible differences in the ways the “Windows Sonic for Headphones” handles its reflections. We would note, as a personal opinion, that the envelopment of “Windows Sonic for Headphones” monitored from Wwise behaves more similarly to the version coming from Pro Tools into the “Home Theatre” renderer with the difference of course being timbre and frequency content.

Isolated "Clap" - Pro Tools Atmos Renderer

Isolated "Clap" - Dolby Atmos for Headphones With Wwise

Isolated "Clap" - Windows Sonic for Headphones With Wwise

Dolby Atmos for Headphones

Despite the apparent lack of envelopment, we thought the “Dolby Atmos for Headphones” renderer handled the music Bed quite well. Elements of the mix still had the same amount of overall punch and clarity that one may come to expect from the pop/hip-hop genre featured in our use-case. However, the experience suffered from hearing only (16) of the (56) total Audio Objects. Additionally the intended “binaural rendering mode” settings could not be carried over from the metadata, creating noticeable differences in envelopment, as opposed to those produced via the standalone “Home Theater” renderer.

Windows Sonic for Headphones

The “Windows Sonic for Headphones'' renderer on the other hand didn’t reproduce the Bed quite as well, we think based on what we perceived, the improved reflections seemed to wash over the mix a bit bringing additional coloration and a subtle inconsistency in the original dynamics of the mix, as a presumption this may have been in part due to potential summing of different phasing of audio, but because of those added reflections the Object localization heard would improve a sense of localization and envelopment within the listening environment; and of course we had access to the original (56) total Audio Objects. Though one can’t necessarily be too critical of “Windows Sonic for Headphones” as it was created for music specifically.

Parting Thoughts

Now at the end of the day is the disparity in differences terrible? We think it depends on who’s being asked that question. The original mixing engineer will notice a difference. So to confidently say they’re the same mix, would be not accurate at the present. However, to the consumer, even where things stand at the moment, it could be a whole hell of a lot of fun to consume an immersive audio experience within this particular context. We also believe there's temporarily an acceptable threshold, in this case it was a tad bit much, but still pretty good we felt, certainly worth incorporating early on and improving on for the value of getting 6DOF from a linearly produced piece of content. Mobile technically supports Dolby Atmos, but there is no support to render Audio Objects in 3D and the Beds will be downmixed to 5.1.2 as stated on their website. We feel supporting these efforts to incorporate true immersive audio experiences, rendered as advertised early on would be advised as there is already an effort to more realistically render audio to custom HRTF profiles provided by individual consumers. Technologies like VR and AR could be improved significantly faster should that be the case, whether its functional sound or for entertainment. But that’s just our opinion.

What Do You Think of All of This?

Speaking directly to our fellow audio nerds, if you or anyone else have questions or information, and or considerations; please reach out to us here or on our twitter! The purpose of this was to help educate and discover workaround solutions for getting more out of these underutilized audio formats in emerging consumer technologies.

Other Spatial Audio Projects by 4th Floor

Previously we’ve brought music into a 3D audio environment using Higher Order Ambisonics with no Dolby Atmos Objects. Try it and read more here

Julian Messina and Robert Coomber run and operate a small company called 4th Floor based in Los Angeles, CA. Our specialty is using real-time solutions to bring 3D audio, particularly music, to immersive consumer technologies including mobile and VR platforms such as the Quest VR headset.