Achieving Synced and Persistent Audio for Social VR Applications

By Robert Coomber

May 29th, 2023

Social VR applications provide users with the opportunity to share a physical space with others, offering a unique and powerful experience. However, ensuring that the visual and audio elements are properly synchronized for all participants poses a significant challenge.

The User Experience Perspective

From a user experience standpoint, achieving synchronization is crucial as it enhances the immersion factor. When audio seamlessly compliments the visual elements, it creates a heightened sense of presence and realism. Synchronized audio also facilitates better communication and collaboration among users, enabling seamless conversations and coordinated actions. These aspects are fundamental for effective social interaction in virtual spaces. It is important to note that users typically only notice synchronization issues when they occur, underscoring the necessity of implementing and thoroughly testing these details before delivering the application to users.


Within a social VR environment, synchronization becomes a complex task involving various audio sources and systems. To create an immersive and cohesive experience, player interactions, visual effects, music, gameplay mechanics, dialogue, and sound effects all need to be synchronized. Failing to achieve synchronization across these elements can result in disjointed conversations, experiences that break immersion, and difficulties in coordinating gameplay.

Networking Layer

Game networking is one of the most important aspects of a quality multiplayer experience; and not only is it important, but it is extremely difficult to manage. To design a system from scratch would be expensive, complex, and would take time which is why gameplay engineers will generally look to existing networking solutions for their VR applications. These solutions include: Photon Unity Networking (PUN), Mirror and Normcore (our favorite). Each of these examples handle common networking problems such as latency, packet loss, and VoIP and can be used to solve audio networking and sync problems.

Solution Approach

To address the synchronization challenge, a comprehensive audio networking strategy. We will explore the four primary components that need to be implemented for success:

  1. Event Sync and ID System
  2. Custom Float Value Streams
  3. State Machine to control playback

Event Sync

Implement a networked event system that notifies clients of play events for specific audio clips, their 3D positions, and any special parameters they require. Each event should have a unique identifier (event ID), allowing clients to playback specific clips. This event ID system is especially important for sounds with variations where there may be a dozen or so versions of a sound that are played often. For example, in our recent work with Gym Class VR there were a number of events such as PA announcer dialogue which needs to be synchronized when called. When a player performs a dunk, the system selects one of the announcer dunk variation clips, retrieves its ID and then broadcasts that event ID to all clients participating in the same virtual room, ensuring everyone hears the same exact dialogue line at the same time.

Value Sync

Introduce streamed values, such as float values, which are broadcasted to clients when those values change. These values can control mix levels, plugin effect parameters (e.g., reverbs, filters), or other custom effects or additional systems. For instance, in a VR sports game, a streamed float value can dynamically adjust the crowd sentiment, ensuring consistent crowd energy across all participant's headsets. Contextual events such as tied games or specific scoring events can influence values such as this as well.

State Machine Sync

Depending on the audio system, one or more state machines may manage specific playback systems. To ensure synchronization, a server-client architecture can be employed. In this architecture, the server receives gameplay events and manages the state, broadcasting audio events to all clients within the virtual room. A particular convenience is this server persists, allowing players to enter and exit seamlessly without disrupting their gameplay or social experience.

In cases where a dedicated room server is too expensive for applications with a larger player base, an alternative approach can be adopted. One of the client devices can instead be designated as the Host/Leader, taking on the responsibility of running a state machine, similar to a server.

However, two considerations should be kept in mind…

First, it is important to ensure that running the state machine and handling additional networking overhead does not negatively impact gameplay on the client device. This can be mitigated by maintaining best practices such as using efficient data models, keeping processes out of the update loop, and decoupling from the rest of the gameplay which will result in better performance.

The second consideration involves handling a scenario where a leader device disconnects from the room. In such cases, that leader role must be transferred to another candidate seamlessly. To achieve this, a JSON serialized copy of the state is passed to all headsets whenever a state change occurs. This ensures that when the Leader leaves, all clients in the room possess the necessary data to become a new leader and resume their individual state machine from where the previous leader left off.

Handling of these systems requires close coordination with a development team overseeing gameplay and networking.

Future Developments and Conclusion

As social VR experiences evolve, audio systems will continue to play a vital role in driving the overall experience. Advancements in spatial audio technology, integration of AI for real-time audio processing and dialogue, and collaborations between VR platform providers and audio technology companies are where we see the next round of advancements in this area. By leveraging these solutions such as event sync, value sync, and state machine sync, alongside addressing network challenges and prioritizing the user experience, social VR applications can achieve persistent and synchronized audio. The integration of well-synced audio enhances immersion, fosters effective communication, and contributes to a more enjoyable social VR experience overall.

Hear how we implemented these solutions in our latest work → Gym Class VR NBA Bundle.

For any questions regarding your own projects please reach out to us either by email or our socials!