I start by asking a user whether they want to view the experience in “Stereo Mode” or “Normal Mode”. Whatever they click on, I use that click event to start playing a 500ms long mp3 of silence. When that clip finishes, I use its ‘ended’ event to start it up again. Meanwhile in the render loop, if the user enters an area where I have some auto playing narration, I set a global flag and store the track name in a global variable. Those flags get picked up the next time the 500ms track finishes, and the named track is substituted in.
This is ridiculous, and this is just to play sound files at various points in time. I wouldn’t be surprised to find out that some of Cabbibo’s work uses multiple Audio Contexts or other complicating factors that would make it difficult to retrofit.