I got pretty deep into building a modular synthesis environment using it (https://github.com/rsimmons/plinth) before deciding that working within the constraints of the built-in nodes was ultimately futile.
Even building a well-behaved envelope generator (e.g. that handles retriggering correctly) is extremely tricky with what the API provides. How could such a basic use case have been overlooked? I made a library (https://github.com/rsimmons/fastidious-envelope-generator) to solve that problem, but it's silly to have to work around the API for basic use cases.
Ultimately we have to hold out for the AudioWorklet API (which itself seems potentially over-complicated) to finally get the ability to do "raw" output.
I will second this. I wanted to make a live streaming playback feature using the API so I could remotely monitor an audio matrix/routing system that I have in the office.
The API has _zero_ provision for streaming MP3. You either load and playback a complete MP3 file or you get corrupted playback because the API simply won't maintain state between decoding calls.
What I ended up having to do was write a port of libMAD to JavaScript and then use that to produce a PCM stream, which I _could_ then convert into an AudioBuffer, attach a timer, and then send into the audio API for correct playback.
Which is an insane amount of work for a gaping oversight in a common use-case of the API, a simple flag in the browsers native decoder would've sufficed.
Did you look into Media Source Extensions[0,1]? Fetching and playing the various audio formats is a bit outside the purview of Web Audio. But you can feed streaming MSE into Web Audio. If I recall, you use Web Audio's `AudioContext.createMediaElementSource()` to use a (potentially chunked) MSE source with web audio, but it's been a while since I did this.
That said, Media Source Extensions (MSE) is only supported on relatively modern browsers (IE11+) but you should be able to use it to stream mp3 to the Web Audio API on supported browsers.
There's also a way to do this without using MSE for older browsers. See the 72lions repo below for an example[2]. It's a bit convoluted, but not as much work as your workaround. As described in the README of the 72lions proof-of-concept:
"The moment the first part is loaded then the playback starts immediately and it loads the second part. When the second part is loaded then then I create a new AudioBuffer by combining the old and the new, and I change the buffer of the AudioSourceNode with the new one. At that point I start playing again from the new AudioBuffer."
0. https://developer.mozilla.org/en-US/docs/Web/API/Media_Sourc...
The MP3 issues don't end there, which is something the article touches on obliquely: you can't reuse many of the important constructs you might want to.
Here's my use case. I have a couple of games (https://arcade.ly/games/starcastle, https://arcade.ly/games/asteroids), each of which has three pieces of music: title screen, in game, and game over. If you play the game a couple of times you're going to hear the title screen audio probably once, in game twice or more (because it loops from the beginning after every playthrough), and game over twice. To put it simply: I need to play the same MP3s multiple times each.
To play an MP3 you have to decode it, which is an expensive operation. Firstly it takes time to decode - enough time that the user will notice the lag even on a fast machine. However the main problem is the amount of memory use: decoding takes you from a couple of MB of compressed MP3 to potentially hundreds of MB of uncompressed audio. The problem worsens for multiple tracks.
I discovered the memory issues via Chrome Task Manager, when I noticed my page using hundreds of MB of native memory, and traced this usage back to the music. You can often get away with this when running on a desktop browser, but not so much on mobile.
You can mitigate the memory issue to some extent by dropping the sample rate of your uncompressed PCM audio to 22.05KHz, which obviously halves its uncompressed size. Quality starts to suffer too much for music if you go much below this though. (Note here that I'm talking about the uncompressed sample rate, and NOT the MP3 bitrate. A 44.1KHz MP3 encoded at 64Kbps and one encoded at 128Kbps will decompress to the same size, although the 64Kbps version will obviously sound worse because more information will have been lost.)
But the inability to reuse a source buffer, which holds compressed audio, is absolutely aggravating, and something I've posted at length about here: https://github.com/WebAudio/web-audio-api/issues/1175. The reason you might want to do this is because it means you're only using as much memory as the compressed audio takes up and (hopefully) the rest will have been freed by the browser's runtime (no guarantees, obviously).
The downside of this approach is that you can't start a piece of music at a defined instant, which is extremely frustrating when you might want to synchronise it with events happening on screen.
Also, due to the re-decoding every time, and the asynchronous nature of such, I've now introduced a weird bug where it's possible to end up with both title and in game music playing at the same time if the user starts the game before decoding the title music is complete. It's fixable (although I haven't had time yet), but it's just one more irritation with a poorly designed API.
I'm actually thinking of going back to using the good old HTML5 AUDIO element just for playing music, since it seems a bit more reliable, but I need to do some experimentation to see what the memory impact is. I also had issues with AUDIO misbehaving quite badly in Firefox with multiple sounds playing simultaneously.
Sound effects are less of an issue because they're obviously quite short and therefore don't take an excessive amount of memory even when uncompressed, so I can at least keep buffer sources around for them. Nonetheless the API's excessive complexity shows through even here: why is it such a drama just to play a sound? Why do I need to create and connect a bunch of objects together just to play a single sound at a given volume? Ridiculous. Asinine.
Within that constraint I don't think it's a terrible API, but it's a big constraint and naturally raw access would be far preferable.
That being said, the AudioParam "automation" methods still make me want to cry.
I just threw up in my mouth a little :/
If you give Web developers access to raw samples, they are going to expect it to work. When it doesn't on Chrome on Android, lots of people are going to start complaining and filing bugs.
So, instead of fixing the audio path, they decided to bury its crappiness under a "higher-level" API which has fuzzier latency and can be built with hacks in the audio driver stacks themselves.
https://issuetracker.google.com/issues/36908622
AAudio is a new C API. It is designed for high-performance audio applications that require low latency. It is currently in the Android O developer preview and not ready for production use. (Jun 2017)
This demo doesn't keep a straight 120 BPM on my machine, it's incapable of holding the rhythm after 10 seconds of playback ( I tried the first patch on the left , Edge browser).
The MPU-401 also had a "dumb" a.k.a. "UART" mode. You had to do everything yourself... and therefore could do anything. It turned out that early PCs were fast enough -- especially because you could install raw interrupt service routines and DOS didn't get in the way. :)
As a sequencer/DAW creator, you really want the system to give you raw hardware buffers and zero latency -- or as close to that as it can -- and let you build what you need on top.
If a system is far from that, it's understandable and well-meaning to try to compensate with some pre-baked engine/framework. It might even meet some folks' needs. But....
And on that note, what I think Web Audio tried to be was a drop-in kit for game engines. Getting the full functionality of Unreal into the browser motivated the requirement for audio processing. But the actual implementation was muddled from the start: basic audio playback remains challenging(try to stream a BGM loop instead of load+uncompress and discover to your woe that it's not going to loop gaplessly, even when the codec is designed to allow that.) and my hobby stab at an independent implementation ran out of gas when I tried to get their envelope model working. The spec has a lot of features but not enough detail, and my morale sank further when I looked at how Chrome did it(stateful pasta code). I got something half-working, put it aside and never came back.
OTOH I had also tried Mozilla's system. That was very simple, and I got a synth working in no time at all with decent performance and latency. Optimizing from that point would have been the way to do it, but something in browser vendor politics at that time led to it being dropped.
Very few games used the MPU-401's intelligent mode, actually. Never mind how I know, that was a long time ago...
(If I could return to Cakewalk, I would. Wrote some of my best tracks with that little ISR of yours!)
I've been heavily into procedural audio for a year or two, and have had no big issues with using Web Audio. There are solid libraries that abstract it away (Tone.js and Tuna, e.g.), and since I outgrew them working directly with audio nodes and params has been fine too.
The big caveat is, when I first started I set myself the rule that I would not use script processor nodes. Obviously it would be nice to do everything manually, but for all the reasons in the article they're not good enough, so I set them aside, and everything's been smooth since.
So I feel like the answer to the articles headline is, today as of this moment the Web Audio API is made for anyone who doesn't need script nodes. If you can live within that constraint it'll suit you fine; if not it won't.
(Hopefully audio worklets will change this and it'll be for everyone, but I haven't followed them and don't know how they're shaping up.)
Other proposals for audio APIs solved a wider set of use cases, while also making it possible to do procedural audio without depending on browser vendors to implement key features for you.
The effects are useful in one setting: hobbyist and toy usage, where you really don't have that many constraints and can play with whatever cool things are around. That said, I'm sure you'd actually get a lot more mileage out of a library of user-made script nodes, rather than whatever the browsers have built for you.
If you're trying to build something production-ready, or port an existing system to the web, most of the fun toys seem like just that: toys.
AudioWorklets don't look like they would improve things for me, but that's a topic for another blog post.
And obviously not having raw script access isn't a good thing. Nonetheless, the other nodes mostly work as advertised, in my limited experience so far, so the stuff that you'd expect to be able to do with them (e.g. FM/AM synthesis) seems to work pretty well.
> AudioWorklets don't look like they would improve things for me, but that's a topic for another blog post.
AFAIK worklets are supposed to be script processor nodes that work performantly. They wouldn't solve the sample rate problems mentioned in TFA but apart from that I'd think they should be pretty usable if they someday work as advertised.
Then if you look into it:
dictionary DynamicsCompressorOptions : AudioNodeOptions {
float attack = 0.003;
float knee = 30;
float ratio = 12;
float release = 0.25;
float threshold = -24;
Which are indeed the basics that you need and totally enough for most use cases.Check out a vintage compressor that has a dozen implementation as VST plugins:
http://media.uaudio.com/assetlibrary/t/e/teletronix_la2a_car...
However, I can take your "simple" compressor and swap it out of my audio chain for a more complex one if I need to.
I can't do that for the Web Audio API. That's really what everybody is complaining about.
The problem is that if your use case only covers 95% and I use 10 pieces, I am practically guaranteed to have a mismatch for multiple pieces--and I can't escape.
https://developer.mozilla.org/en-US/docs/Web/API/DynamicsCom...
It's a conspiracy theory, I know. Reality is probably far more boring and depressing. :/
Like the blog poster, I cut my teeth on the Mozilla API, and I was able to get passable sound out of a OPL3 emulator in a week's time. Perhaps Mozilla could convince other browser vendors to adopt their API in addition to Web Audio API?
1. Firefox always clicks when starting and stopping each tone. I think that's due to a longstanding Firefox bug and not the Web Audio API. I could mostly elminate the clicks by ramping the gain, but the threshold was different for each computer.
2. This was the deal-breaker. Every mobile device I tested had such terrible timing in JavaScript (off by tens of milliseconds) that it was impossible to produce reasonably correct-sounding Morse code faster than about 5-8 WPM.
I found these implementation problems more frustrating than the API itself. At this point I'm pretty sure the only way to reliably generate Morse code is to record and play audio samples of each character, which wastes bandwidth and can be done more easily without using the Web Audio API at all.
You sure it is not due to the sound files you are using not having a normalized start?
That said, this current pull request on emscripten is a fantastic step forward and I'm very excited to see it's completion: https://github.com/kripken/emscripten/pull/5367
To do it properly would require just giving up on WebAudio's features completely and doing all the mixing in software via WebAssembly. Honestly though, if you're going to do that, you may as well just compile OpenAL-Soft with emscripten and use that, so I opted to just try to get the best out of WebAudio that I could. Hopefully it's good enough.
I put some weekends into trying to build a higher-level abstraction framework of sorts for my own sound art projects on top of Web Audio, and it was full of headaches for similar reasons to those mentioned.
The thing that I put the most work into is mentioned here, the lack of proper native support for tightly (but prospectively dynamically) scripted events, with sample accuracy to prevent glitching.
Through digging and prior work I came to a de facto standard solution using two layers of timers, one in WebAudio (which support sample accuracy but gives you hook to e.g. cancel or reschedule events), and one using coarse but flexible JS timers. Fugly, but it worked. But why is this necessary...!?
There's a ton of potential here, and someone like myself looking to implement interactive "art" or play spaces is desperate for a robust cross-platform web solution, it'd truly be a game-changer...
...so far Web Audio isn't there. :/
Other areas I wrestled with: • buffer management, especially with CORS issues and having to write my own stream support (preloading then freeing buffers in series, to get seamless playback of large resources...) • lack of direction on memory management, particularly, what the application is obligated to do, to release resources and prevent memory leaks • the "disposable buffer" model makes perfect sense from an implementation view but could have easily been made a non-issue for clients. This isn't GL; do us some solids yo.
Will keep watching, and likely, wrestling...
One thing that really irks me at the moment is the huge variation in sound volume of the increasing plethora of videos in my social media feed. If there was some way we could use a real time WebAudio manipulation on the browser to equalise the volume on all these home made videos, so much the better. Not just volume up/down, but things like real time audio compression to make vocals stand out a little.
Add delay and reverb to talk tracks etc. for podcasts.
EQ filters to reduce white noise on outdoor videos etc. also would be better. People with hearing difficulties in particular ranges, or who suffer from tinnitus etc. would be able to reduce certain frequencies via parametric equalisation.
It would be intriguing to see a podcast service or SoundCloud etc. offer real time audio manipulation, or let you add post processing mastering effects on your audio productions before releasing them in the wild.
For a while there was a huge footgun that made it easy to synchronously decode entire mp3 files on the ui thread by accident. Oops (:
Even better, for a while there was no straightforward way to pause playback of a buffer. It took a while for the spec people to come around on that one, because they insisted it wasn't necessary.
EDIT: I should also add that the teams behind the apis are quite responsive. You can make an impact in the direction of development simply by making your needs/desires known.
As someone mentioned elsewhere on this thread Android suffered from a crappy Audio/MIDI library. iOS's CoreMIDI was great, but not transportable outside of iOS/OSX. Web Audio API's MIDI control seemed a great way to go - just build a cross platform interface using Electron App and use the underlying WebAudio to fire off MIDI messages.
Unfortunately, at the time of developing the project, WebAudio's MIDI SYSEX spec was still too fluid or not completely defined, so I had trouble sending/reading SYSEX messages via the API, and thus shelved the project for another day.
Oh, and we needed to use SYSEX a LOT in order to intercept clock timing messages, as well as complex data like preset names and multi parameter effect settings (EQ etc.). None of the messages sent/received affected music notes at all - it was all setting configuration only.
Not really, the full range of human hearing is over 120db. Getting to 120db within 16 bits requires tricks like noise shaping. Otherwise, simple rounding at 16 bits gives about 80db and horrible sounding artifacts around quiet parts.
It's even more complicated in audio production, where 16 bits just doesn't provide enough room for post-production editing.
This is why the API is floating-point. Things like noise shaping need to be encapsulated within the API, or handled at the DAC if it's a high-quality one. (Edit) There's nothing wrong with consumer-grade DACs that are limited to about 80-90db of dynamic range; but the API shouldn't force that limitation on the entire world.
The floats are only required if you have a complex audio graph -- with a sample-based API, you can totally do the production in floats, and then have a final mix pass which does the render to an Int16Array. All in JavaScript.
Round(Sample * 32767) is really that slow?
If you're doing integer DSP, you still need to deal with 16 -> 24, or 24 -> 16 overhead; and then the DAC still is converting to its own internal resolution. (Granted, 16 <-> 24 can be simple bit shifting if aliasing is acceptable.)
I think the whole point is that Javascript used to be slow, and using the CPU as a DSP to process samples prevents acceleration. Seems to me what is needed is like "audio shaders" equivalent to compute/pixel shaders, that you farm off to OpenAL-like API which can be compiled to run on native HW.
Even if you grant emscripten produces reasonable code, it's still bloated, and less efficient on mobile devices than leveraging OS level DSP capability.
As a side note, for some common audio DSP tasks, you could presumably take better advantage of highly parallel processing by doing a fourier transform and working in the spectral domain. There has been research do do this on GPUs and it works. However, if you do this you'll have high latency, and it's not a hardware problem, it's inherent to the FFT algorithm, so it's kind of a dead end for many applications.
I hadn't heard that, but some of the "processor node" stuff does sound familiar.
What OS X also has, though, is proper low-level low-latency sound APIs. And that's why there are so many Mac (and iOS) music apps.
const audioContext = new AudioContext();
const osc = audioContext.createOscillator();
osc.frequency.value = 440;
osc.connect(audioContext.destination);
osc.start();
"BufferSourceNode" is intended to play back samples like a sampler would. The method the author proposes of creating buffers one after the other is a bizarre solution.Please use your imagination and try to imagine one of infinitely many other streams that I could make at runtime that are not easily made with the built-in toy oscillators.
Somebody already did. Check out Fourier Theory. The oscillators (well actually just sin, the rest will give you some help as well) can be used to make any stream, technically.
It also misses the point, because, as with Vulkan, you just want a stable, sane, fast low-level API to access the hardware because OpenGL immediate mode doesn't get you beyond kindergarten in todays computer graphics. In audio, that is a sample-level API. Everything else should be handled by the application!
You can still make a source/sink directed graph system with components like "oscillators". In a fricking library!
I used the same solution when I tried to perform realtime audio streaming from a daemon on an embedded device to a browser (which is probably even a more realistic use-case for a browser audio API than generatic sine waves). I basically stumbled over the same issues than the author: A deprecated ScriptProcessorNode and high-level APIs which don't help me (like the oscillator one).
In the end I opted for a very similar solution as the author: Whenenver I got enough samples through websocket (I encoded them simply as raw 16bit samples there) I created a BufferSource, copied all samples in there (with conversion to floating point), and enqueued the buffer for playback at the position where the last buffer finished.
I really didn't expect that to work well due to all the overhead of creating and copying buffers and due to the uncertainity whether the browser will switch between 2 buffers without missing samples. But surprisingly it worked and did the job. I included a buffering of 200ms, which means I only started playback 200ms to be able to receive more data in the background and have a little bit more time to append further buffers. I experimented a little bit with that number but can't remember how deep the lower limit was before getting dropouts regurarly. It definitely wasn't usable for low-latency playback.
So the most important question is: why isn't this interface implemented in any browser yet?
That a BufferSourceNode cannot be abused to generate precision oscillators isn't very enlightening.
Partially because in addition to the interface itself it relies on a bunch of generic worklet machinery which also doesn't exist in any browser and is not trivial to implement in non-sucky ways.
But also partially because the spec has kept mutating, so no one wants to spend time implementing until there's some indication that maybe that will stop.
I think there were some false starts where previous specs were written and then found to have issues.
So basically, Web Audio is unusable in release Chrome on a measurable subset of user machines, for multiple releases (until the fix makes it out), all because of AudioWorklet. Which isn't available yet.
I am being a little unfair here, because this bug isn't really the fault of any of the people doing the AudioWorklet work. But it sucks, and the blame for this horrible situation lies largely with the people who designed WebAudio initially. :(
To be frank, graphics world had some type of standard (OpenGL) long time ago, next to DirectX. So WebGL had a good example. However in the audio world we haven't seen a cross platform quasi-standard spec covering Mac, Linux and Windows. So IMHO, non-web audio lacks also common standards for mixing, sound engineering, music-making. That's why web audio appears to lack a use case. IMHO, that smells opportunity.
I use Web Audio, in canvas-WebGL based games where music making is needed. I understand the issues - we definitely need more than "play" functionality.
If you provide a low-level "play" API, others can build stuff on top because it's just numbers. Sure, sometimes there's "expensive numbers" like MP3 decoders, FFTs, etc., but these can be added as needed.
I think the bigger issue is that non-experts sometimes get tasked with adding support for things.
The "audio device API that leaves the sample rate completely unspecified" example is, believe it or not, one I've seen before elsewhere. And yet, if you know the first thing about PCM samples, you know this is a mind-numbingly stupid mistake to make. Yet it's a mistake that a few people have made into shipping products, because they can't or won't reason about audio, and this did not stop them from being in charge of an audio API.
I'd rather have a comprehensive API that someone can dumb down than one that's so crippled as to be unusable beyond very basic functionality.
OpenAL: https://www.openal.org/
The actual audio equivalent to OpenGL is OpenSL [0], which I don't think picked up any support from anybody.
If we had gone with the Audio Data API, it wouldn't have been satisfying, because the web platform's compute engine simply could not meet the requirement of reliably delivering audio samples on schedule. Fortunately, that is in the process of changing.
Given these constraints, the complexity of building a signal processing graph (with the signal path happening entirely in native code) is justified, if those signal processing units are actually useful. I don't think we've seen the evidence for that.
I'd personally be happy with a much simpler approach based on running wasm in a real-time thread, and removing (or at least deprecating) the in-built behavior. It's very hard to specify the behavior of something like DynamicsCompressorNode precisely enough that people can count on consistent behavior across browsers. To me, that's a sign perhaps it shouldn't be in the spec.
Disclaimer: I've worked on some of this stuff, and have been playing with a port of my DX7 emulator to emscripten. Opinions are my own and not that of my employer.
1. I'm not convinced this is the case. From what I see, GC pauses constitute the big blockers, rather than event processing and repaints. Introducing an API that's friendlier to GC would be a huge win here.
2. We have WebWorkers. What would have prevented a WebWorker from calling new global.Audio() for the Audio Data API?
2. Some form of WebWorker is obviously where we're going. But does postMessage() have the potential to cause delay in the worker that receives it? (There are ways to solve this but it requires some pretty heavy engineering)
You just can't do that with the same level of tightness of rhythm on low hardware with web techs today. Flash was bad yet Flash also opened up insane possibilities on the web when it comes to multimedia applications that just can't be matched with Webtechs. ASM.js might fill the gap, but i haven't seen any equivalent yet.
Without using ScriptProcessorNode, there was no way of tuning the synthesizer because of the limitation that any loop in the audio graph has a 128 samples delay at least.
Maybe a more "compilation-oriented" handling of the audio graphs (at the user's choice) could help overcome this?
Video AND audio? They good you all covered with nice APIs!
Just audio? You're screwed!
I cannot think of one.
In Chrome's implementation, none of the mixing, DSP, etc. go through the hardware, and I'm more than certain that's the case for every other browser out there.
But my question was more like: is Web Audio a mess mostly because it's an attempt to expose the features of the twenty-odd different OS audio backends on Windows/Mac/Linux, where the odd inclusions and exclusions map to the things that all the OS audio backends happen to share that Chrome can then expose?
So if you've been wanting to try some intervention to make web standards less poor, or just want to observe how they end up the way they do, here's an opportunity.
This is not a wart, this is a security feature. Of course, it wouldn't be a necessary limitation if the web wasn't so complicated, but the web is complicated.
For a more generic guide I've heard a lot of good things about a free (in electronic form) book called DSPGuide (http://dspguide.com/). Haven't had a chance to dive into this one, though.
Not to be semantic, but that's technically incorrect. Indeed, if WebGL were to be supplanted by a lower-level graphics API, that would make a lot of people happy.[0]
As far as the author's thesis concerning the Web Audio API: I agree that it's a total piece of shit.
I've come to suspect that my phone's autocorrect functionality, HN's two-hour edit window, and my own brain routinely conspire against me to paint a picture of total idiocy.
I've said it before, I'll say it again: it exists in a vacuum, and is run by people who have never done any significant work on the web, with titles like "Senior Specifications Specialist". Huge chunks of their work is hugely theoretical (note: not academical, just theoretical) and have no bearing on the real world.
Indeed. Because people who do browser internals and never do any web development are a good fit to create APIs for web developers
Why not? I linked a test app [0] in my post that generates generate PCM data on demand, and fast. It works deterministically on all the browsers. Mozilla certainly implemented AudioData back in 2011 and it was fast enough for them.
> He'll I wrote a tron game and did the audio using audio node chains and managed to get something really close to the real tron cycle sounds, without resorting to sample level tweaking
Why couldn't this be a high-level userspace library like three.js? Yes, with a lot of creative energy, you can recreate a lot of sounds, I'm willing to believe that. But I think a low-level API would have been more useful from the getgo.
Who at Apple beat him with a stick to get audio right? Can we get that person to design the audio API's for the Web and Android?
(Edit: I realized that this was an unfair comment born of my frustration with Audio APIs from Google.
The real issue driving this is that audio is still a dumpster fire on Android. So, if he gives web developers access to audio samples, everybody is going to expect it to work. And, on Android, it will fail miserably. So, better to isolate audio functions, give them "fuzzy" latency which you can bury in C code drivers, and hide the fact that audio on Android is a flaming pile of poo rather than piss off even more developers and get even more bugs filed against Android's shitty audio.)
That pre-browser era where we would have sounds for everything. Minimize window, user logged in, logged out, all that crap.
Also the API has good support for visual. Spectrum analysis. This is pretty good for an education course to offer for beginners on sound processing.
I wouldn't use it for anything serious like a DAW.