Audio for the masses

The video above is from LXJS – the Lisbon JavaScript conference, which happened more than a month ago. I gave this talk past week again at VanJS, so I decided it was time for that belated write up on this talk.

If you want to follow along, or play with the examples, the slides are online and you can also check out the code for the slides.

As I’ve given this talk several times I keep changing bits of the content each time depending on what the audience seems more interested in, plus I also sometimes improvise stuff which I don’t remember when writing the final write up, so if you were at any of the talks and see that something’s missing or different now you know why! I’ve also added a section at the end with frequent questions I’ve been asked, hope that’s useful for you too.

I work at Mozilla

red panda

I work at Mozilla (the above is a red panda, which we love), but that’s not what I want to talk about today. I want to talk about music instead.

I ♥︎ music

ukulele and things

I’ve been interested in music since forever, but I have no formal training at all–it’s all self-taught. For example, past year I was walking in Greenwich Park (note for the Americans: Greenwich is where time begins) during one nice summer afternoon, and I got this idea that I should totally learn to play the ukulele. As soon as I got home I went to buy one online, and the store had an offer where you’d get free shipping if you spent at least £25… so I added more instruments to the order: the tambourine! the shaker! the harmonica! And all that even if I didn’t know how to play any of those, because I thought: I can learn with videos or tutorials!

But it wasn’t always this way…

Learning from old books

old books

At the beginning, my only source of musical information was old books. I would go to my grandma’s and find books from my mum or my father, and I’d try to read them even if I didn’t fully understand it all. Or I would maybe go to the local library and look at books on music, but it was really hard to learn how to play musical instruments this way because those were mostly books on the history of music, and besides that, I didn’t have any musical instrument to play with. So it was all really frustrating.

Learning playground

casio pt-100

Things got a little bit better when I got a CASIO keyboard as a gift. I was finally able to play sounds! I really enjoyed playing with it and getting lost in the melodies and the chords and the different atmospheres I could create. And when I say I enjoyed it, I mean it literally. I don’t think anyone from my family was really enjoying it as I didn’t have any sense of rhythm whatsoever and I would just hang on notes or chords that I particularly liked for as long as I felt like hanging, which was probably driving them all nuts.

At some point I was given a booklet with the scores of popular songs, but even then it was hard to play anything that resembled the original songs, because I didn’t know how to interpret note lengths–again I just stayed on interesting notes for as long as I wanted. If there had been Internet back then, I could have listened to what the song was supposed to sound like, but there wasn’t, so I didn’t have any reference that would let me understand where I was doing it wrong.

Computers + trackers

impulse tracker

Everything really started to accelerate when I got access to one of those “multimedia” computers and tracker software. For those who don’t know, trackers are a type of software that allows you to sequence music and store it together with the sampled audio data, so later on it is relatively easy to reproduce the song and make it sound like the way the author intended, unlike what happens with MIDI files, which mostly just contain the score of the song, not the actual sounds contained in it.

Despite there being no internet (or being accessible to just a few people in academia and big cities in Spain), there were lots of distribution networks that got these files copied between enthusiasts. There was people that loved to trade these songs by snail mail (sending actual floppy disks in the post), others used BBSs, and finally there was a monthly tracking contest in a computer magazine that I used to buy–they would put all the participating songs into a directory in their CD with goodies, and this is how I got into the whole tracker scene.

A great thing about trackers was that you could see all the notes and effects used and also edit them, so effectively they were open source music, way before that term even existed. We all could learn from the ways of the masters, and that’s how I got a lot better.

The most hilarious part was how people ‘hacked’ the names of the samples in the trackers so that together they could form a bigger message, and that way an strange communication channel with other people in the contest was created, and everyone started “sending messages” to each other using the sample names. Of course, as in any sufficiently popular channel, there were flamewars! People would fiercely fight over issues such as which style of music or which tracker software were better. Looking back, it’s both hilarious and amazing that all this happened inside a directory in a CD.

Music communities

traxinspace

A while after, when I finally got access to proper Internet, I learnt about online music communities like Traxinspace. Suddenly there was this new universe of music and trackers from all over the world, and we could interact with each other–it wasn’t just speaking to people from Spain! Traxinspace had this feature where people could be artist of the day, or of the month, in addition to other rankings. If you got to the top of these or got to be a featured artist, it was nothing short from winning a Grammy or going to Oprah: it was a huge thing in these circles! The competition to be the best was fierce.

Demoscene

js1k minecraft

More or less at the same time I got into the demoscene as well. For those who don’t know, the demoscene is mostly about making the computers do impressive stuff. For example, size-limited coding: make a tiny bit of code do a lot of things. A modern example of this is the JS1K competition, where authors create something that runs in the browser using less than 1024 characters and does some impressive thing. The screenshot above is one of the entries from the latest edition, “here be dragons“, rendering a Minecraft-like world on the browser using WebGL.

Size limited sound generation

supersole @ breakpoint

One of the areas that piqued my coding curiosity in the demoscene was sound synthesis, where you could write software that generated lots of audio using a small amount of code. At some point I started coding “sorollet“, my own synthesiser with C and C++. One of the first incarnations was 410 Kb of C++ code that, when compiled, generated a 13 Kb executable file, so that is a good compression ratio here… but running it would generate 2:25 minutes of music (with only 13Kb of binaries!) and that is actually the equivalent to 25.5 Mb of stereo WAV data at 44.1KHz i.e. CD quality audio. All that with only 13Kb of code!

Web Audio

web audio modular routing

Even if I had had great fun building my synthesiser in C++, that path wasn’t without issues. I was mostly a web developer, so coding in C meant lots of dealing with memory allocations and management instead of having fun with pure audio code. I jumped at Web Audio as soon as I could because I was quite experienced with JavaScript, and it seemed so easy in comparison with C!

Sorollet.js

sorollet

The first thing I did was just porting my C++ synthesiser to JavaScript, using one of the provided Web Audio nodes that allowed me to generate audio on the fly with JavaScript (the ScriptProcessorNode, formerly JavaScriptNode).

I was really happy to have my code running in the browser, in realtime! But… I quickly realised that was not what the web is about. The web is not about telling someone to visit a page, wait until some code loads, and then spend their next minutes listening to some music Sole composed. The web is about interacting and connecting APIs together, and I was failing quite miserably at that.

Sorollet.js UI

sorollet ui

I started building a UI for my synthesiser, so people could change the parameters and experiment with the different output they would get, and play some notes using their keyboard or clicking on the keys on the screen. They would also get a visual representation of what was being played, thanks to an oscilloscope of sorts that I would draw using the Canvas API.

But the best of all was that each time they changed any parameter for the synthesiser, the URL hash would update automatically, and if they then copied and pasted that URL and sent it to a friend, their friend could get a synthesiser “copy” with those settings applied. Likewise, that person could make more changes in the settings and send the new URL to another friend, which is way more “web-like” than what I had built initially.

Web Audio === easy (?)

modular

Now I was really happy about the output and super excited because Web Audio was so easy to use! After all it’s all about connecting modules together!

OK… easy? I know not everyone finds it easy or has had even a tiny bit of knowledge about it, so I built some examples for progressively introducing its features and explaining how to combine them together with other Web APIs.

I also built a web component for helping me with these demonstrations–so I wouldn’t be purely livecoding, but I would still be able to run things step by step instead of running it all in just one go. If I were to be demonstrating this in front of you in a talk, I know how to operate it, and so you wouldn’t need to do anything, but that’s not the case here, so these are the instructions for using the demos:

  • Command + E executes either the selected piece of code, or if nothing is selected, the entire code in the editor
  • You can toggle showing the code or not
  • You can also run the whole thing pressing the run button
  • Some examples have the autostart attribute so you don’t need to press anything in order to get things going

For more details have a look at the source code of the component. I’m working on making it an independent component but I still haven’t quite figured how to do it in a way that doesn’t involve using bower, so stay tuned for more news if you’re interested in bower-free web components.

That said, let’s move on to the examples!

Oscillator

oscillator

Oscillators are one of the basic units to generate sound in Web Audio. But before you get to have an oscillator instance, you have to create an Audio Context. If you’re familiar with Canvas 2D or Web GL contexts, Audio contexts are very similar: once you have one, they give you access to methods and constants to generate stuff within that context. It’s where everything happens, but it is also akin to a painter’s toolbox, since it provides you with tools you will need to deal with audio.

Here’s how you create the context:

var audioContext = new AudioContext();

and once you have it… well, there’s nothing happening yet! We create an oscillator with this:

var oscillator = audioContext.createOscillator();

nothing’s playing yet… and in fact the oscillator is not even connected anywhere, it’s just floating in the “web audio context nothingness”. Let’s connect it before we start using it:

oscillator.connect(audioContext.destination);

audioContext.destination represents the final output for the audio context, or in other words: your computer’s sound card, and ultimately, the speakers or headphones–whatever you use to listen to audio!

We are now ready to generate some sound using our newly created oscillator:

oscillator.start();

We can also change the frequency the oscillator is playing at. By default it starts at 440.0 Hz, which is the standard A-4 note. Let’s make it play A-3, i.e. 220 Hz:

oscillator.frequency.value = 220;

That change is immediate. But we could also schedule the change to happen in two seconds from now:

oscillator.frequency.setValueAtTime(440, audioContext.currentTime + 2);

or even smoothly ramp to that value for two seconds:

oscillator.frequency.linearRampToValueAtTime(220, audioContext.currentTime + 2);

And that’s how we can create basic sounds and manipulate them with quite accurate timing with relative ease.

Another great feature of Web Audio is, as mentioned, its modularity. You can connect the output of one oscillator to the parameters of another oscillator, so you make the value of that parameter oscillate, and build more complex sounds. But what is a parameter? It is any value you can change in nodes. For example, the frequency is a parameter in OscillatorNodes (they are technically known as AudioParams).

Suppose we create one oscillator which we’ll use to play sounds, as we did before. Now we create another oscillator but give it a very slow frequency value, 10 Hz. That’s why we call it an LFO: Low Frequency Oscillator.

var lfo = audioContext.createOscillator();
lfo.frequency.value = 10;

Now we create a Gain Node, which is another of the nodes that Web Audio provides to us. The purpose of these nodes is basically to multiply their input value by the value of their gain parameter, so you can use it, for example, to reduce loudness (with gain values less than 1.0) or to amplify very quiet sounds (with gain values higher than 1.0):

var lfoGain = audioContext.createGain();
lfoGain.gain.value = 100;

So if we connect the output of the LFO oscillator (which changes from -1 to 1) to the input of the gain node (which is set to multiply everything by 100), we’ll get values from -100 to 100:

lfo.connect(lfoGain);

If we can connect this to the frequency parameter of the initial oscillator, it will be added to its frequency value–if the frequency is 220, it will start oscillating between 120 and 320 (220 – 100, 220 + 100), creating a funny spooky kind of sound:

lfoGain.connect(oscillator.frequency);
lfoGain.start();

This is just a small sample of what the Web Audio API can do, but still it’s just Web Audio, and we agreed before that the greatness of the Web relied in connecting multiple APIs together. So let’s look at an example that does more things at the same time:

Drag and Play

drag and play

We want to load a sample to play it in our example, and we want to be able to load it in two different ways

  1. dragging and dropping it from our file explorer to the browser window–we’ll use the Drag And Drop API, or…
  2. selecting a file using a file input (which makes more sense in touch devices where there is generally no way to drag items across currently running apps) –we’ll use the File API to read the contents of the file client side, instead of sending it to a server for further processing

Once we get the sample data as an ArrayBuffer, we’ll create a node of type BufferSource, and set its buffer to be the data we just decoded:

bufferSource = audioContext.createBufferSource();
bufferSource.connect(finalGain);
bufferSource.buffer = buffer;

We also want it to loop!

bufferSource.loop = true;

And then starting it is similar to the way we start oscillators:

bufferSource.start(0);

Another thing we want to do is to display a representation of the loaded wave. We have a canvas we’ll use for this, and the drawSample function that takes values from -1 to 1–exactly the same values we have in the buffer! So it’s just a matter of running the following:

drawSample(waveCanvas, buffer.getChannelData(0));

Note: getChannelData returns the first channel’s data. For monophonic sounds, the buffer will only have one channel, but stereo and 3D sounds will have more than one channel. I’m keeping it safe and using the first one which for stereo corresponds to the left channel. It’s not totally accurate as we might be discarding too much data (if the signal is very different between both channels), but for demonstration purposes it should be more than enough.

We also want to draw the wave that is being played on a canvas. To “hook” into the BufferSource output and get some already preprocessed data that we can then use on the canvas, we’ll use an instance of AnalyserNode:

var analyser = audioContext.createAnalyser();

This analyser is connected between the output of the bufferSource and the audio context’s destination, so that it can “inspect” what is going through:

bufferSource.connect(finalGain);
finalGain.connect(analyser);
analyser.connect(audioContext.destination);

Note: due to the way Web Audio is architected, bufferSources are meant to be disposed of when you’re done playing them–i.e., once you run their stop method, they’re over and calling start again has no effect; you have to create another BufferSource and assign it the buffer and all parameters and connections. And in this particular example, each time you load a sample you need to create a new BufferSource too.

But we do not want to be reconnecting the buffer source to the analyser every time, so we instead create a “finalGain” node that we permanently connect to the analyser, and we’ll connect the bufferSources to the finalGain node instead, and let Web Audio clean up the disposed nodes when it deems appropriate (via JavaScript’s Garbage Collector mechanism).

Back to the analyser node: we will create an array of unsigned integers to store the analysis data. We will also make sure it is big enough that it can hold all the values that the analyser will return:

analyser.fftSize = 2048;
analyserData = new Uint8Array(analyser.frequencyBinCount);

Each time we want to draw the wave, we’ll ask the analyser to have a look and return the results of its analysis into the analyserData array:

analyser.getByteTimeDomainData(analyserData);

These values are bytes–which means they go from 0 to 255. But, as we mentioned, our drawing function drawSample requires values from -1 to 1, so we just convert them and put them into a Float32 array we initialised earlier on:

for(var i = 0; i < analyserData.length; i++) {
        osciData[i] = analyserData[i] / 128 - 1;
}

And we’re finally ready to draw the wave:

drawSample(osciCanvas, osciData);

Just in case you wondered, we’re using requestAnimationFrame to drive the animation.

So here’s a moderately complex example that does a bunch of things using different Web APIs… and it’s just less than two hundred lines of code. This shows that the web platform is really powerful! Building the same thing using native code would involve lots more of code, plus longer development time, and lots of debugging. Plus it would work in only one platform, whereas this little but powerful example works in all the platforms, no porting required.

Can we go even wilder? Of course we can, let’s involve WebGL so we can smoothly draw lots of elements at the same time, and let’s also use live microphone input instead of a pre-recorded sample, and we’ll display a visualisation of what’s happening.

Realtime visualisations

realtime visualisations

We’ll use Three.js for dealing with the WebGL side of things–i.e. all the rendering with nice antialias, shadow, fog, etc.

We’ll use the getUserMedia part of WebRTC. This allows us to access both the webcam and microphone input, but for the purposes of this demo we just want to “hear” things, so we’ll request only audio.

We will be creating an analyser node again, but instead of connecting a bufferSource to it as we did in the previous example, we’ll connect the MediaStreamSource we just created using the stream we got from getUserMedia. MediaStreamSourceNodes allow us to take a MediaCaptureStream (the type that getUserMedia returns) and send it to another nodes, so we can integrate external sources of sound into our web audio graph–even sound from another peer if we’re using WebRTC for a call!

navigator.getUserMedia(
        { audio: true },
        function yay(stream) {
                source = audioContext.createMediaStreamSource(stream);
                source.connect(analyser);
        },
        function nope(err) {
                console.err("oh noes", err);
        }
);

Once the microphone is allowed access, we’ll start getting interesting data out from the analyser, and the bars won’t be boringly static but move in response to the changes in the input levels. Try clapping!

So we have a really smooth example that is drawing a lot of detail on the screen in response to live microphone input, and not only is multiplatform, but again is less than two hundred lines of code. Doing this for native platforms would be really, really long and tedious to build.

Browser makers have already put on a lot of work into unifying this kind of multimedia interfaces (sample decoding, live input streams, accelerated graphics) so you can take advantage of them and build awesome stuff instead of fighting with compilers and platform-specific frameworks for accessing these capabilities.

The web platform is really incredibly powerful nowadays, but…

We shouldn’t stop here

There’s still over two billion people who don’t have access to the Internet.

That’s right. 2,000,000,000+ people. It’s about two Europes and a half, or in terms of Canada (where I gave this talk too), about 16 times the population of Canada.

At Mozilla we believe the Internet must be open and accessible, so we are working on fixing this too. We partnered with some manufacturers to make a phone that would run Firefox OS and also be affordable, “the $25 dollar phone”.

Tarako

This phone is in the same price range than featurephones, but it runs Firefox OS, which can be upgraded periodically and also has a way lower barrier of entry for businesses and creators than iOS or Android, since apps are written using JavaScript. They also can be run in other devices and operating systems–not only Firefox OS.

We’re also working in new hardware APIs for accessing all these new sensors and features using purely JavaScript. The work that goes on this benefits all the platforms as these APIs are standardised and more vendors implement them in their browsers—so we get closer and closer to the Write Once, Run Anywhere “dream”.

We have a lot of incredible powers on the web, and as Uncle Ben would say, with great power comes great responsibility. It’s great that we can do so much, but we also should be thinking about doing good things–it’s our responsibility!

So I sat down and tried to think of ways in which I could use my new powers for good. What about…

Simulating instruments

Suppose you’re a kid just like I once was and want to learn music, but have no instrument. But you have a phone.

There is this huge list of instruments in Wikipedia. What if we built some sort of simulation software that could recreate different instruments using just a bunch of parameters and no sample data–just as I did with my sound synthesis experiments? Once you got the application code, getting “new instruments” would be just a matter of downloading parameters data, which would imply very little bandwidth requirements. That kid with no instruments but a phone now could have a bunch of different virtual instruments!

Also, since this would be running in phones with lots of sensors, we could make the most of out them and for example use touch and pressure where available, so we could build an engaging interactive simulation.

What about if instead of keeping our sets of parameters to ourselves we share them by uploading them to a patch database where other people could download patches too? We would be building an amazing resource–specially if we enable people to remix existing patches. And another great outcome would be that, by exposing your creation to people from a different background from yours, you’ll get unusual contributions, and that’s always great and enriching.

Unconventional composers

Once you’ve built an instrument simulator, what is stopping you from building some sort of composer so that you can write down your own songs? But we should be very careful and avoid building a conventional composer, in the manner of staff or drum machine based composers.

Why?

Because these composers are not suited to non-Western music. For example some music from Eastern Europe has lots of tempo and key changes, and all these things are lost when “translating” to a staff based music transcription.

Instead, I’d propose we start by recording everything that happens while playing a simulated instrument, and make the data available to “data scientists”–preferably local data scientists, so they experiment with the recordings and devise some sort of visualisation/manipulation interface that works well for that kind of music. Maybe they will come up with local symbols that will seem very strange to us, but that work really well in their settings.

And again, since we’re not storing the sample data but only the events, transmitting these songs would take way less bandwidth, time and money than sending an MP3 file with a 2.5G connection.

People might start composing their own songs using their own very local traditions and maybe share them afterwards, and what might happen is that we end up with a world wide library of local songs—a true treasure trove for mankind that anyone could access.

Too conventional

But still even if they sound fun, these ideas sounded very conventional. I had to think of something that went further and was more original. What about if I took the Web FM API (available in Firefox OS) and mixed it with the Web Audio API? What could happen?

Web FM API
+ Web Audio
---------------
  ???

I think we could maybe have “over the FM” data transmission. Granted, the bandwidth wouldn’t be specially amazing: only 0.65 Mb a day, but still that is like 4000 SMS messages. And because it is broadcasted, it won’t slow down if many users try to get the data at the same time.

There are some existing precedents, mostly focused on providing updates on things such as national news and weather or complementary information to the currently broadcasted programme, but what if communities used this to deliver local updates? For example, the status of water tanks, the area weather forecast—things that are really important to the people close to that FM station.

And although these ideas might sound clever and cool…

…these are just some examples that my ignorant Western mind came up with…

… but I’ve never set my feet outside of my bubble of privilege and thus I can’t predict what is really required in a place where an EDGE connection is the best you can get (if you can get anything at all). And while I humbly recognise that I might be severely wrong about this, I also believe that 3D games or the common reasons why “Web Audio is awesome” in the Western world are not what solves issues for people in those places.

However, that doesn’t mean that we should just give up on Web Audio and feel sad and miserable that we can’t help people because we don’t even know about their issues. Remember that we have great powers… and a great responsibility—a responsibility to teach and make this all as accessible and easy to use as we can. And to keep experimenting, devising new ideas, and creating more code over which lots of Web Audio stuff can be built in the future.

Let’s build stuff and let’s share it. Let’s speak about this and make it all better, for everyone, so they can build their own solutions to their problems—which they understand better than we do!

And let’s do it together! :-)

mozfest

Frequently asked questions

Each time I’ve given this talk I’ve got lots of interesting questions, so I figured they should be accompanying this post too because some of them are asked really often! There we go:

Where do I start learning about Web Audio? Which library should I use?

You could start by having a look at the Web Audio API book by Boris Smus–maybe even buy it if you find it useful!

Once you’re done with the book, the Web Audio API specification is also quite understandable, and it’s hosted in github, so if you find that something is not obvious you should file a new issue to get it clarified.

The Web Audio API is simple enough that you don’t need any library to get started. THANKFULLY.

How would you go about writing those instrument simulators?

There are many ways to simulate instruments. You can write new simulators using JavaScript or we could try to compile the core emulation from existing C/C++ libraries into JS using tools such as Emscripten into asm.js—we don’t need to spend our time rewriting things that already work well.

Of course you also have to take into account what can actually run in a phone. It’s not a full blown computer, so you have to be mindful of restrictions and adjust your code so it degrades nicely in less powerful platforms.

Have you written any sort of online composer?

Yes, but not the sort that I am advocating for. I built a drum machine demo that is included in Sorollet.js – online here. It has many issues, specially timing issues! but it was an early attempt, so I don’t go too heavy on the self-torturing department here. Still it has nice things such as the ability to store the whole song in the URL so you can share it. Sadly the URL is a bit too long for some places so you can’t actually share it, ha!

I started building something else later but it is not public (mostly because I broke something and it doesn’t work right now, but also because there’s nothing to see yet).

Can I actually connect Web FM with Web Audio today?

Turns out you can’t–so far the Web FM API speaks directly to the hardware and doesn’t go through JS, but there have been discussions hinting at being able to get a data URI for the media stream instead of just connecting to the speakers.

I’ve asked at the Web API list for clarifications. Let’s see what happens :-)

What about MIDI in the browser?

There is a Web MIDI API but it is not implemented in Firefox. I wrote about this a while ago, but in short, if you think you have what it takes, you’re more than welcome to help us implement it!

In the meantime you can “hack” temporary solutions such as running node in your computer to read the MIDI or OSC data and then forward it to the browser using something such as Socket.IO, which is what I did for my JSConf.EU 2013 project.

Can you do speech recognition with Firefox OS and Web audio?

Not yet… but here’s a page detailing a roadmap of sorts, and here’s a tracking bug with all the pending subtasks.

When are Firefox OS phones going to be sold in (put your country here)?

I can’t answer that with certainty because it depends on the operators in each country (for operator “branded” phones) and on selling restrictions (for developer phones sold directly–some stores won’t sell to some countries). Your best bet is either search with your favourite search engine or maybe contact your local mozilla community/reps to see if they are more aware of the current status than I can possibly be.

Otherwise I will refer you to the Flame developer phone page.

Also, no, I don’t have phones to give you away, sorry.

What sort of features will those phones have? Conductive vs capacitive screens?

Again, this depends on which sort of markets the operators are targeting and which sort of phones they’re willing to work with. I’m sorry I can’t answer :-)