Web Speech to Web Audio?

Fri Sep 01 2023 11:30:00 GMT+0100 (British Summer Time)

I was writing some code that uses Web Speech to convert text into audio spoken by your device, when inevitably the question came to me: can I connect the output of this to a Web Audio context?

For that I would need some way to obtain a node that can be plugged into an audio context graph, i.e. I'd need a MediaStreamAudioSourceNode! Then it can be connected to any other node in the graph, like this:

// Connect a stream to the audio context's destination
stream.connect(ac.destination);

Sadly, looking at the Speech Synthesis API docs I cannot see any indication that creating streams is a feasible thing to do.

The API works by creating SpeechSynthesisUtterance instances for each 'text' you want the device to speak to you, and then you send these fragments of speech to the main SpeechSynthesis controller which is in the global window object.

let fragment = new SpeechSynthesisUtterance('hello');
window.speechSynthesis.speak(fragment);

If there was a place to get an audio stream from, it would be the speechSynthesis controller. But I see nothing of that sort in its docs.

Without looking at the browser code, I am deeply suspicious that it works by doing a call to whichever Text To Speech service the host system has, and so since the audio is not "originated" from within the browser, it has no way of obtaining an stream.

I was hoping I could add synthetically generated voices to the arsenal of things you can already do in the browser and then package in a video or audio, as I described in my JSConf.AU 2016 talk:

https://www.youtube.com/watch?v=wnDHKgNm4_E

Oh well!