Android's Text To Speech

Thu Aug 11 2011 21:23:31 GMT+0100 (British Summer Time)

After I found out that nasty Xperia PLAY bug, I kept experimenting a bit more today. It was the turn for Android's TTS, or Text To Speech, capabilities. This allows you to send a string and get it read through the speakers.

What I found was that its support is quite varied. Although it's supported since API version 1.6, which makes it about two years old, my first test with the Xperia PLAY quietly failed (yes, again). Without errors or anything.

I was just implementing a simple test, using this as a base. It didn't seem to me like I had messed up with anything--there were just so few lines that it was hard to make a mistake. Tried the same code with my Nexus One, and of course, it worked.

I installed a simple TTS test app from the Market in the Xperia, just to see how it behaved. This time instead of just silently failing, it brought me to a page in the Market that allowed/invited me to download TextToSpeech data!

It turns out that some devices don't come with the data for voice synthesis by default, and you have to install it manually, if you want or need TTS, "because of limited space on the phone". So what this program was doing was launching an intent before attempting to use TTS, to find out if there was enough data on the phone to synthesise voice, and launching another intent to bring me to a page where I could download it.

Of course all this is very well explained and documented on this reference guide that I have found after finding the sample piece of code first. But I don't think a normal end user will understand why he's instantly transported to the Market and presented with a list of language packs, when he thought he was launching a certain application. I'm pretty sure he'll end up quite confused.

As to why the Xperia didn't ship with TTS data: great question. Specially considering it has an incredible 8 Gigabyte microSD card. The available internal storage is reported as 286 Megabytes. Funny number :-P

Back to TTS, something that felt a bit limited too was that you can only alter the pitch and the rate of the words being read (or "spoken"). You can't change the voice (from male to female, or from younger to frail, trembling and old, like some voices in the Mac OS TTS engine). Although I don't know if that's only a matter of swapping the default TTS engine (the open source Pico) with another one that has more capabilities. Maybe that's possible, but I haven't investigated further.

After seeing all this, I think I'm going to leave TTS aside for the time being, or at least until it's a more "controllable" scenario. In any case, adding TTS to my app was more like an unnecessary extra than anything else, so it's OK :-)