Fastest way to generate WAV files in Python, using the wave module

Thu Oct 29 2009 13:25:19 GMT+0000 (Greenwich Mean Time)

I was investigating the possibilities of generating WAV files with Python, when I came across the wave module. While very basic --it doesn't support compressed files, for example-- it's more than enough for what I have in mind.

So I managed to generate a simple WAV and I noticed that it was slightly slow. I wondered whether it was a problem of Python and it accessing in a the hard disk in a sub-optimal way, but that didn't look like a good explanation. So to make that clear, I thought about measuring how much time each method takes to execute.

First method consists in opening a wave file for output, and then write two values (for Left and Right data) in each iteration. Like this:

noise_output = wave.open('noise.wav', 'w')
noise_output.setparams((2, 2, 44100, 0, 'NONE', 'not compressed'))

for i in range(0, SAMPLE_LEN):
    value = random.randint(-32767, 32767)
    packed_value = struct.pack('h', value)
    noise_output.writeframes(packed_value)
    noise_output.writeframes(packed_value)

noise_output.close()

With SAMPLE_LEN = 13230000 (i.e. 300 seconds of noise, which is 5 minutes of audio), this method needs 10 minutes and 24 seconds to complete.

Second method is the "raw approach": just open a file and write SAMPLE_LEN times two values:

noise_file = open('noise.raw', 'w')

for i in range(0, SAMPLE_LEN):
    value = random.randint(-32767, 32767)
    packed_value = struct.pack('h', value)
    noise_file.write(packed_value)
    noise_file.write(packed_value)

noise_file.close()

This one is decidedly faster! It takes only 51 seconds to generate 300 seconds of audio. There's a minor problem: the generated file is missing the WAVE header. But that is relatively easy to fix by reading at some docs on file formats. Still, I didn't want to throw the towel yet regarding the wave module.

I went to the source code of the module to see what was inside the writeframes method. Turns out that the module is written in python (I though it was written in C), but it's better for legibility :P

The writeframes method actually calls the writeframesraw method:

def writeframes(self, data):
        self.writeframesraw(data)
        if self._datalength != self._datawritten:
            self._patchheader()

Consequently if there was an speed issue it might probably be due to writeframesraw. I looked at its code:

def writeframesraw(self, data):
        self._ensure_header_written(len(data))
        nframes = len(data) // (self._sampwidth * self._nchannels)
        if self._convert:
            data = self._convert(data)
        if self._sampwidth > 1 and big_endian:
            import array
            data = array.array(_array_fmts[self._sampwidth], data)
            data.byteswap()
            data.tofile(self._file)
            self._datawritten = self._datawritten + len(data) * self._sampwidth
        else:
            self._file.write(data)
            self._datawritten = self._datawritten + len(data)
        self._nframeswritten = self._nframeswritten + nframes

... and I realised that it wasn't a good idea to call writeframes so often, if it is doing all those checks and conversions for every single value I send it.

So here's the third and (surprisingly) fastest method:

noise_output = wave.open('noise2.wav', 'w')
noise_output.setparams((2, 2, 44100, 0, 'NONE', 'not compressed'))

values = []

for i in range(0, SAMPLE_LEN):
    value = random.randint(-32767, 32767)
    packed_value = struct.pack('h', value)
    values.append(packed_value)
    values.append(packed_value)

value_str = ''.join(values)
noise_output.writeframes(value_str)

noise_output.close()

The values are stored in a temporary list, and when the process is finished, they are joined together into an string which is then sent to the output with the traditional writeframes method. Only this time it just takes 45 seconds! It is quite surprising that this method is faster than the raw one, but maybe that's due to the overhead introduced in the second method by trying to write so many values repeatedly.

Although the third method is the most efficient in terms of time, it is not the most efficient in terms of memory consumption. It might be acceptable for relatively short audio files but if I was to generate longer files I would use a modified version of the third method where the values could be buffered until a certain amount of values was gathered, then call writeframes, and continue until everything has been generated/written.

Here's the source code for my test script. Improvements and corrections are welcome :)