Haskell synthesizer: playing with functional sound generation

@ 2012-01-16 by João Paulo Pizani Flor

Some time ago, in a cloudy friday afternoon, I had nothing to do. Me and my friends started talking about the Unix philosophy, and in particular how everything in Unix is a file, even devices are represented as files. So I thought: the sound card is a file! What happens if I write to it? Then we started feeding lots of files (PDFs, /dev/urandom, some source code and some images) to the sound card and were amused :)

But then we decided to have more fun and feed some more “organized” bytes to the sound card - generating our audio samples programmatically. And what better tool to do some data generation than Haskel?! So I started writing my first audio synthesizer, and after some hours it was ready and working awesomely… I was really excited with the results, so I just NEEDED to share this nice piece of code with the world :)

My Haskell tone synthesizer takes a “score file” as input, describing the tune you want the synthesizer to generate. This input file has a syntax very similar to the old Nokia Ringtone Composer syntax… The program reads the score from the file and outputs a lot of bytes to standard output. If you redirect the standard output to your sound card, you will hear your favorite tunes! So let me guide you through a tour of this program’s source code:

This tour will be in top-down style: we will start by looking at the Main module and its major processing steps, and then dwell a bit deeper into some of the most important functions of each module. So, to start, here is the entire Main module:

As you can see in the second line of main, we need 2 command-line parameters: file and bpmStr (which will be read into an Int called bpm). file is the name of the file containing our score, and bpm_ is the tempo in which the song will be played (in beats per minute - BPM). We then proceed to parse the score from the file, using the parseFromFile function (from Parsec). This function takes a parser to apply and a filename from which to read an input stream. Our conveniently-named score parser is defined in a separate file (Melody.hs) and will not be further discussed… You can see it if you download the full source code at the end of the post.

The parseFromFile function has type Parser a → String → Either ParseError a, which means its result is Either our desired outcome ([Note]) or an error (ParseError). We then apply the either function over parsedScore to do case analysis: in case a perfect parse ocurred, the id function is applied and the result is kept as-is; in case a parse error has been found, we apply error ∘ show - which displays the error message on standard output and terminates the program. The next “big step” in the program is to turn the music into sound waves.

The parsed score is a sequence of musical notes with relative durations (1, 1/2, 1/4, etc.) and abstract pitch names (C, D, E, etc.), so the the task of concrete bpm is to take each abstract note and make both pitch and duration concrete. Its type is:

It takes a Tempo and a Note and produces a pair representing a concrete sound, where the first element is the frequency of this sound (in Hz) and the second element is its duration (in seconds).

After we have our list of sound pairs to play (concreteMusic), we reach the last and most important step, which is to actually generate the samples and output them, using concreteMusic to guide the generation process. This is done by produceStream. For each pair of frequency and duration (f,t), the following is done:

  1. Samples for an (infinite) square wave of frequency f are generated by a call to signal f.
  2. This signal is then “sliced” to obtain a piece of length t, by a call to slice t

Finally, this whole sequence of “signal slices” is concatenated together into a ByteString which goes to standard output… Now going deeper, we will dive into file Signal.hs and look at the definitions of signal and slice, along with some helper functions:

Let’s start explaining this module from the ground up: our signal is an infinite sound wave. The most characteristic feature of a wave is that it’s periodic, which means it’s just an infinite repetition of a pattern. Therefore we have the period function here, which, given the number of samples to generate and the desired waveform, generates a list of samples that follows this waveform. So that’s how signal uses period in order to make a wave: it takes the requested frequency and then calculates (according to the sample rate) how many samples are there in one period. It uses the period function to generate a singlePeriod, which it then cycles and packs into an infinite and repetitive ByteString. The silence function is only there to produce a dummy period when we need a pause. It doesn’t need to have more than one sample, however, because it’s going to be transformed into an infinite ByteString of 0’s anyway…

The cycleAndPack function takes one period ([Int]) and transforms it into a (lazy) infinite ByteString. Its type should be very self-evident, after all. The (map fromIntegral) part transforms our [Int] into [Word8], which then serves as input for B.pack, which actually creates the ByteString, then made infinite (repeated indefinitely) by B.cycle.

Now to the slicing! :) The task of slice is to transform an infinite wave (infinite ByteString) of a certain frequency into a finite wave (finite ByteString), with the requested duration in seconds. The type of slice is:

Which makes evident its role as a “ByteString-transforming” function. The definition of slice - in point-free style - also helps explaining what it does: to slice an infinite ByteString is just to take n of its first elements, where n is the product of the desired time in seconds by the sampling rate in Hz.

Before continuing to show you the code, allow me to take a slight detour and say how Haskell’s purely-functional features help us express this program in such an elegant (and still efficient) way:

  1. Lazyness: Due to the possibility of using lazy evaluation, we were able to represent sound waves as infinite streams of samples, separating stream generation from stream usage.

  2. Referential transparency: Haskell’s purity means that every function is transparent. Thus, the value of any expression needs only to be computed once, and can be substituted for all occurrences of that expression. This is important because - in a single song - several notes will have the same frequency (and therefore will result in identical signals). Thanks to referential transparency, it is guaranteed that we only need to compute signal f once for each f, and thus we will only have one stream for each frequency in our song.

Well, having understood how signal and slice work, the puzzle is now almost entirely solved. The only remaining mystery should be how the actual input file (in Nokia-Composer-style) is parsed, but - as I said before - the parser is mostly straightforward and uses just basic Parsec. The input parser is the largest single piece of code in the whole synthesizer (only relatively big, but still small), so I’ll only show you a summarized version here and won’t even explain it. You can grab the full code at the end of the post and take a more careful look… So here it goes, the core of our input parser:

And now, a bit of fun: to finish up the post nicely, I have included some sample song files in the synthesizer’s tarball; then I ran the synthesizer over them and encoded the output as MP3. Now you can listen to two catchy tunes generated by Haskell!

Finally, as always, there comes what REALLY matters: The Source Code. The code of this synthesizer, along with several of my other “toy programs”, lives in the “Katas” repository on GitHub. You can check it out here: https://github.com/joaopizani/katas .

Download the code as a zip file or even clone the repository and then play freely!… I strongly recommend that you use cabal-dev to build all your Haskell projects, so in case you want to follow my recommendation, the steps to build and run (in Unix) the synthesizer are something like this:

cd ToneSynthesizer
cabal-dev install
./cabal-dev/bin/tonesynthesizer songs/<chosen-song>  | aplay -t raw -f U8 -r 16000

I wish you all a whole lot of excitement and fun while coding in Haskell, and would certainly appreciate suggestions and critique regarding the code I just shared with you. That’s all, folks! :)