Sound Experiments

LODRONE - Graphics to Sound


A while back i found a great freeware tool JPG2WAV* on the internet by DavidAC. It is a small, commandline driven, piece of software that had the ability to create sound from an image.


The tool by DavidAC is documented as to how to use it, but not as how the convertion actually works. I think what DavidAC did was to map a range of frequencies to the vertical position in the image, link amplitude (sound intensity) to the intensity of the image elements (pixels) and working horizontally to get development in time. Also two images can be supplied to act as left and right for making a stereo sound.
The range of frequencies to use can be specified as well as the total duration of the music piece, which is limited to a maximum of 60 seconds. The width of the image is mapped according to the specified lenght in time.

It's a bit like as a book for a barrel organ or the music rolls on a pianola.

Now this methode of sound synthesis is not new. It is very similar to the workings of a Russian machine called the ANS. This 1930ies invention by Soviet Army colonel Evgeny Murzin was realized in the 1950ies. Muzin named his invention to honour his favourite composer Alexander Nikolayevich Scriabin. The ANS had a row of 800 lights shining through a glass plate with a receptor on the other end. The glass plate was covered in a black coating in which some line art could be carved, thus exposing the clear glass. The glass plate was moved by a lever system, back and forth, by hand, to have the circuitry generate sounds in the range from 15Hz to 20kHz depending on which receptors where actuated. The machine is very difficult to get operational with the elaborate electronics and mechanics. Only a few sessions were ever performed.
Last project was by the band Coil from the UK did a beautiful project on the ANS.

What i tried:

If both ideas are combined, one can actually make a ANS type composition by creating an image to feed through the JPG2WAV utility.
Here i'd like to present the result of such an attempt.

Left channel

Right channel

Here you can see the two channels for left and right as well as a combined graph with colors representing the placement. White is center, green is left and red is right. The source graphics for the individual channels are rendered in monochrome as the program JPG2WAV just uses the intensity of the pixels.

The graphics files are made with Adobe Photoshop 7. Photoshop has a nice feature where a picture can be made up in several layers.

I first drew the general, low, striped drones in the low register. Some larger circles in the 'high register' form a sort of crescendo. From the tip of the circle the number of frequencies grow and fade again. Also there's four single 'spots' just after the middle of the song. These 'spots' are short, single tones, lasting about two seconds in the final result.
All these elements sound as loud in the left as in the right channel, which makes them center for the listener.

Then i made two separate layers, one for left, one for right. These are one-by-one superimposed on the base layer and each gets elements added ment for either left or right. Here i just added some more 'spots'. But more elaborate structures using left and right could easily be made.

The first graph shows a combination of the two; it is nice to follow this graph while listening to the result, as both left and right are represented there.

Rendering a one minute soundfile

One major drawback of JPG2WAV is the maximum timespan of 60 seconds. For this example i rendered a one minute soundfile, using the default audio frequency spectrum of 1000 to 20000 Hz and streching the one minute result to eight. By doing this the pitch is effectively lowered by three octaves (3 times a halfing of the frequency). The range of the final result is 125Hz to 2500Hz.

Time streching performed in Sound Forge(click picture for enlargement)

And here is what it sounds like!


If you follow the dots in the image above you can track where you are. A small jittery noise can be observed in the background. I think that's a little phase noise caused by the method of soundgeneration. This could have be avoided by doing some oversampling i guess, but it's just a minor glitch and it would add a lot to the generation time doing that.

I am really impressed with the result. It's atonality and subtle harmonic interferences that make this so lively and interesting! Maybe it would be nice to follow this line and develope a full compository tool that enables longer works.

For further study:


I've also discovered there is a great graphic program Sound-Hole for controlling the JPG2WAV program.

Where do i get JPG2WAV

If you like to try JPG2WAV, it's seems DavidAC has now donated it to the Sound-Hole distribution so you'd best get that download. It contains the GUI, JPG2WAV and a utility for rendering MIDI files. Have not tried that, but that should be brilliant!

Just as a reference here is the JPG2WAV commandline parameters:

Usage: jpg2wav
-i1:<jpeg filename> -o:<output wav> -d:<duration>
 - OR -
-i1:<jpeg filename 1> -i2:<jpeg filename 2> -o:<output wav> -d:<duration>

Additional optional parameters are:
[ -s:<steps per octave> ] [ -f:<low-high> ]

<input jpeg>: JPEG file to read. Gives mono WAV output.
specify -i1 alone for mono output; -i1 and -i2 for stereo output.

<output wav>: WAV file to be created from image(s).

<duration>: duration of output file in seconds. Floating point value.
 Valid: 0.001 to 60.

<low-high>: range of frequencies (integers) to output.
 Valid: 1 to 100000. Default: 1000 to 20000).

<steps per octave>: frequencies used per octave.
 Valid: 1 to 10000. Default: 24.

To see a usage message at any time, type the program name with no parameters.

File formats.

On the file formats used:
JPG is short for 'Joint Picture Group'. A format for storing and exchanging compressed digital images and is by now one of the widest used formats in digital images. WAV is a digital audio format. It is not a compressed format like the well known MP3 format (a derivative of JPG), but it is very flexible and widely used.

<back to experiments main page

poesboes 19-01-2007