image-based spectrographic processing

Image Based Spectrographic Processing Noah Benjamin Maze

Electrical Engineering, University of North Texas

3940 N. Elm Street

Denton, TX 76207-7102

[email protected]

Abstract— This paper presents a novel way to graphically filter a

sound’s frequency spectrum with an image file. The filtering is

accomplished by creating a series of multiple-passband filters

based on each column of the image, and applying them

sequentially to the original audio file.

I. INTRODUCTION

Historically, embedding an image into the voiceprint of a

sound has only been attempted by a handful of avant-garde

musicians [1]. These attempts have produced noisy results

that are generally void of any appreciable audio content.

This is a byproduct of their production. Before today,

generation of spectral images was done by reversing the

process of generating a spectrogram. The sole input to this

process was the target image, and the resulting sound has

been described as “discordant, metallic scratching” by Wired

magazine [2].

By using abstract, minimalistic images, artists have been

able to coax more interesting sounds from this algorithm, but

this sacrifices the overall effect of the image.

Image Based Spectral Processing (IBSP) allows any audio

file to be the manipulated into a spectrographic image. The

resulting sound has a frequency spectrum that is characterized

by ever-changing passbands, but its original content is

otherwise preserved.

This functionality is accomplished by way of a series of

FIR filters working to create a frequency response that

resembles an image supplied by the user. This supplied

image is broken into columns that correspond to the

amplitude response of the normalized frequencies of the

sound file.

The mathematics behind this system are simple enough to

see use in many different environments, but this proof of

concept was assembled, tested and demonstrated in

MATLAB.

II. THE FILTERING PROCESS

Each pixel of the original image corresponds to a

momentary bandpass filter. The passband of the filter is

centered at a frequency corresponding to the pixel’s vertical

position in the image, and the filter is applied to a window of

time corresponding to the horizontal position of the pixel.

A. Input Acquisition

The arguments to the IBSP function contain the location of

an image file and a sound file to be processed by the

aforementioned image.

1) Image Input and Preprocessing

Image Based Spectral Processing can work with any raster

image. Both indexed and RGB images are acceptable, but

indexed images are converted to RGB before processing can

take place [3].

Spectrograms treat intensity as a 1-dimensional value, so

the RGB color values for each pixel are averaged together to

produce a 1-dimensional (grayscale) intensity map. The

resulting matrix is contains a value corresponding to each

pixel of the original image.

2) Sound Input

Sound files can be of any sample rate. When an audio file

is loaded, its sampling frequency is documented and applied to

the output function, but it is not used for any portion of the

processing. Filtering is done based on the normalized

frequency of the audio data [4].

The spectral density of the output file resembles a masked

version of the original audio spectrum, so the frequency

spectrum of the input file serves as a canvas for the image. A

noisy sound with a broad range of frequencies (such as

Gaussian white noise, or pink noise) will provide the most

uniform frequency spectrum.

Music files usually exhibit a periodic burst of frequencies

on the downbeats followed by quieter and more focused

spectral density. This periodicity results in a spectrogram that

is characterized by bright bars that fade to a sparser spectrum

on the upbeats. Spectrographically filtered music resembles

the original image overlaid with vertical scan lines. This

problem can be avoided by selecting musical passages that are

particularly cacophonous.

B. Audio Processing

The filter behind IBSP is actually a series of filters that are

applied sequentially to frames of the original sound. These

frames are then recombined into the spectrally modified

output sound. Many filtering options were tested, but a finite

impulse response filter ended up being the best way to

minimize execution time while maximizing spectrographic

image quality.

Because it is quick and simple to implement in MATLAB,

FIR filtering was employed during the initial design phase.

Surprisingly, it proved to be the most viable method

considered.

Once the initial design was completed, more complex filter

designs were tested. FIR filters that were designed to provide

higher precision, including least-squares approximations [5]

and the Parks-McClellan algorithm [6], produced slight visual

differences and drastic increases in execution time. A

recursive IIR filter implementation [7] was attempted as well,

but the algorithm did not meet expectations.

The trailing samples of each FIR filter output frame,

resulting from the convolution of the impulse response with

the input frame, create an aurally pleasing transition between

each column of the image. Previously tested lower-order

filters left jarring frequency bursts between column transitions.

mailto:[email protected]

The frequency response of the FIR filter is described by a

vector of frequencies, and a corresponding vector of

magnitude. Each column of the input image enumerates these

magnitudes. A shorter image results in a shorter list of

frequency-magnitude pairs. Fewer pairs lead to fewer

compromises in the filter creation, which in turn lead to a

clearer and more accurate representation of the original image.

The IBSP function automatically selects an appropriate

order for the filter design. This maximum order number is

limited by two factors: the length of the image, and the

number of samples in the sound. The frame size is equal to

the number of audio samples in the original sound divided by

the width, in pixels, of the image. The order of the filter must

be less than the frame length.

The upper bound of the filter is theoretically unlimited, but

a hard limit of 1024 is built in to the program to avoid wasting

processing power on a needlessly detailed filter. Experimental

results indicate that there is little improvement in a filter of

double this magnitude, while the execution time is greatly

increased.

C. Reprocessing with Multiple Passes

Further detail can be applied to the output sound by re-

processing: Executing the algorithm again with the output

sound as the input, and filtering with the same image. This

functionality is built in to the program, because multiple

passes have an extremely beneficial effect on the process.

This benefit is particularly noticeable with detailed images.

A single pass with the IBSP function results in adequate

results for small images, and images that are comprised of

broad areas of uniform color. Two to four passes will result in

a clearer spectrographic representation of these types of

images, but a great deal of passes leads to attenuation of

frequencies that should not be attenuated.

For photographs and other, more detailed images, four

passes will provide an acceptable level of detail. The finer

details of an image will not stand out in the results of the first

few passes.

Each pass of the IBSP reapplies the filter, and increases the

contrast of the image as it exists in the spectral intensity of the

sound file. The increasing contrast occurs because each

sample of the file is, at best, left alone. Most samples (and

most frequencies as a result) are attenuated by the filtering

process. The darkest portions of the image result in the most

attenuated signals, and the attenuation decreases as the image

map becomes more intense. Because the rate of attenuation

decreases with brightness, the difference in brightness

between two points of unequal intensity grows.

Unfortunately, this effect has diminishing returns. This

attenuation results in a race to the bottom. The frequencies

representing the darkest pixels attenuate at the fastest rate.

These incongruous fall rates cause the frequency intensities to

move farther away from each other as they fall (increasing

contrast), but the intensities eventually reach zero and the

detail they contained is lost. An over-passed sound is quiet

aside from a few loud bursts, and its spectrogram is very dark

with a few exceptionally bright patches.

III. DEMONSTRATIVE RESULTS

To demonstrate the functionality of the IBSP function, the

picture in Figure 1 was used to filter 30 seconds of Gaussian

white noise with a sample rate of 8000 Hz.

Gaussian white noise is an extremely harsh sound, but it

possesses a uniform frequency spectrum across the entire

frequency band. It was chosen for this demonstration for two

reasons: It provides a very uniform background with which to

demonstrate IBSP functionality, and no one will ever have to

listen to these sounds.

Figure 1: The image used to demonstrate the functionality of the filter.

This image is broken into two regions: a photograph of the

campus at UNT, and the UNT logo. Between these two

pictures, the filter’s ability to represent detail will be

demonstrated.

The spectrogram of the result of the first pass can be seen in

Figure 2. The UNT logo is immediately and clearly visible,

but the photograph portion of the image is unclear and noisy.

An additional pass (see Fig 3) brings out more detail in the

photo while significantly attenuating the fill color of the logo.

Figure 2: Spectrogram of the output sound resulting from processing pink

noise with the image in Figure 1 (1st Pass)

Figure 3: Details emerge with a 2nd pass.

Figure 4 continues this trend with an additional boost in the

detail of the UNT campus after the 4th pass. The fill color of

the UNT logo has been completely muted by this pass. This

tradeoff illustrates the need for a customizable number of

passes in the IBSP function.

Figure 4: After a 4th pass, details of the photograph emerge.

By the 8th pass (see Fig 5) the darker portions of the photo

have begun to lose detail. The outlines of the logo are also

thinning out. This loss of detail due to the attenuation of

darker colors marks the beginning stages of over-passing.

Figure 5: The result of 8 passes with the IBSP function.

By the 16th frame (Figure 6), all but the brightest sections of

the picture have been attenuated. The remaining areas of the

photo are extraordinarily detailed, but the UNT logo is totally

unrecognizable. These images have a totally different

aesthetic than the original input, but the sparse frequency

spectrum produces a much more pleasing sound than the

original.

Figure 6: The 16th pass.

As mentioned earlier in the paper, this process can be

performed on any image-sound pair, but sounds with a less-

uniform spectral density do not produce easy-to-see results.

Figure 7 illustrates a new input sound: a piece of music with a

very strong downbeat. This downbeat results in a periodic

wall of frequencies followed by quieter gaps. Visually this

results in a spectrogram that looks like it has been fed through

a paper shredder.

Figure 7: The demonstration image mixed with an actual piece of music.

Sound files featuring lots of sustain and “walls of sound”

typically have a broad and stable frequency distribution. More

nuanced noises such as pink noise and grey noise can be used

in place of white to achieve a more pleasing sound.

IV. CONCLUSIONS

The Image Based Spectrographic Processing function

discussed in this paper uses multiple passes of a time-varying

FIR filter to manipulate the spectral density of a sound file

with the grayscale color map of an image file. The results of

this demonstration illustrated that IBSP is appropriate for

detailed spectral reproduction of images with any level of

detail.

ACKNOWLEDGMENTS

Noah Maze wishes to acknowledge Oluwayomi Adamo and

other contributors for developing and maintaining the course

content consulted in this assignment.

REFERENCES

[1] Bastwood (2010, September 13). The Aphex Face | bastwood.

Message posted to http://www.bastwood.com/?page_id=10 [2] Kahney, Leander (2002, May 10). Hey, Who's That Face in My Song?

Retrieved December 17, 2010, from

http://www.wired.com/culture/lifestyle/news/2002/05/52426Thad B. [3] MathWorks. (2010). Convert indexed image to RGB image. Retrieved

December 17, 2010, from

http://www.mathworks.com/help/techdoc/ref/ind2rgb.html [4] Welch, Cameron H. G. Wright and Michael G. Morrow, Real-time

digital signal processing from MATLAB to C with the TMS320C6x

DSK, Florida: CRC Press, 2006. [5] MathWorks. (2010). Least square linear-phase FIR filter design.


http://www.mathworks.com/help/toolbox/signal/firls.html

[6] MathWorks. (2010).Parks-McClellan optimal FIR filter design.


http://www.mathworks.com/help/toolbox/signal/firpm.html [7] MathWorks. (2010).Recursive digital filter design. Retrieved

December 17, 2010, from

http://www.mathworks.com/help/toolbox/signal/yulewalk.html

http://www.bastwood.com/?page_id=10

http://www.mathworks.com/help/toolbox/signal/firls.html

image-based spectrographic processing

Documents