image-based spectrographic processing
DESCRIPTION
This paper presents a novel way to graphically filter a sound’s frequency spectrum with an image file. The filtering is accomplished by creating a series of multiple-passband filters based on each column of the image, and applying them sequentially to the original audio file.TRANSCRIPT
Image Based Spectrographic Processing Noah Benjamin Maze
Electrical Engineering, University of North Texas
3940 N. Elm Street
Denton, TX 76207-7102
Abstract— This paper presents a novel way to graphically filter a
sound’s frequency spectrum with an image file. The filtering is
accomplished by creating a series of multiple-passband filters
based on each column of the image, and applying them
sequentially to the original audio file.
I. INTRODUCTION
Historically, embedding an image into the voiceprint of a
sound has only been attempted by a handful of avant-garde
musicians [1]. These attempts have produced noisy results
that are generally void of any appreciable audio content.
This is a byproduct of their production. Before today,
generation of spectral images was done by reversing the
process of generating a spectrogram. The sole input to this
process was the target image, and the resulting sound has
been described as “discordant, metallic scratching” by Wired
magazine [2].
By using abstract, minimalistic images, artists have been
able to coax more interesting sounds from this algorithm, but
this sacrifices the overall effect of the image.
Image Based Spectral Processing (IBSP) allows any audio
file to be the manipulated into a spectrographic image. The
resulting sound has a frequency spectrum that is characterized
by ever-changing passbands, but its original content is
otherwise preserved.
This functionality is accomplished by way of a series of
FIR filters working to create a frequency response that
resembles an image supplied by the user. This supplied
image is broken into columns that correspond to the
amplitude response of the normalized frequencies of the
sound file.
The mathematics behind this system are simple enough to
see use in many different environments, but this proof of
concept was assembled, tested and demonstrated in
MATLAB.
II. THE FILTERING PROCESS
Each pixel of the original image corresponds to a
momentary bandpass filter. The passband of the filter is
centered at a frequency corresponding to the pixel’s vertical
position in the image, and the filter is applied to a window of
time corresponding to the horizontal position of the pixel.
A. Input Acquisition
The arguments to the IBSP function contain the location of
an image file and a sound file to be processed by the
aforementioned image.
1) Image Input and Preprocessing
Image Based Spectral Processing can work with any raster
image. Both indexed and RGB images are acceptable, but
indexed images are converted to RGB before processing can
take place [3].
Spectrograms treat intensity as a 1-dimensional value, so
the RGB color values for each pixel are averaged together to
produce a 1-dimensional (grayscale) intensity map. The
resulting matrix is contains a value corresponding to each
pixel of the original image.
2) Sound Input
Sound files can be of any sample rate. When an audio file
is loaded, its sampling frequency is documented and applied to
the output function, but it is not used for any portion of the
processing. Filtering is done based on the normalized
frequency of the audio data [4].
The spectral density of the output file resembles a masked
version of the original audio spectrum, so the frequency
spectrum of the input file serves as a canvas for the image. A
noisy sound with a broad range of frequencies (such as
Gaussian white noise, or pink noise) will provide the most
uniform frequency spectrum.
Music files usually exhibit a periodic burst of frequencies
on the downbeats followed by quieter and more focused
spectral density. This periodicity results in a spectrogram that
is characterized by bright bars that fade to a sparser spectrum
on the upbeats. Spectrographically filtered music resembles
the original image overlaid with vertical scan lines. This
problem can be avoided by selecting musical passages that are
particularly cacophonous.
B. Audio Processing
The filter behind IBSP is actually a series of filters that are
applied sequentially to frames of the original sound. These
frames are then recombined into the spectrally modified
output sound. Many filtering options were tested, but a finite
impulse response filter ended up being the best way to
minimize execution time while maximizing spectrographic
image quality.
Because it is quick and simple to implement in MATLAB,
FIR filtering was employed during the initial design phase.
Surprisingly, it proved to be the most viable method
considered.
Once the initial design was completed, more complex filter
designs were tested. FIR filters that were designed to provide
higher precision, including least-squares approximations [5]
and the Parks-McClellan algorithm [6], produced slight visual
differences and drastic increases in execution time. A
recursive IIR filter implementation [7] was attempted as well,
but the algorithm did not meet expectations.
The trailing samples of each FIR filter output frame,
resulting from the convolution of the impulse response with
the input frame, create an aurally pleasing transition between
each column of the image. Previously tested lower-order
filters left jarring frequency bursts between column transitions.
The frequency response of the FIR filter is described by a
vector of frequencies, and a corresponding vector of
magnitude. Each column of the input image enumerates these
magnitudes. A shorter image results in a shorter list of
frequency-magnitude pairs. Fewer pairs lead to fewer
compromises in the filter creation, which in turn lead to a
clearer and more accurate representation of the original image.
The IBSP function automatically selects an appropriate
order for the filter design. This maximum order number is
limited by two factors: the length of the image, and the
number of samples in the sound. The frame size is equal to
the number of audio samples in the original sound divided by
the width, in pixels, of the image. The order of the filter must
be less than the frame length.
The upper bound of the filter is theoretically unlimited, but
a hard limit of 1024 is built in to the program to avoid wasting
processing power on a needlessly detailed filter. Experimental
results indicate that there is little improvement in a filter of
double this magnitude, while the execution time is greatly
increased.
C. Reprocessing with Multiple Passes
Further detail can be applied to the output sound by re-
processing: Executing the algorithm again with the output
sound as the input, and filtering with the same image. This
functionality is built in to the program, because multiple
passes have an extremely beneficial effect on the process.
This benefit is particularly noticeable with detailed images.
A single pass with the IBSP function results in adequate
results for small images, and images that are comprised of
broad areas of uniform color. Two to four passes will result in
a clearer spectrographic representation of these types of
images, but a great deal of passes leads to attenuation of
frequencies that should not be attenuated.
For photographs and other, more detailed images, four
passes will provide an acceptable level of detail. The finer
details of an image will not stand out in the results of the first
few passes.
Each pass of the IBSP reapplies the filter, and increases the
contrast of the image as it exists in the spectral intensity of the
sound file. The increasing contrast occurs because each
sample of the file is, at best, left alone. Most samples (and
most frequencies as a result) are attenuated by the filtering
process. The darkest portions of the image result in the most
attenuated signals, and the attenuation decreases as the image
map becomes more intense. Because the rate of attenuation
decreases with brightness, the difference in brightness
between two points of unequal intensity grows.
Unfortunately, this effect has diminishing returns. This
attenuation results in a race to the bottom. The frequencies
representing the darkest pixels attenuate at the fastest rate.
These incongruous fall rates cause the frequency intensities to
move farther away from each other as they fall (increasing
contrast), but the intensities eventually reach zero and the
detail they contained is lost. An over-passed sound is quiet
aside from a few loud bursts, and its spectrogram is very dark
with a few exceptionally bright patches.
III. DEMONSTRATIVE RESULTS
To demonstrate the functionality of the IBSP function, the
picture in Figure 1 was used to filter 30 seconds of Gaussian
white noise with a sample rate of 8000 Hz.
Gaussian white noise is an extremely harsh sound, but it
possesses a uniform frequency spectrum across the entire
frequency band. It was chosen for this demonstration for two
reasons: It provides a very uniform background with which to
demonstrate IBSP functionality, and no one will ever have to
listen to these sounds.
Figure 1: The image used to demonstrate the functionality of the filter.
This image is broken into two regions: a photograph of the
campus at UNT, and the UNT logo. Between these two
pictures, the filter’s ability to represent detail will be
demonstrated.
The spectrogram of the result of the first pass can be seen in
Figure 2. The UNT logo is immediately and clearly visible,
but the photograph portion of the image is unclear and noisy.
An additional pass (see Fig 3) brings out more detail in the
photo while significantly attenuating the fill color of the logo.
Figure 2: Spectrogram of the output sound resulting from processing pink
noise with the image in Figure 1 (1st Pass)
Figure 3: Details emerge with a 2nd pass.
Figure 4 continues this trend with an additional boost in the
detail of the UNT campus after the 4th pass. The fill color of
the UNT logo has been completely muted by this pass. This
tradeoff illustrates the need for a customizable number of
passes in the IBSP function.
Figure 4: After a 4th pass, details of the photograph emerge.
By the 8th pass (see Fig 5) the darker portions of the photo
have begun to lose detail. The outlines of the logo are also
thinning out. This loss of detail due to the attenuation of
darker colors marks the beginning stages of over-passing.
Figure 5: The result of 8 passes with the IBSP function.
By the 16th frame (Figure 6), all but the brightest sections of
the picture have been attenuated. The remaining areas of the
photo are extraordinarily detailed, but the UNT logo is totally
unrecognizable. These images have a totally different
aesthetic than the original input, but the sparse frequency
spectrum produces a much more pleasing sound than the
original.
Figure 6: The 16th pass.
As mentioned earlier in the paper, this process can be
performed on any image-sound pair, but sounds with a less-
uniform spectral density do not produce easy-to-see results.
Figure 7 illustrates a new input sound: a piece of music with a
very strong downbeat. This downbeat results in a periodic
wall of frequencies followed by quieter gaps. Visually this
results in a spectrogram that looks like it has been fed through
a paper shredder.
Figure 7: The demonstration image mixed with an actual piece of music.
Sound files featuring lots of sustain and “walls of sound”
typically have a broad and stable frequency distribution. More
nuanced noises such as pink noise and grey noise can be used
in place of white to achieve a more pleasing sound.
IV. CONCLUSIONS
The Image Based Spectrographic Processing function
discussed in this paper uses multiple passes of a time-varying
FIR filter to manipulate the spectral density of a sound file
with the grayscale color map of an image file. The results of
this demonstration illustrated that IBSP is appropriate for
detailed spectral reproduction of images with any level of
detail.
ACKNOWLEDGMENTS
Noah Maze wishes to acknowledge Oluwayomi Adamo and
other contributors for developing and maintaining the course
content consulted in this assignment.
REFERENCES
[1] Bastwood (2010, September 13). The Aphex Face | bastwood.
Message posted to http://www.bastwood.com/?page_id=10 [2] Kahney, Leander (2002, May 10). Hey, Who's That Face in My Song?
Retrieved December 17, 2010, from
http://www.wired.com/culture/lifestyle/news/2002/05/52426Thad B. [3] MathWorks. (2010). Convert indexed image to RGB image. Retrieved
December 17, 2010, from
http://www.mathworks.com/help/techdoc/ref/ind2rgb.html [4] Welch, Cameron H. G. Wright and Michael G. Morrow, Real-time
digital signal processing from MATLAB to C with the TMS320C6x
DSK, Florida: CRC Press, 2006. [5] MathWorks. (2010). Least square linear-phase FIR filter design.
Retrieved December 17, 2010, from
http://www.mathworks.com/help/toolbox/signal/firls.html
[6] MathWorks. (2010).Parks-McClellan optimal FIR filter design.
Retrieved December 17, 2010, from
http://www.mathworks.com/help/toolbox/signal/firpm.html [7] MathWorks. (2010).Recursive digital filter design. Retrieved
December 17, 2010, from
http://www.mathworks.com/help/toolbox/signal/yulewalk.html