Primer on Analyzing Animal Sounds:
Figures and Sample Sounds
Jack Bradbury & Sandra Vehrencamp
Cornell University
Recording Sounds• A sound is a propagated disturbance in the
ambient pressure of a medium (air, water, etc.)
• Each region of higher-than ambient pressure is matched by a following region of lower-than average pressure
• A microphone converts the variations in pressure created by a passing sound wave into electrical signals that mimic the rise and fall of sound pressure at the microphone
Describing and Comparing Sounds• A plot of pressure versus time is called the
waveform of a sound. It is a description of a sound in the time domain. Examples:
• How can we describe and compare these signals?
Pre
ssur
e
Time Time
Bellbird Oropendola
(Move cursor over waveform to play sound)
Simple Waveforms• The simplest type of signal one could ever
record is a single sine wave that does not change in either amplitude or frequency:
Time Domain Measurements• There are two measures that could easily be
made on this waveform:Amplitude: What are the maximum or average
deviations in pressure from ambient levels?
Time domain
APeak-peak
ARMS average
Time Domain Measurements• There are two measures that could easily be
made on this waveform:Amplitude: Rather than absolute values, one
usually compares amplitude to some soft reference sound, as dB = 20 log10 (Aobs/Aref)
Time domain
APeak-peak
ARMS average
Time Domain Measurements• There are two measures that could easily be
made on this waveform:Frequency: How many cycles/sec (= Hz) are
present? Easiest to compute time between cycles and take reciprocalT
Time domain
f = 1/T
Frequency Domain Measures• It is convenient to plot these two measures
on their own graph, known as a frequency-domain description of the sound:
Am
plit
ude
Frequency
T
Time domain Frequency domain
f = 1/T
f
APeak-peak
Waves That Are Not Sine Waves• But how can we describe these waves?
• In the first example, the frequency is not constant. What should we put in the frequency-domain plot?
• In the second example, both the shape and amplitude of the successive “waves” change. What can we do with this one?
Waveform 1 Waveform 2
Fourier Analysis• There is hope!
Any continuous waveform can be broken down into a set of pure sine waves with frequency and amplitude values that can be computed or measured (Fourier analysis).
Frequency-domain plots provide us with a very powerful way to describe and compare any set of sounds.
Fourier Analysis• Applying the Fourier solution, we get:
A plot of amplitude versus frequency components is called the frequency spectrum (or power spectrum) of a sound.
Am
plit
ude
Frequency Frequency
Waveform 1 Waveform 2
Fourier Analysis• But what do we do if the waveform keeps
changing during the signal, like in this lark sparrow song?
Pre
ssur
e
Time
(Move cursor over waveform to play sound)
Fourier Analysis• But what do we do if the waveform keeps
changing during the signal, like in this lark sparrow song?
• The solution is to break the song into homogeneous segments and create a frequency spectrum for each segment.
Pre
ssur
e
Time
Fourier Analysis• These are then strung together along the
timeline so we can see how the frequency spectra change as the song progresses.
• Such a plot is called a spectrogram, and we shall come back to how these are generated.
Pre
ssur
e
Time
Predicting Power Spectra from Waveforms
• There are three types of deviations from a single sine wave. Most animal signals are some combination of these:
Single sine wave
Amplitude modulation (AM)
Frequency modulation (FM)
Periodic nonsinusoidal
signals
Predicting Power Spectra from Waveforms
• If we can predict the frequency spectrum for each type of deviation, we can predict the spectrum for nearly any signal.
Single sine wave
Amplitude modulation (AM)
Frequency modulation (FM)
Periodic nonsinusoidal
signals
Analysis of Typical Waveforms• Sinusoidal amplitude modulation (AM):
Am
plit
ude
Frequency
Time Domain Frequency domain
Analysis of Typical Waveforms• Sinusoidal amplitude modulation (AM)
Two time-domain measures are possible:
(1) Carrier frequency ( f )
Am
plit
ude
Frequency
Time domain Frequency domainT f = 1/T Carrier
Analysis of Typical Waveforms• Sinusoidal Amplitude Modulation (AM)
Two time domain measures are possible:
(1) Carrier frequency ( f ), and
(2) Modulation rate (w), the number of complete modulation cycles per second
Am
plit
ude
Frequency
Time domain Frequency domainT f = 1/T
t w = 1/t
Carrier
Modulating frequency
Analysis of Typical Waveforms• Sinusoidal amplitude modulation (AM)
Frequency spectrum is 3 lines: carrier f and two side bands at f – w and f + w.
Am
plit
ude
Frequency
Time domain Frequency domain
fT f = 1/T
t w = 1/t
f–w f+wCarrier
Modulating frequency
Analysis of Typical Waveforms• Sinusoidal amplitude modulation (AM)
Frequency spectrum is 3 lines: carrier f and two side bands at f – w and f + w.
The greater the amplitude of w, the higher the sidebands, but these never exceed f amplitude
Am
plit
ude
Frequency
Time domain Frequency domain
fT f = 1/T
t w = 1/t
f–w f+wCarrier
Modulating frequency
Amplitude of w
Predicting Power Spectra from Waveforms
• If we can predict the frequency spectrum for each type of deviation, we can predict the spectrum for nearly any signal.
Single sine wave
Amplitude modulation (AM)
Frequency modulation (FM)
Periodic nonsinusoidal
signals
Analysis of Typical Waveforms• Sinusoidal frequency modulation (FM)
Suppose we keep amplitude fixed, but modulate the frequency of a sine wave sinusoidally, e.g.:
Am
plit
ude
Time domain Frequency domain
Fre
quen
cy
Time
Analysis of Typical Waveforms• Sinusoidal Frequency Modulation (FM)
What can we measure in the time domain?
Am
plit
ude
Time domain Frequency domain
Fre
quen
cy
Time
T1
fmax= 1/T1
Analysis of Typical Waveforms• Sinusoidal frequency modulation (FM)
What can we measure in the time domain?
Am
plit
ude
Time domain Frequency domain
Fre
quen
cy
Time
T2 T1
fmax= 1/T1
fmin= 1/T2
Analysis of Typical Waveforms• Sinusoidal frequency modulation (FM)
What can we measure in the time domain?
Am
plit
ude
Time domain Frequency domain
Fre
quen
cy
Time
T2 T1
fmax= 1/T1
fmin= 1/T2
Carrier ( f ) = (fmax+ fmin) / 2
Analysis of Typical Waveforms• Sinusoidal frequency modulation (FM)
What can we measure in the time domain?
Am
plit
ude
Time domain Frequency domain
Modulating frequency, w
Fre
quen
cy
Time
T2 T1
fmax= 1/T1
fmin= 1/T2
t w = 1/t
Carrier ( f ) = (fmax+ fmin) / 2
Analysis of Typical Waveforms• Sinusoidal frequency modulation (FM)
What can we measure in the time domain?
Am
plit
ude
Time domain Frequency domain
Modulating frequency, w
Fre
quen
cy
Time
T2 T1
fmax= 1/T1
fmin= 1/T2
t w = 1/t
Modulation index = (fmax – fmin) / w
Carrier ( f ) = (fmax+ fmin) / 2
Analysis of Typical Waveforms• Sinusoidal frequency modulation (FM)
The frequency spectrum for a sinusoidally FM waveform has a line at the carrier and sidebands for each f ± nw around the carrier (nmax= ∞), where n is a positive integer (1, 2, 3, etc.).
Am
plit
ude
Time domain Frequency domain
f
f–w f+wT2 T1
t w = 1/t
f–2w
f–3w
f+2w
f+3w
Analysis of Typical Waveforms• Sinusoidal frequency modulation (FM)
The frequency spectrum for a sinusoidally FM waveform has a line at the carrier and sidebands for each f±nw around the carrier (nmax= ∞)
If the modulation index <10, then the carrier has the highest amplitude and sideband amplitudes decrease with n
Am
plit
ude
Time domain Frequency domain
f
f–w f+wT2 T1
t w = 1/t
f–2w
f–3w
f+2w
f+3w
Analysis of Typical Waveforms• Sinusoidal frequency modulation (FM)
If the modulation index >20, then the sidebands and the carrier have the same frequency values as before, but the carrier can have a lower amplitude than the sidebands
Am
plit
ude
Time domain Frequency domain
f
f–w f+w
f–2w
f–3w
f+2w
f+3wT2
T1
t w = 1/t
Predicting Power Spectra from Waveforms
• If we can predict the frequency spectrum for each type of deviation, we can predict the spectrum for nearly any signal.
Single sine wave
Amplitude modulation (AM)
Frequency modulation (FM)
Periodic nonsinusoidal
signals
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
Any shape of waveform is allowed under this category as long as there is a clearly repeating unit. For example:
Am
plit
ude
Time domain Frequency domain
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
The major measurement we can make on this waveform in the time domain is the period of the repeats (t), and thus the repeat rate, w.
Am
plit
ude
Time domain Frequency domain
t w = 1/t
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
The frequency spectrum of a periodic waveform contains components at w, 2w, 3w, etc., to infinity. When spectrum components are integer multiples of some frequency w, we call the set a harmonic series. The fundamental is w and 2w is the second harmonic, etc.
Am
plit
ude
Time domain Frequency domain
2w
w 3w 5w
4w6w
7w
t w = 1/t
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
The amplitude of successively higher harmonics tends to decrease in an exponential manner (Dirichlet’s Rule)…
Am
plit
ude
Time domain Frequency domain
2w
w 3w 5w
4w6w
7w
t w = 1/t
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
The amplitude of successively higher harmonics tends to decrease in an exponential manner (Dirichlet’s Rule)…
unless the wave is half-wave symmetric. To determine this…
Am
plit
ude
Time domain Frequency domain
2w
w 3w 5w
4w6w
7w
t w = 1/t
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
To determine if the wave is half-wave symmetric, divide a complete cycle of a periodic waveform in half….
Am
plit
ude
Time domain Frequency domain
2w
w 3w 5w
4w6w
7w
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
Divide a complete cycle of a periodic waveform in half. Then reflect the right half upside down. If the two halves are different, the waveform is half-wave asymmetric and the spectrum shows all harmonics.
Am
plit
ude
Time domain Frequency domain
2w
w 3w 5w
4w6w
7w
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
Now try this on a different periodic waveform. Measure t and compute w. Again, isolate one complete cycle and divide it in half on the time axis:
Am
plit
ude
Time domain Frequency domain
t w = 1/t
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
Now flip the right half upside down. If the two halves are the same, the waveform is half-wave symmetric and the amplitudes of all even harmonics are zero. Only odd harmonics are present to follow Dirichlet’s Rule.
Am
plit
ude
Time domain Frequency domain
w 3w 5w 7w
t w = 1/t
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
Another deviation from Dirichlet’s Rule occurs if there are “multiple maxima” in the waveform. Take the following example:
Am
plit
ude
Time domain Frequency domain
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
We can measure the usual period between repeats of the periodic waveform, t, and use it to predict the fundamental of the harmonic series that will occur in the frequency spectrum:
Am
plit
ude
Time domain Frequency domain
t
w = 1/t
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
But we can also measure the interval between multiple maxima, , and use it to compute a frequency z. Because < t, then z > w.
Am
plit
ude
Time domain Frequency domain
t
w = 1/t
z = 1/
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
This leads to the following harmonic series, based on the fundamental w. Whenever a harmonic of w is close to an integer multiple of z, it has lower amplitude than intermediate harmonics.
Am
plit
ude
Time domain Frequency domain
15ww 10w5w
t
w = 1/t
z = 1/≈ z ≈ 2z ≈ 3z
Analysis of Typical Waveforms• Periodic nonsinusoidal waveforms
The result is bands of harmonics that have higher amplitudes (lobes) and intervening harmonics with low amplitudes (nodes). Note that harmonics still gradually decrease following Dirichlet’s Rule.
Am
plit
ude
Time domain Frequency domain
15ww 10w5w
t
w = 1/t
z = 1/≈ z ≈ 2z ≈ 3z
LobesNodes
Analysis of Typical Waveforms• Compound waveforms
While a few birds can emit pure sine waves, most animal sounds are some combination of AM, FM, and/or nonsinusoidal periodic signals.
We call these compound waveforms.Compound waveforms can always be
decomposed into a set of carrier sine waves and a set of modulating sine waves. We can then use the simple rules of AM and FM to add the appropriate sidebands for each modulating sine wave around each carrier sine wave. This is the spectrum of the compound wave.
Analysis of Typical Waveforms• Compound waveforms
Consider the following example of a frog call:
This appears to be a pure sine wave that has been amplitude-modulated with a repeating waveform that is not sinusoidal
Time domain
Analysis of Typical Waveforms• Compound waveforms
Continuing with the frog call…
The first thing to do is characterize the carrier. We see that it is a pure sine wave. So, we measure the period of the sine waves inside the pulses and use the resulting T to compute the sine wave carrier frequency f = 1/T.
Time domainT
Analysis of Typical Waveforms• Compound waveforms
Continuing with the frog call…
The next step is to characterize the modulating waveform. This repeats every t seconds giving a repetition rate of w = 1/t. The frequency spectrum of the modulating waveform will be a harmonic series with a fundamental of w.
Time domain
t
Analysis of Typical Waveforms• Compound waveforms
Continuing with the frog call…
We draw the envelope of the waveform to see the shape of the modulating waveform. This is not half-wave symmetric. The modulating spectrum will show all harmonics.
Time domain
Analysis of Typical Waveforms• Compound waveforms
Continuing with the frog call…
Also, there is only one maximum in each repeat of the modulating waveform. So, there will be no lobes or nodes in its spectrum.
Time domain
Analysis of Typical Waveforms• Compound waveforms
Putting these results together, we have two frequency spectra to deal with:
Am
plit
ude
Frequency domainf
Am
plit
ude
w, 2w, 3w…
Carrier Modulating waveform
Analysis of Typical Waveforms• Compound waveforms
Since the modulating waveform now consists of a series of pure sine waves, we can use the AM rules to modulate the carrier f with each of them in turn.
Am
plit
ude
Frequency domainf
Am
plit
ude
w, 2w, 3w…
Carrier Modulating waveform
Analysis of Typical Waveforms• Compound waveforms
Since the modulating waveform now consists of a series of pure sine waves, we can use the AM rules to modulate the carrier f with each of them in turn, first using the fundamental of the modulating waveform, w:
Am
plit
ude
Frequency domainf
Am
plit
ude
w, 2w, 3w…
Carrier Modulating waveform
f + wf–w
Analysis of Typical Waveforms• Compound waveforms
Next, sinusoidally amplitude-modulate the carrier with the second harmonic, 2w. Note that because this component has less amplitude than w in the modulating waveform spectrum, it also has lower amplitude as sidebands.
Am
plit
ude
Frequency domainf
Am
plit
ude
w, 2w, 3w…
Carrier Modulating waveform
f +wf–w
f–2w
f +2w
Analysis of Typical Waveforms• Compound waveforms
We continue until we have sinusoidally amplitude-modulated f with every harmonic in the series constituting the modulating waveform spectrum:
Am
plit
ude
Frequency domainf
Am
plit
ude
w, 2w, 3w…
Carrier Modulating waveform
f +wf–w
f –2w
f +2w
Analysis of Typical Waveforms• Compound waveforms
Nearly all animal sounds are compound waveforms. Any combination is possible:
Carrier ModulationModulating waveform Result
FM
AM
FM
The Uncertainty Principle
• Any Fourier analyzer needs several cycles of a signal to compute component frequencies.
• The more cycles of a stable frequency component that an analyzer can measure, the more accurate the measurement of that frequency.
Medium duration sample
The Uncertainty Principle
• If the analyzer has only a short time to estimate frequencies, each component will appear as a wide band in the frequency spectrum; if a longer time is available, frequency components will be narrow bands. Example: sinusoidal AM signal:
Am
plit
ude
Frequency
Long duration sample
Am
plit
ude
Frequency
Am
plit
ude
Frequency
Short duration sample
Medium duration sample
The Uncertainty Principle
• The bandwidth, f, of an analyzer is the minimum difference in two adjacent frequencies that can be distinguished.
• Clearly, short duration samples result in large f values, and long duration samples result in small f values.
Am
plit
ude
Frequency
Long duration sample
Am
plit
ude
Frequency
Am
plit
ude
Frequency
Short duration sample
• If we let t be the duration of the shortest sampling time available to a Fourier analyzer, the Uncertainty Principle for sound analysis states that:
f·t ≈ 1
Medium t, medium f
The Uncertainty Principle
Am
plit
ude
Frequency
Long t, small f
Am
plit
ude
Frequency
Am
plit
ude
Frequency
Small t, large f
Making Spectrograms• We noted earlier that a spectrogram is
created by dividing a sound into segments, computing the frequency spectrum for each segment, and then stringing the segments together along the time axis.
Making Spectrograms• Thus, we might take the lark sparrow song
that we saw earlier…
Pre
ssur
e
Time
(Move cursor over waveform to play sound)
Making Spectrograms• Compute the frequency spectrum for each
segment. Align these along the time axis (imagine the peaks sticking out of the plane of the graphs).
t
Fre
quen
cy
Time
Making Spectrograms• Then, use black to mark those portions of
the overall graph that have higher peaks, use white to mark the lower amplitude components, and use grey for intermediate portions.
t
Fre
quen
cy
Time
Making Spectrograms• Then, use black to mark those portions of
the overall graph that have higher peaks, use white to mark the lower amplitude components, and use grey for intermediate portions.
t
Fre
quen
cy
Time
Making Spectrograms• The result is a spectrogram with frequency
on the vertical axis, time on the horizontal axis, and amplitude of a frequency component at a given time indicated by darkness on the plot.
t
Fre
quen
cy
Time
Spectrograms and Bandwidth• The spectrogram we just made uses a pretty
large t. This gives us very fine frequency resolution (f = 5 Hz), but much of the temporal resolution has been lost. Can we get by with a smaller t?
Fre
quen
cy
Time
Spectrograms and Bandwidth• Let’s decrease t by 4×. This will give us a
f = 20 Hz). This starts to restore some of the temporal pattern, and the frequency bands are still pretty thin.
Fre
quen
cy
Time
Spectrograms and Bandwidth• Let’s decrease t by 4× again. This will
give us a f = 80 Hz. We get much better temporal pattern and even some better frequency pattern because FM signals show as FM, not their components!
Fre
quen
cy
Ti×me
Spectrograms and Bandwidth• Let’s decrease t by 4× once more. This
will give us a f = 320 Hz. This is similar to the prior bandwidth, but we can see the temporal pattern in the last notes better.
Fre
quen
cy
Time
Spectrograms and Bandwidth• Let’s decrease t by 4× again. This will
give us a f = 1280 Hz. Now, large bands start to appear instead of fine lines, although the temporal pattern is retained.
Fre
quen
cy
Time
Spectrograms and Bandwidth• Let’s decrease t by 4× yet again. This will
give us a f = 5120 Hz. We have now lost any decent frequency resolution, but the temporal pattern is retained.
Fre
quen
cy
Time
Spectrograms and Bandwidth• Clearly, an intermediate bandwidth, f,
provides the optimal balance of frequency resolution and temporal resolution.
Fre
quen
cy
Time
Spectrograms and Bandwidth• In general, you want a bandwidth:
small enough to separate harmonics clearly;big enough to show FM undecomposed; and big enough to show AM undecomposed.
Fre
quen
cy
Time
Digital Sound Analysis• Computers and DAT recorders sample
(digitize) the continuous rise and fall of sound amplitudes at some fixed rate and store a long column (vector) of amplitude values. Music CDs sample at 44.1 kHz.
Digital Sound Analysis• At each sample point, the computer also
digitizes the amplitude value into one of N equidistant categories. The number of categories depends on how many “bits” are used to store each value. N = 2number of bits
• Music CDs store 16 bits/sample and thus divide the full amplitude range into 216 = 65,536 possible values.
Digital Sound Analysis• The higher the sampling rate and the higher
the bit depth, the more accurately the digital recording captures the original sound.
• However, increasing sampling rate or bit depth or both increases the size of the digital file that must be stored.
• In stereo recording, two columns of numbers must be stored, taking up even more memory.
Digital Sound Analysis• Nyquist frequency: A digital recorder or
computer must be able to take at least 2 samples/cycle to be able to identify each frequency.
• Thus, if you digitize your sounds at R samples/sec, you will be unable to properly capture any component with frequency >R/2. This latter value is called the Nyquist frequency.
Digital Sound Analysis• Aliasing: If you do not sample your sounds
at a high enough rate, any frequency in the sounds that is higher than half the sampling rate is aliased. This means you will see an artifact in your spectrograms consisting of an inverted version of what the sounds should have looked like if you had sampled at a sufficiently high rate. Not nice!
Digital Sound Analysis• Digital Bandwidths: In most computer
sound analysis programs, you do not set the bandwidth f directly, but instead set the segment duration, t.
• Instead of setting a time, you indicate t by specifying the number of consecutive sample points to be used for each frequency spectrum in the spectrogram. This is often called “frame size.”
Digital Sound Analysis• Windowing: If you cut a sound directly
into segments (a rectangular window) to make a spectrogram, you introduce artifacts at the beginning and end of each segment.
• This occurs because, with rectangular windows, each segment begins with no sound and is suddenly switched “on” and suddenly “off.” The frequency spectrum of sudden onsets and offsets must contain a wide smear of frequencies.