project 2 (time frequency analyis using windowed fourier transform)
TRANSCRIPT
Time Frequency Analysis of sounds using
Windowed Fourier Transform in MATLAB
Author
Jimin Kim
Abstract
The Fourier transform is one of the most effective, and powerful method of analyzing
signals. However, it had a severe drawback of not being able to capture the moment in time
when various frequencies were present in the signal. Windowed Fourier transform offers a
solution to this problem. By adding a translational kernel into the original Fourier transform
equation, Windowed Fourier transform can localize both time and frequency with certain
accuracy. By using this method, signals that have time dependent frequency can be analyzed
with the spectrograms and different types of translational kernel function can be used to
improve the result. In this paper, implementation of Windowed Fourier transform into
MATLAB and its applications to realistic sound signals will be discussed.
Introduction/Overview
The Windowed Fourier transform technique will be applied to two realistic sound
samples: 9 seconds portion of Handel’s ‘Messiah’, and ‘Mary had a little Lamb’ recorded
with both Piano and Recorder. ‘Messiah’ sample will be used to investigate the effect of
different types of translation kernel functions into the signal. Also, the idea of over sampling,
under sampling and width sizes of the kernel will be explored. ‘Mary had a little lamb’
sample will be used to analyze the difference of Piano and Recorder from the spectrogram of
the piece. Also its music score will be reconstructed through the spectrogram analysis. By
carrying out these applications in MATLAB, the goal is to not only learn the usefulness of
Windowed Fourier transform with sound analysis, but also understand the limitation of the
technique in terms of attaining accuracy in both time and frequency domain.
Theoretical Background
Mathematically, Windowed Fourier transform is Fourier transform with slight
modification. Recall that Fourier transform equation states
(1)
Where k is the frequency domain and x is the position (or time) domain. The Windowed
Fourier transform, also known as Gabor transform implements a time translation kernel
(2)
Into the Fourier transform, which then becomes
(3)
Here, the term induces the time localization of the Fourier integral around
. Therefore, as varies in the given time interval, g sweeps through the signal and
picks up the frequency information from each point in time, just as shown in top picture of
the Figure 1. Therefore, it is possible to investigate both time and frequency information in
the signal. However, this technique that enables simultaneous analysis of both time and
frequency domain comes with a price when it comes to accuracy.
Figure 2 well describes the principle behind the Windowed Fourier transform
technique. In time series domain, excellent resolution is obtained in time domain but this
leads to zero resolution in frequency domain. For the frequency series analysis, great
resolution is achieved in the frequency domain but in return, zero resolution is obtained in the
time domain. By introducing the time translational kernel, Windowed Fourier transform
achieves moderate resolution in both time and frequency domain by trading away some
resolution to each other. I.E, if one attempts to improve the time resolution by decreasing the
window size of the kernel, it results in poorer resolution in the frequency domain. In the
contrast, if one attempts to improve the frequency resolution by increasing the window size, it
will result in poorer resolution in the time domain. Therefore, understanding this principle
and selecting a reasonable size of the window is crucial during the time frequency analysis. A
map created by Windowed Fourier transforms that holds both time and frequency information
is called the ‘Spectrogram’.
Figure 1. This figures describes how Windowed Fourier transform is performed. The top picture shows
the overlap between the translational kernel function (red) with the signal. The middle picture shows
the filtered signal at the given timestamp. The bottom picture shows the FFT transform of the filtered
signal.
Many different types of functions can be used as the translational kernel in
Windowed Fourier transform. In this paper, three special functions will be discussed:
Gaussian, Mexican Hat and Shannon function.
Gaussian wavelet
The Gaussian function is the most commonly used wavelet in time frequency analysis. Its
equation follows
(4)
Here, the constant ‘a’ determines the width of the window and ‘ ’ determines the center
location of the function. Hence, this function produces a normal curve with the width ‘a’ and
symmetric about .
Mexican Hat wavelet
The Mexican Hat function is another type of wavelet that is similar to the Gaussian but with
trough on each side of normal distribution, resembling the sombrero. Its function is defined as
(5)
Where is the window size parameter and is the time translational parameter.
Shannon wavelet
The Shannon function is essentially a step function that only has two discrete values
throughout the domain. The function is defined as
Figure 2. The left picture shows how signal is sampled in time series domain. The center picture shows
the sampling in Frequency domain. The right picture shows the sampling in both time and frequency
domains in Windowed Fourier transform technique.
Algorithm implementation/development
The algorithm implementation in MATLAB follows the following sequence of
procedure. By following this general procedure, one can produce spectrograms for both
Handel’s ‘Messiah’ and ‘Mary had a little lamb’ played by Piano and Recorder.
1. Construct the linear space and frequency space that incorporate the sound sample.
First one should construct a framework which all the time frequency analysis will be
based on. Both time domain and frequency domain are needed to create a
spectrogram. Since the portion of ‘Messiah’ that will be analyzed is 9 seconds long
with 8192 samples per seconds for example, one should create a linspace with L=9
and n=73112. Notice that n is not in the power of 2 in this case. It is generally a good
idea to divide the domain with modes of power of 2 but FFT still does the job even
when n is not in power of 2. But one should note that this comes with a price of
decreased efficiency. After creating time domain, define the frequency domain k by
rescaling it to 2pi/L since FFT algorithm assumes 2pi periodic signals. Don’t forget to
fftshift the wave number k so that the plot comes out correctly.
2. Load the sound file.
Once the both time and frequency domain have been defined, load the music sample
(in this case, ‘Handel’) that will be analyzed. Since the original sample is a row
vector, one should transpose the vector so that dimensions are matched when the
Figure 3. Different types of translational kernel function that was used in this paper. From the top, it
shows the Gaussian wavelet, Mexican Hat wavelet and Shannon wavelet.
sample is multiplied by the translational kernel function. Also, the sample has been
divided by 2 to scale it to the right size for filtering.
3. Filter the sound signal (Optional)
If the original sound sample you have is too noisy, (for example, the signal has series
of overtones and noise around the signature frequencies) then filtering the signal prior
to sampling can help producing a cleaner spectrogram. Depending on the ultimate
goal of your time frequency analysis, different types of filter can be applied. In this
paper, a low pass filter has been applied in a purpose of cleaning up the overtones to
obtain a better music score. Procedures for designing a filter will not be discussed in
this paper but one can easily filter a signal by using a MATLAB’s built in filters as
well.
4. Define the sampling rate.
Before the signal can be analyzed, one should define how often the signal will be
sampled. First, create an empty matrix where all the time-frequency information will
be stored after the loop. Next, define the sampling frequency by creating a row vector
with desired increment. In this case, the starting point will be 0 and end point will be
9. To begin with, 0.1 second increment is a nice number since it samples the signal 91
times, which is a reasonable number. However, this value will be changed when we
explore the idea of over sampling and under sampling.
5. Define the time translational kernel function.
Now the signal is ready to be analyzed, one should create a ‘for’ loop that
incorporates the short time Fourier transform. The loop uses the row vector defined at
step 3 as collection of time stamps where the kernel will be centered at. Once the loop
parameter is set, define a translational kernel function. This function can be arbitrary
as it was mentioned earlier, but in this paper, Gaussian, Mexican Hat and Shannon
functions were used. Check the ‘Theoretical Background’ section to find the
mathematical descriptions of these functions. Make sure to include both translational
parameter and window width parameter b.
6. Implement Windowed Fourier transform
Once the function is defined, one should multiply the function to the signal at each
sampling point. Simply define another vector that multiplies the signal and the kernel.
Then create a vector that takes the Fourier transform of the result. Recall we defined
an empty matrix in section 3 where all the time frequency information will be stored.
Define this matrix to hold absolute value of the transformed data with fftshift applied.
The loop then stores the time frequency information from the each loop into the each
column of this matrix. By the end of the loop, this matrix should have a dimension of
91*73112, which is the (sampling number)*(number of samples in the signals). The
loop can end at this point since this matrix will hold all the information needed for
creating a spectrogram.
7. Create a spectrogram.
Once the time frequency matrix has been created, one can use this matrix to create a
nice spectrogram. Make sure to rescale the frequency domain by diving it with 2pi.
This is because when it comes to the sound analysis, the wave number that is
originally defined in terms of angular frequency must be converted into Hz that
describes the sound frequency. Set appropriate range of frequency to analyze different
portion of sound range.
Computation results/Analysis
This section will be divided into two parts: analysis of Handel’s ‘Messiah’ and
analysis of ‘Mary had a little lamb’ piece played by piano and recorder.
Handel’s Messiah
Spectrogram analysis
After following the procedures in previous section, one can obtain the following
spectrogram of the piece. The spectrogram used the Gaussian wavelet with window size -15
and sampling rate of 0.1 seconds. Notice the frequency ranges from about ~250Hz to 4000Hz
but one can also notice the existence of the overtones within the piece. Overtones are related
‘timbre’ of the instrument such that when one plays a certain note at frequency x, an
instrument will generate overtones at 2x, 3x, 4x…and so forth.
Figure 4. The spectrograms of Handel’s Messiah piece using the Gaussian kernel. One can see the
existence of overtones by closely inspecting the spectrogram.
Window size investigation
One can also investigate the effect of modifying the window size of the kernel with
the spectrogram. Figure 5 demonstrates the ‘uncertainty principle’ of Windowed Fourier
transform technique when it comes to attaining resolution in both time and frequency domain.
The left figure has been obtained by setting the window size of the Gaussian wavelet to -5.
Notice that it has good frequency resolution but has poor time resolution. The right figure has
been obtained by setting the window size of the Gaussian wavelet to -25. In this figure,
excellent resolution is achieved in time resolution, but relatively poor resolution in frequency
domain. By experimenting with different window sizes, one should aim to pick the window
size that gives the reasonable resolution in both time and frequency.
Over sampling and under sampling
While window size can be modified by varying the window size parameter ‘a’, the
rate of sampling can be modified by varying the translational parameter ‘ ’. The figure 6
shows the effects of over sampling and under sampling to the spectrogram. The left figure has
been produced by setting the sampling rate to 0.01, which corresponds to total 901 samplings
within the signal. The right figure has been produced by setting the sampling rate to 1, which
corresponds to only 10 samplings within the signal. Notice from the left picture that when the
window size is kept constant and signal is over sampled, it produces great resolutions in both
time and frequency domain. But when signal is under sampled, it results in poor resolutions
in both domains. However, one should be aware that the rate of sampling is directly related to
the efficiency of the code. Therefore, even if over sampling produces a high resolution
spectrogram, one should expect the code to run way slower compared to that incorporates
under sampling. The key idea is to find the sampling rate that gives both reasonable
efficiency of the code and the quality of the spectrogram.
Other types of translational kernel: Mexican Hat and Shannon wavelets
By defining different types of function as translational kernel, one can explore the
spectrograms produced by different types of wavelet.
Figure 5. The spectrograms of Handel’s Messiah piece using the large window size (left) and using the
small window size (right). Notice that using the large window size has great frequency resolution but
misses out on the time resolution. In the contrast, using the small window size has excellent time
resolution but poor frequency resolution.
In this paper, Mexican Hat and Shannon wavelets have been applied to the signal.
The figure 7 and 8 show the spectrogram produced by the Mexican Hat wavelet and
spectrogram produced by the Shannon wavelet. Both wavelets were scaled so that they have
window size of about 1 second length. The sampling rate was kept as 0.1 second. One can
notice that both wavelets produce similar spectrogram generated using Gaussian but they are
different in terms of the resolutions. One can notice that Shannon window picks up more
information in frequency domain than Gaussian does since unlike Gaussian which scales the
most in center frequency, Shannon window scales equally throughout the window. Similar
principle seems to apply with Mexican Hat wavelet. By adding two troughs at the both sides
of the Gaussian wavelet, it picks up more frequency information at each sampling than
Gaussian does.
Figure 6. The spectrograms of Handel’s Messiah piece by over sampling the piece (left) and under
sampling the piece (right). Notice that when window size is kept constant, over sampling results in
great resolutions in both domains while under sampling produces poor resolutions in both time and
frequency.
Figure 7. The spectrogram of Handel’s Messiah piece using the Mexican Hat wavelet
Mary Had a Little Lamb
Filtering the signal
For the purpose of obtaining the clean music score from the spectrograms, it is
important to get rid of the overtones beforehand. This can be done by using the built in
MATLAB low pass filter. The figure 9 shows the comparison of the unfiltered signal and
filtered signal.
Reproduction of the music scores
After filtering the initial signals to remove the overtones, one can produce
Figure 8. The spectrogram of Handel’s Messiah piece using the Shannon wavelet
Figure 9. The comparison of unfiltered and filtered signals of piano (left) and recorder (right). One can
notice that the overall amplitude of the frequencies is reduced after applying the low pass filter.
spectrograms for both piano and recorder sample by following the similar procedures that
were done with Handel’s Messiah. The right pictures of figure 9 shows the spectrogram of the
piece played by piano and the left picture shows the spectrogram of the piece played by
recorder. By using this spectrogram, one can reproduce the music score for both instruments
by converting the frequency value of center frequency of each note into corresponding
musical note.
The music score of each instrument reconstructed from the information in
spectrogram follows.
Piano
320Hz, 285Hz, 255Hz, 285Hz, 320Hz, 320Hz, 320Hz, 285Hz, 285Hz, 285Hz, 320Hz, 320Hz,
320Hz, 320Hz, 285Hz, 255Hz, 285Hz, 320Hz, 320Hz, 320Hz, 320Hz, 285Hz, 285Hz, 320Hz,
285Hz, 255Hz
Which approximately correspond to the music score of
E4, C#4, B3, C#4, E4, E4, E4, C#4, C#4, C#4, E4, E4, E4, E4, C#4, B3, C#4, E4, E4, E4,
E4, C#3, C#3, E4, C#3, B3
This shows that the piano in the sound sample is slightly out of tune.
Recorder
1030Hz, 920Hz, 820Hz, 925Hz, 1040Hz, 1045Hz, 1030Hz, 910Hz, 910Hz, 910Hz, 1030Hz,
1040Hz, 1040Hz, 1040Hz, 910Hz, 810Hz, 910Hz, 1045Hz, 1040Hz, 1030Hz, 1030Hz, 910Hz,
910Hz, 1025Hz, 910Hz, 815Hz
Which approximately correspond to the music score of
C6, A#5, G#5, A#5, C6, C6, C6, A#5, A#5, A#5, C6, C6, C6, C6, A#5, G#5, A#5, C6, C6, C6,
C6, A#5, A#5, C6, A#5, G#5
Hence, one can also notice that the recorder is slightly out of tune as well.
Figure 10. The spectrograms of ‘Mary had a little lamb’ played by piano (left) and recorder (right).
Notice that the frequency range is different for two instruments.
Overtones of each instrument
By comparing the spectrograms of unfiltered signals from both piano and recorder,
one can spot the difference of two instruments in terms of time frequency information. Figure
11 and 12 show the spectrograms for piano and recorder respectively. One can notice that for
a given note, piano keeps the corresponding frequency uniform, leading to almost zero
variation of frequency of the same note.
Figure 11. The spectrogram of unfiltered ‘Mary had a little lamb’ played by piano. Notice the
overtones that are multiples of certain frequencies. Also, the variation of a frequency in a single note is
very small compared to that of the recorder.
Figure 12. The spectrogram of unfiltered ‘Mary had a little lamb’ played by recorder. The overtones
are also present here but not as much as the piano. Instead the variation of the frequency of a single
note is larger than that of piano.
In the contrast for the recorder, one can easily notice the variation of frequency when
the same note is played. However, when it comes to the overtones, one can see that piano has
more overtones than the recorder by closely inspecting the spectrograms.
Summary/Conclusion
By applying the Windowed Fourier transform technique, one can analyze the signals
that have time dependent frequencies. In this paper, a brief theoretical background behind this
technique has been introduced, and the implementation into MATLAB has been discussed
with step by step manner. The application of Windowed Fourier transform with the sound
signals have been explored by performing the analysis on Handel’s ‘Messiah’ and ‘Marry had
a little lamb’ played with two different instruments. By modifying the window size and the
sampling rate, it was possible to understand the ‘uncertainty principle’ of this technique when
it comes to obtaining both time and frequency resolutions. Also, by implementing different
types on functions into the translational kernel, qualitative understanding of their effects on
spectrogram has been achieved. Finally, by using the spectrogram, the music scores of the
sound signals have been reconstructed, and the difference between the sounds produced by
two instruments has been analyzed.
Appendix A
In this section, the MATLAB functions that have been used for analysis are
introduced with brief implementation explanation.
linspace: This function is used to define time and frequency domain/discretization.
fftshift: This function is used to shift the frequency domain data so that plotting is correct.
fft: This function is used to perform Fourier transform on the signal at given timestamp
during the Windowed Fourier transform process.
abs: This function is used to take the absolute value of frequency data produced by fft.
subplot: This function is used to produce multiple plots in a box to keep track of the process
of Windowed Fourier transform.
pcolor: This function is used to create a spectrogram.
colormap: This function is used to define the color spectrum of the spectrogram
xlabel,ylabel: These functions are used to label the x and y axis in the plot.
max: This function is used to locate the maximum value of the frequency data to normalize
the Mexican Hat wavelet.
heaviside: This function is used to create the Shannon function.
wavread: this function is used to convert the wave file into MATLAB compatible vector.
butter: This function is used to define butterworth filter parameters. The butterworth filter is a
built in MATLAB filter that was used to create a low pass filter in this paper.
filter: This function is used to filter the original sound file with butterworth filter parameters.
Appendix B
In this section, the coding for algorithms that are mentioned in this paper is presented.
Handel’s ‘Messiah’ with Gaussian wavelet
clear all; close all; clc
L=9; n=73112; t2=linspace(0,L,n+1); t=t2(1:n); k=(2*pi/L)*[0:n/2-1 -n/2:-1]; ks=fftshift(k);
load handel v = y'/2; vv = v(1:73112);
Sgt_spec=[]; tslide=0:0.1:9; for j=1:length(tslide) g=exp(-15*(t-tslide(j)).^2); % Gaussian Sg=g.*vv; Sgt=fft(Sg); Sgt_spec=[Sgt_spec; abs(fftshift(Sgt))]; subplot(3,1,1), plot(t,vv,'k',t,g,'r') subplot(3,1,2), plot(t,Sg,'k') subplot(3,1,3), plot(ks,abs(fftshift(Sgt))/max(abs(Sgt))) drawnow end
close all;
pcolor(tslide,ks/(2*pi),Sgt_spec.'), shading interp set(gca,'Ylim',[0 4000],'Fontsize',[10]) colormap(hot) xlabel('time(sec)'); ylabel('frequenct(Hz)');
Handel’s ‘Messiah’ with Mexican Hat wavelet
clear all; close all; clc L=9; n=73112; t2=linspace(0,L,n+1); t=t2(1:n); k=(2*pi/L)*[0:n/2-1 -n/2:-1]; ks=fftshift(k);
load handel v = y'/2; vv = v(1:73112); a=0.3;
Sgt_spec=[]; tslide=0:0.1:9; for j=1:length(tslide)
g=2/((sqrt(3)*a)*(pi^1/4))*(1-((t-tslide(j)).^2/a^2)).*exp((-(t-
tslide(j)).^2)/(2*a^2));% Mexican Hat mm=max(g); normm=g/mm; Sg=normm.*vv; Sgt=fft(Sg); Sgt_spec=[Sgt_spec; abs(fftshift(Sgt))]; subplot(3,1,1), plot(t,vv,'k',t,normm,'r') subplot(3,1,2), plot(t,Sg,'k') subplot(3,1,3), plot(ks,abs(fftshift(Sgt))/max(abs(Sgt))) drawnow end
close all; pcolor(tslide,ks/(2*pi),Sgt_spec.'), shading interp set(gca,'Ylim',[0 4000],'Fontsize',[10]) colormap(hot) xlabel('time(sec)'); ylabel('frequenct(Hz)');
Handel’s ‘Messiah’ with Shannon wavelet
clear all; close all; clc L=9; n=73112; t2=linspace(0,L,n+1); t=t2(1:n); k=(2*pi/L)*[0:n/2-1 -n/2:-1]; ks=fftshift(k);
load handel v = y'/2; vv = v(1:73112);
Sgt_spec=[]; tslide=0:0.1:9; for j=tslide h=2*heaviside(t-j)-1; hh=4*heaviside(t-(j+0.5))-1; hhh=heaviside(t-(j+1)); sh=((h-hh)/2+hhh); % Shannon Sg=sh.*vv; Sgt=fft(Sg); Sgt_spec=[Sgt_spec; abs(fftshift(Sgt))]; subplot(3,1,1), plot(t,vv,'k',t,sh,'r') subplot(3,1,2), plot(t,Sg,'k') subplot(3,1,3), plot(ks,abs(fftshift(Sgt))/max(abs(Sgt))) drawnow end
close all; pcolor(tslide,ks/(2*pi),Sgt_spec.'), shading interp set(gca,'Ylim',[0 4000],'Fontsize',[10]) colormap(hot) xlabel('time(sec)'); ylabel('frequenct(Hz)');
‘Mary had a little lamb’ piano version analysis
clear all; close all; clc;
L=16; n=701440; t2=linspace(0,L,n+1); t=t2(1:n); k=(2*pi/L)*[0:n/2-1 -n/2:-1]; ks=fftshift(k);
tr_piano=16; % record time in seconds y=wavread('music1'); Fs=length(y)/tr_piano; Mary=y';
[B,A]=butter(2,0.1,'low'); CMary=filter(B,A,Mary);
Sgt_spec=[]; tslide=0:0.5:16; for j=1:length(tslide) g=exp(-20*(t-tslide(j)).^2); % Gaussian Sg=g.*CMary; Sgt=fft(Sg); Sgt_spec=[Sgt_spec; abs(fftshift(Sgt))]; subplot(3,1,1), plot(t,CMary,'k',t,g,'r') subplot(3,1,2), plot(t,Sg,'k') subplot(3,1,3), plot(ks,abs(fftshift(Sgt))/max(abs(Sgt))) drawnow end
close all; pcolor(tslide,ks/(2*pi),Sgt_spec.'), shading interp set(gca,'Ylim',[200 400],'Fontsize',[10]) colormap(hot) xlabel('time(sec)'); ylabel('frequenct(Hz)');
‘Mary had a little lamb’ recorder version analysis
clear all; close all; clc;
L=14; n=627712; t2=linspace(0,L,n+1); t=t2(1:n); k=(2*pi/L)*[0:n/2-1 -n/2:-1]; ks=fftshift(k);
tr_rec=14; % record time in seconds y=wavread('music2'); Fs=length(y)/tr_rec; Mary=y';
[B,A]=butter(2,0.1,'low'); CMary=filter(B,A,Mary);
Sgt_spec=[]; tslide=0:0.5:14; for j=1:length(tslide) g=exp(-20*(t-tslide(j)).^2); % Gaussian Sg=g.*CMary; Sgt=fft(Sg); Sgt_spec=[Sgt_spec; abs(fftshift(Sgt))]; subplot(3,1,1), plot(t,CMary,'k',t,g,'r') subplot(3,1,2), plot(t,Sg,'k')
subplot(3,1,3), plot(ks,abs(fftshift(Sgt))/max(abs(Sgt))) drawnow end
close all; pcolor(tslide,ks/(2*pi),Sgt_spec.'), shading interp set(gca,'Ylim',[700 1100],'Fontsize',[10]) colormap(hot) xlabel('time(sec)'); ylabel('frequenct(Hz)');