project 2 (time frequency analyis using windowed fourier transform)

Time Frequency Analysis of sounds using

Windowed Fourier Transform in MATLAB

Author

Jimin Kim

Abstract

The Fourier transform is one of the most effective, and powerful method of analyzing

signals. However, it had a severe drawback of not being able to capture the moment in time

when various frequencies were present in the signal. Windowed Fourier transform offers a

solution to this problem. By adding a translational kernel into the original Fourier transform

equation, Windowed Fourier transform can localize both time and frequency with certain

accuracy. By using this method, signals that have time dependent frequency can be analyzed

with the spectrograms and different types of translational kernel function can be used to

improve the result. In this paper, implementation of Windowed Fourier transform into

MATLAB and its applications to realistic sound signals will be discussed.

Introduction/Overview

The Windowed Fourier transform technique will be applied to two realistic sound

samples: 9 seconds portion of Handel’s ‘Messiah’, and ‘Mary had a little Lamb’ recorded

with both Piano and Recorder. ‘Messiah’ sample will be used to investigate the effect of

different types of translation kernel functions into the signal. Also, the idea of over sampling,

under sampling and width sizes of the kernel will be explored. ‘Mary had a little lamb’

sample will be used to analyze the difference of Piano and Recorder from the spectrogram of

the piece. Also its music score will be reconstructed through the spectrogram analysis. By

carrying out these applications in MATLAB, the goal is to not only learn the usefulness of

Windowed Fourier transform with sound analysis, but also understand the limitation of the

technique in terms of attaining accuracy in both time and frequency domain.

Theoretical Background

Mathematically, Windowed Fourier transform is Fourier transform with slight

modification. Recall that Fourier transform equation states

(1)

Where k is the frequency domain and x is the position (or time) domain. The Windowed

Fourier transform, also known as Gabor transform implements a time translation kernel

(2)

Into the Fourier transform, which then becomes

(3)

Here, the term induces the time localization of the Fourier integral around

. Therefore, as varies in the given time interval, g sweeps through the signal and

picks up the frequency information from each point in time, just as shown in top picture of

the Figure 1. Therefore, it is possible to investigate both time and frequency information in

the signal. However, this technique that enables simultaneous analysis of both time and

frequency domain comes with a price when it comes to accuracy.

Figure 2 well describes the principle behind the Windowed Fourier transform

technique. In time series domain, excellent resolution is obtained in time domain but this

leads to zero resolution in frequency domain. For the frequency series analysis, great

resolution is achieved in the frequency domain but in return, zero resolution is obtained in the

time domain. By introducing the time translational kernel, Windowed Fourier transform

achieves moderate resolution in both time and frequency domain by trading away some

resolution to each other. I.E, if one attempts to improve the time resolution by decreasing the

window size of the kernel, it results in poorer resolution in the frequency domain. In the

contrast, if one attempts to improve the frequency resolution by increasing the window size, it

will result in poorer resolution in the time domain. Therefore, understanding this principle

and selecting a reasonable size of the window is crucial during the time frequency analysis. A

map created by Windowed Fourier transforms that holds both time and frequency information

is called the ‘Spectrogram’.

Figure 1. This figures describes how Windowed Fourier transform is performed. The top picture shows

the overlap between the translational kernel function (red) with the signal. The middle picture shows

the filtered signal at the given timestamp. The bottom picture shows the FFT transform of the filtered

signal.

Many different types of functions can be used as the translational kernel in

Windowed Fourier transform. In this paper, three special functions will be discussed:

Gaussian, Mexican Hat and Shannon function.

Gaussian wavelet

The Gaussian function is the most commonly used wavelet in time frequency analysis. Its

equation follows

(4)

Here, the constant ‘a’ determines the width of the window and ‘ ’ determines the center

location of the function. Hence, this function produces a normal curve with the width ‘a’ and

symmetric about .

Mexican Hat wavelet

The Mexican Hat function is another type of wavelet that is similar to the Gaussian but with

trough on each side of normal distribution, resembling the sombrero. Its function is defined as

(5)

Where is the window size parameter and is the time translational parameter.

Shannon wavelet

The Shannon function is essentially a step function that only has two discrete values

throughout the domain. The function is defined as

Figure 2. The left picture shows how signal is sampled in time series domain. The center picture shows

the sampling in Frequency domain. The right picture shows the sampling in both time and frequency

domains in Windowed Fourier transform technique.

Algorithm implementation/development

The algorithm implementation in MATLAB follows the following sequence of

procedure. By following this general procedure, one can produce spectrograms for both

Handel’s ‘Messiah’ and ‘Mary had a little lamb’ played by Piano and Recorder.

1. Construct the linear space and frequency space that incorporate the sound sample.

First one should construct a framework which all the time frequency analysis will be

based on. Both time domain and frequency domain are needed to create a

spectrogram. Since the portion of ‘Messiah’ that will be analyzed is 9 seconds long

with 8192 samples per seconds for example, one should create a linspace with L=9

and n=73112. Notice that n is not in the power of 2 in this case. It is generally a good

idea to divide the domain with modes of power of 2 but FFT still does the job even

when n is not in power of 2. But one should note that this comes with a price of

decreased efficiency. After creating time domain, define the frequency domain k by

rescaling it to 2pi/L since FFT algorithm assumes 2pi periodic signals. Don’t forget to

fftshift the wave number k so that the plot comes out correctly.

2. Load the sound file.

Once the both time and frequency domain have been defined, load the music sample

(in this case, ‘Handel’) that will be analyzed. Since the original sample is a row

vector, one should transpose the vector so that dimensions are matched when the

Figure 3. Different types of translational kernel function that was used in this paper. From the top, it

shows the Gaussian wavelet, Mexican Hat wavelet and Shannon wavelet.

sample is multiplied by the translational kernel function. Also, the sample has been

divided by 2 to scale it to the right size for filtering.

3. Filter the sound signal (Optional)

If the original sound sample you have is too noisy, (for example, the signal has series

of overtones and noise around the signature frequencies) then filtering the signal prior

to sampling can help producing a cleaner spectrogram. Depending on the ultimate

goal of your time frequency analysis, different types of filter can be applied. In this

paper, a low pass filter has been applied in a purpose of cleaning up the overtones to

obtain a better music score. Procedures for designing a filter will not be discussed in

this paper but one can easily filter a signal by using a MATLAB’s built in filters as

well.

4. Define the sampling rate.

Before the signal can be analyzed, one should define how often the signal will be

sampled. First, create an empty matrix where all the time-frequency information will

be stored after the loop. Next, define the sampling frequency by creating a row vector

with desired increment. In this case, the starting point will be 0 and end point will be

9. To begin with, 0.1 second increment is a nice number since it samples the signal 91

times, which is a reasonable number. However, this value will be changed when we

explore the idea of over sampling and under sampling.

5. Define the time translational kernel function.

Now the signal is ready to be analyzed, one should create a ‘for’ loop that

incorporates the short time Fourier transform. The loop uses the row vector defined at

step 3 as collection of time stamps where the kernel will be centered at. Once the loop

parameter is set, define a translational kernel function. This function can be arbitrary

as it was mentioned earlier, but in this paper, Gaussian, Mexican Hat and Shannon

functions were used. Check the ‘Theoretical Background’ section to find the

mathematical descriptions of these functions. Make sure to include both translational

parameter and window width parameter b.

6. Implement Windowed Fourier transform

Once the function is defined, one should multiply the function to the signal at each

sampling point. Simply define another vector that multiplies the signal and the kernel.

Then create a vector that takes the Fourier transform of the result. Recall we defined

an empty matrix in section 3 where all the time frequency information will be stored.

Define this matrix to hold absolute value of the transformed data with fftshift applied.

The loop then stores the time frequency information from the each loop into the each

column of this matrix. By the end of the loop, this matrix should have a dimension of

91*73112, which is the (sampling number)*(number of samples in the signals). The

loop can end at this point since this matrix will hold all the information needed for

creating a spectrogram.

7. Create a spectrogram.

Once the time frequency matrix has been created, one can use this matrix to create a

nice spectrogram. Make sure to rescale the frequency domain by diving it with 2pi.

This is because when it comes to the sound analysis, the wave number that is

originally defined in terms of angular frequency must be converted into Hz that

describes the sound frequency. Set appropriate range of frequency to analyze different

portion of sound range.

Computation results/Analysis

This section will be divided into two parts: analysis of Handel’s ‘Messiah’ and

analysis of ‘Mary had a little lamb’ piece played by piano and recorder.

Handel’s Messiah

Spectrogram analysis

After following the procedures in previous section, one can obtain the following

spectrogram of the piece. The spectrogram used the Gaussian wavelet with window size -15

and sampling rate of 0.1 seconds. Notice the frequency ranges from about ~250Hz to 4000Hz

but one can also notice the existence of the overtones within the piece. Overtones are related

‘timbre’ of the instrument such that when one plays a certain note at frequency x, an

instrument will generate overtones at 2x, 3x, 4x…and so forth.

Figure 4. The spectrograms of Handel’s Messiah piece using the Gaussian kernel. One can see the

existence of overtones by closely inspecting the spectrogram.

Window size investigation

One can also investigate the effect of modifying the window size of the kernel with

the spectrogram. Figure 5 demonstrates the ‘uncertainty principle’ of Windowed Fourier

transform technique when it comes to attaining resolution in both time and frequency domain.

The left figure has been obtained by setting the window size of the Gaussian wavelet to -5.

Notice that it has good frequency resolution but has poor time resolution. The right figure has

been obtained by setting the window size of the Gaussian wavelet to -25. In this figure,

excellent resolution is achieved in time resolution, but relatively poor resolution in frequency

domain. By experimenting with different window sizes, one should aim to pick the window

size that gives the reasonable resolution in both time and frequency.

Over sampling and under sampling

While window size can be modified by varying the window size parameter ‘a’, the

rate of sampling can be modified by varying the translational parameter ‘ ’. The figure 6

shows the effects of over sampling and under sampling to the spectrogram. The left figure has

been produced by setting the sampling rate to 0.01, which corresponds to total 901 samplings

within the signal. The right figure has been produced by setting the sampling rate to 1, which

corresponds to only 10 samplings within the signal. Notice from the left picture that when the

window size is kept constant and signal is over sampled, it produces great resolutions in both

time and frequency domain. But when signal is under sampled, it results in poor resolutions

in both domains. However, one should be aware that the rate of sampling is directly related to

the efficiency of the code. Therefore, even if over sampling produces a high resolution

spectrogram, one should expect the code to run way slower compared to that incorporates

under sampling. The key idea is to find the sampling rate that gives both reasonable

efficiency of the code and the quality of the spectrogram.

Other types of translational kernel: Mexican Hat and Shannon wavelets

By defining different types of function as translational kernel, one can explore the

spectrograms produced by different types of wavelet.

Figure 5. The spectrograms of Handel’s Messiah piece using the large window size (left) and using the

small window size (right). Notice that using the large window size has great frequency resolution but

misses out on the time resolution. In the contrast, using the small window size has excellent time

resolution but poor frequency resolution.

In this paper, Mexican Hat and Shannon wavelets have been applied to the signal.

The figure 7 and 8 show the spectrogram produced by the Mexican Hat wavelet and

spectrogram produced by the Shannon wavelet. Both wavelets were scaled so that they have

window size of about 1 second length. The sampling rate was kept as 0.1 second. One can

notice that both wavelets produce similar spectrogram generated using Gaussian but they are

different in terms of the resolutions. One can notice that Shannon window picks up more

information in frequency domain than Gaussian does since unlike Gaussian which scales the

most in center frequency, Shannon window scales equally throughout the window. Similar

principle seems to apply with Mexican Hat wavelet. By adding two troughs at the both sides

of the Gaussian wavelet, it picks up more frequency information at each sampling than

Gaussian does.

Figure 6. The spectrograms of Handel’s Messiah piece by over sampling the piece (left) and under

sampling the piece (right). Notice that when window size is kept constant, over sampling results in

great resolutions in both domains while under sampling produces poor resolutions in both time and

frequency.

Figure 7. The spectrogram of Handel’s Messiah piece using the Mexican Hat wavelet

Mary Had a Little Lamb

Filtering the signal

For the purpose of obtaining the clean music score from the spectrograms, it is

important to get rid of the overtones beforehand. This can be done by using the built in

MATLAB low pass filter. The figure 9 shows the comparison of the unfiltered signal and

filtered signal.

Reproduction of the music scores

After filtering the initial signals to remove the overtones, one can produce

Figure 8. The spectrogram of Handel’s Messiah piece using the Shannon wavelet

Figure 9. The comparison of unfiltered and filtered signals of piano (left) and recorder (right). One can

notice that the overall amplitude of the frequencies is reduced after applying the low pass filter.

spectrograms for both piano and recorder sample by following the similar procedures that

were done with Handel’s Messiah. The right pictures of figure 9 shows the spectrogram of the

piece played by piano and the left picture shows the spectrogram of the piece played by

recorder. By using this spectrogram, one can reproduce the music score for both instruments

by converting the frequency value of center frequency of each note into corresponding

musical note.

The music score of each instrument reconstructed from the information in

spectrogram follows.

Piano

320Hz, 285Hz, 255Hz, 285Hz, 320Hz, 320Hz, 320Hz, 285Hz, 285Hz, 285Hz, 320Hz, 320Hz,

320Hz, 320Hz, 285Hz, 255Hz, 285Hz, 320Hz, 320Hz, 320Hz, 320Hz, 285Hz, 285Hz, 320Hz,

285Hz, 255Hz

Which approximately correspond to the music score of

E4, C#4, B3, C#4, E4, E4, E4, C#4, C#4, C#4, E4, E4, E4, E4, C#4, B3, C#4, E4, E4, E4,

E4, C#3, C#3, E4, C#3, B3

This shows that the piano in the sound sample is slightly out of tune.

Recorder

1030Hz, 920Hz, 820Hz, 925Hz, 1040Hz, 1045Hz, 1030Hz, 910Hz, 910Hz, 910Hz, 1030Hz,

1040Hz, 1040Hz, 1040Hz, 910Hz, 810Hz, 910Hz, 1045Hz, 1040Hz, 1030Hz, 1030Hz, 910Hz,

910Hz, 1025Hz, 910Hz, 815Hz

Which approximately correspond to the music score of

C6, A#5, G#5, A#5, C6, C6, C6, A#5, A#5, A#5, C6, C6, C6, C6, A#5, G#5, A#5, C6, C6, C6,

C6, A#5, A#5, C6, A#5, G#5

Hence, one can also notice that the recorder is slightly out of tune as well.

Figure 10. The spectrograms of ‘Mary had a little lamb’ played by piano (left) and recorder (right).

Notice that the frequency range is different for two instruments.

Overtones of each instrument

By comparing the spectrograms of unfiltered signals from both piano and recorder,

one can spot the difference of two instruments in terms of time frequency information. Figure

11 and 12 show the spectrograms for piano and recorder respectively. One can notice that for

a given note, piano keeps the corresponding frequency uniform, leading to almost zero

variation of frequency of the same note.

Figure 11. The spectrogram of unfiltered ‘Mary had a little lamb’ played by piano. Notice the

overtones that are multiples of certain frequencies. Also, the variation of a frequency in a single note is

very small compared to that of the recorder.

Figure 12. The spectrogram of unfiltered ‘Mary had a little lamb’ played by recorder. The overtones

are also present here but not as much as the piano. Instead the variation of the frequency of a single

note is larger than that of piano.

In the contrast for the recorder, one can easily notice the variation of frequency when

the same note is played. However, when it comes to the overtones, one can see that piano has

more overtones than the recorder by closely inspecting the spectrograms.

Summary/Conclusion

By applying the Windowed Fourier transform technique, one can analyze the signals

that have time dependent frequencies. In this paper, a brief theoretical background behind this

technique has been introduced, and the implementation into MATLAB has been discussed

with step by step manner. The application of Windowed Fourier transform with the sound

signals have been explored by performing the analysis on Handel’s ‘Messiah’ and ‘Marry had

a little lamb’ played with two different instruments. By modifying the window size and the

sampling rate, it was possible to understand the ‘uncertainty principle’ of this technique when

it comes to obtaining both time and frequency resolutions. Also, by implementing different

types on functions into the translational kernel, qualitative understanding of their effects on

spectrogram has been achieved. Finally, by using the spectrogram, the music scores of the

sound signals have been reconstructed, and the difference between the sounds produced by

two instruments has been analyzed.

Appendix A

In this section, the MATLAB functions that have been used for analysis are

introduced with brief implementation explanation.

linspace: This function is used to define time and frequency domain/discretization.

fftshift: This function is used to shift the frequency domain data so that plotting is correct.

fft: This function is used to perform Fourier transform on the signal at given timestamp

during the Windowed Fourier transform process.

abs: This function is used to take the absolute value of frequency data produced by fft.

subplot: This function is used to produce multiple plots in a box to keep track of the process

of Windowed Fourier transform.

pcolor: This function is used to create a spectrogram.

colormap: This function is used to define the color spectrum of the spectrogram

xlabel,ylabel: These functions are used to label the x and y axis in the plot.

max: This function is used to locate the maximum value of the frequency data to normalize

the Mexican Hat wavelet.

heaviside: This function is used to create the Shannon function.

wavread: this function is used to convert the wave file into MATLAB compatible vector.

butter: This function is used to define butterworth filter parameters. The butterworth filter is a

built in MATLAB filter that was used to create a low pass filter in this paper.

filter: This function is used to filter the original sound file with butterworth filter parameters.

Appendix B

In this section, the coding for algorithms that are mentioned in this paper is presented.

Handel’s ‘Messiah’ with Gaussian wavelet

clear all; close all; clc

L=9; n=73112; t2=linspace(0,L,n+1); t=t2(1:n); k=(2*pi/L)*[0:n/2-1 -n/2:-1]; ks=fftshift(k);

load handel v = y'/2; vv = v(1:73112);

Sgt_spec=[]; tslide=0:0.1:9; for j=1:length(tslide) g=exp(-15*(t-tslide(j)).^2); % Gaussian Sg=g.*vv; Sgt=fft(Sg); Sgt_spec=[Sgt_spec; abs(fftshift(Sgt))]; subplot(3,1,1), plot(t,vv,'k',t,g,'r') subplot(3,1,2), plot(t,Sg,'k') subplot(3,1,3), plot(ks,abs(fftshift(Sgt))/max(abs(Sgt))) drawnow end

close all;

pcolor(tslide,ks/(2*pi),Sgt_spec.'), shading interp set(gca,'Ylim',[0 4000],'Fontsize',[10]) colormap(hot) xlabel('time(sec)'); ylabel('frequenct(Hz)');

Handel’s ‘Messiah’ with Mexican Hat wavelet

clear all; close all; clc L=9; n=73112; t2=linspace(0,L,n+1); t=t2(1:n); k=(2*pi/L)*[0:n/2-1 -n/2:-1]; ks=fftshift(k);

load handel v = y'/2; vv = v(1:73112); a=0.3;

Sgt_spec=[]; tslide=0:0.1:9; for j=1:length(tslide)

g=2/((sqrt(3)*a)*(pi^1/4))*(1-((t-tslide(j)).^2/a^2)).*exp((-(t-

tslide(j)).^2)/(2*a^2));% Mexican Hat mm=max(g); normm=g/mm; Sg=normm.*vv; Sgt=fft(Sg); Sgt_spec=[Sgt_spec; abs(fftshift(Sgt))]; subplot(3,1,1), plot(t,vv,'k',t,normm,'r') subplot(3,1,2), plot(t,Sg,'k') subplot(3,1,3), plot(ks,abs(fftshift(Sgt))/max(abs(Sgt))) drawnow end

close all; pcolor(tslide,ks/(2*pi),Sgt_spec.'), shading interp set(gca,'Ylim',[0 4000],'Fontsize',[10]) colormap(hot) xlabel('time(sec)'); ylabel('frequenct(Hz)');

Handel’s ‘Messiah’ with Shannon wavelet

clear all; close all; clc L=9; n=73112; t2=linspace(0,L,n+1); t=t2(1:n); k=(2*pi/L)*[0:n/2-1 -n/2:-1]; ks=fftshift(k);

load handel v = y'/2; vv = v(1:73112);

Sgt_spec=[]; tslide=0:0.1:9; for j=tslide h=2*heaviside(t-j)-1; hh=4*heaviside(t-(j+0.5))-1; hhh=heaviside(t-(j+1)); sh=((h-hh)/2+hhh); % Shannon Sg=sh.*vv; Sgt=fft(Sg); Sgt_spec=[Sgt_spec; abs(fftshift(Sgt))]; subplot(3,1,1), plot(t,vv,'k',t,sh,'r') subplot(3,1,2), plot(t,Sg,'k') subplot(3,1,3), plot(ks,abs(fftshift(Sgt))/max(abs(Sgt))) drawnow end


‘Mary had a little lamb’ piano version analysis

clear all; close all; clc;


tr_piano=16; % record time in seconds y=wavread('music1'); Fs=length(y)/tr_piano; Mary=y';

[B,A]=butter(2,0.1,'low'); CMary=filter(B,A,Mary);

Sgt_spec=[]; tslide=0:0.5:16; for j=1:length(tslide) g=exp(-20*(t-tslide(j)).^2); % Gaussian Sg=g.*CMary; Sgt=fft(Sg); Sgt_spec=[Sgt_spec; abs(fftshift(Sgt))]; subplot(3,1,1), plot(t,CMary,'k',t,g,'r') subplot(3,1,2), plot(t,Sg,'k') subplot(3,1,3), plot(ks,abs(fftshift(Sgt))/max(abs(Sgt))) drawnow end


‘Mary had a little lamb’ recorder version analysis

clear all; close all; clc;


tr_rec=14; % record time in seconds y=wavread('music2'); Fs=length(y)/tr_rec; Mary=y';

[B,A]=butter(2,0.1,'low'); CMary=filter(B,A,Mary);

Sgt_spec=[]; tslide=0:0.5:14; for j=1:length(tslide) g=exp(-20*(t-tslide(j)).^2); % Gaussian Sg=g.*CMary; Sgt=fft(Sg); Sgt_spec=[Sgt_spec; abs(fftshift(Sgt))]; subplot(3,1,1), plot(t,CMary,'k',t,g,'r') subplot(3,1,2), plot(t,Sg,'k')

subplot(3,1,3), plot(ks,abs(fftshift(Sgt))/max(abs(Sgt))) drawnow end


project 2 (time frequency analyis using windowed fourier transform)

Documents

windowed fourier transform

time domain

frequency domain

time dependent frequency

time series domain

usefulness of windowed

time localization

time translational kernel