l29: fourier analysis - texas a&m university | college station, tx
TRANSCRIPT
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 1
L29: Fourier analysis
β’ Introduction
β’ The discrete Fourier Transform (DFT)
β’ The DFT matrix
β’ The Fast Fourier Transform (FFT)
β’ The Short-time Fourier Transform (STFT)
β’ Fourier Descriptors
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 2
Introduction
β’ Similarity between time series β Suppose that you are to determine whether two time series π₯(π) and π¦(π) are similar
β One measure of alignment is the inner product of the two signals
π₯, π¦ = π₯ π π¦ π
π
β’ If the inner product is large, then the two signals are very much in in alignment
β’ If the inner product is zero, the two signals are orthogonal
x(k)
y(k)
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 3
β The Euclidean distance is another measure of (dis)similarity π₯ β π¦ 2 = π₯ 2 β 2 π₯, π¦ + π¦ 2
β’ Note that, if we assume that the two signals have unit norm
π₯ 2 = π¦ 2 = 1
β’ then the Euclidean distance and the inner product are equivalent
β Small distance β large inner product
β Large distance β small inner product
β’ For this reason, we will use the inner product for the rest of this lecture
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 4
β’ Example β Assume the following time series
π₯ = β¦1,1, β1,β1,1,1 β 1,β1,1,1 β 1,β1,β¦ π¦ = β¦1,β1,1, β1,1, β1,1, β1,1, β1,1, β1,β¦
β Compute their inner product
β’ What can you say about their degree of similarity?
β How about the degree of similarity with the signal π§ below? π§ = β¦ , 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, β¦
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 5
β Since all inner products are zero, the three signals (x,y,z) are orthogonal, and therefore independent
β Thus, linear combinations of these signals defines a subspace with three dimensions
π’ = π1π₯ + π2π¦ + π3π§
x
y
z
u
v
a1
a2
a3
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 6
β Likewise, two sine waves (shown below) are orthogonal whenever their frequencies are different (π1 β π2)
π₯ π‘ = sin 2ππ1π‘ π¦ π‘ = sin 2ππ2π‘
β As we will see, a family of sine functions (for all possible frequencies ππ) is at the core of Fourier analysis
β Since sine waves are orthogonal, the analysis is dramatically simplified (e.g., a unique representation exists for every conceivable signal)
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 7
Cross-correlation and autocorrelation
β’ Definition β The inner-product operator allows us to define the cross-correlation
between two continuous signals π₯(π‘) and π¦(π‘) as:
π π₯π¦ π = π₯ π π¦ π + π
β
π=ββ
β’ where π is a shift applied to π¦(π‘)
β Or, for continuous-time signals
π π₯π¦ π = π₯ π‘ π¦ π‘ + πβ
ββ
ππ‘ = π₯ π‘ , π¦(π‘ + π)
β When the cross-correlation is applied to a signal and a copy of itself, it is called the autocorrelation
π π₯π₯ π = π₯ π π₯ π + π
β
π=ββ
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 8
β’ Example β Recall the two signals on slide 2
β The cross correlation function reveals that one signal is very close (in our case identical ) to a delayed version of the other
x(k)
y(k-100)
-200 -150 -100 -50 0 50 100 150 200 -0.4
-0.2
0
0.2
0.4
0.6
0.8
1
Rxy()
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 9
β’ Example II
100 200 300 400 500 600 700 800 900 1000
-3
-2
-1
0
1
2
-200 -150 -100 -50 0 50 100 150 200
0
0.2
0.4
0.6
0.8
1
Rxx()
k
x(k)
x(k) = randn(1000,1) x(k) = 100sin(20.01k)
100 200 300 400 500 600 700 800 900 1000 -100
-50
0
50
100
-200 -150 -100 -50 0 50 100 150 200
-0.5
0
0.5
1
Rxx()
k
x(k)
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 10
The Fourier Transform
β’ In Fourier analysis, one represents a signal with a family of sinusoidal functions β Recall from a few slides back that sine waves of different frequencies
are orthogonal, so this representation is unique to each signal
β Fourier analysis transforms the signal from a βtime-domainβ representation π₯(π‘) into a βfrequency-domainβ representation π(π)
β The collection of values of π(π) at each and every frequency π is called the spectrum of π₯(π‘)
β’ Mathematically, the Fourier Transform is defined as
π π = π₯ π‘ πβπ2πftππ‘ = π₯ π‘ , ππ2πftβ
ββ
β which you can recognize as the inner product between our signal π₯(π‘) and the complex sine wave ππ2πft
β’ Recall Eulerβs formula πΒ±ππ = cos π Β± π sin π
β’ And the inner product of functions π and π being defined as
π, π = ππβππ‘β
ββ
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 11
β’ Interpretation of the Fourier Transform β The Fourier Transform π(π) is defined for each and every frequency π
β’ Each term in π(π) represents the inner product of our signal π₯(π‘) with a sine wave of frequency π
β’ π(π) is a complex number with magnitude π and phase π , which represent the sine wave that is βclosestβ to π₯(π‘)
β’ Because the sine waves are orthogonal, their magnitudes m represent the amount of frequency π that is present in π₯(π‘)
β The collection of values of π(π) for every frequency (each defined by a magnitude π and phase π) is called the spectrum of π₯(π‘)
β The Fourier Transform is lossless and invertible, which means that the original signal π₯(π‘) can be perfectly reconstructed from π(π)
β’ This reconstruction is achieved by means of the INVERSE Fourier transform
π₯ π‘ = π π ππ2πftππ = π π , πβπ2πftβ
ββ
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 12
β’ The Fourier Transform as a sound βprismβ
[Sethares (2007). Rhythms and transforms]
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 13
The Discrete Fourier Transform
β’ The DFT differs from the Fourier Transform in three respects β It applies to discrete-time sequences π₯[π] = π₯(ππ), where π is the
sampling period of a continuous-time signal π₯(π‘)
β Because we operate in discrete time, the frequency representation is also discrete, and the transform is a summation rather than an integral
β Finally, we work with a finite data record (i.e., we do not have access to the value of the signal for π β β)
β’ Mathematically, the DFT is defined as
π π = π₯ π πβπ2ππ ππ
πβ1
π=0
= π₯ π , πβπ2ππ ππ , π = 0,1,2β¦π β 1
β So the DFT is (again) the inner product of our signal π₯[π] with a sine wave
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 14
Frequency vs. time resolution β’ The DFT is only defined at frequency multiples of 2π /π, which
can be thought of as a βfundamental frequencyβ β NOTE: 2π radians correspond to the sampling frequency in Hz
β Therefore, for a given window size, the frequency resolution of the DFT is
Ξπ = ππ β ππβ1 = π2π
πβ π β 1
2π
π=
2π
π=
π πππππππ πππ‘π (π»π§)
π€πππππ€ π ππ§π #π π.
β So, the longer the recording, the better the frequency resolution
β’ Why not then use long analysis windows? β Because longer windows reduce the
temporal resolution of frequency events
β Therefore, there is a trade-off between spectral resolution (long windows) and temporal resolution (shorter windows)
β NOTE: Zero-padding can be used to increase the smoothness (or apparent resolution) of the DFT spectrum, but not its true resolution, which remains limited by the length of the original (unpadded) signal
[Sethares, 2007]
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 15
β’ The DFT matrix β Let us denote the βfundamental frequencyβ signal as
ππ = πβπ2ππ = cos 2π π β π sin 2π π
β Then, the DFT can be expressed as
π π = π₯ π ππππ
πβ1
π=0
β Or, using matrix notation, as
π 0π 1π 2
π π β 1
=
1 1 1 1 11 ππ ππ
2 πππβ1
1 ππ2 ππ
4 ππ2(πβ1)
1 πππβ1 ππ
2(πβ1)ππ
(πβ1)(πβ1)
π₯ 0π₯ 1π₯ 2
π₯ π β 1
β So the DFT can also be thought of as a projection of the time series data by means of a complex-valued matrix
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 16
β’ Symmetry of the DFT matrix β Note that the k-th row of the DFT matrix consist of a unitary vector
rotating clockwise with a constant increment of 2ππ/π
β The second and last row are complex conjugates
β The third and second-to-last are complex conjugatesβ¦
X[0]
X[1]
X[2]
X[3]
X[4]
X[5]
X[6]
X[7]
X[8]
x[0]
x[1]
x[2]
x[3]
x[4]
x[5]
x[6]
x[7]
x[8]
=
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 17
β’ Interpretation of the DFT β So, expressing these rotating unitary vectors in terms of the underlying
sine waves, we obtain
β where the solid line represents the real part and the dashed line represent the imaginary part of the corresponding sine wave
β Note how this illustration brings us back to the definition of the DFT as an inner product between our signal π₯[π] and a complex sine wave
Illustration borrowed from http://en.wikipedia.org/wiki/DFT_matrix
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 18
β’ Example I β Sampling rate πΉπ = 2ππ»π§
β Signal π₯(π‘) = sin (2π10π‘)
β Recording length 1 sec
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -2
-1
0
1
2
t (sec) x(
t)
0 200 400 600 800 1000 1200 1400 1600 1800 2000 -10
-5
0
5
10
f (Hz)
|X(f
)|
2 πΉπ/2 (Nyquist rate)
0 20 40 60 80 100 -10
-5
0
5
Symmetry around πΉπ/2
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 19
β’ Example II β Sampling rate πΉπ = 2ππ»π§
β Signal π₯(π‘) = 10sin (2π10π‘) + 3sin (2π100π‘)
β Recording length 1 sec
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -15
-10
-5
0
5
10
15
t (sec)
π₯(π‘)
0 20 40 60 80 100 120 140 160 180 200 -10
-5
0
5
10
f (Hz)
|π(π)|
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 20
The Fast Fourier Transform
β’ Definition β The FFT refers to any fast algorithm for computing the DFT
β The DFT runs in π(π2), whereas FFT algorithms run in π(ππππ2π)
β Several FFT algorithms exists, but the most widely used are radix-2 algorithms, which require π = 2π
β’ When the number of data points is not a power of 2, it is then just a matter of padding the sequence π₯[π] with zeros
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 21
β’ What happens when the signal is not stationary? β As we saw a few slides back, if the DFT/FFT is applied to the entire
signal, we will be unable to resolve the spectral changes over time
β Instead, we can divide the signal into βchunksβ, and apply the DFT/FFT to each one of them
β This strategy is known as the Short-Time Fourier Transform (STFT), and the resulting time-frequency representation is known as a spectrogram
β’ The SFTF preserves both temporal and spectral information β By adjusting the size of the βchunksβ, the STFT provides a tradeoff
between
β Perfect temporal resolution, as given by the original signal π₯(π‘)
β Perfect spectral resolution, as obtained by the Fourier Transform π(π)
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 22
β’ The SFTF is performed as follows β Define an analysis window size
(e.g., 30 ms for narrowband, 5 ms for wideband)
β Define the amount of overlap between windows (e.g., 30%)
β Define a windowing function (e.g., Hann, Gaussian)
β Generate windowed segments (by multiplying signal with the windowing function)
β Apply the FFT to each windowed segment
[Sethares, 2007]
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 23
β’ Windowing β The window function serves several purposes
β’ It localizes the Fourier Transform in time, by considering only a short time interval in the signal
β’ By having a smooth shape, it minimizes the effects (e.g., high side lobes) of chopping the signal into pieces
β’ By overlapping windows, it provides spectral continuity across time
β The windowing functions π€[π β ππ] must be such that, when overlapped, their sum is unity (or constant)
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 24
β The STFT is then computed as
π ππ, π‘π = π₯ π π€ π β π πβπ2ππ ππ
πβ1
π=0
= π₯ π ,π€ π β π ππ2ππ ππ
β where ππ is the n-th discrete frequency, and π‘π is the starting time of the i-th analysis window
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 25
β’ Example I
50 100 150 200
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
time (sa.)
x(t)
0 100 200 300 400 500 -5
-4
-3
-2
-1
0
1
2
3
4
frequency (Hz)
X(f
)
time (frames)
SFTF
(H
z)
2 4 6 8
490
390
290
200
100
0
FFT 1024 points
Window length = 30ms Window shift = 1ms
FS = 2kHz Two concatenated sine waves
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 26
β’ Example II
1000 2000
-0.5
0
0.5
1
time (sa.)
x(t)
time (frames)
SFTF
(H
z)
50 100 150 200
5000
4000
3000
2000
1000
0
time (frames)
50 100 150 200 250
Window length = 40ms Window shift = 1ms
Window length = 5ms Window shift = 1ms
FS = 10kHz Voiced speech
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 27
Fourier descriptors
β’ Problem definition β Consider the object below, with contour defined in terms of the
coordinates of N points along its periphery
β’ We assume that these points are ordered (e.g., CW or CCW)
β which can be represented by a complex vector u as
π’ =
π₯0 + ππ¦0π₯1 + ππ¦1
π₯π + ππ¦π
(x0,y0)
(x1,y1)
(x2,y2)
(x3,y3)
(x4,y4)
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 28
β Taking the (one-dimensional) DFT of the complex vector π’, we obtain
π π = πΉπΉπ π’ = π’ π πβπ2ππ ππ
πβ1
π=1
β Properties of U(n)
β’ Translation (π’ β π’ + π) only affects the first FD (π(0) β π(0) + ππ)
β’ Scaling by a factor πΌ (π’ β πΌπ’) scales all FDs accordingly (π β πΌπ)
β’ Rotation by an angle π, results in a phase shift (π β ππππ)
β’ Changing the starting point by π positions (π’[π] β π’[π + π]), also results in a phase shift π(π) β ππ2πππ/ππ(π)
CSCE 666 Pattern Analysis | Ricardo Gutierrez-Osuna | CSE@TAMU 29
β Hence, by ignoring π(0) and π(1), taking norms, and dividing by π(1)
π π =π π
π 1 π = 2,3β¦ ,π β 1
β The coefficients become translation-, scale-, rotation-, and start-point-invariant
β These are known as the Fourier Descriptors of the shape defined by u
β’ However, by ignoring the phase of π(π), an essential part of the contour is lost (e.g., two different shapes may have the same FDs)
β Additionally, smooth versions of the original contour can be obtained by performing the IDFT on a subset of the coefficients π(π)
Original shape
n=1..5 n=1..13 n=1..25 n=1..65
[Krzyzak et al. 1988]