time-scale modification of speech signals bill floyd ece 5525 – digital speech processing december...
TRANSCRIPT
![Page 1: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/1.jpg)
Time-Scale Modificationof Speech Signals
Bill Floyd
ECE 5525 – Digital Speech Processing
December 14, 2004
![Page 2: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/2.jpg)
Slide 2 of 49
Objectives
Introduction Background Theory
Methods Examples
Matlab Code Short Time Fourier Transform Short Time Fourier Transform Magnitude Speech Samples
Conclusion Questions References
![Page 3: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/3.jpg)
Slide 3 of 49
Introduction
Goal To either speed up or slow down a speech
signal while maintaining the approximate pitch Applications
Change voice mail playback Court stenographers-play proceedings quicker Sound effects Etc…
![Page 4: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/4.jpg)
Slide 4 of 49
Introduction
Option 1 – Change sample rate If you modify the sample rate, you can change
the speed but the pitch is also changed Increase sample rate = higher pitch (chipmunk
sound) Decrease sample rate = lower pitch (drawn out
echo sound) Option 2 – Decimate or Interpolate Signal
If you change the number of samples, the result is the same as modifying the sample rate
![Page 5: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/5.jpg)
Slide 5 of 49
Introduction
Option 3 – Use more complex methods This will change the speed of the sample while
preserving the pitch data Short Time Fourier Transform Short Time Fourier Transform Magnitude Sinusoidal Synthesis Linear Prediction Synthesis
![Page 6: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/6.jpg)
Slide 6 of 49
Terminology
0 100 200 300 400 500 600 7000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Window Representation
Window Size
Frame Rate
![Page 7: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/7.jpg)
Slide 7 of 49
Theory
Short Time Fourier Transform Methods Chapter 7 in our text (Discrete-Time Speech
Signal Processing) Refer to notes from in class for mathematical
theory of operation I will pick up from where Dr. Kepuska stopped
in his notes
![Page 8: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/8.jpg)
Slide 8 of 49
Short Time Fourier Transform
Short Time Fourier Transform Also called the Fairbanks method Extract successive short-time segments and
then discard the following ones
STFTDecimateSamples
IFFT
OLA
Signal
Output
![Page 9: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/9.jpg)
Slide 9 of 49
Short Time Fourier Transform
Frame Rate factor L In frequency domain after taking the STFT,
you get X(nL,ω)
Form a new signal by Y(nL, ω) = X(snL, ω)
where s = compression factor
Take Inverse Fourier Transform Use Overlap and Add method to form new
signal
![Page 10: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/10.jpg)
Slide 10 of 49
Short Time Fourier Transform
0 100 200 300 400 500 600 700 8000
0.2
0.4
0.6
0.8
1
0 100 200 300 400 500 600 700 8000
0.2
0.4
0.6
0.8
1
X(nL, ω)
Y(nL, ω)= X(2nL, ω)
![Page 11: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/11.jpg)
Slide 11 of 49
Short Time Fourier Transform
0 100 200 300 400 500 600 7000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Window Representation
0 100 200 300 400 500 600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
New Sequence
OriginalWindowedSequence
![Page 12: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/12.jpg)
Slide 12 of 49
Short Time Fourier Transform
Problems Pitch Synchronization
It is highly likely that the pitch periods will not line up properly
![Page 13: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/13.jpg)
Slide 13 of 49
Short Time Fourier Transform Magnitude Short Time Fourier Transform Magnitude
Problems with STFT method relate directly to the linear phase component of the STFT
Time shift = phase change Alternate approach is to only use the
magnitude portion of the STFT—Short Time Fourier Transform Magnitude
![Page 14: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/14.jpg)
Slide 14 of 49
Short Time Fourier Transform Magnitude Compression
With the Fairbanks method, time slices were discarded
Now we can just compress the time slices Form a new signal by
|Y(nM, ω)| = |X(nL, ω)| where M = compression factor = L / speed i.e. for speeding up by two => M = L/2
![Page 15: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/15.jpg)
Slide 15 of 49
Short Time Fourier Transform Magnitude Compression
Take Inverse Fourier Transform Use Overlap and Add method to form new
signal
![Page 16: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/16.jpg)
Slide 16 of 49
Short Time Fourier Transform Magnitude
0 100 200 300 400 500 600 700 8000
0.2
0.4
0.6
0.8
1
0 100 200 300 400 500 600 700 8000
0.2
0.4
0.6
0.8
1
X(nL, ω)
Y(nM, ω)= X(nL, ω)
M=L/2
![Page 17: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/17.jpg)
Slide 17 of 49
Short Time Fourier Transform Magnitude
0 100 200 300 400 500 600 7000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Window Representation
New Sequence
OriginalWindowedSequence
-50 0 50 100 150 200 250 300 350 400 4500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 18: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/18.jpg)
Slide 18 of 49
Other Methods
Sinusoidal Synthesis—Chapter 9 Time-warp the sinewave frequency track and
the amplitude function This technique has been successful with not
only speech but also music, biological, and mechanical signals
Problems Does not maintain the original phase relations Suffer from reverberance
![Page 19: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/19.jpg)
Slide 19 of 49
Other Methods
Linear Prediction Synthesis Use Homomorphic and Linear Prediction
results to modify the time base Book briefly mentions this is possible but ran
out of time before I could investigate this process more
![Page 20: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/20.jpg)
Slide 20 of 49
Other Methods
New Techniques Internet search showed several methods
trying to improve on what is out there now Software
Different software programs that will change speed for you
Adobe Audition is one of the most all encompassing right now
![Page 21: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/21.jpg)
Slide 21 of 49
Matlab Code-Prepare the Workspace
%%%%%%%%%%%%%%%%% Prepare Workspace%%%%%%%%%%%%%%%%
close all;clear all;
window_size_1 = 200;frame_rate_1 = 100;
%Speed to slow down byspeed = 2;
![Page 22: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/22.jpg)
Slide 22 of 49
Matlab Code-Load the Speech Signal
%%%%%%%%%%%%%%%%% Load Data File%%%%%%%%%%%%%%%%
filename = input('Please enter the file name to be used. ');
[sample_data,sample_rate,nbits] = wavread(filename);
loop_time = floor(max(size(sample_data))/frame_rate_1);
sample_data((max(size(sample_data))):(loop_time+1)* frame_rate_1)=0;
![Page 23: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/23.jpg)
Slide 23 of 49
Matlab Code-Develop the Window
%%%%%%%%%%%%%%%%% Create Windows%%%%%%%%%%%%%%%%
% Want windows of 25ms% File sampled at 10,000 samples/sec% Want a window of size 10000 * 25ms(10ms)
triangle_30ms = triang(window_size_1);%triangle_30ms = hamming(window_size_1);
W0 = sum(triangle_30ms);
![Page 24: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/24.jpg)
Slide 24 of 49
Matlab Code-Window the Entire Speech Signal
%%%%%%%%%%%%%%%%% Window the speech%%%%%%%%%%%%%%%%
for i =0:loop_time-1
window_data(:,i+1)=sample_data((frame_rate_1*i)+1:((i+2)* frame_rate_1)).*triangle_30ms;
end
![Page 25: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/25.jpg)
Slide 25 of 49
Matlab Code-Perform the Fast Fourier Transform
%%%%%%%%%%%%%%%%% Create FFT%%%%%%%%%%%%%%%%
for i = 1:loop_time
window_data_fft(:,i) = fft(window_data(:,i),1024);
end
![Page 26: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/26.jpg)
Slide 26 of 49
Matlab Code-Recreate the Modified Signal
%%%%%%%%%%%%%%%%% Recreate Original Signal%%%%%%%%%%%%%%%%
%Initialize the recreated signals
reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0;real_reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0;
modified_reconstructed_signal(1:(loop_time+3)*(frame_rate_1/speed))=0;
modified_reconstructed_signal_compressed(1:(loop_time+3)* (frame_rate_1/ speed))=0;
![Page 27: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/27.jpg)
Slide 27 of 49
Matlab Code-Recreate the Modified Signal
% Perform the ifft
for i = 1:loop_time recreated_data_ifft(:,i) = ifft(window_data_fft(:,i),1024); real_recreated_data_ifft(:,i) = ifft(abs(window_data_fft(:,i)),1024);
truncated_recreated_data_ifft(:,i) = recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0);
real_truncated_recreated_data_ifft(:,i) = real_recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0);
end
![Page 28: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/28.jpg)
Slide 28 of 49
Matlab Code-Recreate the Modified Signal
% Get back to the original signal
for i=0:loop_time-1
reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + truncated_recreated_data_ifft(:,i+1)';
real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + real_truncated_recreated_data_ifft(:,i+1)';
end
![Page 29: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/29.jpg)
Slide 29 of 49
Matlab Code-Recreate the Modified Signal
% Get a modified signal by deleting certain parts (STFT)
for i=0:(loop_time-1)/speed
modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)* frame_rate_1)) = modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + real_truncated_recreated_data_ifft(:,i*speed+1)';
end
![Page 30: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/30.jpg)
Slide 30 of 49
Matlab Code-Recreate the Modified Signal
% Initialize the compressed sequence (STFTM)
modified_reconstructed_signal_compressed(1:frame_rate_1+frame_rate_1/speed+1)=truncated_recreated_data_ifft(frame_rate_1-frame_rate_1/speed:window_size_1,1)';
% Get a modified signal by compressing
for i=0:(loop_time-2) modified_reconstructed_signal_compressed((frame_rate_1/speed*i)
+1:(frame_rate_1/speed*i)+window_size_1) = modified_reconstructed_signal_compressed((frame_rate_1/speed*i)+1:(frame_rate_1/speed*i)+window_size_1) + real_truncated_recreated_data_ifft(:,i+2)';
end
![Page 31: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/31.jpg)
Slide 31 of 49
Matlab Code-Plot Results
%%%%%%%%%%%%%%%%% Plot Results%%%%%%%%%%%%%%%%
Figure; subplot(211)plot(sample_data)title('Original Speech'); v1=axis;hold on; subplot(212)plot(real(modified_reconstructed_signal))title(['STFT Synthesis w/ Speed = ',num2str(speed),'X']); v2=axis;if speed > 1 subplot(211); axis(v1) subplot(212); axis(v1)else subplot(211); axis(v2) subplot(212); axis(v2)end
![Page 32: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/32.jpg)
Slide 32 of 49
Matlab Code-Write Sound Files
%%%%%%%%%%%%%%%%% Write sound files%%%%%%%%%%%%%%%%
wavwrite(modified_reconstructed_signal,sample_rate,nbits,'C:\Classes\ECE_5525\tea party fairbanks 2x.wav')
![Page 33: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/33.jpg)
Slide 33 of 49
Examples Baseline Samples
STFT Sound file
STFTM Sound file
Original File
Sample Rate 2X
Sample Rate .5X
![Page 34: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/34.jpg)
Slide 34 of 49
Examples STFT—Speed 0.5X
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 104
-0.4
-0.2
0
0.2
0.4
0.6Original Speech
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 104
-0.4
-0.2
0
0.2
0.4
0.6STFT Synthesis w/ Speed = 0.5X
Sound file
![Page 35: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/35.jpg)
Slide 35 of 49
Examples STFT—Speed 2X
0 0.5 1 1.5 2 2.5
x 104
-1
-0.5
0
0.5
1Original Speech
0 0.5 1 1.5 2 2.5
x 104
-1
-0.5
0
0.5
1STFT Synthesis w/ Speed = 2X
Sound file
![Page 36: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/36.jpg)
Slide 36 of 49
Examples STFT—Speed 4X
0 0.5 1 1.5 2 2.5
x 104
-1
-0.5
0
0.5
1Original Speech
0 0.5 1 1.5 2 2.5
x 104
-1
-0.5
0
0.5
1STFT Synthesis w/ Speed = 4X
Sound file
![Page 37: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/37.jpg)
Slide 37 of 49
Examples STFTM—Speed 0.5X
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 104
-0.4
-0.2
0
0.2
0.4
0.6Original Speech
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
x 104
-0.4
-0.2
0
0.2
0.4
0.6STFTM Synthesis w/ Speed = 0.5X
Sound file
![Page 38: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/38.jpg)
Slide 38 of 49
Examples STFTM—Speed 2X
0 0.5 1 1.5 2 2.5
x 104
-1
-0.5
0
0.5
1Original Speech
0 0.5 1 1.5 2 2.5
x 104
-1
-0.5
0
0.5
1STFTM Synthesis w/ Speed = 2X
Sound file
![Page 39: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/39.jpg)
Slide 39 of 49
Examples STFTM—Speed 4X
0 0.5 1 1.5 2 2.5
x 104
-1
-0.5
0
0.5
1Original Speech
0 0.5 1 1.5 2 2.5
x 104
-1
-0.5
0
0.5
1STFTM Synthesis w/ Speed = 4X
Sound file
![Page 40: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/40.jpg)
Slide 40 of 49
More Results
Change in window size If the window size becomes too small, then a
change in pitch will occur Need window to be 2 to 3 pitch periods long I generally used 20 – 30 ms windows
![Page 41: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/41.jpg)
Slide 41 of 49
More Results
Change in frame rate If the frame rate decreases too much, then there will
be too many samples overlapping to get an intelligible signal
-50 0 50 100 150 200 250 300 350 400 4500
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 42: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/42.jpg)
Slide 42 of 49
More Results
Change filter type Tried Hamming—not much perceptual
difference Using the window energy becomes important
here Frame Rate/W0 is not equal to one
![Page 43: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/43.jpg)
Slide 43 of 49
Conclusion
Optimum area Frame rate is one half of the window size Window size needs to be 2 to 3 pitch periods
long It is possible to easily change the time scale
and still maintain the original pitch although the result is not always natural sounding
![Page 44: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/44.jpg)
Slide 44 of 49
Conclusion
Further investigation What to do when you want to slow down over
half. Using the STFTM means there will be gaps
between the sequences
0 100 200 300 400 500 600 700 800 900 10000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 45: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/45.jpg)
Slide 45 of 49
Conclusion
Further investigation What to do when you want to slow down over half
Could replicate windowed segments
0 100 200 300 400 500 600 700 800 900 10000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 46: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/46.jpg)
Slide 46 of 49
Conclusion
Further investigation Use the other methods to determine quality
Implement Sinusoidal Synthesis Implement Linear Predictive Synthesis using linear
prediction and homomorphic methods Work on synchronizing pitch periods
Shift samples so that the peaks line up Scott and Gerber—Synchronized Overlap and Add (SOLA) Cross-correlation of two samples to find peak Use the peaks to line up samples
Align the window at same relative location within a pitch period
![Page 47: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/47.jpg)
Slide 47 of 49
Questions
Are there any questions?
![Page 48: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/48.jpg)
Slide 48 of 49
References
Quatieri, Thomas E. Discrete-Time Speech Signal Processing. Prentice Hall, Upper Saddle River, NJ, 2002.
Rabiner, L.R. and Schafer, R.W. Digital Processing of Speech Signals. Prentice Hall, Upper Saddle River, NJ, 1978.
Oppenheim, A.V and Schafer, R.W. Digital Signal Processing. Prentice Hall, Englewood Cliffs, NJ, 1975.
Scott, R. and Gerber, S. “Pitch Synchronous Time-Compression of Speech,” Proc. Conf. Speech Communications Processing, p63-85, April 1972.
![Page 49: Time-Scale Modification of Speech Signals Bill Floyd ECE 5525 – Digital Speech Processing December 14, 2004](https://reader037.vdocuments.mx/reader037/viewer/2022100305/56649ce15503460f949ab4ba/html5/thumbnails/49.jpg)
Slide 49 of 49
References
Fairbanks, G., Everitt, W.L., and Jaeger, R.P. “Method for Time or Frequency Compression-Expansion of Speech,” IEEE Transaction Audio and Electroacoustics, vol. AU-2 pp.7-12, Jan 1954.