time-scale modification of speech signals bill floyd ece 5525 – digital speech processing december...

Post on 16-Dec-2015

234 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Time-Scale Modificationof Speech Signals

Bill Floyd

ECE 5525 – Digital Speech Processing

December 14, 2004

Slide 2 of 49

Objectives

Introduction Background Theory

Methods Examples

Matlab Code Short Time Fourier Transform Short Time Fourier Transform Magnitude Speech Samples

Conclusion Questions References

Slide 3 of 49

Introduction

Goal To either speed up or slow down a speech

signal while maintaining the approximate pitch Applications

Change voice mail playback Court stenographers-play proceedings quicker Sound effects Etc…

Slide 4 of 49

Introduction

Option 1 – Change sample rate If you modify the sample rate, you can change

the speed but the pitch is also changed Increase sample rate = higher pitch (chipmunk

sound) Decrease sample rate = lower pitch (drawn out

echo sound) Option 2 – Decimate or Interpolate Signal

If you change the number of samples, the result is the same as modifying the sample rate

Slide 5 of 49

Introduction

Option 3 – Use more complex methods This will change the speed of the sample while

preserving the pitch data Short Time Fourier Transform Short Time Fourier Transform Magnitude Sinusoidal Synthesis Linear Prediction Synthesis

Slide 6 of 49

Terminology

0 100 200 300 400 500 600 7000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Window Representation

Window Size

Frame Rate

Slide 7 of 49

Theory

Short Time Fourier Transform Methods Chapter 7 in our text (Discrete-Time Speech

Signal Processing) Refer to notes from in class for mathematical

theory of operation I will pick up from where Dr. Kepuska stopped

in his notes

Slide 8 of 49

Short Time Fourier Transform

Short Time Fourier Transform Also called the Fairbanks method Extract successive short-time segments and

then discard the following ones

STFTDecimateSamples

IFFT

OLA

Signal

Output

Slide 9 of 49

Short Time Fourier Transform

Frame Rate factor L In frequency domain after taking the STFT,

you get X(nL,ω)

Form a new signal by Y(nL, ω) = X(snL, ω)

where s = compression factor

Take Inverse Fourier Transform Use Overlap and Add method to form new

signal

Slide 10 of 49

Short Time Fourier Transform

0 100 200 300 400 500 600 700 8000

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600 700 8000

0.2

0.4

0.6

0.8

1

X(nL, ω)

Y(nL, ω)= X(2nL, ω)

Slide 11 of 49

Short Time Fourier Transform

0 100 200 300 400 500 600 7000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Window Representation

0 100 200 300 400 500 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

New Sequence

OriginalWindowedSequence

Slide 12 of 49

Short Time Fourier Transform

Problems Pitch Synchronization

It is highly likely that the pitch periods will not line up properly

Slide 13 of 49

Short Time Fourier Transform Magnitude Short Time Fourier Transform Magnitude

Problems with STFT method relate directly to the linear phase component of the STFT

Time shift = phase change Alternate approach is to only use the

magnitude portion of the STFT—Short Time Fourier Transform Magnitude

Slide 14 of 49

Short Time Fourier Transform Magnitude Compression

With the Fairbanks method, time slices were discarded

Now we can just compress the time slices Form a new signal by

|Y(nM, ω)| = |X(nL, ω)| where M = compression factor = L / speed i.e. for speeding up by two => M = L/2

Slide 15 of 49

Short Time Fourier Transform Magnitude Compression

Take Inverse Fourier Transform Use Overlap and Add method to form new

signal

Slide 16 of 49

Short Time Fourier Transform Magnitude

0 100 200 300 400 500 600 700 8000

0.2

0.4

0.6

0.8

1

0 100 200 300 400 500 600 700 8000

0.2

0.4

0.6

0.8

1

X(nL, ω)

Y(nM, ω)= X(nL, ω)

M=L/2

Slide 17 of 49

Short Time Fourier Transform Magnitude

0 100 200 300 400 500 600 7000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Window Representation

New Sequence

OriginalWindowedSequence

-50 0 50 100 150 200 250 300 350 400 4500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Slide 18 of 49

Other Methods

Sinusoidal Synthesis—Chapter 9 Time-warp the sinewave frequency track and

the amplitude function This technique has been successful with not

only speech but also music, biological, and mechanical signals

Problems Does not maintain the original phase relations Suffer from reverberance

Slide 19 of 49

Other Methods

Linear Prediction Synthesis Use Homomorphic and Linear Prediction

results to modify the time base Book briefly mentions this is possible but ran

out of time before I could investigate this process more

Slide 20 of 49

Other Methods

New Techniques Internet search showed several methods

trying to improve on what is out there now Software

Different software programs that will change speed for you

Adobe Audition is one of the most all encompassing right now

Slide 21 of 49

Matlab Code-Prepare the Workspace

%%%%%%%%%%%%%%%%% Prepare Workspace%%%%%%%%%%%%%%%%

close all;clear all;

window_size_1 = 200;frame_rate_1 = 100;

%Speed to slow down byspeed = 2;

Slide 22 of 49

Matlab Code-Load the Speech Signal

%%%%%%%%%%%%%%%%% Load Data File%%%%%%%%%%%%%%%%

filename = input('Please enter the file name to be used. ');

[sample_data,sample_rate,nbits] = wavread(filename);

loop_time = floor(max(size(sample_data))/frame_rate_1);

sample_data((max(size(sample_data))):(loop_time+1)* frame_rate_1)=0;

Slide 23 of 49

Matlab Code-Develop the Window

%%%%%%%%%%%%%%%%% Create Windows%%%%%%%%%%%%%%%%

% Want windows of 25ms% File sampled at 10,000 samples/sec% Want a window of size 10000 * 25ms(10ms)

triangle_30ms = triang(window_size_1);%triangle_30ms = hamming(window_size_1);

W0 = sum(triangle_30ms);

Slide 24 of 49

Matlab Code-Window the Entire Speech Signal

%%%%%%%%%%%%%%%%% Window the speech%%%%%%%%%%%%%%%%

for i =0:loop_time-1

window_data(:,i+1)=sample_data((frame_rate_1*i)+1:((i+2)* frame_rate_1)).*triangle_30ms;

end

Slide 25 of 49

Matlab Code-Perform the Fast Fourier Transform

%%%%%%%%%%%%%%%%% Create FFT%%%%%%%%%%%%%%%%

for i = 1:loop_time

window_data_fft(:,i) = fft(window_data(:,i),1024);

end

Slide 26 of 49

Matlab Code-Recreate the Modified Signal

%%%%%%%%%%%%%%%%% Recreate Original Signal%%%%%%%%%%%%%%%%

%Initialize the recreated signals

reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0;real_reconstructed_signal(1:(loop_time+1)*frame_rate_1)=0;

modified_reconstructed_signal(1:(loop_time+3)*(frame_rate_1/speed))=0;

modified_reconstructed_signal_compressed(1:(loop_time+3)* (frame_rate_1/ speed))=0;

Slide 27 of 49

Matlab Code-Recreate the Modified Signal

% Perform the ifft

for i = 1:loop_time recreated_data_ifft(:,i) = ifft(window_data_fft(:,i),1024); real_recreated_data_ifft(:,i) = ifft(abs(window_data_fft(:,i)),1024);

truncated_recreated_data_ifft(:,i) = recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0);

real_truncated_recreated_data_ifft(:,i) = real_recreated_data_ifft(1:window_size_1,i).*(frame_rate_1/W0);

end

Slide 28 of 49

Matlab Code-Recreate the Modified Signal

% Get back to the original signal

for i=0:loop_time-1

reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + truncated_recreated_data_ifft(:,i+1)';

real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) = real_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + real_truncated_recreated_data_ifft(:,i+1)';

end

Slide 29 of 49

Matlab Code-Recreate the Modified Signal

% Get a modified signal by deleting certain parts (STFT)

for i=0:(loop_time-1)/speed

modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)* frame_rate_1)) = modified_reconstructed_signal((frame_rate_1*i)+1:((i+2)*frame_rate_1)) + real_truncated_recreated_data_ifft(:,i*speed+1)';

end

Slide 30 of 49

Matlab Code-Recreate the Modified Signal

% Initialize the compressed sequence (STFTM)

modified_reconstructed_signal_compressed(1:frame_rate_1+frame_rate_1/speed+1)=truncated_recreated_data_ifft(frame_rate_1-frame_rate_1/speed:window_size_1,1)';

% Get a modified signal by compressing

for i=0:(loop_time-2) modified_reconstructed_signal_compressed((frame_rate_1/speed*i)

+1:(frame_rate_1/speed*i)+window_size_1) = modified_reconstructed_signal_compressed((frame_rate_1/speed*i)+1:(frame_rate_1/speed*i)+window_size_1) + real_truncated_recreated_data_ifft(:,i+2)';

end

Slide 31 of 49

Matlab Code-Plot Results

%%%%%%%%%%%%%%%%% Plot Results%%%%%%%%%%%%%%%%

Figure; subplot(211)plot(sample_data)title('Original Speech'); v1=axis;hold on; subplot(212)plot(real(modified_reconstructed_signal))title(['STFT Synthesis w/ Speed = ',num2str(speed),'X']); v2=axis;if speed > 1 subplot(211); axis(v1) subplot(212); axis(v1)else subplot(211); axis(v2) subplot(212); axis(v2)end

Slide 32 of 49

Matlab Code-Write Sound Files

%%%%%%%%%%%%%%%%% Write sound files%%%%%%%%%%%%%%%%

wavwrite(modified_reconstructed_signal,sample_rate,nbits,'C:\Classes\ECE_5525\tea party fairbanks 2x.wav')

Slide 33 of 49

Examples Baseline Samples

STFT Sound file

STFTM Sound file

Original File

Sample Rate 2X

Sample Rate .5X

Slide 34 of 49

Examples STFT—Speed 0.5X

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-0.4

-0.2

0

0.2

0.4

0.6Original Speech

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-0.4

-0.2

0

0.2

0.4

0.6STFT Synthesis w/ Speed = 0.5X

Sound file

Slide 35 of 49

Examples STFT—Speed 2X

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1Original Speech

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1STFT Synthesis w/ Speed = 2X

Sound file

Slide 36 of 49

Examples STFT—Speed 4X

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1Original Speech

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1STFT Synthesis w/ Speed = 4X

Sound file

Slide 37 of 49

Examples STFTM—Speed 0.5X

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-0.4

-0.2

0

0.2

0.4

0.6Original Speech

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 104

-0.4

-0.2

0

0.2

0.4

0.6STFTM Synthesis w/ Speed = 0.5X

Sound file

Slide 38 of 49

Examples STFTM—Speed 2X

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1Original Speech

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1STFTM Synthesis w/ Speed = 2X

Sound file

Slide 39 of 49

Examples STFTM—Speed 4X

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1Original Speech

0 0.5 1 1.5 2 2.5

x 104

-1

-0.5

0

0.5

1STFTM Synthesis w/ Speed = 4X

Sound file

Slide 40 of 49

More Results

Change in window size If the window size becomes too small, then a

change in pitch will occur Need window to be 2 to 3 pitch periods long I generally used 20 – 30 ms windows

Slide 41 of 49

More Results

Change in frame rate If the frame rate decreases too much, then there will

be too many samples overlapping to get an intelligible signal

-50 0 50 100 150 200 250 300 350 400 4500

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Slide 42 of 49

More Results

Change filter type Tried Hamming—not much perceptual

difference Using the window energy becomes important

here Frame Rate/W0 is not equal to one

Slide 43 of 49

Conclusion

Optimum area Frame rate is one half of the window size Window size needs to be 2 to 3 pitch periods

long It is possible to easily change the time scale

and still maintain the original pitch although the result is not always natural sounding

Slide 44 of 49

Conclusion

Further investigation What to do when you want to slow down over

half. Using the STFTM means there will be gaps

between the sequences

0 100 200 300 400 500 600 700 800 900 10000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Slide 45 of 49

Conclusion

Further investigation What to do when you want to slow down over half

Could replicate windowed segments

0 100 200 300 400 500 600 700 800 900 10000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Slide 46 of 49

Conclusion

Further investigation Use the other methods to determine quality

Implement Sinusoidal Synthesis Implement Linear Predictive Synthesis using linear

prediction and homomorphic methods Work on synchronizing pitch periods

Shift samples so that the peaks line up Scott and Gerber—Synchronized Overlap and Add (SOLA) Cross-correlation of two samples to find peak Use the peaks to line up samples

Align the window at same relative location within a pitch period

Slide 47 of 49

Questions

Are there any questions?

Slide 48 of 49

References

Quatieri, Thomas E. Discrete-Time Speech Signal Processing. Prentice Hall, Upper Saddle River, NJ, 2002.

Rabiner, L.R. and Schafer, R.W. Digital Processing of Speech Signals. Prentice Hall, Upper Saddle River, NJ, 1978.

Oppenheim, A.V and Schafer, R.W. Digital Signal Processing. Prentice Hall, Englewood Cliffs, NJ, 1975.

Scott, R. and Gerber, S. “Pitch Synchronous Time-Compression of Speech,” Proc. Conf. Speech Communications Processing, p63-85, April 1972.

Slide 49 of 49

References

Fairbanks, G., Everitt, W.L., and Jaeger, R.P. “Method for Time or Frequency Compression-Expansion of Speech,” IEEE Transaction Audio and Electroacoustics, vol. AU-2 pp.7-12, Jan 1954.

top related