pitch and time scale modifications
TRANSCRIPT
![Page 1: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/1.jpg)
Prepared by:Doaa Gamal
Lecturer Assistant Faculty of Engineering – Suez Canal University
1
![Page 2: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/2.jpg)
Outline
Introduction
Applications
History of time and pitch modification
Time-domain techniques
Frequency-domain techniques
Parametric techniques
conclusion
2
![Page 3: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/3.jpg)
Introduction Timescale modification: slow down or speed up a given
signal, possibly in a time-varying manner, withoutaltering the signal’s spectral content (and in particularits pitch when the signal is periodic).
pitch-scale modification: the aim is to modify thepitch of the signal, possibly in a time-varying manner,without altering the signal’s time-evolution (and inparticular, its duration).
3
![Page 4: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/4.jpg)
Introduction time-scaling or pitch-scaling is not easy because time
and frequency characteristics of a signal, being relatedby the Fourier transform, are not independent.
the simplest method of time scaling a sound is to justreplay it at a different rate. When using magnetictapes, for example, the tape speed may be varied, butthis incurs a simultaneous change in the pitch of thesignal.
4
![Page 5: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/5.jpg)
applications
Speech Synthesizers
Post-synchronization
Data compression
Reading for the blind:
Foreign language learning
Voice transformation
5
![Page 6: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/6.jpg)
History of time and pitch modification
Signal type
method technique
Analog tape recorder machine Time-domain
Digital Digital tape recorder Time-domain
Digital Periodicity-driven methods
Time-domain
Digital STFT Frequency-domain
Digital Linear prediction models & sinusoidal models
parametric models
6
![Page 7: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/7.jpg)
time and pitch modification techniques
Non-parametric
Frequency-domain
techniques
Time-domain techniques
Parametric
![Page 8: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/8.jpg)
Time-domain techniques
Pitch independent methods
requires very few calculations
very well to real-time implementation.
prone to artifacts because no precaution is taken at the splicing points, other than to guarantee continuity.
![Page 9: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/9.jpg)
Time-domain techniquesPeriodicity-driven methods
The most popular method using pitch information is TD-PSOLA
modification factors (between 0.5 and 2).
![Page 10: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/10.jpg)
TD-PSOLA analysis-synthesis process without modification
1 2
3
![Page 11: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/11.jpg)
TD-PSOLA analysis-synthesis process without modification
The output speech waveform of PSOLA analysis-synthesis is perceptually indistinguishable from the original waveform.
4
![Page 12: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/12.jpg)
pitch-scaling (lowering) using TD-PSOLA
12
![Page 13: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/13.jpg)
time-scaling (lengthening) using TD-PSOLA
13
![Page 14: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/14.jpg)
Computation of synthesis pitch-marks for pitch modification
14
![Page 15: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/15.jpg)
Computation of synthesis pitch-marks for pitch modification (raising)
![Page 16: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/16.jpg)
Computation of synthesis pitch-marks for duration modification
16
![Page 17: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/17.jpg)
Computation of synthesis pitch-marks for time-scale modification (lengthening)
![Page 18: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/18.jpg)
From the synthesis pitch-marks to the modified waveform
The simple way is
calculate the nearest analysis pitch-mark to the virtual pitch-mark is found
The frames which corresponds to the nearest analysis pitch-marks are centered on the synthesis pitch-marks.
The overlapping regions are added together.
18
![Page 19: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/19.jpg)
From the synthesis pitch-marks to the modified waveform
19
• In more sophisticated systems, the mapping involves linear interpolation between the two successive short-time analysis signals lying the closest to the virtual pitch-mark
The perceptual quality of the prosody modified speechusing PSOLA methods depends on the accuracy of thepitch markers estimation. As estimating epochs fromspeech provide more accurate pitch marker locations
![Page 20: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/20.jpg)
LP-PSOLA & FD-PSOLA
The Frequency-Domain PSOLA (FD-PSOLA) and theLinear-Predictive PSOLA (LP-PSOLA) approaches aretheoretically more appropriate than the time-domainPSOLA method for pitch-scale modifications becausethey provide independent control over the spectralenvelope of the synthesis signal.
![Page 21: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/21.jpg)
Frequency-domain techniques
Frequency-domain algorithms operate with a short-time spectrum of the signal (phase-vocoder)
1. Calculate shift-time Fourier transform (STFT) of a signal
2. Modify phases of each frequency channel.
3. Synthesize a signal using inverse STFT with a different time stride
21
![Page 22: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/22.jpg)
Parametric techniques linear prediction models
sinusoidal models
the Harmonic plus Noise Model, HNM
wideband models
STRAIGHT
![Page 23: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/23.jpg)
conclusion Time-domain approaches are computationally cheap
and perform good for small modification factors.
Good for real-time implementations
possible to incorporate such systems in consumerproducts such as telephone answering systems.
suffering from echos.
In particular, time or pitch-scale modifications bylarge factors cannot be carried out by time-domainmethods and usually require the use of the moreelaborate frequency-domain techniques.
23
![Page 24: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/24.jpg)
conclusion
Frequency-domain techniques are capable ofproviding very high quality output. However, they stillsuffer from some distortion, mainly due to the effectsof “phase dispersion.”
computationally intensive.
24
![Page 25: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/25.jpg)
conclusion Parametric techniques tend to outperform non-
parametric methods when the adequation between thesignal to be modified and the underlying model isgood. When this is not the case however, the methodsbreak down and the results are unreliable.
Parametric techniques usually are more costly in termsof computations, because they require an explicitpreliminary analysis stage for the estimation of themodel parameters.
25
![Page 26: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/26.jpg)
![Page 27: Pitch and time scale modifications](https://reader034.vdocuments.mx/reader034/viewer/2022052304/55c3dac6bb61eb2f348b467a/html5/thumbnails/27.jpg)