pitch tracking ( 音高追蹤 ) jyh-shing roger jang ( 張智星 ) mir lab (...
TRANSCRIPT
![Page 1: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/1.jpg)
Pitch Tracking ( 音高追蹤 )
Jyh-Shing Roger Jang ( 張智星 )
MIR Lab ( 多媒體資訊檢索實驗室 )
CS, NTHU ( 清華大學 資訊工程系 )
[email protected], http://mirlab.org/jang
![Page 2: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/2.jpg)
Pitch ( 音高)Definition of pitch
Fundamental frequency (FF, in Hz): Reciprocal of the fundamental period in a quasi-periodic waveform
Pitch (in semitone): Obtained from the fundamental frequency through a log-based transformation (to be detailed later)
Characteristics of pitch Noise and unvoiced sound do not have pitch.
![Page 3: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/3.jpg)
Pitch Tracking ( 音高追蹤 ) Pitch tracking: To compute the pitch vector of a give
waveform ( 對整段音訊求取音高 ) Applications
Query by singing/humming ( 哼唱選歌 ) Tone recognition for Mandarin ( 華語的音調辨識 ) Intonation scoring for English ( 英語的音調評分 ) Prosody analysis for speech synthesis ( 語音合成中的韻律分析 )
Pitch scaling and duration modification ( 音高調節與長度改變 )
![Page 4: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/4.jpg)
Pitch Tracking Algorithms
Two categories for pitch tracking algorithms Time domain ( 時域 )
ACF (Autocorrelation function)AMDF (Average magnitude difference function)SIFT (Simple inverse filtering tracking)
Frequency domain ( 頻域 )Harmonic product spectrum methodCepstrum method
![Page 5: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/5.jpg)
Typical Steps for Pitch Tracking
Chop signals into frames (aka frame blocking)Compute pitch functions (ACF, AMDF, etc.)Determine pitch for a frame
Max/min picking of the pitch function
Remove unreliable pitch Via volume/clarity thresholding
Smooth the whole pitch vector Via median filter, etc.
![Page 6: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/6.jpg)
Frame Blocking
Frame size=256 pointsOverlap=84 pointsFrame rate = fs/(frameSize-overlap) = 11025/(256-84)=64 pitch/sec
0 50 100 150 200 250 300-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
Zoom in
Overlap
Frame
0 500 1000 1500 2000 2500-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
![Page 7: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/7.jpg)
ACF: Auto-correlation Function
Frame s(i):
Shifted frame s(i+):
=30
30
acf(30) = inner product of overlap part
Pitch period
1
0
n
i
acf s i s i
![Page 8: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/8.jpg)
ACF Example 1
sunday.wav Sample rate = 16kHz Frame size = 512
(starting from point 9000)
Fundamental frequency Max of ACF occurs at
index 132 FF = 16000/(132-1) =
123.077 Hz
![Page 9: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/9.jpg)
ACF Example 2
If the range of humans’ FF is [40, 1000], then we have the restriction for selecting pitch point: Min FF=40Hz
acf(fs/40:end) is not considered.
Max FF=1000Hz acf(1:fs/1000) is not considered.
![Page 10: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/10.jpg)
Pitch Tracking via ACF
Specs Sampe rate = 11025 Hz Frame size = 353 points
= 32 ms Overlap = 0 Frame rate = 31.25 f/s
Playback soo.wav sooPitch.wav
![Page 11: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/11.jpg)
Variations of ACF to Avoid Tapering
Normalized version Half-frame shifting:
1
0
n
i
s i s iacf
n
/2
0
n
i
acf s i s i
![Page 12: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/12.jpg)
Variations of ACF to Normalize Range
To normalize ACF to the range [-1 1]:
This is based on the inequality:
2 2
2 s i s insdf
s i s i
2 2 2 22x y xy x y
![Page 13: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/13.jpg)
AMDF: Average Magnitude Difference Function
Frame s(i):
Shifted frame s(i+):
=30
30
amdf(30) = sum of abs. difference
Pitch period
1
0
n
i
amdf s i s i
![Page 14: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/14.jpg)
AMDF Example
sunday.wav Sample rate = 16kHz Frame size = 512
(starting from point 9000)
Fundamental frequency Min of AMDF occurs at
index 132 FF = 16000/(132-1) =
123.077 Hz
![Page 15: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/15.jpg)
Variations of AMDF to Avoid Tapering
Normalized version Half-frame shifting:
1
0
n
i
s i s iamdf
n
/2
0
n
i
amdf s i s i
![Page 16: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/16.jpg)
Combining ACF and AMDF
ACF
AMDF
Frame
ACF/AMDF
![Page 17: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/17.jpg)
Example of Pitch Tracking
1 2 3 4 5 6 7 8-200
-100
0
100
200soo.wav
Am
plitu
de
1 2 3 4 5 6 7 8
52
54
56
58
60
Pitc
h (s
emito
ne)
PT using ptByDpOverPfMex, with pfWeight=1 and indexDiffWeight=22
pitch1: computed pitch
![Page 18: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/18.jpg)
18/44
UPDUDP (1/4)
UPDUDP: Unbroken Pitch Determination Using DP Goal: To take pitch smoothness into consideration
: a given path in the AMDF matrix : Number of frames : Transition penalty : Exponent of the transition difference
n
i
n
i
m
iiii pppamdfm1
1
11,,cost p
mn
ni ppp ,,1p
![Page 19: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/19.jpg)
UPDUDP (2/4)
Optimum-value function D(i, j): the minimum cost starting from frame 1 to position (i, j)
Recurrent formula:
Initial conditions : Optimum cost :
160,8),(),1( 1 jjamdfjD
),(min
160,8jnD
j
2
160,8),1(min)(),( jkkiDjamdfjiD
ki
160,8,,1 jni
![Page 20: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/20.jpg)
UPDUDP (3/4)
A typical example
![Page 21: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/21.jpg)
UPDUDP (4/4)
Insensitivity in
0 0.5 1 1.5 2
-3
-2
-1
0
1
2
3
x 104
Wav
efor
m
xi
x i
lu
l u
chan
ch a nn
sheng
sh ng
chang
ch a ng
0 0.5 1 1.5 2
20
30
40
50
60
70
80
Time (seconds)
Pitc
h (S
emito
nes)
xi
x i
lu
l u
chan
ch a nn
sheng
sh ng
chang
ch a ng
=0
=2000 =4000 =6000 =8000 =10000 =12000 =14000 =16000 =18000 =20000
![Page 22: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/22.jpg)
Harmonic Product Spectrumhps.m
![Page 23: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/23.jpg)
Frequency to Semitone Conversion
Semitone : A music scale based on A440
Reasonable pitch range: E2 - C6 82 Hz - 1047 Hz ( - )
69440
log12 2
freqsemitone
![Page 24: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/24.jpg)
Unreliable Pitch Removal
Pitch removal via volume thresholding
1 2 3 4 5 6 7 8
-100
-50
0
50
100
Waveform of .wav小 毛 驢
1 2 3 4 5 6 70
5000
10000
Volume
1 2 3 4 5 6 7
40
50
60
70
80
Pitch
Time (sec)
![Page 25: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/25.jpg)
Unreliable Pitch Removal
Pitch removal via volume/clarity thresholding
1 2 3 4 5 6 7 8
-100
0
100
Waveform of .wav小 毛 驢
1 2 3 4 5 6 70
5000
10000
Volume
1 2 3 4 5 6 70
0.5
1Clarity
1 2 3 4 5 6 7
40
60
80
Pitch
Time (sec)
![Page 26: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/26.jpg)
Rest Handling
With rests Without rests
![Page 27: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/27.jpg)
Rest Handling
0 50 100 150 200 25055
60
65
70Original PV
0 20 40 60 80 100 120 140 160 18055
60
65
70useRest=1
0 50 100 150 200 25055
60
65
70useRest=0
Frame index
Rests are removed. Good for DTW.
Rests are replaced by previous nonzero pitch. Good for LS.
Original pitch vectors with rests.
![Page 28: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/28.jpg)
Typical Result of Pitch Tracking
Pitch tracking via autocorrelation for 茉莉花 (jasmine)聲音
![Page 29: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/29.jpg)
Comparison of Pitch VectorsYellow line : Target pitch vector
![Page 30: Pitch Tracking ( 音高追蹤 ) Jyh-Shing Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, NTHU ( 清華大學 資訊工程系 ) jang@mirlab.orgjang@mirlab.org,](https://reader033.vdocuments.mx/reader033/viewer/2022061510/56649ce15503460f949ab49a/html5/thumbnails/30.jpg)
Demo of Pitch Tracking
Real-time display of ACF for pitch tracking toolbox/sap/goPtByAcf.mdl
Real-time pitch tracking for real-time mic input toolbox/sap/goPtByAcf2.mdl
Pitch scaling pitchShiftDemo/project1.exe pitchShift-multirate/multirate.m
Intonation assessment ap170/matlab/goDemo.m