指導教授陳福坤學生葛書銓

1

Specifications for theAnalog to Digital Conversion of

Voice by 2,400 Bit/Second Mixed

Excitation Linear Prediction

指導教授陳福坤學生葛書銓

2

大綱簡介編碼器解碼器

3

簡介MELP (Mixed Excitation Linear Prediction)1. 取代 LPC-10 (FS-1015)2. 以 LPC 作為模型基礎，並包含五項新特性 :

.壹 mixed excitation

.貳 aperiodic pulses

.參 adaptive spectral enhancement

.肆 pulse dispersion

.伍 Fourier magnitude modeling3. 每個 MELP 音框為 22.5ms 每音框含 180 個 samples (8000 sampl

es/s)

4

編碼器Low frequency removal1. 編碼過程中第一步，使用 Chebychev 高通濾波器，截止頻率 60Hz

和阻帶抑制 (stopband rejection)30dB 。

5

Integer Pitch Calculation

1KHzLPF

Normalized Autocorrelation

Function

PointOf

Maximum

Integer Pitch

( )-current frame

1P

7

Integer Pitch Calculation1. = 40~160 ，計算 Normalized Autocorrelation 正規化自相關函數的定義為

3. 求出使正規化自相關函數最大的值作為 Pitch 的第一個估計值

),()0,0(

),0()(

CC

Cr

)(r

nk

k

mk SSnm

792/

802/

),(C

1P

8

Bandpass Voicing Analysis

12)( VbpPr BPF 0-500Hz

BPF500-1000HZ

BPF1000-2000Hz

BPF2000-3000Hz

BPF3000-4000Hz

Pitch Refinement & Normalized Autocorrelation

Function

Full-Wave Rectifier + Smoothing Filter




-- current & last frame

1P

Point ofMaximum

--Fractional

pitch

2P

2Vbp

5Vbp

4Vbp

3Vbp

1Vbp

9

Fractional Pitch Calculation

1. 當 0-500Hz filter output signal:.壹使用兩個分別為前後音框的 integer pitch values ( ) 作為候選值.貳為 real pitch & integer pitch 的差值 .參計算是對兩個候選值前 5 個到後 5 個 samples , 用正規化自相

關再做一次 integer pitch search , 找到 optimum integer pitch lag

後做 Fractional Pitch Refinement

1P

0-500HzBPF

--Current frame & last frame

1P

Fractional Pitch

Refinement

Normalized Autocorrelation

Function

r( )

Voicing

Analysis

2PPoint of

Maximum

-- Fractional pitch

2P

10

Fractional Pitch Refinement1. 假設 Integer Pitch 為 T 個樣點 , 假定我們求出的最大值發生在 = T , 最大值可能位於區間 T and T+1 或 T-1 and T 內 ,The fractional offset : = Fractional Pitch value 正規化自相關 :

對兩個候選值各別計算出的 Fractional Pitch & Normalized Autocorrelation value , 其中較大的作為 current frame 的 fractional pitch , , =

)(r

)]1,()1,1()[,0()]1,(),()[1,0(

)1,(),0(),()1,0(

TTCTTCTCTTCTTCTC

TTCTCTTCTC

TTTTTT

TTTT

)]1,1()1,()1(2),()1)[(0,0(

)1,0(),0()1()(

22

TTCTTCTTCC

TCTCTr

TTTT

TT

1Vbp2P )( 2Pr

11

Aperiodic Flag1. 由 Bandpass Voicing Analysis 決定 if < 0.5 , Aperiodic Flag set 1 otherwise , Aperiodic Flag set 02. 由旗標設定決定解碼器是否使用非週期性脈衝的激發源

1Vbp

1Vbp

12

Linear Prediction Analysis1. 用 200 個 samples(25ms) 的漢明窗對輸入語音加窗進行 10 階線性預

測分析2. 採用 Levinson-Durbin 求解線性預測系數 (i=1,2,…,10) ，然後對

做 0.994(15Hz) 帶寬擴展，也就是 :

Linear Prediction Residual Calculation1. 輸入的語音信號經過線性預測分析後將 10 個 LPC 係數過濾後 , 為線性

預測殘值信號

ia ia

)10,...,2,1(94.0 iaa iii

LPC Aanlysis LPC Residual signal

Final Pitch

Peakiness & Adjust Vbp

Voice strength

13

Peakiness Calculation1. 線性預測殘值信號的峰度 (peakiness) 定義為 :

2. 峰度值超過 1.34 ，則會被設定為 1峰度值超過 1.6 ，則 (i=1,2,3) 全會設為 1

160

1

160

1

2

1601160

1

n

n

nn

r

r

peakiness

1VbpiVbp

14

Final Pitch Calculation

1. 將殘值信號經過截止頻率為 1KHz 的低通濾波器2. 以為基準 , 從前 5 個到後 5 個 samples , 用 Normalized Autocor

relation 做做 Integer Pitch search

3. 找出 optimum integer pitch lag 進行 Fractional Pitch Refinement , 得到的值暫定為 Final Pitch &

2P

3P )( 3Pr

15

4. 之後經過 Pitch Doubling Check 後 , 才會是準確的 Final Pitch

16

)( 23 PFractionalP

6.0)( 3 Pr

3P

yes no

55.0)( 3 Pr

1003 P

no yes

7.0thD 9.0thD

5.0thD 75.0thD

3P

avgPP 3

DoublingCheck

yes

1003 P

DoublingCheck

END

17

Gain Calculation1. 輸入信號的每一音框分成兩個子音框 , 分別計算和2. 計算增益的窗長會隨著變化而有所改變3. 每個子音框的增益為 RMS 的分貝值計算 , 計算公式為 :

4. 式中 0.01 是防止 RMS 值太接近零 ,若計算結果為負值 , 則將結果設為0

1G 2G

L

nni s

LG

1

2

10]

101.0[10log

2P

18

QUANTIZATION1. Quantization of Prediction Coefficients2. Pitch Quantization3. Gain Quantization4. Bandpass Voicing Quantization5. Fourier Magnitude Calculation and Quantization

19

Quantization of Prediction Coefficients1. 將 10 個線性預測係數轉化為 line spectrum frequencies (LSF's)2. 10 個 LSF按照升冪排列 , 間隔至少為 50Hz3. LSF向量用 multi-stage vector quantizer (MSVQ) 進行量化

st1LSF-- LSF--

stage128 levels

Stage64 levels

Stage64 levels

Stage64 levels

end2 rd3 th4

ff

20

The algorithm is to find the quantized vector - and as seen in the above figure he is the sum of the vectors selected in each stage. The main purpose of the MSVQ is to find the quantized vector that will best represent the original LSF vector. In order to do so the MSVQ finds the codebook vector, which minimize the square of the weighted Euclidean distance, , between the original LSF and the quantized LSF vectors:

f

2d

21

Pitch Quantization1. 將 Final Pitch value 進行 99 階平均量化2. 量化後對應於量化表中一個 7bit 的 codeword.

22

Gain Quantization1. 每個 Frame 有兩個增益和

, 分別用 3bit , 5bit 進行平均量化

2. 的範圍 10dB~77dB

1G 2G

2G

23

Bandpass Voicing Quantization1. 若 , 表示 unvoiced , (i=2,3,4,5)量化為 02. 若 , 且 (i=2,3,4,5)>0.6 , 則量化為 1 , Otherwise 量化為

03. 有一特例 , 若 (i=2,3,4,5) 為 0001 , 則將量化為 0

6.01 VbP iVbP

6.01 VbP

iVbP

iVbP5VbP

24

Fourier Magnitude Calculation and Quantization1. 先根據量化的 LSF向量計算出量化的線性預測參數2. 利用量化的線性預測參數計算殘值信號3. 用 200sample 的漢明視窗補零後做 512 點的 Fast Fourier Transfo

rm (FFT) 4. 將複數的 FFT結果轉換成幅度值5. 利用 spectral peak-picking algorithm 搜尋 first 10 pitch harmon

ics

27

解碼器Pitch Decoding 1. decoding the 7-bit pitch code to determine if a frame is

voiced, unvoiced, or whether a frame erasure is indicated 2. If the pitch code is all-zero or has only one bit set, then the

unvoiced mode is used. If two bits are set, a frame erasure is indicated. Otherwise, the pitch value is decoded and the voiced mode is used.

3. If any erasure is detected in the current frameAll of the parameters for the current frame are replaced with the parameters from the previous frame. In addition, the first gain term is set equal to the second gain

29

Parameter Interpolation1. 由於每個 Frame只會傳送一組參數 , 考慮到一個 Frame 內可能不止

有一個 pitch period, 所以 MELP 的參數再合成時 ,都要進行 pitch-synchronously .

30

Aperiodic Pulses1. 由於 MELP 語音標準中 , 語音分成三種狀態 voiced , unvoiced , jitter

voiced2. 非週期脈衝激發源主要使用在 voiced 和 unvoiced 語音交界 , 用來

合成 jitter voiced , 能使得解碼器產生不穩定的 glottal pulses.

31

Mixed Pulse and Noise Excitation 1. 利用 multi-band mixing model , 使用分成五個頻帶的 FIR band-pas

s filter bank 2. 處理有聲成分的濾波器統稱為 Pulse Shaping Filter Bank

處理無聲成分的濾波器統稱為 Noise Shaping Filter Bank3. 兩濾波器組會依據激發源每個頻帶的有聲和無聲傾向改變 , 將脈衝激

發源的訊號利用代通濾波器套用到有聲 /無聲的頻帶4. 將這五個頻帶的訊號合成便是所謂的混合激發源 ,利用這種方式可以

大幅改善傳統 LPC參數模型嚴重的 buzz雜音

34

Adaptive Spectral Enhancement1. 由於合成的語音衰減速度會比自然情況的人聲還快 , 因此造成失真 , 失真的原因是由於 LPC pole bandwidth 所造成 2. 為了解決失真的問題 , 在混合激勵信號產生後 , 會經過自適應頻譜增

強濾波器濾波3. 此濾波器為一個 10 階層極點零點 (pole/zero) 加強濾波器 , 加上一個

一階 FIR 濾波器進行補償4. 其目的是減少有共振點的頻帶與真實語音間的誤差 , 以減緩共振點響

應衰減的速度

指導教授 陳福坤 學生 葛書銓

Documents

指導教授陳福坤學生葛書銓