

Cepstral analysis synthesis on the mel frequency scale, and an adaptative algorithm for it.

Cecilia Caruncho Llaguno


Cepstral analysis on the mel frequency scale– Satoshi Imai - Tokio Institute of Technology, 1983

An adaptative algorithm for mel-cepstral analysis of speech– Toshiako Fukada - Canon Inc. Kawasaki, 1992– Keeichi Tokuda, Takao Kobayasi, and Satoshi

Imai - Tokio Institute of Technology, 1992

Basic Concepts

Cepstral Analysis– Definition– Features

Mel frequency scale

Cepstral analysis

Main features– Good characteristics for representation– Log spectral envelope → accurate & efficient– Small sensitivity & quantization noise– Small spectral distortion– LMA filter → high quality speech synthesis

Cepstral analysis

Complex logarithm

Inverse Z transform

In unit circle |z|<1

Mel frequency scale

Human hearing sense → non-linear frequency scale

Linear up to 1000 Hz, logarithmic above.

Mel cepstral analysis system

Spectral envelope extraction by the improved cepstral method

Approximation of the mel scale

ta n 1 1 2 s in

1 2 co s 2

Gm 0


c m co s m

Spectral envelope extraction by the improved cepstral method

Former method:– Fine structure → The spectral envelope is not

suficiently separated from the pitch parameter

Present method:– Can extract the envelope without being affected

by the fine structure.

Mel Log Spectrum Approximation filter

Why do we use it?– High quality– Simple– Coefficient sensitivities– Quantization characteristics

Transfer function Quantization of the filter parameter

MLSA transfer function

H o z eF z

zz 1

1 z 1


Basic filter: F zm 0


c m z m

MLSA transfer function

F z b 0 z 1 ·m 1

M 1

b m · z m 1

Ideal MLSA filter Not realizable Padé approximation:

b ... recu rsiv e f ilt er param eter

Filter parameters

b M 1 c M

b m c m c m 1 b m 1 m M , M 1 , ... ,3 ,2

b 1c 1 b 2

1 2

b 0 c 0 b 1

Data rate

Filter coefficients → bounded

Digitalization → quantizer q → data amount bs


b sM 2 2 lo g 2 q 3 if M 9M 2 2 lo g 2 q 14 if M 9

Data rate

Spectral envelope: bs bits/frame

Pitch parameter: bp bits/frame

Period of transmission: T seconds

Averall bit rate of this system: B (bits/second)

Bb s b p


Data rate

Speech quality

T (ms) M q Bp (bit) B (kbits/s) Speech quality

15 11 0.25 7 4 Very high

20 8 0.5 7 2 Fairly good

25 5 0.5 6 1.2 Still good

Spectral distortion

D T 6 5 TDistortion caused by the interpolation

Distortion caused by the quantization

D Qq M 1


Spectral estimation based on mel-cepstral representation

Model spectrum

H z e xpm 0


c m z m e xpm 0


b m · m z K · D z

D z e xpm 1


b m · m z

K e xp b 0

c m b m if m M b m b m 1 if 0 m M



D e j 2 d

Spectral estimation based on mel-cepstral representation

Unbiased Estimator of Log Spectrum by S. Imai and C. Furuichi → minimization of ε

Spectral estimation based on mel-cepstral representation

Newton-Raphson method:

H · b ib b i

b i b i 1 , b i 2 , ... , b i M T

b i 1 b i b i

Adaptative mel-cepstral analysis algorithm

b i 1 b ib b i

E e 2 n

n a

M n, 0 a 1

H → Unit matrix →

μ... adaptation step size

ε(n)... estimate of ε at time n

e(n) → output of the inverse filter 1/D(z) at time n →

n n 1 1 e 2 n , 0 1

Adaptative mel-cepstral analysis algorithm

b n 1 b n n n


MLSA– Simple– Good stathistical features– Small spectral distortions

Adaptative algorithm– Computationally efficient– Fast convergence properties


