latent prosody models of continuous mandarin speech speech lab., cm, nctu chen yu chiang 2007/2/8

100
LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Upload: bartholomew-mccoy

Post on 19-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH

Speech Lab., CM, NCTUChen Yu Chiang

2007/2/8

Page 2: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Outline

Introduction Base Latent Prosody Models (LPM)

A Statistical Syllable Duration Model A Statistical Syllable Pitch Contour

Model Automatic Prosody Labeling based

on LPM Summary

Page 3: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Introduction (1/11)

What is Prosody? Prosody is an inherent supra-

segmental feature of human speech. It carries stress, intonation patterns and timing structures of continuous speech which decide the naturalness and understandability of an utterance.

Page 4: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Introduction (2/11) For the listener’s points of view, prosody consis

ts of systematic perception and recovery of a speaker’s intentions based on: Pause: to indicate phrases and avoid running out of ai

r. Pitch: rate of vocal-fold cycling( fundamental frequen

cy or F0) as a function of time. Rate/relative duration: phoneme durations, timing, a

nd rhythm. Loudness (Energy): relative amplitude/volume

For simplicity, we may say “ 抑 , 揚 , 頓 , 挫 , 輕 ,重 , 緩 , 急”

Page 5: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Introduction (3/11)

The affecting factors of prosody Linguistic

Lexical, Syntactic, Semantic, Pragmatic Para-linguistic

Intentional, Attitudinal, Stylistic Non-linguistic

Physical, Emotional

Page 6: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Introduction (4/11) Issues concerned in prosody modeling

Labeling of important prosodic cues Construction of prosody hierarchy Modeling of syntax-prosody relationship Prediction of prosodic phrase boundary (break)

from text, etc. Applications

Automatic Speech Recognition (ASR) Important prosodic cues can be explored from the input

utterance to assist in both acoustic and linguistic decoding

Text-to-Speech (TTS) A good prosody model can be used to generate

appropriate prosodic features from the input text

Page 7: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Introduction (5/11)

Important characteristics of Mandarin Chinese A tonal language (Four lexical tones,

one neutral tone) The tonality of a monosyllable is mainly

characterized by the shape of its fundamental frequency (F0) contour

A syllable-based language (411 base-syllables)

Page 8: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Introduction (6/11) Syllable duration is also seriously affected

by the phonetic structure of base-syllables. Generally speaking, syllable duration

increases as the number of constituent phonemes increases.

For examples: Syllables with single vowels are shortest. Syllables with stop initials or no initials, and

without nasal endings are pronounced shorter. Syllables with fricative initials and with nasal

endings are longer.

Page 9: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Introduction (7/11) Standard tone pattern

Affection of context and intonation

Page 10: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Introduction (8/11) As a tonal language, in Mandarin

speech, there is a tight interaction between four lexical tones, a neutral tone, base-syllable types and the underlying speech prosody/intonation.

Page 11: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Introduction (9/11) To find the underlying prosody/intonation structur

e, we propose the Latent Prosody Models (LPM) LPM considered several Companding Factors (CFs)

(or affecting factors) on syllable pitch contour and syllable duration, including tone, initial-final type, base syllable type and prosodic state, etc.

The prosodic state (treated as a latent variable) is conceptually defined as the state of a syllable in a prosodic phrase and used as a substitute for high level linguistic information, like a word, phrase or a syntactic boundary.

Use of unlabeled database

Page 12: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Introduction (10/11) LPMs are formulated based on the assu

mption that all affecting factors are combined additively or multiplicatively

n n n n nn n t y j l sZ X

n n n n nn n t y j l sZ X

Prosodic observed

feature vector

Normalized feature vector

Affecting factors

Page 13: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Introduction (11/11) The main purpose of using prosodic state to replace

conventional high level linguistic information is to decompose the affections of low-level and high-level linguistic features on speech.

Through this modeling approach, some unsolved problems, such as the inconsistency of prosodic and syntactic structures, the ambiguity of word segmentation and word chunking for Mandarin Chinese, can be avoided.

Hence, based on the LPM, the proposed prosody labeling model can focus on modeling the global effect of mapping high-level linguistic features to the prosodic state and break indices, since interference caused by low-level linguistic feature has been removed by LPM.

Page 14: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

References1. Sin-Horng Chen, Wen-hsing Lai and Yih-Ru Wang, “A new duratio

n modeling approach for Mandarin speech”, IEEE transaction on speech and audio processing, vol. 11, no.4, Jul 2003, pp. 308-320

2. Sin-Horng Chen, Wen-hsing Lai and Yih-Ru Wang, “A statistics-based pitch contour model for Mandarin speech”, J. Acoust. Soc. Am. 117(2), Feb. 2005, pp. 908 – 925

3. Chen-Yu Chiang, Yih-Ru Wang, and Sin-Horng Chen, "On the inter-syllable coarticulation effect of pitch modeling for Mandarin speech", INTERSPEECH-2005, pp. 3269-3272

4. Chen-Yu Chiang, Xiao-Dong Wang, Yuan-Fu Liao, Yih-Ru Wang, Sin-Horng Chen, Keikichi Hirose, “Latent prosody model of continuous Mandarin speech”, ICASSP 2007

Page 15: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Base Latent Prosody Models (LPM)

A Statistical Syllable Duration Model A Statistical Syllable Pitch Contour

Model

Page 16: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

A Statistical Syllable Duration Model• In ASR, state duration models are constructed to a

ssist.• In TTS, synthesis of proper duration information is

essential for natural speech.• An extension includes the modelings of initial and f

inal durations.• Multiplicative and additive models are compared.

Page 17: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

The Multiplicative Duration Model

n n n n nn n t y j l sZ X

nZ

nX

nt

ny

nj

nl

ns

observed duration of the nth syllable

normalized duration of the nth syllable

affecting factor

lexical tone of the nth syllable

prosodic state of the nth syllable

base-syllable of the nth syllable

utterance of the nth syllable

speaker of the nth syllable

Page 18: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Training of the Model (1/2)

Expectation-Maximization (EM) algorithm

},,,,,,{ sljytvu

N the total number of training samples

Y the total number of prosodic states

the set of parameters to be estimated

auxiliary function in E-step

: new set : old set

)|,(log),|(),(1 1

n

N

n

Y

ynnn yZpZypQ

n

Page 19: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Training of the Model (2/2)

nX : normal distribution with mean u and variance v

Assumption

Y

ynn

nnnn

n

yZp

yZpZyp

1

)|,(

)|,(),|(

),;()|,( 22222

nnnnnnnnnn sljytsljytnnn vZyZp

sequential optimizations in M-step

Assign prosodic state * max ( | , )n

n n ny

y p y Z

Page 20: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

The Additive Duration Model

nnnnn sljytnn XZ Model ->

Auxiliary Function ->

))((

)|,(log),|(),(

1

1 1

zsl

N

njyt

N

n

Y

ynnnn

N

yZpZypQ

nnnnn

n

Page 21: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Database (1/2) MIC

high-quality, reading style microphone-speech database

MIC-sent : 455 phonetic-balanced sentential utterances

MIC-para : 300 paragraphic utterances Training : 102,529 syllables Testing : 22,109 syllables 20kHz sampling rate downsampled to 8kHz 1 frame = 5 ms

Page 22: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Database (2/2)

Data Set Speaker Sentence Paragraph Syllable

Training Male A 1-455 1-200 34670

Training Female B 1-455 1-50 12945

Training Male C 1-455 1-100 20748

Training Female D 1-455 1-200 34166

Testing Female E None 201-300 22109

Page 23: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results (1/7)Training set Testing set

Mean Variance Mean Variance

Syllable44.31

(42.34)“43.89”

180.17(2.52)“2.53”

41.08(44.77)“43.77”

136.26(4.44)“3.97”

Initial17.21

(16.63)“17.20”

62.28(0.74)“0.78”

13.83(18.36)“17.05”

40.02(5.92)“1.73”

Final31.75

(31.50)“31.44”

117.06(2.12)“1.84”

30.94(33.90)“31.38”

104.15(3.40)“2.85”

(units: mean in frame and variance in frame2; 1 frame = 5 ms)

Observed Durations

( ) Normalized Durations in Multiplicative Model with 16 prosodic states

“ “ Normalized Durations in Additive Model with 16 prosodic states

Page 24: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results (2/7)

0 20 40 60 80 100 120 1400

500

1000

1500

2000

2500

3000

3500

4000

4500

5000

duration(frame)

num

ber

Histogram of Observed (left)/Normalized (right) Syllable Duration in Multiplicative Model for Training Set

0 10 20 30 40 50 60 70 80 900

1000

2000

3000

4000

5000

6000

7000

8000

9000

duration(frame)

num

ber

Page 25: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results (3/7) Analyses of CFs

tone 1 2 3 4 5

CF 1.00 1.02 0.99 1.03 0.84

state 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

syllable0.56

-16.070.72

-12.360.79-9.69

0.84-7.71

0.89-5.79

0.91-4.70

0.95-3.14

0.98-1.94

1.000.00

1.020.12

1.051.69

1.094.10

1.145.87

1.229.65

1.3315.08

1.6928.74

initial0.30

-11.200.49-6.82

0.63-6.22

0.71-4.98

0.80-3.82

0.85-3.60

0.86-2.92

0.89-2.49

0.96-1.40

1.00-0.41

1.040.00

1.090.89

1.121.39

1.193.56

1.306.03

1.6112.69

final0.50

-14.280.68

-10.240.75-7.94

0.80-6.45

0.84-5.15

0.87-4.24

0.91-2.99

0.95-.1.73

0.98-0.86

1.000.00

1.020.73

1.083.12

1.145.10

1.248.50

1.4013.42

1.8625.49

CFs for prosodic states (up: multiplicative model down: additive model)

CFs for tones

Page 26: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results (4/7)

用 14* 百 14 子 9 蓮 15* 、蕾 11 絲 4 花 15* 、姬 7 百 11 合 15* 、龍 13 膽 15* 、土 5耳 9 其 11 桔 10 梗 13* 和 14* 蒜 4 香 1藤 12* 為 4 材 15* ,以 14* 維 4 納 6 斯 13* 執 8 壺 2 的 14* 石 10 膏 13* 花 3 器14* 烘 10 托 15* ,好 4 一 2 趟 11* 春 4雨 14* 濛 3濛 10 的 15* 郊 9外 14* 田 4野 9風 13 光 15* 。

Examples of Prosodic State Labeling* denotes word boundary

Page 27: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results (5/7)1.07

0.86 1.10

0.79 0.89

0.83 0.92

1.00 0.91

1.221.06

1.211.03

0.96 1.05

{b, d, g}?

Single vowel

Compoundvowel

Open vowel

{f, s, sh, shi, h}

{ts, ch, chi}

Single vowel

Decision Tree of Base-Syllable CFs for Syllable Duration ModelThe number associated with a node is the mean of the CFs belonging to the cluster

Solid line indicates positive answerDashed line indicates negative answer

Page 28: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results (6/7)0.79

0 0.87

0.95

0.89

0.76

Null initial

0.37

1.29

1.42 1.25

0.42 0.35

1.321.18

0.91

1.21

0.70

1.00 0.89

{b, d, g}

{ts, ch, chi}

Singlevowel

{f, s, sh, shi, h}

Vowel begins with {i}

Singlevowel

{p, t, k}

Vowel begins with {i}

1.141.22

With medial

1.291.17

Vowel begins with {u}

Decision Tree of Base-Syllable CFs for Initial Duration Model

Page 29: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results (7/7)1.07

1.37 1.04

1.08

1.06

Null initial

1.33

0.96

Single vowel

1.40

1.47 1.35

0.91

1.150.83

1.02 0.94 1.01 1.08

1.150.94 1.071.02

With medial

Vowel begins with {i}With medial

{m, n, l, r}

{m, n, l, r}Compound

vowel

{b, d, g}{ts, ch, chi}

Decision Tree of Base-Syllable CFs for Final Duration Model

Page 30: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

A Statistical Syllable Pitch Contour Model (1/7)• Mandarin is a tonal language. Information o

f the tonality appears on its pitch contour.• Pitch contour patterns in continuous speec

h are highly varying and can deviate dramatically away from their canonical forms.

• Separate an utterance’s pitch contour into a global trend pitch mean model and a locally variational shape model.

• A quantitative description to the coarticulation effect is given.

Page 31: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

A Statistical Syllable Pitch Contour Model (2/7)

Gaussian normalization

original pitch period of frame t

mean of speaker k

standard deviation of speaker k

normalized pitch period of frame t

( )( ) k

all allk

f tf t

( )f t

( )f t

k

k

all

all

averaged mean of all speaker

averaged standard deviation of all speakers

Page 32: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

A Statistical Syllable Pitch Contour Model (3/7)

Discrete orthogonal polynomial Basis Functions (Discrete Legendre

Polynomials) :

1)(0 Mi

][][)( 212/1

212

1

Mi

MM

Mi

])[(][)( 6122/1

)3)(2)(1(180

2

3

MM

Mi

Mi

MMMM

Mi

])()()[(][)( 22

25

20

)2)(1(

102362

2332/1

)4)(3)(2)(2)(1(2800

3 M

MMMi

MMM

Mi

Mi

MMMMMM

Mi

Mi 0 3M

Page 33: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

A Statistical Syllable Pitch Contour Model (4/7)

Parameterized pitch contour

3

0

)()(ˆj

Mi

jjMi af Mi 0

M

iMi

jMi

Mj fa0

11 )()(

Page 34: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

A Statistical Syllable Pitch Contour Model (5/7)

Pitch mean modeling

nn ssnn YZ )(

nZ observed log-pitch mean

ns

ns speaker’s dynamic range change CF

speaker’s level shift CF

nY speaker-compensated log-pitch mean

Page 35: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

A Statistical Syllable Pitch Contour Model (6/7)

nnnnnn pfiftpttnn XY

nX

nt

normalized log-pitch mean of the nth syllable

affecting factor

current lexical tone of the nth syllable

prosodic state of the nth syllable

r

npt

nft

ni

nf

np

previous lexical tone of the nth syllable

initial class of the nth syllable

following lexical tone of the nth syllable

final class of the nth syllable

Page 36: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

A Statistical Syllable Pitch Contour Model (7/7)

Pitch shape modeling

normalized pitch shape vector of the nth syllable

CF vector for affecting factor

lexical tone combinations of the nth syllable

nZ

nX

rb

ntc

pause < 13 frames : tight coupling effect >=13 : loose

Taaa 321observed of the nth syllable

nnnnn fisqtcnn bbbbbXZ

nq prosodic state of pitch shape

Page 37: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results (1/6)Observed Log-Pitch

(unit of pitch period: ms)

  training set test set

  mean (co)variance mean (co)variance

mean 1.949 0.0372 1.948 0.0345

Shape(x 0.01)

056.0

982.0

545.3

900.2106.0140.5

106.0671.9229.3

140.5229.3550.58

142.0

749.0

012.4

356.4276.0007.4

276.0460.12653.3

007.4653.3489.49

Page 38: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results (2/6)

Normalized Log-Pitch with 16 Prosodic States

(unit of pitch period: ms)

training set test set

mean (co)variance

RMSE mean (co)variance

RMSE

mean 1.948 0.000402 0.0203 1.948 0.000344 0.0183

shape(x 0.01)

104.0

996.0

660.3

251.1232.0076.0

232.0907.1354.0

076.0354.0865.9

120.1

381.1

143.3

085.0

906.0

861.3

263.2808.0073.1

808.0101.3955.0

073.1955.0885.12

505.1

762.1

603.3

Page 39: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results (3/6)

1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 30

200

400

600

800

1000

1200

1400

1600

1800

2000

pitch mean

num

ber

1.6 1.7 1.8 1.9 2 2.1 2.2 2.30

1000

2000

3000

4000

5000

6000

7000

pitch mean

num

ber

Histograms of Observed (left)/Normalized (right) Log-Pitch Mean for the Training Set

Page 40: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results (4/6)

Examples of the Reconstructed Pitch Contours Inside Test : ” 在國人消費習慣改變,國民所得提高,信用貸款市場,成為潛力市場。

0 200 400 600 800 1000 1200 14000

2

4

6

8

10

12

Frame

Pitch P

eroid (m

s)

original predicted

Page 41: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results (5/6)

Examples of the Reconstructed Pitch ContoursOutside Test : ” 在意國政經混亂中臨危受命的齊安培,未來在政經兩方面都有不少

艱困任務待完成。 ”

0 200 400 600 800 1000 1200 1400 1600 18000

1

2

3

4

5

6

7

8

Frame

Pitch P

eroid (m

s)

original predicted

Page 42: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results (6/6)

Influences of the 16 Unified Prosodic States

0 2 4 6 8 10 12 14 164

5

6

7

8

9

10

11

prosodic state

pitc

h pe

riod

(ms)

Page 43: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Analyses of the Inferred Model (1/13)

t

pt

tone 1 2 3 4 5

-0.154 0.054 0.160 -0.035 0.128

-0.022 -0.034 0.018 0.024 0.029

0.022 -0.003 -0.047 0.011 0.013ft

CFs of Current, Previous and Following Tones in Pitch Mean Model

Page 44: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Analyses of the Inferred Model (2/13)

Comparison of a Tone 3 Precedes another Tone 3 with Canonical Tone 2 and 3

0 2 4 6 8 10 12 14 16 18 206

6.5

7

7.5

8

8.5

9

9.5

frame

pitc

h pe

riod

(ms)

033133233333433533020030

Page 45: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Analyses of the Inferred Model (3/13)

Comparison of a Tone 4 Precedes another Tone 4 with Canonical Tone 4

0 2 4 6 8 10 12 14 16 18 205.5

6

6.5

7

7.5

8

8.5

frame

pitc

h pe

riod

(ms)

044144244344444544040

Page 46: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Analyses of the Inferred Model (4/13)

CFs of Initial/Final Classes in Pitch Mean Model

i

f

class 0 1 2 3 4 5 6

-0.008 0.004 0.011 -0.013 0.003 -0.014 0.003

0.011 -0.001 -0.004 0.008 -0.005 -0.019 0.004

(unit of pitch period: ms)

Null initial {b,d,g} {f,s,sh,shi,h}

{m,n,l,r} {ts,ch,chi}

{p,t,k} {tz,j,ji}

Low vowels Middle vowels

High vowels

Compound vowels

Vowel with nasal ending

retroflexion Null vowels

Page 47: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Analyses of the Inferred Model (5/13)

CFs of Initial/Final Classes in Pitch Shape Model

(unit of pitch period: ms)

class 0 1 2 3 4 5 6

             

             

ib

fb

548.0

125.1

971.0

020.0

015.0

522.0

321.0

440.0

509.0

697.0

506.0

520.0

648.0

666.0

270.1

389.0

627.0

111.0

075.0

161.0

722.0

095.0

280.0

641.0

076.0

865.0

278.0

094.0

017.0

978.0

166.0

703.0

640.0

080.0

891.0

266.1

291.0

696.0

354.0

182.0

131.0

224.0

(x 0.01)(x 0.01)

(x 0.01)(x 0.01)

Page 48: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Analyses of the Inferred Model (6/13)

CFs of Speakers in Pitch Mean Model

s

s

speakers 1(M) 2(F) 3(M) 4(F)

1.014 0.971 1.026 0.981

-0.030 0.049 -0.044 0.041

(unit of pitch period: ms)

Page 49: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Analyses of the Inferred Model (7/13)

CFs of Speakers in Pitch Shape Model

(unit of pitch period: ms)

speakers 1(M) 2(F) 3(M) 4(F)

       sb

012.0

134.0

291.0

125.0

302.0

324.0

348.0

349.0

216.0

152.0

472.0

301.0

(x 0.01)(x 0.01)

Page 50: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Analyses of the Inferred Model (8/13)

state 0 1 2 3 4 5 6 7

  -0.400 -0.225 -0.159 -0.113 -0.081 -0.047 -0.016 0.014

state 8 9 10 11 12 13 14 15

  0.039 0.073 0.102 0.130 0.161 0.196 0.265 0.348

p

p

CFs of Prosodic States in Pitch Mean Model

(unit of pitch period: ms)

Page 51: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Analyses of the Inferred Model (9/13)

(unit of pitch period: ms)

CFs of Prosodic States in Pitch Shape Model

state 0 1 2 3 4 5 6 7

state 8 9 10 11 12 13 14 15

qb

qb

108.0

832.4

662.3

476.1

249.1

354.9

535.1

179.0

047.0

304.0

479.0

164.0

436.0

221.3

167.1

773.0

295.0

707.3

346.0

218.4

297.2

164.1

798.0

340.1

267.0

591.0

245.2

184.0

249.2

849.0

466.0

194.1

558.1

961.0

582.0

033.4

248.0

550.1

167.1

603.1

469.1

094.0

684.0

455.2

550.1

106.0

289.0

279.0

(x 0.01)(x 0.01)

(x 0.01)(x 0.01)

Page 52: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Analyses of the Inferred Model (10/13)

Page 53: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Analyses of the Inferred Model (11/13)

Page 54: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Analyses of the Inferred Model (12/13)

BreakPM

Non-boundary Minor boundary Major boundary

Non-PM 89.18% 9.80% 1.02%

Minor PM 57.73% 33.48% 8.80%

Secondary Major PM

30.52% 44.65% 24.83%

Major PM 19.31% 31.66% 49.02%

Statistics of the Prosodic Labeling

Major PM={, ,。 ,! ,; ,? }, Secondary Major PM={、 ,: }, Minor PM={brace, bracket, dot}

1

1

major boundary if 10 15

location after syllable minor boundary if 4 9

non-boundary otherwise

n n

n n

p p

n p p

Page 55: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Analyses of the Inferred Model (13/13)

這位約翰霍普金斯大學名譽教授 *在第一屆國際 &性高潮會議中說 *,他對這一始於 &一九八O年代的性趨勢 &感到 ...這場比賽 *將於今日下午2時 &在 &台北 &市立棒球場舉行 *,黑鷹組織 &所屬 &三級棒球隊 *,包括台南六信 *、台東農工 &、屏東鶴聲國中 *、台東鹿野國中 &及台南善化國小等隊 *,將各著球隊服裝&到場加油 *,預計人數有近千人以上 *。黑鷹兩位教練 *黃永裕及&江泰權 *,對於此場比賽 *不敢掉以輕心 *,除了排出鑽石陣容外,也要親自上場 *。黑鷹所 ...商人非法囤積 &大量爆竹 *,萬一發生爆炸事件 *,不但會造成死傷慘劇 *,自己也可能成為 &受害最大 ...世界性的環保潮流 &,使人們日益重視環境汙染的問題 *;而觀光旅遊 & 這個﹁無煙囪工業 *﹂正好吻合此一 *健康訴求 * ,因此可預期& 今年將是遊樂區 ...

Examples of Possible Minor (&) and Major (*) Prosodic Phrase Boundaries

Page 56: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Conclusions Effectiveness on isolating several main

factors Greatly reducing the variance of the mo

deled duration/pitch The estimated companding factors (CF

s) conformed well to the prior linguistic knowledge

The prosodic-state labels produced are linguistically meaningful

Page 57: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Automatic Prosody Labeling based on LPM

Page 58: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Break types In this study, we define break types to be five

levels; i.e., B0~B4. B0 : tightly coupling syllabic boundary that the

pitch contour on the syllable juncture may be connected and affected by contextual syllables severely

B1 represents normal syllabic boundary which loosely couples two consecutive syllables and does not have a pitch reset.

B2 represents prosodic word boundary which has short pause or an irregular pitch reset.

B3 /B4 :minor/major breaks with medium and long pauses, respectively. Besides, they usually accompany large or medium pitch resets.

Page 59: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Break Labeling Algorithm

* *

,

,

,

, argmax ( , | , , , )

argmax ( , , , | , )

argmax ( , | , , , ) ( , | , )

P

P

P P

B p

B p

B p

B p p B x Pau L t

p B x PauL t

x Pau p B L t p BL t

Break type

Prosodic state

Pitch contour

Pause duration

High-level Linguistic feature

Low-level Linguistic feature (tone)

Acoustic-prosodic model

linguistic-prosodic model

Page 60: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Acoustic-prosodic model (1/3)

1, , , -1 , , 1 , , 1 , , ,

1 1

( , | , , , )

( | , , , ) ( | , , , )

( | , , ) ( | , )

( | , , , , , ) ( | , )kNK

k n k n k n k n k n k n k n k n k n k nk n

P

P P

P P

P p B B t t t P Pau B L

x Pau p B L t

x p B L t Pau p B L t

x p B t PauB L

x

The syllable pitch contour model

(Base LPM)

The pause-break model

Page 61: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Acoustic-prosodic model (2/3)

The syllable pitch contour model, , , ,, 1 , 1 ,, , ,k n k n k n k nk n k n

f bt p B tpk n k n B tp

μx y PT PP PC PC

Page 62: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Acoustic-prosodic model (3/3)

The pause-break model

, , , -1 , , 1 , , 1

, , , , 1 , 1 , ,, ,

( | , , , , , )

( ; , )

k n k n k n k n k n k n k n

k n k n k n k n k n k n k nf b

t p B tp B tp

P p B B t t t

N

x

x μ RPT PP PC PC

1 1, , , ,

1, , , , , ,

( | , ) ( ; , )k n k n k n k n

k n k n k n k n B L B LP Pau B L g Pau

Page 63: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Linguistic-prosodic model

12

,1 , , 1 , 1 , ,1 2 1

( , | , ) ( , | ) ( | , ) ( | ) ( | ) ( | )

( ) ( | , ) ( | )k kN NK

k k n k n k n k n k nk n n

P P P P P P

P p P p p B P B L

p BL t p BL pB L BL pB BL

Prosodic state transition modelLinguistic-break model

Page 64: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Training of the Model To estimate the parameters of the break

labeling model, a sequential optimization procedure based on the ML criterion is adopted. It first defines a likelihood function

expressed by 1

, , , -1 , , 1 , , 1 , , ,1 1

12

,1 , , 1 , 1 , ,1 2 1

log ( | , , , , , ) ( | , )

( ) ( | , ) ( | )

k

k k

NK

k n k n k n k n k n k n k n k n k n k nk n

N NK

k k n k n k n k n k nk n n

Q P p B B t t t P Pau B L

P p P p p B P B L

x

Page 65: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Initialization of Break Labeling

Pause ≥ 300ms

Pause ≥ 125ms

Pause ≥ 75ms

PMNormalized pitch reset ≥ threshold

Pitch pause ≥ 30ms

Interword

Pitch pause ≥ 30ms

B4

B3

B3 B2

B1 B0

B1 B0

B2

Y

Y

Y

YY

YY

Y

N

N

N

NN

N

N

Page 66: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Database Performance of the proposed pitch modeling meth

od was evaluated using a Mandarin speech database

The database contained the read speech of a single female professional announcer

Its texts were all short paragraphs composed of several sentences selected from the Sinica Tree-Bank Corpus

The database consisted of 380 utterances with 52192 syllables

Sampling rate 16kHz All segmentations and F0 values are manually corr

ected

Page 67: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results

The learning curve

Page 68: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results

Covariance matrices of observed and normalized feature vectors

-4

932.3 0 0 0

0 89.9 0 0 10

0 0 17.8 0

0 0 0 5.0

xR-4

y

9.0 0 0 0

0 31.9 0 0 10

0 0 11.1 0

0 0 0 3.8

R

Page 69: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-syllable pitch contour model(1/12)

The learned pitch contour of 5 tones

Page 70: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-syllable pitch contour model(2/12)

Prosodic state patterns

Page 71: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-syllable pitch contour model(3/12)

Coarticulation patterns

Page 72: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-syllable pitch contour model(4/12)

Page 73: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-syllable pitch contour model(5/12)

Page 74: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-syllable pitch contour model(6/12)

Page 75: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-syllable pitch contour model(7/12)

Page 76: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-syllable pitch contour model(8/12)

Page 77: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-syllable pitch contour model(9/12)

Page 78: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-syllable pitch contour model(10/12)

Page 79: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-syllable pitch contour model(11/12)

Page 80: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-syllable pitch contour model(12/12)

Page 81: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-Pause-break model (1/2)

Pause-break model

Break type

B0 B1 B2 B3 B4

Pause duration mean in

sec

0.0020.00

90.035 0.206

0.479

Page 82: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-Pause-break model (2/2)

1, , ,( | 4, )k n k n k nP Pau B L

Page 83: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-length of prosodic units (1/3)

Histogram of length of prosodic group

Page 84: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-length of prosodic units (2/3)

Histogram of length of prosodic phrase

Page 85: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-length of prosodic units (3/3)

Histogram of length of word

Page 86: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results

Count of break indices

Page 87: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results

Count of prosodic state

Page 88: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results

Prob. of prosodic state after B3

Page 89: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results

Prob. of prosodic state before B3

Page 90: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results

Prob. of prosodic state after B4

Page 91: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results

Prob. of prosodic state before B4

Page 92: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-prosodic state transition model(1/5)

, , 1 , 1( | , 4)k n k n k nP p p B

Pn-1\Pn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 0.01 0.01 0.01 0.02 0.01 0.08 0.08 0.02 0.04 0.13 0.01 0.12 0.13 0.12 0.12 0.092 0.00 0.01 0.01 0.01 0.04 0.00 0.10 0.01 0.08 0.16 0.07 0.00 0.18 0.11 0.13 0.083 0.00 0.00 0.00 0.03 0.00 0.04 0.00 0.10 0.03 0.07 0.00 0.23 0.10 0.19 0.12 0.084 0.00 0.00 0.00 0.02 0.00 0.04 0.00 0.13 0.00 0.14 0.00 0.04 0.13 0.20 0.16 0.135 0.00 0.00 0.00 0.01 0.00 0.06 0.00 0.00 0.13 0.00 0.17 0.00 0.33 0.10 0.17 0.006 0.00 0.00 0.01 0.00 0.06 0.01 0.00 0.20 0.00 0.03 0.14 0.00 0.07 0.08 0.28 0.107 0.00 0.00 0.00 0.00 0.02 0.01 0.00 0.00 0.08 0.01 0.00 0.26 0.00 0.43 0.00 0.178 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.09 0.01 0.00 0.11 0.00 0.38 0.00 0.35 0.009 0.01 0.01 0.01 0.01 0.01 0.01 0.07 0.01 0.01 0.01 0.01 0.24 0.01 0.35 0.01 0.2510 0.01 0.01 0.01 0.01 0.12 0.01 0.01 0.01 0.22 0.03 0.01 0.01 0.31 0.07 0.16 0.0111 0.02 0.02 0.02 0.02 0.04 0.02 0.02 0.04 0.02 0.06 0.02 0.25 0.04 0.19 0.04 0.1512 0.01 0.01 0.01 0.02 0.01 0.01 0.01 0.08 0.02 0.10 0.07 0.01 0.08 0.08 0.28 0.1713 0.02 0.02 0.02 0.02 0.15 0.02 0.02 0.12 0.04 0.10 0.06 0.15 0.12 0.02 0.08 0.0414 0.03 0.03 0.03 0.05 0.03 0.03 0.19 0.03 0.03 0.11 0.05 0.08 0.16 0.05 0.05 0.0315 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.10 0.05 0.05 0.05 0.05 0.05 0.10 0.10 0.0516 0.03 0.03 0.03 0.03 0.03 0.07 0.03 0.03 0.03 0.17 0.10 0.07 0.07 0.03 0.10 0.10

Page 93: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-prosodic state transition model(2/5)

, , 1 , 1( | , 3)k n k n k nP p p B

Pn-1\Pn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 0.03 0.03 0.03 0.15 0.05 0.03 0.08 0.03 0.08 0.03 0.20 0.05 0.10 0.05 0.05 0.032 0.01 0.03 0.08 0.11 0.01 0.15 0.09 0.03 0.01 0.16 0.15 0.05 0.05 0.05 0.01 0.013 0.00 0.01 0.06 0.05 0.10 0.07 0.00 0.18 0.01 0.26 0.00 0.07 0.09 0.03 0.04 0.004 0.00 0.01 0.04 0.00 0.11 0.00 0.22 0.00 0.12 0.00 0.20 0.11 0.00 0.10 0.04 0.025 0.00 0.00 0.00 0.09 0.13 0.00 0.00 0.10 0.18 0.00 0.00 0.32 0.10 0.00 0.05 0.016 0.00 0.00 0.08 0.02 0.00 0.00 0.28 0.00 0.01 0.33 0.01 0.03 0.00 0.19 0.00 0.047 0.00 0.00 0.00 0.05 0.00 0.17 0.00 0.28 0.00 0.00 0.25 0.00 0.13 0.01 0.08 0.008 0.00 0.00 0.03 0.00 0.15 0.00 0.13 0.00 0.25 0.00 0.00 0.00 0.26 0.12 0.00 0.059 0.00 0.00 0.03 0.16 0.00 0.00 0.00 0.11 0.00 0.00 0.46 0.04 0.00 0.00 0.16 0.0010 0.00 0.01 0.02 0.00 0.10 0.18 0.00 0.00 0.15 0.23 0.00 0.00 0.24 0.01 0.00 0.0411 0.01 0.05 0.03 0.01 0.10 0.01 0.19 0.06 0.01 0.08 0.01 0.17 0.03 0.06 0.14 0.0112 0.00 0.00 0.00 0.10 0.00 0.13 0.01 0.07 0.17 0.00 0.00 0.25 0.00 0.19 0.00 0.0413 0.01 0.01 0.04 0.01 0.14 0.01 0.01 0.37 0.02 0.01 0.14 0.01 0.15 0.01 0.08 0.0114 0.01 0.01 0.03 0.04 0.05 0.01 0.08 0.10 0.09 0.02 0.22 0.06 0.07 0.14 0.03 0.0515 0.02 0.02 0.02 0.04 0.02 0.02 0.02 0.17 0.08 0.17 0.13 0.06 0.11 0.02 0.02 0.0816 0.05 0.05 0.10 0.05 0.05 0.05 0.05 0.10 0.05 0.05 0.10 0.05 0.05 0.05 0.05 0.05

Page 94: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-prosodic state transition model(3/5)

, , 1 , 1( | , 2)k n k n k nP p p B

Pn-1\Pn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 0.06 0.13 0.15 0.09 0.04 0.02 0.11 0.02 0.06 0.09 0.02 0.02 0.09 0.02 0.02 0.022 0.05 0.05 0.09 0.22 0.01 0.05 0.19 0.12 0.03 0.01 0.12 0.01 0.03 0.01 0.01 0.013 0.02 0.01 0.05 0.11 0.17 0.03 0.00 0.22 0.04 0.06 0.00 0.16 0.00 0.09 0.02 0.014 0.01 0.00 0.04 0.00 0.19 0.00 0.06 0.00 0.35 0.00 0.00 0.15 0.14 0.01 0.04 0.005 0.02 0.00 0.03 0.03 0.00 0.00 0.18 0.00 0.00 0.39 0.00 0.13 0.04 0.16 0.00 0.026 0.00 0.00 0.00 0.15 0.00 0.00 0.00 0.38 0.00 0.00 0.29 0.00 0.00 0.05 0.11 0.007 0.00 0.01 0.04 0.00 0.00 0.00 0.15 0.00 0.13 0.22 0.00 0.00 0.31 0.06 0.06 0.018 0.00 0.00 0.00 0.00 0.06 0.00 0.00 0.00 0.17 0.00 0.09 0.37 0.00 0.28 0.00 0.019 0.00 0.01 0.03 0.03 0.00 0.01 0.02 0.00 0.00 0.23 0.00 0.00 0.37 0.00 0.24 0.0610 0.00 0.01 0.00 0.00 0.04 0.00 0.00 0.10 0.00 0.00 0.00 0.36 0.00 0.47 0.00 0.0011 0.01 0.00 0.00 0.04 0.00 0.04 0.02 0.01 0.11 0.01 0.01 0.00 0.43 0.00 0.23 0.0912 0.00 0.00 0.01 0.00 0.00 0.04 0.00 0.05 0.00 0.00 0.18 0.00 0.20 0.17 0.26 0.0613 0.01 0.01 0.01 0.01 0.01 0.00 0.00 0.02 0.03 0.01 0.00 0.08 0.00 0.33 0.37 0.1014 0.01 0.01 0.01 0.08 0.01 0.02 0.00 0.00 0.12 0.02 0.03 0.00 0.13 0.05 0.29 0.2215 0.01 0.01 0.02 0.03 0.00 0.10 0.01 0.04 0.01 0.07 0.03 0.13 0.05 0.03 0.21 0.2316 0.01 0.03 0.04 0.04 0.03 0.01 0.09 0.01 0.01 0.11 0.01 0.12 0.08 0.06 0.14 0.22

Page 95: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-prosodic state transition model(4/5)

, , 1 , 1( | , 1)k n k n k nP p p B

Pn-1\Pn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 0.09 0.29 0.18 0.07 0.04 0.02 0.02 0.02 0.04 0.04 0.02 0.04 0.02 0.02 0.02 0.022 0.09 0.28 0.30 0.10 0.01 0.07 0.00 0.05 0.02 0.02 0.00 0.00 0.01 0.02 0.01 0.003 0.04 0.24 0.21 0.30 0.00 0.12 0.03 0.00 0.00 0.01 0.00 0.00 0.03 0.01 0.00 0.004 0.02 0.13 0.26 0.17 0.30 0.00 0.00 0.08 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.005 0.00 0.05 0.22 0.35 0.00 0.18 0.00 0.14 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.006 0.02 0.10 0.00 0.34 0.07 0.00 0.21 0.00 0.07 0.12 0.00 0.00 0.03 0.03 0.00 0.007 0.00 0.03 0.11 0.18 0.22 0.00 0.33 0.00 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.008 0.00 0.02 0.15 0.00 0.45 0.00 0.00 0.24 0.00 0.00 0.11 0.00 0.02 0.00 0.00 0.009 0.01 0.00 0.00 0.35 0.00 0.20 0.00 0.24 0.00 0.10 0.00 0.00 0.08 0.00 0.00 0.0110 0.00 0.02 0.06 0.00 0.00 0.00 0.43 0.00 0.34 0.00 0.00 0.15 0.00 0.00 0.00 0.0011 0.00 0.01 0.05 0.00 0.36 0.00 0.00 0.00 0.00 0.33 0.00 0.09 0.00 0.10 0.05 0.0112 0.00 0.01 0.00 0.14 0.00 0.16 0.00 0.34 0.00 0.17 0.05 0.00 0.11 0.00 0.00 0.0013 0.00 0.01 0.04 0.00 0.09 0.00 0.13 0.00 0.24 0.08 0.00 0.29 0.00 0.10 0.02 0.0114 0.00 0.00 0.01 0.06 0.00 0.07 0.00 0.18 0.02 0.19 0.00 0.17 0.12 0.11 0.04 0.0215 0.00 0.01 0.00 0.02 0.00 0.00 0.08 0.00 0.12 0.08 0.00 0.19 0.19 0.19 0.09 0.0416 0.00 0.01 0.01 0.03 0.00 0.00 0.02 0.03 0.03 0.08 0.00 0.12 0.10 0.24 0.23 0.07

Page 96: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-prosodic state transition model(5/5)

, , 1 , 1( | , 0)k n k n k nP p p B

Pn-1\Pn 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 0.19 0.10 0.03 0.13 0.13 0.03 0.03 0.03 0.06 0.03 0.03 0.03 0.03 0.03 0.03 0.032 0.11 0.31 0.16 0.13 0.01 0.06 0.02 0.06 0.02 0.01 0.02 0.02 0.01 0.02 0.01 0.013 0.03 0.14 0.33 0.24 0.00 0.00 0.12 0.00 0.03 0.04 0.01 0.02 0.01 0.00 0.01 0.004 0.02 0.06 0.21 0.10 0.31 0.00 0.00 0.21 0.00 0.00 0.02 0.03 0.00 0.02 0.00 0.005 0.00 0.01 0.02 0.38 0.00 0.40 0.00 0.00 0.15 0.00 0.01 0.00 0.02 0.00 0.01 0.006 0.02 0.00 0.21 0.00 0.46 0.00 0.00 0.15 0.09 0.01 0.00 0.02 0.00 0.02 0.00 0.017 0.01 0.02 0.04 0.00 0.18 0.00 0.46 0.00 0.00 0.17 0.00 0.08 0.00 0.03 0.00 0.008 0.00 0.02 0.00 0.22 0.24 0.00 0.00 0.00 0.35 0.00 0.07 0.00 0.06 0.01 0.02 0.009 0.00 0.01 0.01 0.00 0.00 0.23 0.00 0.47 0.00 0.00 0.00 0.20 0.06 0.00 0.00 0.0010 0.00 0.00 0.03 0.00 0.15 0.00 0.34 0.00 0.00 0.36 0.00 0.00 0.00 0.09 0.01 0.0111 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.54 0.00 0.16 0.00 0.26 0.00 0.00 0.0012 0.00 0.01 0.00 0.05 0.00 0.11 0.00 0.20 0.00 0.21 0.00 0.30 0.00 0.08 0.02 0.0013 0.00 0.00 0.01 0.00 0.03 0.00 0.12 0.03 0.19 0.00 0.16 0.00 0.31 0.06 0.07 0.0214 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.10 0.00 0.20 0.00 0.25 0.08 0.17 0.11 0.0415 0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.00 0.05 0.00 0.08 0.16 0.23 0.20 0.17 0.0816 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.01 0.03 0.07 0.00 0.02 0.13 0.26 0.28 0.16

Page 97: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-The decision tree of linguistic-break model

Page 98: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Experimental Results-break labeling example

Page 99: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Summary In base LPM

The prosodic state was introduced to replace conventional high level linguistic information so as to decompose the affections of low-level and high-level linguistic features on speech

Effectiveness on isolating several main factors Greatly reducing the variance of the modeled du

ration/pitch The estimated companding factors conformed

well to the prior linguistic knowledge The prosodic-state labels produced are linguisti

cally meaningful

Page 100: LATENT PROSODY MODELS OF CONTINUOUS MANDARIN SPEECH Speech Lab., CM, NCTU Chen Yu Chiang 2007/2/8

Summary In Automatic Prosody Labeling

We propose a new automatic prosody labeling algorithm based on base LPM

We treat both break type and prosodic state as latent variables

The premiere experimental results are both linguistically and acoustically meaningful

Further discussion for each models is needed