using paragraph- and discourse-based prosodic cues to improve speech synthesis expressiveness -...

Post on 11-Apr-2017

52 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Using Paragraph- and Discourse-based Prosodic Cues to Improve Speech Synthesis Expressiveness

Mireia FarrúsAI With the Best, 25/09/2016

AI With the Best, 25/09/2016 2

Outline

Over the last decade, automatically generated speech has significantly improved in terms of voice quality and expressiveness. However, multi-sentential synthesized speech still suffers from a high degree of unnaturalness.

AI With the Best, 25/09/2016 3

Outline

To overcome it, a more paragraph and communicative structure aware approach is needed to make real improvements in speech synthesis

AI With the Best, 25/09/2016 4

Text-to-Speech (TTS) Systems

AI With the Best, 25/09/2016 5

TTS systems

AI With the Best, 25/09/2016 6

TTS systems

AI With the Best, 25/09/2016 7

Context: current TTS systems

• Preceding and following phonemes• Position of segment in syllable• Position of syllable in word & phrase• Position of word in phrase• Stress/accent/length features of

current/preceding/following syllables• Distance from stressed/accented syllables

AI With the Best, 25/09/2016 8

Context: current TTS systems

• POS of current/preceding/following word• Length of current/preceding/following

phrase• End tone of phrase• Lenght of utterance measured in

syllables/words/phrases(King, 2010)

AI With the Best, 25/09/2016 9

BUT human speech also relies on…

• Paragraph structure• Communicative structure• Discourse structure

AI With the Best, 25/09/2016 10

Paragraph structure

• “Paragraph-based Prosodic Cues for Speech Synthesis Applications”.Mireia Farrús, Catherine Lai, Johanna D. Moore

AI With the Best, 25/09/2016 11

Paragraph structure

AI With the Best, 25/09/2016 12

Paragraph structure

AI With the Best, 25/09/2016 13

AI With the Best, 25/09/2016

Prosody & Pragraph Structure

• ~ 1400 TED talks

14

AI With the Best, 25/09/2016 15

AI With the Best, 25/09/2016 16

AI With the Best, 25/09/2016 17

• There is clear evidence of prosodic resets over paragraph breaks• We can also observe a steady declination in prosodic level over the paragraph• Difference features are more discriminative of boundaries than sentence-based features• Paragraphs have an identifiable suprasentential prosodic structure that can be described in terms of relative changes in F0, intensity, and timing• The classification experiments support the idea that utterance intrinsic features to paragraph position exist• Pause duration is the most robust predictor of paragraph breaks We should be able to employ paragraph declination, pause and prosodic reset features to improve the naturalness of longer synthesized speech

Conclusions

Paragraph structure

AI With the Best, 25/09/2016 18

Information/Communicative structure

• “The Information Structure-Prosody Language Interface Revisited”.Mónica Domínguez, Mireia Farrús, Alicia Burga, Leo Wanner

AI With the Best, 25/09/2016 19

Theoretical background - Motivation

• Influence of information structure on intonation

• Steedman’s theory relating– Theme/rheme– Intonation patterns

AI With the Best, 25/09/2016 20

Theoretical background - Handicaps

• Based on short sentences with a simple structure and a default word order (SVO for English)

• What if we have…

AI With the Best, 25/09/2016 21

ToBI labels

Tones and Break Indices• high (H) and low (L) tones• pitch accents (the L* tones)• bitonal pitch accents (L+H*, etc.)• phrase accents (H- and L- tones)• boundary tones (H% and L%)

AI With the Best, 25/09/2016 22

Theoretical background – Our work

• “The Information Structure-Prosody Language Interface Revisited”.Mónica Domínguez, Mireia Farrús, Alicia Burga, Leo Wanner

• Objectives– Validate Steedman’s theory– Proposal for more complex syntactic structures

AI With the Best, 25/09/2016 23

Theoretical background - Mel’čuk

Steedman• Linearity• Intonation ~ theme/rheme

Mel’čuk• Hierarchy• Intonation ~ Thematicity

– theme/rheme– specifiers– embeddedness

AI With the Best, 25/09/2016 24

Preliminary experiments

• Wall Street Journal corpus (Penn Treebank)• American English recordings• Native speakers• 109 sentences• AuToBI labelling + reduction model• Manual annotation of Thematicity

AI With the Best, 25/09/2016 25

Validating the classic interface

• To what extent the classic approaches can be applied to general discourse with more complex sentences?

• Examples matching the expected THEME patterns…

… but not the expected RHEMES.

AI With the Best, 25/09/2016 26

Validating the classic interface

• We have found that…

– Themes usually match, although ~40% do not.– Steedman’s approach to include everything –apart

from theme – into a flat rheme span lacks accuracy.

• We need a more accurate IS—prosody interface.

AI With the Best, 25/09/2016 27

Towards a more accurate IS-Prosody interface

• Our hypothesis:– Applying Mel’čuk’s hierarchical three-partite

thematicity structure, we will be able to:• Propose a more accurate modelisation of the

intonation-thematicity correlation for the ~40% non-coincident patterns in theme spans.• Find a justification for the discrepancies observed in the

rheme patterns.

AI With the Best, 25/09/2016 28

Towards a more accurate IS-Prosody interface

• SpecifierExample with the annotation suggested by Mel’čuk (1)

AI With the Best, 25/09/2016 29

Towards a more accurate IS-Prosody interface

• SpecifierExample with the annotation suggested by Mel’čuk (2)

AI With the Best, 25/09/2016 30

Towards a more accurate IS-Prosody interface

• HierarchyExample with the annotation suggested by Mel’čuk

rising pattern ↔ theme

Embedded themes behave as main themes in terms of intonation.

AI With the Best, 25/09/2016 31

Classification experiments

• Combining Acoustic and Linguistic Levels in Phrase-Oriented Prosody Modelling

AI With the Best, 25/09/2016 32

Classification experiments

• Testing acoustic parameters

AI With the Best, 25/09/2016 33

Classification experiments

• Testing linguistic features

AI With the Best, 25/09/2016 34

Conclusions

• Information Structure determines the “communicative” segmentation of the meaning of an utterance.

• Central to the semantics—syntax—intonation interface, and to NLP.

AI With the Best, 25/09/2016 35

Conclusions

• Descriptive study attempting to determine which intonation patterns better characterize thematicity in real utterances.

• Flat theme/rheme interpretation prevailing in classical approaches fails to explain complex linguistic structures.

• Hierarchical structures and the specifiers render positive results.

AI With the Best, 25/09/2016 36

Prosody & discourse structure

• Rhetorical Structure Theory (RST)(Mann & Thompson, 1988)

Describes organization structure of texts via definitions of relations between two text span, nucleous (N) and satellite (S)

AI With the Best, 25/09/2016 37

Conclusions

• Prosody prediction from:• Type of sentence• Discourse structure• Discourse markers• Information structure

… to improve expressiveness and naturalness of automatically generated speech

AI With the Best, 25/09/2016 38

Thank you for your attention!

top related