from sounds to language lecture 2 spoken language processing prof. andrew rosenberg

Download From Sounds to Language Lecture 2 Spoken Language Processing Prof. Andrew Rosenberg

If you can't read please download the document

Upload: anissa-harmon

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Slide 2
  • From Sounds to Language Lecture 2 Spoken Language Processing Prof. Andrew Rosenberg
  • Slide 3
  • Linguistic sounds How does a sound wave become language? Sounds are continuous wave forms. Linguistic units are categorical. How is the human perceptual system able to categorize and combine linguistic sounds into language? 1
  • Slide 4
  • Studying Speech Who studies speech? Linguists (phoneticians, phonologists, forensic linguists) Speech Engineers Speech recognition Speech synthesis etc. Speech Pathologists Language Instructors Singers Marketing experts 2
  • Slide 5
  • Marketing experts? 3
  • Slide 6
  • Studying speech Major questions in studying speech. What is the sound inventory of a language? Which variations are linguistically relevant? R/L in Asian Languages P/P h in English How are speech sounds produced? What sounds are shared by two languages, and which are not? How do sounds vary in context? Green banana vs. Greem banana 4
  • Slide 7
  • Representing speech sounds Why are representations important? translation between sounds and words ASR and TTS Learning pronunciation Having a shared vocabulary to discuss language. How should we represent speech sounds? Orthography? Special symbols? Abstract classes based on sound and/or articulatory similarities 5
  • Slide 8
  • Using orthography to represent sounsd A single orthographic letter is realized in many different ways (in English) bcomb, tomb, bomb ccourt, center, chess oofood, good, blood sreason, sunrise, shy, collision 6
  • Slide 9
  • Using orthography to represent sounsd A single sound can be written in many different ways (in English) [i]sea, see, scene, receive, thief, miss [s]cereal, same, miss [u]true, few, choose, lieu, do [ay]lie, prime, pry, buy, How is orthography looking as a choice in English? 7
  • Slide 10
  • Phonetic Symbol Sets International Phonetic Alphabet (IPA) Single (unique) character for each sound Represents all sounds of the worlds languages, but is large, and requires a special (non-ascii) font. ARPAbet, TIMIT, etc. Multiple characters for each sound Language specific. A new symbol set is required for each language. 8
  • Slide 11
  • 9 Exercise: Write your full name in English orthography and in ARPAbet.
  • Slide 12
  • Sound categories Phone: Basic speech sound of a language A minimal sound difference between two words too vs. zoo Not every sound made by a human speaker is phonetic Sniffs, laughs, coughs, breaths Phoneme: Class of speech sounds Phoneme may include several phones /t/ in top, stop, little, butter, winter Allophone: the set of phonetic variants that comprise a phoneme. {[t], [ ], } 10
  • Slide 13
  • Speech Production The articulatory organs General Process: Air is expelled from the lungs through the windpipe (trachea) leaving via the mouth (and nose) Air passes through the trachea through the larynx which contains the vocal folds the space between them is the glottis. When vocal folds vibrate, voiced sounds are produced, otherwise, voiceless (e.g. [f] vs [v]) 11
  • Slide 14
  • Vocal Fold Vibration 12 Slow motion video of normal vocal folds
  • Slide 15
  • Articulators Why did Ken set the net on the soggy deck? Queens University ATR Labs X-ray Film Database http://psyc.queensu.ca/~munhallk/05_database.htm 13
  • Slide 16
  • Vocal Organs 14
  • Slide 17
  • Recording Articulatory Data X-Ray Microbeam Database Track motion of small gold pellets on the tongue, jaw, lips and soft pallate Electroglottography Run a high freq current through the glottal area of a speaker. There is lower resistance when the vocal folds are closed. Electromagnetic articulography (EMMA) 3 transmitters on a helmet allow for triangulation of 5-15 sensor positions 15
  • Slide 18
  • Classes of Sounds Consonants and Vowels Consonants: Restricted or blocked airflow (e.g. [s]) Voiced or unvoiced Vowels Unrestricted airflow voiced Semi vowels (approximants): [w], [y] 16
  • Slide 19
  • Consonants: Place of Articulation What is the point of maximum air restriction? Labial: bilabial [b], [p]; labiodental [v], [f] Dental: [ ], [ ] thief vs. them Alveolar: [t], [d], [s], [z] Palatal: [ ], [t ] shrimp vs. chimp Velar: [k], [g] Glottal: [?] glottal stop 17
  • Slide 20
  • Consonants: Place of Articulation What is the point of maximum air restriction? Approximant: [w], [y] 2 articulators come close but dont restrict much Somewhere between vowels and consonants lateral: [l] Tap or flap: [ ] e.g. butter 18
  • Slide 21
  • Places of Articulation 19 http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html labial dental alveolar post-alveolar/palatal velar uvular pharyngeal laryngeal/glottal
  • Slide 22
  • Consonants: Manner of articulation How is the airflow restricted Stop (or plosive): [p], [t], [g], Airflow is completely blocked (closure) and released (release) Glottal stop, e.g. before word-initial vowels in English after a pause. three even Nasal: air is released through the nose [m], [ng] Frivative: [s], [z], [f] air is forced through a narrow channel, leading to turbulent airflow Affricates: [t ] begin as stops, but the release is frivative 20
  • Slide 23
  • Articulation map 21 PLACE OF ARTICULATION bilabiallabio- dental inter- dental alveolarpalatalvelarglottal stop p b t d k g q fric. f vthdh s zshzh h affric.chjh nasal m nng appr ox wl/r y flapdx VOICING: voicelessvoiced MANNER OF ARTICULATION
  • Slide 24
  • Vowels All voiced Vowel height How high is the tongue? High or low? Where is its highest point? Front or back? How rounded are the lips? mono- [eh] vs. dipthong [ey] 1 vowel sound vs. two 22
  • Slide 25
  • American English Vowel Space 23 FRONTBACK HIGH LOW ey ow aw oy ay iy ih eh ae aa ao uw uh ah ax ixux
  • Slide 26
  • Compare to vowel spaces in other languages British English Indian English Swedish Spanish Mandarin Chinese Japanese 24
  • Slide 27
  • [iy] vs [uw] key vs coo 25 (From a lecture given by Rochelle Newman)
  • Slide 28
  • [ae] vs [aa] cat vs. cot 26 (From a lecture given by Rochelle Newman)
  • Slide 29
  • Acoustic Landmarks 27 [ix] [ih] [ax][ae][iy] [ae][l][p][t][p][t] [p][t] [sh][s] Patricia and Patsy and Sally
  • Slide 30
  • Coarticulation The same phone can be produced differently depending on phonetic context. Articulations overlap as articulators move in different timing patterns to to produce consecutive dounsounds Eight vs. Eighth Articulation moves forward Met vs. Men Vowel becomes nasalized Green Banana or greem banana? 28
  • Slide 31
  • Articulator mistiming Probably is canonically [p r aa b ax b l iy] [p r aa b iy] [p r aw l uh] [p r ah b iy] [p r aa l iy] Sense is canonically [s eh n s] [s eh n t s] [s ih t s] 29
  • Slide 32
  • IPA Consonants 30
  • Slide 33
  • IPA Vowels 31
  • Slide 34
  • Representations for Sounds With ways to represent sounds (IPA, Arpabet, etc.) we can classify and manipulate these units. Automatic Speech Recognition Speech synthesis Speech pathology Language ID Speaker ID Buthow do we recognize these different sounds automatically from sound data? Acoustic analysis (digital signal processing) 32
  • Slide 35
  • Next Class Overview of Spoken Dialog Systems Readings: J&M 24.1, 24.2 33