74.419 artificial intelligence
DESCRIPTION
74.419 Artificial Intelligence. Speech and Natural Language Processing. Speech and Natural Language Processing. Communication Natural Language Syntax Semantics Pragmatics Speech. Evolution of Human Language. communication for "work" social interaction basis of cognition and thinking - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/1.jpg)
74.419 Artificial Intelligence
Speech and Natural Language Processing
![Page 2: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/2.jpg)
Speech and Natural Language Processing
• Communication
• Natural Language• Syntax• Semantics• Pragmatics
• Speech
![Page 3: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/3.jpg)
Evolution of Human Language
communication for "work" social interaction basis of cognition and thinking
(Whorff & Saphir)
![Page 4: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/4.jpg)
Communication
"Communication is the intentional exchange of information brought about by the production and perception of signs drawn from a shared system of conventional signs."
[Russell & Norvig, p.651]
![Page 5: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/5.jpg)
Natural Language - General
Natural Language is characterized by a common or shared set of signs
alphabeth; lexicon a systematic procedure to produce
combinations of signs syntax
a shared meaning of signs and combinations of signs (constructive) semantics
![Page 6: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/6.jpg)
Speech and Natural Language
Speech Recognition acoustic signal as input conversion into phonemes and written words
Natural Language Processing written text as input; sentences (or 'utterances') syntactic analysis: parsing; grammar semantic analysis: "meaning", semantic representation pragmatics; dialogue; discourse
Spoken Language Processing transcribed utterances Phenomena of spontaneous speech
![Page 7: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/7.jpg)
Phoneme Recognition:HMM, Neural Networks
Phonemes
Acoustic / sound waveFiltering, FFT; Spectral Analysis
Frequency Spectrum
Features (Phonemes; Context)
Grammar or Statistics Phoneme Sequences / Words
Grammar or Statistics for likely word sequences
Word Sequence / Sentence
Speech Recognition
Signal Processing / Analysis
![Page 8: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/8.jpg)
Areas in Natural Language Processing
Morphology (word stem + ending) Syntax, Grammar & Parsing (syntactic description
& analysis) Semantics & Pragmatics (meaning; constructive;
context-dependent; references; ambiguity) Pragmatic Theory of Language; Intentions;
Metaphor (Communication as Action) Discourse / Dialogue / Text Spoken Language Understanding Language Learning
![Page 9: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/9.jpg)
MorphologicalAnalyzer
Lexicon
Part-of-Speech(POS)
Tagging
GrammarRules
Parser
the the – determiner Det NP → Det Noun NP recognized NP
Det Noun
parse treeLinguistic Background Knowledge
NLP Syntax Analysis - Processes
![Page 10: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/10.jpg)
MorphologicalAnalyzer
Lexicon
Part-of-Speech(POS)
Tagging
GrammarRules
Parser
NLP - Syntactic Analysis
eat + s eat – verb Verb VP → Verb Noun VP recognized
3rd sing VP
Verb Noun
parse tree
![Page 11: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/11.jpg)
Morphology
A morphological analyzer determines (at least) the stem + ending of a word,
and usually delivers related information, like the word class, the number, the person and the case of the word.
The morphology can be part of the lexicon or implemented as a single component, for example as a rule-based system.
eats eat + s verb, singular, 3rd pers
dog dog noun, singular
![Page 12: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/12.jpg)
Lexicon
The Lexicon contains information on words, as inflected forms (e.g. goes, eats) or word-stems (e.g. go, eat).
The Lexicon usually assigns a syntactic category, the word class or Part-of-Speech category
Sometimes also further syntactic information (see Morphology); semantic information (e.g. agent); syntactic-semantic information (e.g. verb complements like: 'give' requires a direct object).
![Page 13: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/13.jpg)
Lexicon
Example contents:
eats verb; singular, 3rd person (-s);
can have direct object
(verb subcategorization)
dog dog, noun, singular;
animal
(semantic annotation)
![Page 14: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/14.jpg)
POS (Part-of-Speech) Tagging
POS Tagging determines the word class or ‘part-of-speech’ category (basic syntactic categories) of single words or word-stems.
The det (determiner)
dog noun
eats verb (3rd person; singular)
the det
bone noun
![Page 15: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/15.jpg)
Open Word Class: Nouns
Nouns denote objects, concepts, …
Proper NounsNames for specific individual objects, entitiese.g. the Eiffel Tower, Dr. Kemke
Common NounsNames for categories or classes or abstractse.g. fruit, banana, table, freedom, sleep, ...
Count Nounsenumerable entities, e.g. two bananas
Mass Nounsnot countable items, e.g. water, salt, freedom
![Page 16: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/16.jpg)
Open Word Class: Verbs
Verbs denote actions, processes, states
e.g. smoke, dream, rest, run
Several morphological forms e.g.
non-3rd person - eat
3rd person - eats
progressive/ - eating present participle/ gerundive
past participle - eaten
Auxiliaries, e.g. be, as sub-class of verbs
![Page 17: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/17.jpg)
Open Word Class: Adjectives
Adjectives denote qualities or properties of objects, e.g. heavy, blue, content
most languages have concepts for
colour - white, green, ...
age - young, old, ...
value - good, bad, ...
not all languages have adjectives as separate class
![Page 18: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/18.jpg)
Open Word Class: Adverbs
Adverbsdenote modifications of actions (verbs), qualities (adjectives) e.g. walk slowly, heavily drunk
Directional or Locational AdverbsSpecify direction or location e.g. go home, stay here
Degree AdverbsSpecify extent of process, action, property e.g. extremely slow, very modest
![Page 19: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/19.jpg)
Open Word Class: Adverbs 2
Manner AdverbsSpecify manner of action or process e.g. walk slowly, run fast
Temporal AdverbsSpecify time of event or action e.g. yesterday, Monday
![Page 20: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/20.jpg)
Closed Word Classes
prepositions: on, under, over, at, from, to, with, ...
determiners: a, an, the, ...
pronouns: he, she, it, his, her, who, I, ...
conjunctions: and, or, as, if, when, ...
auxiliary verbs: can, may, should, are
particles: up, down, on, off, in, out,
numerals: one, two, three, ..., first, second, ...
![Page 21: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/21.jpg)
Language and Grammar
Natural Language described as Formal Language L using a Formal Grammar G:
• start-symbol S ≡ sentence• non-terminals NT ≡ syntactic constituents• terminals T ≡ lexical entries/ words• production rules P ≡ grammar rules
Generate sentences or recognize sentences (Parsing) of the language L through the application of grammar rules.
![Page 22: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/22.jpg)
Grammar
Here, POS Tags are included in the grammar rules.
det the
noun dog | bone
verb eat
NP det noun (NP noun phrase)
VP verb (VP verb phrase)
VP verb NP
S NP VP (S sentence)
Most often we deal with Context-free Grammars, with a distinguished Start-symbol S (sentence).
![Page 23: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/23.jpg)
Parsing
Parsing derive the syntactic structure of a sentence
based on a language model (grammar) construct a parse tree, i.e. the derivation of
the sentence based on the grammar (rewrite system)
![Page 24: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/24.jpg)
Parsing (here: bottom-up)
determine the syntactic structure of the sentence
the det
dog noun
det noun NP
eats verb
the det
bone noun
det noun NP
verb NP VP
NP VP S
![Page 25: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/25.jpg)
Sample GrammarGrammar (S, NT, T, P) - NT Non-Terminal; T Terminals; P
Productions
Sentence Symbol S NT Word-Classes / Part-of-Speech NT syntactic Constituents NT terminal words NT
Grammar Rules P NT (NT T)*
S → NP VP | Aux NP VPNP → Det Nominal | Proper-Noun Nominal → Noun | Nominal PPVP → Verb | Verb NP | Verb PP | Verb NP PP PP → Prep NP
Det → that | this | aNoun → book | flight | meal | moneyProper-Noun → Houston | American Airlines | TWAVerb → book | include | preferPrep → from | to | onAuc → do | does
![Page 26: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/26.jpg)
Parse "Does this flight include a meal?"
S
Aux NP VP
Det Nominal Verb NP
Noun Det Nominal
does this flight include a meal
Sample Parse Tree
![Page 27: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/27.jpg)
Bottom-up – from word-nodes to sentence-symbol Top-down Parsing – from sentence-symbol to words
S
Aux NP VP
Det Nominal Verb NP
Noun Det Nominal
does this flight include a meal
Bottom-up vs. Top-Down Parsing
![Page 28: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/28.jpg)
Ambiguity
“One morning, I shot an elephant in my pajamas. How he got into my pajamas, I don’t know.”
Groucho Marx
syntactical or structural ambiguity – several parse trees example: above sentence
semantic or lexical ambiguity – several word meanings
bank (where you get money) and (river) bank
even different word categories possible (interim)
He books the flight. vs. The books are here.
Fruit flies from the balcony vs. Fruit flies are on the balcony.
![Page 29: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/29.jpg)
Lexical Ambiguity
Several word senses or word categories
e.g. chase – noun or verb
e.g. plant - ????
![Page 30: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/30.jpg)
Syntactic Ambiguity
Several parse trees
e.g. “The dog eats the bone in the park.”
e.g. “The dog eats the bone in the package.”
Who/what is in the park and who/what is in the package?
Syntactically speaking: How do I bind the Prepositional Phrase "in the ... " ?
![Page 31: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/31.jpg)
Problems in Parsing
Problems with left-recursive rules like NP → NP PP: don’t know how many times recursion is needed.
Pure Bottom-up or Top-down Parsing is inefficient because it generates and explores too many structures which in the end turn out to be invalid.
Combine top-down and bottom-up approach:Start with sentence; use rules top-down (look-ahead); read input; try to find shortest path from input to highest unparsed constituent (from left to right).
→ Chart-Parsing / Earley-Parser
![Page 32: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/32.jpg)
Chart-Parsing / Early Algorithm
Essence: Integrate top-down and bottom-up parsing. Keep recognized sub-structures (sub-trees) for shared use
during parsing.
Top-down Prediction: Start with S-symbol. Generate all applicable rules for S. Go further down with left-most constituent in rules and add rules for these constituents until you encounter a left-most node on the RHS which is a word category (POS).
Bottom-up Completion: Read input word and compare. If word matches, mark as recognized and continue the recognition bottom-up, trying to complete active rules.
![Page 33: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/33.jpg)
Earley Algorithm - Functions
predictorgenerates new rules for partly recognized RHS with constituent right of • (top-down generation);• indicates how far a rule has been recognized
scannerif word category (POS) is found right of the • , the Scanner reads the next input word and adds a rule for it to the chart (bottom-up mode)
completerif rule is completely recognized (the • is far right), the recognition state of earlier rules in the chart advances: the • is moved over the recognized constituent (bottom-up recognition).
![Page 34: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/34.jpg)
Chart
VP V NP .
V
Book this flight
S VP .
NP Det Nom .
DetNom Noun .
Noun
![Page 35: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/35.jpg)
![Page 36: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/36.jpg)
Semantics
![Page 37: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/37.jpg)
Semantic Representation
Representation of the meaning of a sentence.Generate a logic-based representation or a frame-based representation
based on the syntactic structure, lexical entries, and particularly the head-verb (determines how to arrange parts of the sentence in the semantic representation).
![Page 38: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/38.jpg)
Semantic Representation
Verb-centered Representation Verb (action, head) is regarded as center of verbal expression and determines the case frame with possible case roles; other parts of the sentence are described in relation to the action as fillers of case slots. (cf. also Schank’s CD Theory)
Typing of case roles possible (e.g. 'agent' refers to a specific sort or concept)
![Page 39: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/39.jpg)
General Frame for "eat"
Agent: animate
Action: eat
Patiens: food
Manner: {e.g. fast}
Location: {e.g. in the yard}
Time: {e.g. at noon}
![Page 40: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/40.jpg)
Example-Frame with Fillers
Agent: the dog
Action: eat
Patiens: the bone / the bone in the package
Location: in the park
![Page 41: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/41.jpg)
General Frame for drive Frame with fillers
Agent: animate Agent: she
Action: drive Action: drives
Patiens: vehicle Patiens: the convertible
Manner:{the way it is done} Manner: fast
Location: Location-specLocation: [in the] Rocky Mountains
Source: Location-spec Source: [from] home
Destination: Location-spec Destination: [to the] ASIC
conference
Time: Time-spec Time: [in the] summer holidays
![Page 42: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/42.jpg)
Representation in Logic
Action: eat
Agent: the dog
Patiens: the bone / the bone in the package
Location: in the park
predicate
constants
eat (dog-1, bone-1, park-1)
![Page 43: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/43.jpg)
Representation in Logic
variables
eat (dog-1, bone-1, park-1)
eat ( x, y, z )
animate-being (x)food (y)location (z)
NP-1 (x)NP-2 (y)PP (z)
eat ( NP-1, NP-2, PP )
general
syntactic
lexical
syntactic framesemantic frame
![Page 44: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/44.jpg)
Pragmatics
![Page 45: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/45.jpg)
Pragmatics
Pragmatics includes context-related aspects of NL expressions (utterances).
These are in particular anaphoric references, elliptic expressions, deictic expressions, …
anaphoric references – refer to items mentioned before
deictic expressions – simulate pointing gestures
elliptic expressions – incomplete expression;
relate to item mentioned before
![Page 46: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/46.jpg)
Pragmatics
“I put the box on the top shelve.”
“I know that. But I can’t find it there.”
elliptic expression
deictic expressionanaphoric reference
“The candy-box?”
![Page 47: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/47.jpg)
Intentions
Intentions
One philosophical assumption is that natural language is used to achieve things or situations: “Do things with words.”
The meaning of an utterance is essentially determined by the intention of the speaker.
![Page 48: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/48.jpg)
Intentionality - Examples
What was said: What was meant:
“There is a terrible "Can you please draft here.” close the window."
“How does it look "I am really mad; here?” clean up your room."
"Will this ever end?" "I would prefer to bewith my friends than to sit in class now."
![Page 49: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/49.jpg)
Metaphors
Metaphors The meaning of a sentence or expression is not directly inferable from the sentence structure and the word meanings. Metaphors transfer concepts and relations from one area of discourse into another area, for example, seeing time as line (in space) or seing friendship or life as a journey.
![Page 50: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/50.jpg)
Metaphors - Examples
“This car eats a lot of gas.”
“She devoured the book.”
“He was tied up with his clients.”
“Marriage is like a journey.”
“Their marriage was a one-way road into hell.”
(see George Lakoff, Women, Fire and Dangerous Things)
![Page 51: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/51.jpg)
Dialogue and Discourse
![Page 52: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/52.jpg)
Discourse / Dialogue Structure
Grammar for various sentence types (speech acts): dialogue, discourse, story grammar
Distinguish questions, commands, and statements: Where is the remote-control? Bring the remote-control! The remote-control is on the brown table.
Dialogue Grammars describe possible sequences of Speech Acts in communication, e.g. that a question is followed by an answer/statement.
![Page 53: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/53.jpg)
Speech
![Page 54: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/54.jpg)
![Page 55: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/55.jpg)
Speech Production & Reception
Sound and Hearing• change in air pressure sound wave• reception through inner ear membrane /
microphone• break-up into frequency components: receptors in
cochlea / mathematical frequency analysis (e.g. Fast-Fourier Transform FFT) Frequency Spectrum
• perception/recognition of phonemes and subsequently words (e.g. Neural Networks, Hidden-Markov Models)
![Page 56: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/56.jpg)
![Page 57: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/57.jpg)
Speech Recognition Phases
Speech Recognition• acoustic signal as input
• signal analysis - spectrogram
• feature extraction
• phoneme recognition
• word recognition
• conversion into written words
![Page 58: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/58.jpg)
Speech Signal
Speech Signal composed of harmonic signal (sinus waves)
with different frequencies and amplitudes frequency - waves/second like pitch amplitude - height of wave like loudness
non-harmonic signal (not sinus wave): noise
![Page 59: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/59.jpg)
![Page 60: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/60.jpg)
glottis and speech signal in lingWAVES (from http://www.lingcom.de)
![Page 61: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/61.jpg)
Speech Signal Analysis
Analog-Digital Conversion of Acoustic SignalSampling in Time Frames (“windows”) frequency = 0-crossings per time frame
e.g. 2 crossings/second is 1 Hz (1 wave) e.g. 10kHz needs sampling rate 20kHz
measure amplitudes of signal in time frame digitized wave form
separate different frequency components FFT (Fast Fourier Transform) spectrogram
other frequency based representations LPC (linear predictive coding), Cepstrum
![Page 62: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/62.jpg)
Waveform
Time
Amplitude/Pressure
"She just had a baby."
![Page 63: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/63.jpg)
Waveform for Vowel ae
Time
Amplitude/Pressure
Time
![Page 64: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/64.jpg)
Waveform and Spectrogram
![Page 65: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/65.jpg)
Waveform and LPC Spectrum for Vowel ae
Energy
Formants
Time
Frequency
Amplitude/Pressure
![Page 66: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/66.jpg)
Phoneme Recognition
Recognition Process based on features extracted from spectral analysis phonological rules statistical properties of language/ pronunciation
Recognition Methods Hidden Markov Models Neural Networks Pattern Classification in general
![Page 67: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/67.jpg)
Speech Signal Characteristics
Derive from signal representation:
formants - dark stripes in spectrumstrong frequency components; characterize particular vowels; gender of speaker
pitch – fundamental frequency baseline for higher frequency harmonics like formants; gender characteristic
change in frequency distributioncharacteristic for e.g. plosives (form of articulation)
![Page 68: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/68.jpg)
Features for Vowels & Consonants
![Page 69: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/69.jpg)
Probabilistic FAs as Word Models
![Page 70: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/70.jpg)
Word Recognition with Hidden Markov Model
![Page 71: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/71.jpg)
Viterbi-Algorithm
The Viterbi Algorithm finds an optimal sequence of states in continuous Speech Recognition, given an observation sequence of phones and a probabilistic (weighted) FA (state graph). The algorithm returns the path through the automaton which has maximum probability and accepts the observation sequence.
a[s,s'] is the transition probability (in the phonetic word model) from current state s to next state s', and b[s',ot] is the observation likelihood of s' given ot. b[s',ot] is 1 if the observation symbol matches the state, and 0 otherwise.
(cf. Jurafsky Ch.5)
![Page 72: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/72.jpg)
Speech Recognizer Architecture
![Page 73: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/73.jpg)
Speech Processing - Characteristics
Speech Recognition vs. Speaker Identification (Voice Recognition)
speaker-dependent vs. speaker-independent training unlimited vs. large vs. small vocabulary single word vs. continuous speech
![Page 74: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/74.jpg)
Spoken Language
![Page 75: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/75.jpg)
Spoken Language
Output of Speech Recognition System as input "text".
Can be associated with probabilities for different word sequences.
Contains ungrammatical structures, so-called "disfluencies", e.g. repetitions and corrections.
![Page 76: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/76.jpg)
![Page 77: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/77.jpg)
Spoken Language - Examples
1. no [s-] straight southwest
2. right to [my] my left
3. [that is] that is correct
From: Robin J. Lickley. HCRC Disfluency Coding Manual. http://www.ling.ed.ac.uk/~robin/maptask/HCRCdsm-01.html
![Page 78: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/78.jpg)
Spoken Language - Examples
1. we're going to [g-- ]... turn straight back around
for testing.
2. [come to] ... walk right to the ... right-hand side of the page.
3. right [up ... past] ... up on the left of the ... white mountain walk ... right up past.
4. [i'm still] ... i've still gone halfway back round the lake again.
![Page 79: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/79.jpg)
Spoken Language - Examples
1. [I’d] [d if] I need to go
2. [it’s basi--] see if you go over the old mill
3. [you are going] make a gradual slope … to your right
4. [I’ve got one] I don’t realize why it is there
![Page 80: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/80.jpg)
Spoken Language - Disfluency
Reparandum and Repair
Reparandum Repair
[come to] ... walk right to [the] ... the right-hand side of the page
![Page 81: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/81.jpg)
Additional References
Jurafsky, D. & J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000
Hong, X. & A. Acero & H. Hon: Spoken Language Processing. A Guide to Theory, Algorithms, and System Development. Prentice-Hall, NJ, 2001
Kemke, C., 74.793 Natural Language and Speech Processing - Course Notes, 2nd Term 2004, Dept. of Computer Science, U. of Manitoba
Robin J. Lickley. HCRC Disfluency Coding Manual. http://www.ling.ed.ac.uk/~robin/maptask/HCRCdsm-01.html
![Page 82: 74.419 Artificial Intelligence](https://reader035.vdocuments.mx/reader035/viewer/2022062410/5681556b550346895dc3366f/html5/thumbnails/82.jpg)
Figures
Figures taken from:
Jurafsky, D. & J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000, Chapters 5 and 7.
lingWAVES (from http://www.lingcom.de