ling 438/538 computational linguistics sandiway fong lecture 8: 9/29
Post on 18-Dec-2015
216 views
TRANSCRIPT
Last Time
• regular grammars – aka Chomsky hierarchy type-3 grammars
– are formal grammars with severe restrictions on what can appear on the RHS
– are limited in generative capacity or power
– in Prolog DCG notation:• x --> y, [t]. x --> [t]. (left recursive variant) or• x --> [t],y. x --> [t]. (right recursive variant)
– can’t have both left and right recursive rules in the same grammar
Last Time
• regular grammars
• examples • regular languages
– “one or more a’s followed by one or more b’s”
– sheeptalk {ba!, baa!, baaa!, ...}
• i.e.– can be encoded by a
regular grammar
• beyond regular grammars
• examples – anbn =
– {ab, aabb, aaabbb, ... }
– wwR: where w {a,b}+
– i.e. any non-empty sequence of a’s and b’s
informal idea about the crucial difference“needing to keep track of history”
Today’s Topic
• Finite State Automata– plus more on what it means to be a regular language
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Merge Point–Textbook
– Chapter 2: Regular Expressions and Automata
+ left & right recursive rules
Today’s Topic
• Finite State Automata– plus more on what it means to be a regular language
formally equivalent– in terms of generative capacity or power
Regular Grammars
FSA Regular Expressions
Regular Languages
Some Regular Expression Notation
• ... some notation first (more on regexps next time)• Regular Expressions (regexp)
shorthand for describing sets of strings• Operators:
– string+ • set of one or more occurrences of string• a+ = {a, aa, aaa, aaaa, aaaaa, …}• (abc)+ = {abc, abcabc, abcabcabc, …}
– Note:• parentheses used to delimit the scope of the operator
– string* • set of zero or more occurrences of string• a* = {, a, aa, aaa, aaaa, …}• (abc)* = {, abc, abcabc, …}
– Note: - zero length string
Some Regular Expression Notation
• ... some notation first
• Relation between * and +– a a* = a+ – “a concatenated with a*”– a {, a, aa, aaa, aaaa, …}
= {a , aa, aaa, aaaa, aaaaa, …}
• Operators: – stringn
• exactly n occurrences of string• a4 b3 = { aaaabbb }
• Language = a set of strings
Regular Expressions
• regular expressions – formally equivalent to regular grammars and finite state
automata• How to show this?• Proof by construction…
• beyond regular expressions– examples
• {anbn | n>0} is not regular• {wwR | w {a,b}+ } is not regular, e.g. (abc)R = cba
– How to show this?– Proof by Pumping Lemma
Regular Grammars
FSA Regular Expressions
Regular Expressions
• Example:
– Language: L = {a+b+}
“one or more a’s followed by one or more b’s”
• regular language
– described by a regular expression
• Note: – infinite set of strings belonging to language L
» e.g. abbb, aaaab, aabb, *abab, *• Notation:
is the empty string (or string with zero length)– * means string is not in the language
regular grammars --> [a],b.b --> [a],b.b --> [b],c.b --> [b].c --> [b],c.c --> [b].
Finite State Automata (FSA)
s x
y
aa
b
b
L = {aL = {a++bb++}}
L = {aaL = {aa**bbbb**}}
deterministic FSA (DFSA)no ambiguity about where to go at any given state
non-deterministic FSA (NDFSA)no restriction on ambiguity (surprisingly, no increase in power)
Finite State Automata (FSA)
• more formally– (Q,s,f,Σ,)1. set of states (Q): {s,x,y} must be a finite set2. start state (s): s3. end state(s) (f): y
4. alphabet (Σ): {a, b}5. transition function :
signature: character × state → state1. (a,s)=x2. (a,x)=x3. (b,x)=y4. (b,y)=y
s x
y
aa
b
b
Finite State Automata (FSA)• practical applications
•can be encoded and run efficiently on a computer•widely used
–encode regular expressions–compress large dictionaries–morphological analyzers
•Different word forms, e.g. want, wanted, unwanted (suffixation/prefixation)•see chapter 3 of textbook
•speech recognizers • Markov models = FSA + probabilities
•and many more …
Finite State Automata (FSA)
how: 3 vs. 6 keystrokesmichael: 7 vs. 15 keystrokes
– T9 text entry (tegic.com)• built in to your cellphone• predictive text entry for mobile messaging/data
entry• reduces the number of keystrokes for inputting
words on a telephone keypad (8 keys)
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
RegExp FSA
• From Regular Expression to FSA• Operators
– a single symbol a– an n occurrences of a
– a
– an a3
a
a a a
RegExp FSA
• Operators
– a* zero or more occurrences of a– a+ one or more occurrences of a
– a*
– a+ a+ = aa*
a
aa
Regular Grammar FSA
• examples– s --> [a], t.
– x --> [a], x.
– x --> [a].
a
s t
a
x
a
x
final state
y