ling 438/538 computational linguistics sandiway fong lecture 8: 9/29

18
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29

Post on 18-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

LING 438/538Computational Linguistics

Sandiway Fong

Lecture 8: 9/29

Administrivia

• reminder– homework 2 due tonight

Last Time

• regular grammars – aka Chomsky hierarchy type-3 grammars

– are formal grammars with severe restrictions on what can appear on the RHS

– are limited in generative capacity or power

– in Prolog DCG notation:• x --> y, [t]. x --> [t]. (left recursive variant) or• x --> [t],y. x --> [t]. (right recursive variant)

– can’t have both left and right recursive rules in the same grammar

Last Time

• regular grammars

• examples • regular languages

– “one or more a’s followed by one or more b’s”

– sheeptalk {ba!, baa!, baaa!, ...}

• i.e.– can be encoded by a

regular grammar

• beyond regular grammars

• examples – anbn =

– {ab, aabb, aaabbb, ... }

– wwR: where w {a,b}+

– i.e. any non-empty sequence of a’s and b’s

informal idea about the crucial difference“needing to keep track of history”

Today’s Topic

• Finite State Automata– plus more on what it means to be a regular language

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Merge Point–Textbook

– Chapter 2: Regular Expressions and Automata

+ left & right recursive rules

Today’s Topic

• Finite State Automata– plus more on what it means to be a regular language

formally equivalent– in terms of generative capacity or power

Regular Grammars

FSA Regular Expressions

Regular Languages

Some Regular Expression Notation

• ... some notation first (more on regexps next time)• Regular Expressions (regexp)

shorthand for describing sets of strings• Operators:

– string+ • set of one or more occurrences of string• a+ = {a, aa, aaa, aaaa, aaaaa, …}• (abc)+ = {abc, abcabc, abcabcabc, …}

– Note:• parentheses used to delimit the scope of the operator

– string* • set of zero or more occurrences of string• a* = {, a, aa, aaa, aaaa, …}• (abc)* = {, abc, abcabc, …}

– Note: - zero length string

Some Regular Expression Notation

• ... some notation first

• Relation between * and +– a a* = a+ – “a concatenated with a*”– a {, a, aa, aaa, aaaa, …}

= {a , aa, aaa, aaaa, aaaaa, …}

• Operators: – stringn

• exactly n occurrences of string• a4 b3 = { aaaabbb }

• Language = a set of strings

Regular Expressions

• regular expressions – formally equivalent to regular grammars and finite state

automata• How to show this?• Proof by construction…

• beyond regular expressions– examples

• {anbn | n>0} is not regular• {wwR | w {a,b}+ } is not regular, e.g. (abc)R = cba

– How to show this?– Proof by Pumping Lemma

Regular Grammars

FSA Regular Expressions

Regular Expressions

• Example:

– Language: L = {a+b+}

“one or more a’s followed by one or more b’s”

• regular language

– described by a regular expression

• Note: – infinite set of strings belonging to language L

» e.g. abbb, aaaab, aabb, *abab, *• Notation:

is the empty string (or string with zero length)– * means string is not in the language

regular grammars --> [a],b.b --> [a],b.b --> [b],c.b --> [b].c --> [b],c.c --> [b].

Finite State Automata (FSA)

s x

y

aa

b

b

L = {aL = {a++bb++}}

L = {aaL = {aa**bbbb**}}

deterministic FSA (DFSA)no ambiguity about where to go at any given state

non-deterministic FSA (NDFSA)no restriction on ambiguity (surprisingly, no increase in power)

Finite State Automata (FSA)

• more formally– (Q,s,f,Σ,)1. set of states (Q): {s,x,y} must be a finite set2. start state (s): s3. end state(s) (f): y

4. alphabet (Σ): {a, b}5. transition function :

signature: character × state → state1. (a,s)=x2. (a,x)=x3. (b,x)=y4. (b,y)=y

s x

y

aa

b

b

Finite State Automata (FSA)• practical applications

•can be encoded and run efficiently on a computer•widely used

–encode regular expressions–compress large dictionaries–morphological analyzers

•Different word forms, e.g. want, wanted, unwanted (suffixation/prefixation)•see chapter 3 of textbook

•speech recognizers • Markov models = FSA + probabilities

•and many more …

Finite State Automata (FSA)

how: 3 vs. 6 keystrokesmichael: 7 vs. 15 keystrokes

– T9 text entry (tegic.com)• built in to your cellphone• predictive text entry for mobile messaging/data

entry• reduces the number of keystrokes for inputting

words on a telephone keypad (8 keys)

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

RegExp FSA

• From Regular Expression to FSA• Operators

– a single symbol a– an n occurrences of a

– a

– an a3

a

a a a

RegExp FSA

• Operators

– a* zero or more occurrences of a– a+ one or more occurrences of a

– a*

– a+ a+ = aa*

a

aa

Regular Grammar FSA

• examples– s --> [a], t.

– x --> [a], x.

– x --> [a].

a

s t

a

x

a

x

final state

y

Next Time

• Prolog and FSA