ling 438/538 computational linguistics sandiway fong lecture 25: 11/21

66
LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

Upload: ethan-jacobs

Post on 05-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

LING 438/538Computational Linguistics

Sandiway Fong

Lecture 25: 11/21

Page 2: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

2

Administrivia

• Lecture schedule (from last time)– Tuesday 21st November

• Homework #6: Context-free Grammars and Parsing• due Tuesday 28th

– Thursday 23rd November• Turkey Day

– Tuesday 28th November– Thursday 30th November

• Homework #7: Machine Translation• due December 7th• 538 Presentations

– Tuesday 5th December• Homework #7: Machine Translation• 538 Presentations

Page 3: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

3

Administrivia

• 538 Presentations: assignments

First name Last name CHAPTERAhmed Abbasi 18/19/17Anastasia Gorbunova 20Andrew Lebovitz ?Andrew Glines 17Bojan Durickovic 14Emad Nawfal 15Guoqiang Shan 12Jin Wang ?Jiyoung Kim 11Jon Peoble 13Kara Johnson 4Lindsay Butler 16Mans Hulden resnadia hamrouni 19Shannon Bischoff resTianjun Fu 18

Page 4: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

4

Last Time

• Chapter 10: – Parsing with Context-Free Grammars

• Top-down Parsing– Prolog’s DCG rule system– Left recursion– Left-corner idea

• Bottom-up Parsing– Dotted rules– LR parsing: shift and reduce operations

Page 5: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

5

Bottom-Up Parsing

• LR(0) parsing– An example of bottom-up

tabular parsing– Similar to the top-down

Earley algorithm described in the textbook in that both methods use the idea of dotted rules

– LR is more efficient• it computes the dotted

rules offline (during parser/grammar construction)

• Earley computes the dotted rules at parse time

• LR actions– Shift: read an input word

• i.e. advance current input word pointer to the next word

– Reduce: complete a nonterminal

• i.e. complete parsing a grammar rule

– Accept: complete the parse

• i.e. start symbol (e.g. S) derives the terminal string

Page 6: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

6

Tabular Parsing

• Dotted Rule Notation– “dot” used to indicate the

progress of a parse through a phrase structure rule

– examples• vp --> v . np means we’ve seen v and predict

np

• np --> . d np means we’re predicting a d

(followed by np)

• vp --> vp pp. means we’ve completed a vp

• state– a set of dotted rules

encodes the state of the parse

• kernel• vp --> v . np• vp --> v .

• completion (of predict NP)• np --> . d n• np --> . n• np --> . np cp

Page 7: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

7

Tabular Parsing

• compute possible states by advancing the dot– example: – (Assume d is next in the input)

• vp --> v . np• vp --> v . (eliminated)• np --> d . n• np --> . n (eliminated)• np --> . np cp

Page 8: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

8

Tabular Parsing

• Dotted rules– example

• State 0:– s -> . np vp– np -> .d np– np -> .n– np -> .np pp

– possible actions• shift d and go to new

state• shift n and go to new

state

• Creating new states

S -> . NP VPNP -> . D NNP -> . NNP -> . NP PP

NP -> D . N

NP -> N .

State 0State 2

State 1shift d

shift n

Page 9: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

9

Tabular Parsing• State 1: Shift N, goto State 2

S -> . NP VPNP -> . D NNP -> . NNP -> . NP PP

NP -> D . N

NP -> N .

State 0

State 2

State 1

NP -> D N .

State 3

Page 10: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

10

Tabular Parsing

• Shift– take input word, and– place on stack

[V hit ] … [N man][D a ]

Input

Stack• state 3

S -> . NP VPNP -> . D NNP -> . NNP -> . NP PP

NP -> D . N

NP -> N .

State 0

State 2

State 1

NP -> D N .

State 3

shift d

shift n

Page 11: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

11

Tabular Parsing• State 2: Reduce action NP -> N .

S -> . NP VPNP -> . D NNP -> . NNP -> . NP PP

NP -> D . N

NP -> N .

State 0

State 2

State 1

NP -> D N .

State 3

Page 12: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

12

Tabular Parsing

• Reduce NP -> N .– pop [N milk] off the stack, and

– replace with [NP [N milk]] on stack

[V is ] … [N milk]

Input

Stack• State 2

[NP milk]

Page 13: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

13

Tabular Parsing

• State 3: Reduce NP -> D N .

S -> . NP VPNP -> . D NNP -> . NNP -> . NP PP

NP -> N .

State 0

State 2

NP -> D . N

State 1

NP -> D N .

State 3

Page 14: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

14

Tabular Parsing

• Reduce NP -> D N .– pop [N man] and [D a] off the stack– replace with [NP[D a][N man]]

[V hit ] … [N man][D a ]

Input

Stack• State 3

[NP[D a ][N man]]

Page 15: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

15

Tabular Parsing

• State 0: Transition NP

S -> . NP VPNP -> . D NNP -> . NNP -> . NP PP

NP -> N .

State 0

State 2

S -> NP . VPNP -> NP . PPVP -> . V NPVP -> . VVP -> . VP PPPP -> . P NP

State 4

Page 16: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

16

Tabular Parsing

• for both states 2 and 3– NP -> N . (reduce NP -> N)– NP -> D N . (reduce NP -> D N)

• after Reduce NP operation– Goto state 4

• notes: – states are unique– grammar is finite– procedure generating states must terminate since the

number of possible dotted rules

Page 17: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

17

Tabular Parsing

State Action Goto

0 Shift D

Shift N

1

2

1 Shift N 3

2 Reduce NP -> N 4

3 Reduce NP -> D N

4

4 … …

Page 18: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

18

Tabular Parsing

• Observations• table is sparse

• example• State 0, Input: [V ..]• parse fails immediately

• in a given state, input may be irrelevant• example

• State 2 (there is no shift operation)• there may be action conflicts

• example• State 1: shift D, shift N

• more interesting cases• shift-reduce and reduce-reduce conflicts

Page 19: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

19

Tabular Parsing

• finishing up– an extra initial rule is usually added to the grammar– SS --> S . $

• SS = start symbol• $ = end of sentence marker

– input: • milk is good for you $

– accept action• discard $ from input• return element at the top of stack as the parse tree

Page 20: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

20

LR Parsing in Prolog

• Recap– finite state machine

• each state represents a set of dotted rules– example

» S --> . NP VP» NP --> . D N» NP --> . N» NP --> . NP PP

• we transition, i.e. move, from state to state by advancing the “dot” over terminal and nonterminal symbols

Page 21: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

21

LR Parsing in Prolog

• Plan:– formally describe

a LR finite state machine construction process

– define the parse procedure• parse(Sentence,Tree)

in terms of the LR finite state machine

– run• John saw the man with a telescope• ? - parse([john,saw,the,man,with,a,telescope],T).

which produces two parses (PP-attachment ambiguity)

Page 22: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

22

Grammar

• assume grammar rules and lexicon:– rule(s,[np,vp]). convenient format for the LR(0) generator– rule(np,[d,n]).– rule(np,[n]).– rule(np,[np,pp]).– rule(vp,[v,np]).– rule(vp,[v]).– rule(vp,[vp,pp]).– rule(pp,[p,np]).

– lexicon(the,d). lexicon(a,d).– lexicon(man,n). lexicon(john,n). lexicon(telescope,n).– lexicon(saw,v). lexicon(v,runs).– lexicon(with,p).

Page 23: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

23

Grammar

• extra definitions• :- dynamic rule/2.

• start(ss).

• rule(ss,[s,$]).

• nonT(ss). nonT(s). nonT(np). nonT(vp). nonT(pp).• term(n). term(v). term(p). term(d). term($).

• notes:– $ = end of sentence marker– Prolog programming trick

• declaring rule/2 as dynamic allows us to use the builtin clause(rule(LHS,RHS),true,Ref)

to keep a pointer (Ref) to a particular rule

Page 24: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

24

Grammar Rule Predicates

• define– %%% Assume grammar rules are stored as database facts – %%% rule(LHS,RHS)

– ruleLHS(NonT,Ref) :- clause(rule(NonT,_),true,Ref).– ruleRHS(RHS,Ref) :- clause(rule(_,RHS),true,Ref).

– ruleElements(LHS,RHS,Ref) :- % assume Ref instantiated– clause(rule(LHS,RHS),true,Ref).

• note– Ref (when instantiated) is a pointer to an instance of rule(LHS,RHS).

Page 25: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

25

A Counter in Prolog

• define– stateCounter(N)

to hold the current state number (N = 0,1,2,3…)

• define predicates– resetStateCounter :- – retractall(stateCounter(_)),– assert(stateCounter(0)).

– incStateCounter :-– retract(stateCounter(X)),– Y is X + 1,– assert(stateCounter(Y)).

Prolog builtins used:

retract/1 - removes matching item from the database

retractall/1 - removes all matching items from the database

assert/1 - adds item to the database

Page 26: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

26

Data Structures

• define cfsm/3– cfsm(L,CFSet,N) “state configuration”

• CFSet = list of dotted rules for state N• L = |CFSet| (used for quicker lookup)

• define cf/3– cf(Ref,I) “dotted rule configuration”

• Ref points to a rule(LHS,RHS)• (I = 0,1,2…) is the index of the “dot” in RHS

Page 27: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

27

Build FSA

• initially• R1 = rule(ss,[s,$]).

• ss --> . s $ • cf(R1,0)

• do a closure on the dotted rule, adding• s --> . np vp• np --> . d n• …

SS --> . S $S --> . NP VPNP --> . D NNP --> . NNP --> . NP PP

State 0

Page 28: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

28

Build FSM: Closure Operation

• define– mkStartCF(cf(Ref,0)) :- start(Start),ruleLHS(Start,Ref).

• call– mkStartCF(StartCF),– closure([StartCF],S0),

• define closure/2 recursively– closure(CFSet,CFSet1) :-

• dotNonT(CFSet,NonT),• predict(NonT,CFSet,CFSet2),• closure(CFSet2,CFSet1).

– closure(CFSet,CFSet).

Page 29: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

29

Build FSM: Closure Operation

• define dotNonT/2to pick out possible instances of Y in X --> … .Y …

– dotNonT([cf(Ref,Pos)|_],NonT) :- – dotNonT1(Ref,Pos,NonT).– dotNonT([_|L],NonT) :- dotNonT(L,NonT).

– dotNonT1(Ref,Pos,NonT) :- – ruleRHS(RHS,Ref), nth(Pos,RHS,NonT), nonT(NonT).

• notes– dotNonT/2 works just like list member/2– nth(N,L,X) picks out (N+1)th element (X) in list L

Page 30: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

30

Build FSM: Closure Operation

• define predict/3 to add new dotted rules for NonT– predict(NonT,CFSet,NewCFSet) :-– findall(cf(Ref,0),ruleLHS(NonT,Ref),NewCFs),– merge(NewCFs,CFSet,NewCFSet,[],new).

• define merge/3 to add new dotted rules only if there’re not already present in CFSet

• merge([],L,L,Flag,Flag).• merge([cf(Ref,Pos)|L],CFSet,CFSet1,Flag,Flag1) :- % already present• member(cf(Ref,Pos),CFSet),• merge(L,CFSet,CFSet1,Flag,Flag1).• merge([CF|L],CFSet,CFSet1,_,Flag) :- % CF is new• merge(L,[CF|CFSet],CFSet1,new,Flag).

• note– the variable Flag ([]/new) is used to make sure something has been

added to CFSet

Page 31: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

31

Build FSM: Closure Operation

• call– mkStartCF(StartCF),– closure1([StartCF],S0),– resetStateCounter,– length(S0,L),– cfsmEntry(S0,L),

• define storage predicate cfsmEntry/2– cfsmEntry(CFSet,L) :-– stateCounter(State),– incStateCounter,– asserta(cfsm(L,CFSet,State)).

cfsm(L,CFSet,N) “state configuration”CFSet = list of dotted rules for state NL = |CFSet| (used for quicker lookup)

Page 32: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

32

Build FSM: Build new state

• define buildState/1– buildState(CFSet,S1) :-– transition(CFSet,Symbol,CFSet1),– length(CFSet1,L),– addCFSet(CFSet1,L,S2),– assert(goto(S1,Symbol,S2)),– fail.– buildState(_,_).

• notes– transition/3 produces a new CFSet by advancing the dot over Symbol– addCFSet/3 will add a new state represented by CFSet1 (if it doesn’t

already exist)– State transitions represented by goto(S1,Symbol,S2)

Page 33: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

33

Build FSM: Build new state

• define transition/3– transition(CFSet,Symbol,CFSet1) :-– pickSymbol(CFSet,Symbol),– advanceDot(CFSet,Symbol,CFSet2),– closure(CFSet2,CFSet1).

• Note: pickSymbol/2 picks a symbol next to a dot in a dotted rule in CFSet

• define advanceDot/3– advanceDot([cf(Ref,Pos)|L],Symbol,[cf(Ref,Pos1)|CFSet]) :-– ruleRHS(RHS,Ref), nth(Pos,RHS,Symbol),– !,– Pos1 is Pos+1,– advanceDot(L,Symbol,CFSet).– advanceDot([_|L],Symbol,CFSet) :- !, advanceDot(L,Symbol,CFSet).– advanceDot([],_,[]).

S --> . NP VPNP --> . D NNP --> . NNP --> . NP PP

State 0

S --> NP . VPNP --> NP . PPVP --> . V NPVP --> . VVP --> . VP PPPP --> . P NP

State 4

Page 34: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

34

Build FSM: Build new state

• define addCFSet/3– addCFSet(CFSet,L,S) :- % CFSet already established– findCFSet(CFSet,S,L),– !.– addCFSet(CFSet,L,S) :- % CFSet is new state #N– cfsmEntry(CFSet,L,S). % add it

• Note:– findCFSet/3 will succeed only if CFSet exists in the current cfsm/3

database – cfsmEntry/3 defined earlier will increment the state number (S) and

perform:• ?- asserta(cfsm(L,CFSet,S)).

Page 35: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

35

Build Actions

• two main actions– Shift

• move a word from the input onto the stack• Example:

– NP --> .D N

– Reduce• build a new constituent• Example:

– NP --> D N.

Page 36: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

36

Build Actions

• Machine components

[V hit ] … [V hit ] …

[N man][D a ]

Input

StructureStack(items)

320

ControlStack(states)

•A machine operation step (action) will have signature:–CS x Input x SS CS’ x Input’ x SS’

where•CS = control stack•SS = (constituent) structure stack

Page 37: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

37

Build Actions: shift action

• example– shift(n)

• code– action(S, CS, Input, SS, CS2, Input2, SS2 ) :-

• Input = [Item|Input2],• category(Item,n),• goto(S,n,S2),• CS2 = [S2|CS],• SS2 = [Item|SS].

• notes: (changes)– Input2 is Input minus Item– SS2 is SS plus Item– CS2 is CS plus S2 from goto(S,n,S2)

Page 38: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

38

Build Actions: shift action

• calling pattern for action/7– given values for:

• current state (S)• control and structure stacks (CS,SS)

– compute new values of:• state (S2)• control and structure stacks (CS2,SS2)

action(S, CS, Input, SS, CS2, Input2, SS2 )action(S, CS, Input, SS, CS2, Input2, SS2 )

Given Compute

Page 39: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

39

Build Actions: reduce action

• example– reduce NP --> D N.

• code– action(S, CS, Input, SS, CS2, Input2, SS2 ) :-

• Input = Input2,• SS = [N,D|SS3],• SS2 = [np(D,N)|SS3],• CS = [_,_,S1|CS3],• CS2 = [S2,S1|CS3],• goto(S1,np,S2).

• notes– input is unchanged– pop 2 items off the stacks– goto is not based on current state

Page 40: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

40

Build Actions

• define shift/reduce action generation procedure– buildActions :-– cfsm(_,CFSet,State),– actions(CFSet,Instructions),– genActions(State,Instructions),– fail.– buildActions.

• define actions/2– actions([],[]).– actions([CF|CFs],L) :-– reduceAction(CF,L1),– shiftAction(CF,L2),– append(L1,L2,L3),– actions(CFs,L4),– union(L3,L4,L) % should be no duplicate actions

Page 41: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

41

Build Actions

• define shift and reduce actions– reduceAction(cf(Ref,Pos),[reduce(Ref)]) :-– ruleRHS(RHS,Ref),– length(RHS,Pos), % finds config. A-->.– !.– reduceAction(_,[]).

– % assume that Symbol in Vt– shiftAction(cf(Ref,Pos),[shift(Symbol)]) :- – ruleRHS(RHS,Ref), % finds config. A-->.a– nth(Pos,RHS,Symbol),– term(Symbol),– !.– shiftAction(_,[]).

• builds sequences of instructions of the form – [shift(n), reduce(R3)] etc.

Page 42: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

42

Build Actions

• define procedure genActions/2– which turns instructions such as:

• shift(n)

– into code like• action(S, CS, Input, SS, CS2, Input2, SS2 ) :-

– Input = [Item|Input2],– category(Item,n),– goto(S,n,S2),

– CS2 = [S2|CS2].

Page 43: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

43

Build Actions

• genActions/2 – processes a list of actions for a given state S– genActions(_,[]).– genActions(S,[Action|As]) :-– nl,– actionClause(S,Action,Clause),– write(Clause), write('.'),– genActions(S,As).

Prolog builtins

nl - writes a newline to standard output

write/1 - writes supplied argument to standard output

Page 44: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

44

Build Actions: shift

• generate action/7 for shift– % shifting a $– actionClause(State,shift($),action(State,_,

[$],SS,accept,[],SS)) :- !.

– % shifting anything other than a $– actionClause(State,shift(Symbol),– (action(State,CS,[I|Is],SS,[S|CS],Is,[I|SS]) :- – functor(I,Symbol,_),– goto(State,Symbol,S))).

– note: • see words/2 later• assume input item is of form c(word), e.g. n(john)

Page 45: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

45

Build Actions: reduce

• generate action/7 for reduce– actionClause(State,reduce(Ref),– (action(State,CS,I,SS,[S2,Last|CS1],I,[Item|

SS1]) :- – goto(Last,NT,S2))) :-– ruleElements(NT,RHS,Ref),– popStk(RHS,CS,Last,CS1),– popAndLink(RHS,SS,SS1,L),– Item =.. [NT|L].

• note– popStk/4 and popAndLink/4 both generate code to pop

the control and structure stacks

Page 46: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

46

LR Machine: goto table

• example of LR Machine constructed– % State 8: pp->.p np vp-

>vp .pp s->np vp.– goto(4,vp,8).

– % State 9: vp->vp pp.– goto(8,pp,9).– goto(8,p,6).– goto(7,d,2).– goto(7,n,3).

– % State 10: pp->.p np np->np .pp vp->v np.

– goto(7,np,10).– goto(10,pp,5).– goto(10,p,6).– goto(6,d,2).– goto(6,n,3).

– % State 11: pp->.p np np->np .pp pp->p np.

– goto(6,np,11).– goto(11,pp,5).– goto(11,p,6).

– % State 12: np->d n.– goto(2,n,12).

– % State 13: ss->s $.– goto(1,$,13).

Page 47: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

47

LR Machine: action table

• example of action table constructed– action(State,CS,Input,SS,CS’,Input’,SS’)

– % 7– action(7,_14,[_20|_18],_16,[_22|_14],_18,[_20|_16]):-

functor(_20,n,_32),goto(7,n,_22).– action(7,_58,[_64|_62],_60,[_66|_58],_62,[_64|_60]):-

functor(_64,d,_76),goto(7,d,_66).– action(7,[_38,_10|_11],_03,[_44|_13],[_08,_10|_11],_03,[vp(_44)|_13]):-

goto(_10,vp,_08).– % 6– action(6,_78,[_84|_82],_80,[_86|_78],_82,[_84|_80]):-

functor(_84,n,_96),goto(6,n,_86).– action(6,_22,[_28|_26],_24,[_30|_22],_26,[_28|_24]):-

functor(_28,d,_40),goto(6,d,_30).– % 5– action(5,[_68,_70,_38|_39],_31,[_78,_82|_41],[_36,_38|_39],_31,

[np(_82,_78)|_41]):-goto(_38,np,_36).

Page 48: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

48

Parser

• define parse/2 as follows– parse(Words,Parse) :-– words(Words,L),– machine([0],L,[],Parse).

– machine(CS,Input,SS,Parse) :-– CS = accept– -> SS = [Parse]– ; CS = [State|_],– action(State,CS,Input,SS,CS2,Input2,SS2),– machine(CS2,Input2,SS2,Parse).

– words([],[$]).– words([W|Ws],[I|Is]) :- lexicon(W,C), I =.. [C,W], words(Ws,Is).

Page 49: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

49

Administrivia

• Prolog code available on the course webpage

• files grammar0.pl - example grammarlr0.pl - LR(0) parser/generatormachine0.pl - generated tables

Page 50: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

50

LR Parsing in Prolog

How to use• steps1. ?- [grammar0].

(consult toy grammar)2. ?- [lr0].

(consult LR code)3. ?- build.

(constructs goto table)

4. ?- buildActions. (constructs shift/reduce actions)

How to use (saving output to a file)

• stepst ?- [grammar0].

(consult toy grammar)t ?- [lr0].

(consult LR code)t ?- tell(‘filename.pl’).

(redirect screen output to filename.pl)

t ?- build. (constructs goto table)

t ?- buildActions. (constructs shift/reduce actions)

t ?- told.t (close filename.pl)

Page 51: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

51

Parser

• Example:– ?- parse([john,saw,the,man,with,a,telescope],X).– X =

s(np(n(john)),vp(v(saw),np(np(d(the),n(man)),pp(p(with),np(d(a),n(telescope)))))) ;

– X = s(np(n(john)),vp(vp(v(saw),np(d(the),n(man))),pp(p(with),np(d(a),n(telescope))))) ;

– no

Page 52: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

52

LR(0) Goto Table

0 1 2 3 4 5 6 7 8 9 10 11 12 13

D 2 2

N 3 12 3

NP 4 10

V 7

VP 8

P 6 6 6

PP 5 9 5

S 1

$ 13

Page 53: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

53

LR(0) Action Table

0 1 2 3 4 5 6 7 8 9 10 11 12 13

A1 SD

A SN

RNP

SV

RNP

SD

SD

SP

RVP

SP

SP

RNP

A2 SN

SP

SN

SN

RS

RVP

RPP

A3 RVP

S = shift, R = reduce, A = accept

Empty cells = error states

Multiple actions = machine conflict

Prolog’s computation rule: backtrack

Page 54: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

54

LR(0) Conflict Statistics

• Toy grammar– 14 states– 6 states

• with 2 competing actions• states 11,10,8:

– shift-reduce conflict

– 1 state • with 3 competing actions• State 7:

– shift(d) shift(n) reduce(vp->v) 0

1

2

3

4

5

6

7

Noconflicts

1 conflict 2conflicts

No. of states

Page 55: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

55

Lookahead

• LR(1) – a shift/reduce tabular parser– using one (terminal) lookahead symbol

• decide on the Action (shift, reduce) to take

based on– state x input symbol

• example– select reduce operation consulting the current input symbol

• cf. LR(0)– select an action based on just the current state

Page 56: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

56

Lookahead

• potential advantage– the input symbol may partition the action space– resulting in fewer conflicts

• provided the current input symbol can help to choose between possible actions

• potential disadvantages– larger finite state machine

• more possible dotted rule/lookahead combinations than just dotted rule combinations

– might not help much• depends on grammar

– more complex (off-line) computation• building the LR machine gets more complicated

Page 57: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

57

Lookahead

• formally– X --> … .Y …, L

• L = lookahead set• L = set of possible terminals that can follow X

• example– State 0

• ss-->.s $ [[]] • s-->.np vp [$]• np-->.d n [p,v]• np-->.n [p,v]• np-->.np pp [p,v]

Page 58: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

58

Lookahead

• central idea • for propagating lookahead in state machine

– if dotted rule is complete, – lookahead informs parser about what the next terminal

symbol should be

– example• NP --> D N. , L• reduce by NP rule provided current input symbol is in

set L

Page 59: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

59

Lookahead

• Prolog code to compute LR(1) machine is given on the course homepage – see file

• lr1.pl– procedure

• after loading in grammar and lr1.pl• ?- computeF. (extra step: compute first set)• ?- build. (build goto table)• ?- buildActions.

Page 60: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

60

LR(1) Machine

• toy grammar revisited– number of states: 20 – cf. 14 for LR(0)

– almost deterministic• almost LR(1)• only 3 State/Lookahead

combinations have a conflict

0

5

10

15

20

25

30

35

No conflicts 1 conflict

State/Lookahead

Page 61: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

61

LR(0) vs. LR(1)

• LR(1) parser-generator disambiguates using one symbol of lookahead– allows it to improve the determinacy of the parser– theoretical result:

• LR(1) is optimal (for one symbol of lookahead)

0

5

10

15

20

25

30

35

No conflicts 1 conflict

State/Lookahead

0

1

2

3

4

5

6

7

Noconflicts

1 conflict 2conflicts

No. of states

LR(1)LR(0)

Page 62: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

62

LR(1) Action Table0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

n S S S S S S

d S S S S

v RNP

S RNP

RPP

RVP

p RNP

S RNP

RVP

S RVP

RNP

SRPP

RNP

RNP

SRVP

SRPP

RVP

$ A RVP

RS

RVP

RNP

RPP

RNP

RNP

RVP

S = shift, R = reduce, A = accept

Empty cells = error states

Page 63: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

63

LR Parsing

• in fact– LR-parsers are generally acknowledged to be the fastest

parsers• especially when combined with the chart technique • (to be described today)

– reference • (Tomita, 1985)

– textbook• Earley’s algorithm• uses chart• but builds dotted-rule configurations dynamically at parse-

time • instead of ahead of time (so slower than LR)

Page 64: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

64

Homework 6

• don’t panic– all Prolog code is supplied– you just have to run

• goal– test understanding of ideas behind the

algorithms discussed in class

Page 65: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

65

LR Code

• grammar: – grammar0.pl

• machine generators:– lr0.pl– lr1.pl

• generated machines:– LR(0): machine0.pl– LR(1): machine1.pl

:- dynamic rule/2.nonT(ss). nonT(s). nonT(np). nonT(vp). nonT(pp).term(n). term(v). term(p). term(d). term($).

start(ss).rule(ss,[s,$]).rule(s,[np,vp]).rule(np,[d,n]).rule(np,[n]).rule(np,[np,pp]).rule(vp,[v,np]).rule(vp,[v]).rule(vp,[vp,pp]).rule(pp,[p,np]).

lexicon(the,d). lexicon(a,d).lexicon(man,n). lexicon(john,n). lexicon(telescope,n).lexicon(saw,v). lexicon(v,runs).lexicon(with,p).

Page 66: LING 438/538 Computational Linguistics Sandiway Fong Lecture 25: 11/21

66

Homework 6

• Homework 6– Question 1: (5pts)– compare LR(0) and LR(1)

algorithms

• run both the LR(0) and LR(1) machines on the sentence John saw the man with a telescope looking for all answers

• compare the number of calls to the predicate machine

• which one makes the fewer calls?

• by how many?

– Question 2: (5pts)• LR(1) Action Table

– states 14 and 15 in machine1.pl are very similar• 14: np-->np pp./[p,$]

• 15: np-->d n./[p,$]

• can they be merged?• explain your answer