log-linear models for history-based parsing -...

32
Log-Linear Models for History-Based Parsing Michael Collins, Columbia University

Upload: others

Post on 04-Sep-2019

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Log-Linear Models for History-Based Parsing

Michael Collins, Columbia University

Page 2: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Log-Linear Taggers: Summary

I The input sentence is w[1:n] = w1 . . . wn

I Each tag sequence t[1:n] has a conditional probability

p(t[1:n] | w[1:n]) =∏n

j=1 p(tj | w1 . . . wn, t1 . . . tj−1) Chain rule

=∏n

j=1 p(tj | w1 . . . wn, tj−2, tj−1) Independence

assumptions

I Estimate p(tj | w1 . . . wn, tj−2, tj−1) using log-linear models

I Use the Viterbi algorithm to compute

argmaxt[1:n]log p(t[1:n] | w[1:n])

Page 3: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

A General Approach:

(Conditional) History-Based Models

I We’ve shown how to define p(t[1:n] | w[1:n]) where t[1:n] is atag sequence

I How do we define p(T | S) if T is a parse tree (or anotherstructure)? (We use the notation S = w[1:n])

Page 4: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

A General Approach:

(Conditional) History-Based Models

I Step 1: represent a tree as a sequence of decisions d1 . . . dm

T = 〈d1, d2, . . . dm〉m is not necessarily the length of the sentence

I Step 2: the probability of a tree is

p(T | S) =m∏i=1

p(di | d1 . . . di−1, S)

I Step 3: Use a log-linear model to estimatep(di | d1 . . . di−1, S)

I Step 4: Search?? (answer we’ll get to later: beam orheuristic search)

Page 5: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

An Example Tree

S(questioned)

NP(lawyer)

DT

the

NN

lawyer

VP(questioned)

Vt

questioned

NP(witness)

DT

the

NN

witness

PP(about)

IN

about

NP(revolver)

DT

the

NN

revolver

Page 6: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Ratnaparkhi’s Parser: Three Layers of Structure

1. Part-of-speech tags

2. Chunks

3. Remaining structure

Page 7: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Layer 1: Part-of-Speech Tags

DT

the

NN

lawyer

Vt

questioned

DT

the

NN

witness

IN

about

DT

the

NN

revolver

I Step 1: represent a tree as a sequence of decisions d1 . . . dm

T = 〈d1, d2, . . . dm〉

I First n decisions are tagging decisions〈d1 . . . dn〉 = 〈 DT, NN, Vt, DT, NN, IN, DT, NN 〉

Page 8: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Layer 2: Chunks

NP

DT

the

NN

lawyer

Vt

questioned

NP

DT

the

NN

witness

IN

about

NP

DT

the

NN

revolver

Chunks are defined as any phrase where all children arepart-of-speech tags

(Other common chunks are ADJP, QP)

Page 9: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Layer 2: Chunks

Start(NP)

DT

the

Join(NP)

NN

lawyer

Other

Vt

questioned

Start(NP)

DT

the

Join(NP)

NN

witness

Other

IN

about

Start(NP)

DT

the

Join(NP)

NN

revolver

I Step 1: represent a tree as a sequence of decisions d1 . . . dm

T = 〈d1, d2, . . . dm〉

I First n decisions are tagging decisionsNext n decisions are chunk tagging decisions

〈d1 . . . d2n〉 = 〈 DT, NN, Vt, DT, NN, IN, DT, NN,Start(NP), Join(NP), Other, Start(NP), Join(NP),Other, Start(NP), Join(NP)〉

Page 10: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Layer 3: Remaining Structure

Alternate Between Two Classes of Actions:

I Join(X) or Start(X), where X is a label (NP, S, VP etc.)I Check=YES or Check=NO

Meaning of these actions:

I Start(X) starts a new constituent with label X(always acts on leftmost constituent with no start or joinlabel above it)

I Join(X) continues a constituent with label X(always acts on leftmost constituent with no start or joinlabel above it)

I Check=NO does nothingI Check=YES takes previous Join or Start action, and converts

it into a completed constituent

Page 11: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

NP

DT

the

NN

lawyer

Vt

questioned

NP

DT

the

NN

witness

IN

about

NP

DT

the

NN

revolver

Page 12: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Start(S)

NP

DT

the

NN

lawyer

Vt

questioned

NP

DT

the

NN

witness

IN

about

NP

DT

the

NN

revolver

Page 13: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Start(S)

NP

DT

the

NN

lawyer

Vt

questioned

NP

DT

the

NN

witness

IN

about

NP

DT

the

NN

revolver

Check=NO

Page 14: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Start(S)

NP

DT

the

NN

lawyer

Start(VP)

Vt

questioned

NP

DT

the

NN

witness

IN

about

NP

DT

the

NN

revolver

Page 15: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Start(S)

NP

DT

the

NN

lawyer

Start(VP)

Vt

questioned

NP

DT

the

NN

witness

IN

about

NP

DT

the

NN

revolver

Check=NO

Page 16: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Start(S)

NP

DT

the

NN

lawyer

Start(VP)

Vt

questioned

Join(VP)

NP

DT

the

NN

witness

IN

about

NP

DT

the

NN

revolver

Page 17: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Start(S)

NP

DT

the

NN

lawyer

Start(VP)

Vt

questioned

Join(VP)

NP

DT

the

NN

witness

IN

about

NP

DT

the

NN

revolver

Check=NO

Page 18: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Start(S)

NP

DT

the

NN

lawyer

Start(VP)

Vt

questioned

Join(VP)

NP

DT

the

NN

witness

Start(PP)

IN

about

NP

DT

the

NN

revolver

Page 19: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Start(S)

NP

DT

the

NN

lawyer

Start(VP)

Vt

questioned

Join(VP)

NP

DT

the

NN

witness

Start(PP)

IN

about

NP

DT

the

NN

revolver

Check=NO

Page 20: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Start(S)

NP

DT

the

NN

lawyer

Start(VP)

Vt

questioned

Join(VP)

NP

DT

the

NN

witness

Start(PP)

IN

about

Join(PP)

NP

DT

the

NN

revolver

Page 21: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Start(S)

NP

DT

the

NN

lawyer

Start(VP)

Vt

questioned

Join(VP)

NP

DT

the

NN

witness

PP

IN

about

NP

DT

the

NN

revolver

Check=YES

Page 22: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Start(S)

NP

DT

the

NN

lawyer

Start(VP)

Vt

questioned

Join(VP)

NP

DT

the

NN

witness

Join(VP)

PP

IN

about

NP

DT

the

NN

revolver

Page 23: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Start(S)

NP

DT

the

NN

lawyer

VP

Vt

questioned

NP

DT

the

NN

witness

PP

IN

about

NP

DT

the

NN

revolver

Check=YES

Page 24: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Start(S)

NP

DT

the

NN

lawyer

Join(S)

VP

Vt

questioned

NP

DT

the

NN

witness

PP

IN

about

NP

DT

the

NN

revolver

Page 25: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

S

NP

DT

the

NN

lawyer

VP

Vt

questioned

NP

DT

the

NN

witness

PP

IN

about

NP

DT

the

NN

revolver

Check=YES

Page 26: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

The Final Sequence of decisions

〈d1 . . . dm〉 = 〈 DT, NN, Vt, DT, NN, IN, DT, NN,Start(NP), Join(NP), Other, Start(NP), Join(NP),Other, Start(NP), Join(NP),Start(S), Check=NO, Start(VP), Check=NO,Join(VP), Check=NO, Start(PP), Check=NO,Join(PP), Check=YES, Join(VP), Check=YES,Join(S), Check=YES 〉

Page 27: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

A General Approach:

(Conditional) History-Based ModelsI Step 1: represent a tree as a sequence of decisions d1 . . . dm

T = 〈d1, d2, . . . dm〉

m is not necessarily the length of the sentence

I Step 2: the probability of a tree isp(T | S) =

∏mi=1 p(di | d1 . . . di−1, S)

I Step 3: Use a log-linear model to estimate

p(di | d1 . . . di−1, S)

I Step 4: Search?? (answer we’ll get to later: beam or heuristicsearch)

Page 28: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Applying a Log-Linear Model

I Step 3: Use a log-linear model to estimate

p(di | d1 . . . di−1, S)

I A reminder:

p(di | d1 . . . di−1, S) =ef(〈d1...di−1,S〉,di)·v∑d∈A e

f(〈d1...di−1,S〉,d)·v

where:

〈d1 . . . di−1, S〉 is the history

di is the outcome

f maps a history/outcome pair to a feature vector

v is a parameter vector

A is set of possible actions

Page 29: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Applying a Log-Linear Model

I Step 3: Use a log-linear model to estimate

p(di | d1 . . . di−1, S) =ef(〈d1...di−1,S〉,di)·v∑d∈A e

f(〈d1...di−1,S〉,d)·v

I The big question: how do we define f?

I Ratnaparkhi’s method defines f differently depending onwhether next decision is:

I A tagging decision(same features as before for POS tagging!)

I A chunking decisionI A start/join decision after chunkingI A check=no/check=yes decision

Page 30: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Layer 3: Join or Start

I Looks at head word, constituent (or POS) label, andstart/join annotation of n’th tree relative to the decision,where n = −2,−1

I Looks at head word, constituent (or POS) label of n’th treerelative to the decision, where n = 0, 1, 2

I Looks at bigram features of the above for (-1,0) and (0,1)

I Looks at trigram features of the above for (-2,-1,0), (-1,0,1)and (0, 1, 2)

I The above features with all combinations of head wordsexcluded

I Various punctuation features

Page 31: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

Layer 3: Check=NO or Check=YES

I A variety of questions concerning the proposed constituent

Page 32: Log-Linear Models for History-Based Parsing - cs.columbia.edumcollins/cs4705-spring2019/slides/loglinear... · Log-Linear Models for History-Based Parsing Michael Collins, Columbia

The Search Problem

I In POS tagging, we could use the Viterbi algorithm because

p(tj | w1 . . . wn, j, t1 . . . tj−1) = p(tj | w1 . . . wn, j, tj−2 . . . tj−1)

I Now: Decision di could depend on arbitrary decisions in the“past” ⇒ no chance for dynamic programming

I Instead, Ratnaparkhi uses a beam search method