september 2003 1 probabilistic cfgs & probabilistic parsing universita’ di venezia 3 ottobre...

Post on 15-Jan-2016

217 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

September 2003 1

PROBABILISTIC CFGs &PROBABILISTIC PARSING

Universita’ di Venezia

3 Ottobre 2003

September 2003 2

Probabilistic CFGs

Context-Free Grammar Rules are of the form:– S NP VP

In a Probabilistic CFG, we assign a probability to these rules:– S NP VP, P(SNP,VP|S)

September 2003 3

Why PCFGs?

DISAMBIGUATION: with a PCFG, probabilities can be used to choose the most likely parse

ROBUSTNESS: rather than excluding things, a PCFG may assign them a very low probability

LEARNING: CFGs cannot be learned from positive data only

September 2003 4

An example of PCFG

September 2003 5

PCFGs in Prolog (courtesy Doug Arnold)

s(P0, [s,NP,VP] ) --> np(P1,NP),

vp(P2,VP),{ P0 is 1.0*P1*P2 }.

….vp(P0, [vp,V,NP] ) -->

v(P1,V),np(P2,NP ),{ P0 is 0.7*P1*P2 }.

September 2003 6

Notation and assumptions

September 2003 7

Independence assumptions

PCFGs specify a language model, just like n-grams

We need however to make some independence assumptions yet again: the probability of a subtree is independent of:

September 2003 8

The language model defined by PCFGs

September 2003 9

Using PCFGs to disambiguate: “Astronomers saw stars with ears”

September 2003 10

A second parse

September 2003 11

Choosing among the parses, and the sentence’s probability

September 2003 12

Parsing with PCFGs:A comparison with HMMs

An HMM defines a REGULAR GRAMMAR:

September 2003 13

Parsing with CFGs: A comparison with HMMs

September 2003 14

Inside and outside probabilities(cfr. forward and backward probabilities for HMMs)

September 2003 15

Parsing with probabilistic CFGs

September 2003 16

The algorithm

September 2003 17

Example

September 2003 18

Initialization

September 2003 19

Example

September 2003 20

Example

September 2003 21

Learning the probabilities: the Treebank

September 2003 22

Learning probabilities

Reconstruct the rules used in the analysis of the Treebank

Estimate probabilities by:

P(AB) = C(AB) / C(A)

September 2003 23

Probabilistic lexicalised PCFGs(Collins, 1997; Charniak, 2000)

September 2003 24

Parsing evaluation

September 2003 25

Performance of current parsers

September 2003 26

Readings

Manning and Schütze, chapters 11 and 12

September 2003 27

Acknowledgments

Some slides and the Prolog code are borrowed from Doug Arnold

Thanks also to Chris Manning & Diego Molla

top related