computational linguistics introduction

24
November 2011 CLINT-LN CFG 1 Computational Linguistics Introduction Context Free Grammars

Upload: lenora

Post on 12-Jan-2016

56 views

Category:

Documents


0 download

DESCRIPTION

Computational Linguistics Introduction. Context Free Grammars. Chomsky Hierarchy. Context Free Grammars (CFGs). Sets of rules expressing how symbols of the language fit together, e.g. S -> NP VP NP -> Det N Det -> the N -> dog. What Does Context Free Mean?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 1

Computational Linguistics Introduction

Context Free Grammars

Page 2: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 2

Chomsky Hierarchy

Page 3: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 3

Context Free Grammars (CFGs)

Sets of rules expressing how symbols of the language fit together, e.g.

S -> NP VPNP -> Det NDet -> theN -> dog

Page 4: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 4

What Does Context Free Mean?

• LHS of rule is just one symbol.

• Can haveNP -> Det N

• Cannot haveX NP Y -> X Det N Y

• Everyday examples of context sensitivity

Page 5: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 5

What Does Context Free Mean?

• LHS of rule is just one symbol.• Can haveNP -> Det N

• Cannot haveX NP Y -> X Det N Y

• Everyday examples of context sensitivityy e -> i e[b]# -> [p]#

Page 6: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 6

Grammar Symbols

• Non Terminal Symbols

• Terminal Symbols– Words– Preterminals

Page 7: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 7

Non Terminal Symbols

• Symbols which have definitions

• Symbols which appear on the LHS of rules

S -> NP VPNP -> Det NDet -> theN -> dog

Page 8: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 8

Non Terminal Symbols

• Same Non Terminals can have several definitions

S -> NP VPNP -> Det NNP -> N Det -> theN -> dog

Page 9: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 9

Terminal Symbols

• Symbols which appear in final string• Correspond to words• Are not defined by the grammar

S -> NP VPNP -> Det NDet -> theN -> dog

Page 10: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 10

Parts of Speech (POS)

• NT Symbols which produce terminal symbols are sometimes called pre-terminalsS -> NP VPNP -> Det NDet -> theN -> dog

• Sometimes we are interested in the shape of sentences formed from pre-terminalsDet N VAux N V D N

Page 11: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 11

CFG - formal definition

A CFG is a tuple (N,,R,S)

• N is a set of non-terminal symbols is a set of terminal symbols disjoint from

N

• R is a set of rules each of the form A where A is non-terminal

• S is a designated start-symbol

Page 12: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 12

CFG - Example

grammar:

S NP VPNP NVP V NPlexicon:

V kicksN JohnN Bill

N = {S, NP, VP, N, V}

= {kicks, John, Bill}

R = (see opposite)

S = “S”

Page 13: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 13

Class Exercise

• Write grammars that generate the following languages, for m > 0(ab)m anbm

anbn

• Which of these are Regular?• Which of these are Context Free?

Page 14: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 14

(ab)m for m > 0

S -> a b

S -> a b S

Page 15: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 15

(ab)m for m > 0

S -> a bS -> a b S

S -> a XX -> b YY -> a bY -> S

Page 16: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 16

anbm

S -> A B

A -> a

A -> a A

B -> b

B -> b B

Page 17: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 17

anbm

S -> A B

A -> a

A -> a A

B -> b

B -> b B

S -> a AB

AB -> a AB

AB -> B

B -> b

B -> b B

Page 18: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 18

Grammar Defines a Structure

grammar:

S NP VPNP NVP V NPlexicon:

V kicksN JohnN Bill

S

NP

N

John kicks

NPV

VP

N

Bill

Page 19: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 19

Different Grammar Different Stucture

grammar:

S NP NPNP N VNP Nlexicon:

V kicksN JohnN Bill

S

NP

N

BillJohn

VN

NP

kicks

Page 20: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 20

Which Grammar is Best?

• The structure assigned by the grammar should be appropriate.

• The structure should– Be understandable– Allow us to make generalisations.– Reflect the underlying meaning of the

sentence.

Page 21: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 21

Ambiguity

• A grammar is ambiguous if it assigns two or more structures to the same sentence.NP NP CONJ NPNP Nlexicon:CONJ andN JohnN Bill

• The grammar should not generate too many possible structures for the same sentence.

Page 22: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 22

Criteria for Evaluating Grammars

• Does it undergenerate?• Does it overgenerate?• Does it assign appropriate structures to

sentences it generates?• Is it simple to understand? How many rules are

there?• Does it contain just a few generalisations or is it

full of special cases?• How ambiguous is it? How many structures does

it assign for a given sentence?

Page 23: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 23

PATR-II

• The PATR-II formalism can be viewed as a computer language for encoding linguistic information.

• A PATR-II grammar consists of a set of rules and a lexicon.

• The rules are CFG rules augmented with constaints.

• The lexicon provides information about terminal symbols.

Page 24: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 24

Example PATR-IIGrammar and Lexicon

Grammar (grammar.grm)

Rule

s -> np vp

Rule

np -> n

Rule

vp -> v

Lexicon (lexicon.lex)

\w uther

\c n

\w sleeps

\c v