computational linguistics introduction
DESCRIPTION
Computational Linguistics Introduction. Context Free Grammars. Chomsky Hierarchy. Context Free Grammars (CFGs). Sets of rules expressing how symbols of the language fit together, e.g. S -> NP VP NP -> Det N Det -> the N -> dog. What Does Context Free Mean?. - PowerPoint PPT PresentationTRANSCRIPT
November 2011 CLINT-LN CFG 1
Computational Linguistics Introduction
Context Free Grammars
November 2011 CLINT-LN CFG 2
Chomsky Hierarchy
November 2011 CLINT-LN CFG 3
Context Free Grammars (CFGs)
Sets of rules expressing how symbols of the language fit together, e.g.
S -> NP VPNP -> Det NDet -> theN -> dog
November 2011 CLINT-LN CFG 4
What Does Context Free Mean?
• LHS of rule is just one symbol.
• Can haveNP -> Det N
• Cannot haveX NP Y -> X Det N Y
• Everyday examples of context sensitivity
November 2011 CLINT-LN CFG 5
What Does Context Free Mean?
• LHS of rule is just one symbol.• Can haveNP -> Det N
• Cannot haveX NP Y -> X Det N Y
• Everyday examples of context sensitivityy e -> i e[b]# -> [p]#
November 2011 CLINT-LN CFG 6
Grammar Symbols
• Non Terminal Symbols
• Terminal Symbols– Words– Preterminals
November 2011 CLINT-LN CFG 7
Non Terminal Symbols
• Symbols which have definitions
• Symbols which appear on the LHS of rules
S -> NP VPNP -> Det NDet -> theN -> dog
November 2011 CLINT-LN CFG 8
Non Terminal Symbols
• Same Non Terminals can have several definitions
S -> NP VPNP -> Det NNP -> N Det -> theN -> dog
November 2011 CLINT-LN CFG 9
Terminal Symbols
• Symbols which appear in final string• Correspond to words• Are not defined by the grammar
S -> NP VPNP -> Det NDet -> theN -> dog
November 2011 CLINT-LN CFG 10
Parts of Speech (POS)
• NT Symbols which produce terminal symbols are sometimes called pre-terminalsS -> NP VPNP -> Det NDet -> theN -> dog
• Sometimes we are interested in the shape of sentences formed from pre-terminalsDet N VAux N V D N
November 2011 CLINT-LN CFG 11
CFG - formal definition
A CFG is a tuple (N,,R,S)
• N is a set of non-terminal symbols is a set of terminal symbols disjoint from
N
• R is a set of rules each of the form A where A is non-terminal
• S is a designated start-symbol
November 2011 CLINT-LN CFG 12
CFG - Example
grammar:
S NP VPNP NVP V NPlexicon:
V kicksN JohnN Bill
N = {S, NP, VP, N, V}
= {kicks, John, Bill}
R = (see opposite)
S = “S”
November 2011 CLINT-LN CFG 13
Class Exercise
• Write grammars that generate the following languages, for m > 0(ab)m anbm
anbn
• Which of these are Regular?• Which of these are Context Free?
November 2011 CLINT-LN CFG 14
(ab)m for m > 0
S -> a b
S -> a b S
November 2011 CLINT-LN CFG 15
(ab)m for m > 0
S -> a bS -> a b S
S -> a XX -> b YY -> a bY -> S
November 2011 CLINT-LN CFG 16
anbm
S -> A B
A -> a
A -> a A
B -> b
B -> b B
November 2011 CLINT-LN CFG 17
anbm
S -> A B
A -> a
A -> a A
B -> b
B -> b B
S -> a AB
AB -> a AB
AB -> B
B -> b
B -> b B
November 2011 CLINT-LN CFG 18
Grammar Defines a Structure
grammar:
S NP VPNP NVP V NPlexicon:
V kicksN JohnN Bill
S
NP
N
John kicks
NPV
VP
N
Bill
November 2011 CLINT-LN CFG 19
Different Grammar Different Stucture
grammar:
S NP NPNP N VNP Nlexicon:
V kicksN JohnN Bill
S
NP
N
BillJohn
VN
NP
kicks
November 2011 CLINT-LN CFG 20
Which Grammar is Best?
• The structure assigned by the grammar should be appropriate.
• The structure should– Be understandable– Allow us to make generalisations.– Reflect the underlying meaning of the
sentence.
November 2011 CLINT-LN CFG 21
Ambiguity
• A grammar is ambiguous if it assigns two or more structures to the same sentence.NP NP CONJ NPNP Nlexicon:CONJ andN JohnN Bill
• The grammar should not generate too many possible structures for the same sentence.
November 2011 CLINT-LN CFG 22
Criteria for Evaluating Grammars
• Does it undergenerate?• Does it overgenerate?• Does it assign appropriate structures to
sentences it generates?• Is it simple to understand? How many rules are
there?• Does it contain just a few generalisations or is it
full of special cases?• How ambiguous is it? How many structures does
it assign for a given sentence?
November 2011 CLINT-LN CFG 23
PATR-II
• The PATR-II formalism can be viewed as a computer language for encoding linguistic information.
• A PATR-II grammar consists of a set of rules and a lexicon.
• The rules are CFG rules augmented with constaints.
• The lexicon provides information about terminal symbols.
November 2011 CLINT-LN CFG 24
Example PATR-IIGrammar and Lexicon
Grammar (grammar.grm)
Rule
s -> np vp
Rule
np -> n
Rule
vp -> v
Lexicon (lexicon.lex)
\w uther
\c n
\w sleeps
\c v