natural language processing dcg and syntax nlp dcg a “translation” example: special case a dcg...
TRANSCRIPT
Natural Language Processing
DCG and Syntax
• NLP• DCG • A “translation” example: special case• A DCG recogniser
Natural Language Processing
NLP is the art and science of getting computers to understand natural language.
• NLP draws on materials from other disciplines: computer science, formal philosophy and formal linguistics
• NLP is an “AI complete” task: all the activities which turn up elsewhere in AI, such as knowledge representation, planning, inference and so on turn up in one form or another in NLP.
DCG (Definite Clause Grammar)
• An example• $120pw $200perweek $150pweek• [$, 1, 2, 0, p, w]
price-->dollar, number, unit.dollar-->[$].number-->digit, number.number-->digit.unit-->[p,w].unit-->[p, e, r, w, e, e, k].unit-->[p, w, e, e, k].
digit-->[1].digit-->[2].digit-->[3].... • ?- price([$, 2, 3, 4, p, w], []).• Yes• ?-price([8, 0, 0, p, w, e, e, k], []).• No.
Expand DCG to standard predicates
Price-->dollar, number, unit.
price(List1, List2):-
dollar(List1, List11),
number(List11, List12),
unit(List12, List2).
dollar-->[$].
dollar([$|List], List).
digit-->[1].
digit([1|List], List]).
number-->digit, number.
number-->digit.
Extending DCG
1. Add variables2. Add normal predicates in { }
• ?-price(X, [$, 1,2,3, p, w], []).• X=[1,2,3]
price(X)-->dollar, number(X), unit.
number([D|T])-->digit(D), number(T).
number([D])-->digit(D).
digit(1)-->[1].digit(2)-->[2].
price(X)-->dollar, number(X), unit, {length(X,N), N<3}.
Expand extended DCG to standard predicates
price(X)-->dollar, number(X), unit, {length(X, N), N<3}.
price(X, List1, List2):-dollar(List1, List11),number(X, List11, List12),
unit(List12, List2),length(X, N),N<3.
A “machine translation” example
• three hundred and thirty four: 334• twenty one: 21• fourteen: 14• five: 5
• ?-to_number(N, [three, hundred, and, thirty, four],[]).
• N=334.
A “translation” example
• Vocabulary, lexicondigit(1) --> [one].digit(2) --> [two].…..digit(9) --> [nine].
teen(10) --> [ten].teen(11) --> [eleven].…..teen(19) --> [nineteen].
tens(20) --> [twenty].tens(30) --> [thirty].…..tens(90) --> [ninety].
A “translation” example
• Numbers with one or two digits.
xx(N) --> digit(N).xx(N) --> teen(N).xx(N) --> tens(T), rest_xx(N1), {N is T+N1}.
rest_xx(N) --> digit(N).rest_xx(0) --> [].
A “translation” example
% numbers with 3 or fewer digitsxxx(N) --> digit(D), [hundred], rest_xxx(N1),
{N is D*100+N1}.xxx(N) --> xx(N).
rest_xxx(N) --> [and], xx(N).rest_xxx(0) --> [].
%top level to_number(0) --> [zero].to_number(N) --> xxx(N).
Query?-to_number(N, [two, hundred, and, twenty
one], []).N=221
Representing Syntactic Knowledge
Syntactic knowledge:– Syntactic Categories: e.g. Noun, Sentence.– Grammatical features: e.g. Singular, Plural– Grammar rules.
• Why bother?
Parts of language
• Regard sentences as being built out of constituents
• Two types of constituents:– words (simple constituents), which have
lexical categories like noun, verb, etc.– phrases (compound constituents), like noun
phrases, verb phrases, etc.
• How to store syntactic knowledge?– lexicon– grammar rules
Words: Lexical Categories (Parts of Speech)
• Noun (N): Jack, tree, house, cannon
• Verb (V): build, walk, kill
• Adjective (Adj): big, red, unpleasant
• Determiner (Det): the, a, which, that– Jack built {the, a, that} big, red house;– Which house did Jack build?
• Preposition (Prep): with, for, in, from, to, through, via, under
Words: Lexical Categories (ctd)
• Pronoun (Pro): her, him, she, itself, that, it– I saw the man in the park with the telescope– Don't do that to him
• Conjunction (Conj): and, or, but.
Two kinds of lexical categories:
1. Open categories (“content words”): N, V, Adj
2. Closed categories (“function words”): Det, Prep, Pro, Conj
Compound Constituents
Some compound constituents:
Sentence (S): Jack built the house.Noun Phrase (NP):
John;the big, red house;the house that Jack built;the destruction of the city.
Verb Phrase (VP):built the house quickly;saw the man in the park.
Prepositional Phrase (PP):with the telescope;on the table
A Simple Grammar
S NP VP
VP V NP
NP Proper_N
NP det N
Proper_N John
Proper_N Mary
N cake
V loves
V ate
det the
Sentences in this language:“John loves Mary”“John ate the cake”“John loves the cake”
Definite Clause Grammars (DCGs)
The above grammar can be simply implemented in DCG notation as follows:
s --> np, vp.vp --> v, np.np --> proper_n.np --> det, n.proper_n --> [john].proper_n --> [mary].n --> [cake].v --> [loves].v --> [ate].det --> [the].
Translating DCG
Consider the rules --> np, vp.
Prolog translates this as:s(Ws1,Ws2) :- np(Ws1,Ws),vp(Ws,Ws2).
This says that after taking an s off the start of Ws1, Ws2 remains
The ruleproper_n --> [john].
is translated asproper_n([john|Ws],Ws).
Query• s([john, ate, the cake],[]).• Yes• s([ate, john, cake, the],[]).• No