patr ii compiler

29
PATR II PATR II Compiler Compiler Prolog Aufbaukurs SS 2000 Heinrich-Heine-Universität Düsseldorf Christof Rumpf

Upload: elwyn

Post on 13-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

PATR II Compiler. Prolog Aufbaukurs SS 2000 Heinrich-Heine-Universität Düsseldorf Christof Rumpf. Notationskonventionen. Instantiierungsmodus von Argumenten Blau : Input-Argumente Rot : Output-Argumente Cut roter Cut ! grüner Cut ! Prädikatsdefinitionen abgeschlossen - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: PATR II Compiler

PATR II PATR II CompilerCompiler

Prolog Aufbaukurs SS 2000

Heinrich-Heine-Universität Düsseldorf

Christof Rumpf

Page 2: PATR II Compiler

22.05.2000 PATR II Compiler 2

NotationskonventionenNotationskonventionen

• Instantiierungsmodus von Argumenten– Blau: Input-Argumente– Rot: Output-Argumente

• Cut– roter Cut !– grüner Cut !

• Prädikatsdefinitionen– abgeschlossen– wird fortgesetzt

Page 3: PATR II Compiler

22.05.2000 PATR II Compiler 3

DirektivenDirektiven

% external resources

:- [tokenize]. % load tokenizer

% operators

:- op(510, xfy, : ). % attr:val:- op(600, xfx, ===). % path equation:- op(1100,xfx,'--->'). % syntax rule, lexical entry:- op(1200,xfx,'::'). % description annotation

Page 4: PATR II Compiler

22.05.2000 PATR II Compiler 4

3 Compiler-Komponenten3 Compiler-Komponenten

• Tokenizer– Input: PATR II-Grammatik– Output: Token-Zeilen

• Präprozessor– Input: Token-Zeilen– Output: Token-Sätze

• Syntax-Compiler– Input: Token-Sätze– Output: Prolog-Klauseln

compile_grammar(File):-clear_grammar,tokenize_file(File), read_sentences,compile_sentences.

Page 5: PATR II Compiler

22.05.2000 PATR II Compiler 5

Tokenizer-InputTokenizer-Input

; Shieb1.ptr; Sample grammar one from Shieber 1986

; Grammar Rules; ------------------------------------------------------------

Rule {sentence formation} S --> NP VP:

<S head> = <VP head><VP head subject> = <NP head>.

Rule {trivial verb phrase} VP --> V:

<VP head> = <V head>.

; Lexicon; ----------------------------------------------------------------

Word uther:<cat> = NP<head agreement gender> = masculine<head agreement person> third<head agreement number> = singular.

Page 6: PATR II Compiler

22.05.2000 PATR II Compiler 6

Tokenizer Output = Präprozessor InputTokenizer Output = Präprozessor Input

line(1,[o($;$),b(1),u($Shieb1$),o($.$),l($ptr$)]).line(2,[o($;$),b(1),u($Sample$),b(1),l($grammar$),b(1),l($one$),b(1),l($from$),b(1), ...line(3,[ ]).line(4,[ ]).line(5,[o($;$),b(1),u($Grammar$),b(1),u($Rules$)]).line(6,[o($;$),b(1),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$), ...line(7,[ ]).line(8,[u($Rule$),b(1),o(${$),l($sentence$),b(1),l($formation$),o($}$)]).line(9,[b(2),u($S$),b(1),o($-$),o($-$),o($>$),b(1),u($NP$),b(1),u($VP$),o($:$)]).line(10,[b(1),o($<$),u($S$),b(1),l($head$),o($>$),b(1),o($=$),b(1),o($<$),u($VP$),b(1), ...line(11,[b(1),o($<$),u($VP$),b(1),l($head$),b(1),l($subject$),o($>$),b(1),o($=$),b(1), ...line(12,[b(1)]).line(13,[u($Rule$),b(1),o(${$),l($trivial$),b(1),l($verb$),b(1),l($phrase$),o($}$)]).line(14,[b(2),u($VP$),b(1),o($-$),o($-$),o($>$),b(1),u($V$),o($:$)]).line(15,[b(1),o($<$),u($VP$),b(1),l($head$),o($>$),b(1),o($=$),b(1),o($<$),u($V$),b(1),......line(41,[b(1),o($<$),l($head$),b(1),l($subject$),b(1),l($agreement$),b(1),l($number$),...line(42,[eof]).

Page 7: PATR II Compiler

22.05.2000 PATR II Compiler 7

Präprozessor Output = Compiler InputPräprozessor Output = Compiler Input

sentence( 1,11,[u($Rule$),o(${$),l($sentence$),l($formation$),o($}$),...

sentence(12,15,[u($Rule$),o(${$),l($trivial$),l($verb$),l($phrase$),o($}$),...

sentence(16,24,[u($Word$),l($uther$),o($:$),o($<$),l($cat$),o($>$),o($=$),...

sentence(25,30,[u($Word$),l($knights$),o($:$),o($<$),l($cat$),o($>$),o($=$),...

sentence(31,36,[u($Word$),l($sleeps$),o($:$),o($<$),l($cat$),o($>$),o($=$),...

sentence(37,41,[u($Word$),l($sleep$),o($:$),o($<$),l($cat$),o($>$),o($=$),...

sentence(42,42,[eof]).

Der Präprozessor entfernt Kommentare und Leerzeichen und fasst mit einem Punkt terminierte Sätze aus mehreren Zeilen zusammen. Der eigentliche Compiler kann sich dann auf das wesentliche konzentrieren.

Page 8: PATR II Compiler

22.05.2000 PATR II Compiler 8

Präprozessor: Main LoopPräprozessor: Main Loopread_sentences:-

abolish(cnt/1),write('preprocessing...'), nl,repeat,count(I),read_sentence(N,M,S),assert(sentence(N,M,S)),put(13), tab(3), write(I), write(' sentences preprocessed'),S = [eof], !, nl.

read_sentence(N,M,S):-retract(line(N,L)),read_sentence(L,N,M,S), !.

Backtracking

Page 9: PATR II Compiler

22.05.2000 PATR II Compiler 9

Präprozessor: Satz lesenPräprozessor: Satz lesen

read_sentence([eof],N,N,[eof]):- !. % end of fileread_sentence([o($.$)|_],N,N,[]):- !. % end of sentenceread_sentence([o($;$)|_],N,M,S):- !, % skip comment

N1 is N+1,retract(line(N1,L)), % next lineread_sentence(L,N1,M,S).

read_sentence([],N,M,S):- !, % end of lineN1 is N+1,retract(line(N1,L)), % next lineread_sentence(L,N1,M,S).

read_sentence([b(_)|T1],N,M,T2):- !, % skip blanksread_sentence(T1,N,M,T2).

read_sentence([H|T1],N,M,[H|T2]):- % collect tokensread_sentence(T1,N,M,T2).

Page 10: PATR II Compiler

22.05.2000 PATR II Compiler 10

Compiler: Main LoopCompiler: Main Loop

compile_sentences:-abolish(cnt/1),write('compiling...'), nl,retract(sentence(N,M,S)),compile_sentence((N,M),C,S,[]),assert(C),count(I), put(13), tab(3), write(I), write(' sentences compiled'),S = [eof], !,nl.

Backtracking

Page 11: PATR II Compiler

22.05.2000 PATR II Compiler 11

Compiler: SatztypenCompiler: Satztypen

% compile_sentence(Position,Clause,Sentence,Rest)

compile_sentence(_,C) --> [eof], !, {C = finished}.compile_sentence(_,C) --> syntax_rule(C), !.compile_sentence(_,C) --> lex_entry(C), !.compile_sentence(_,C) --> template(C), !.compile_sentence(P,_,_,_):-

P = (N,M), nl,write(' error in sentence between lines '),write(N),write(' and '), write(M), nl, fail.

Page 12: PATR II Compiler

22.05.2000 PATR II Compiler 12

Syntax-RegelnSyntax-Regeln

syntax_rule(C) --> rs('Rule'), !, syntax_rule_cont(C).

syntax_rule_cont((Expansion :: Descr)) -->

rule_name,

sr_expansion(Expansion,Sugar),

rs(:), !,

sr_path_equations(Equations,Sugar),

{sr_sugar_cats(Sugar,Equations,Descr)}.

Page 13: PATR II Compiler

22.05.2000 PATR II Compiler 13

Reservierte SymboleReservierte Symbolers(=) --> [o($=$)], !.rs(:) --> [o($:$)], !.rs(<) --> [o($<$)], !.rs(>) --> [o($>$)], !.rs('{') --> [o(${$)], !.rs('}') --> [o($}$)], !.rs('Rule') --> [u($Rule$)], !.rs('Word') --> [u($Word$)], !.rs('Let') --> [u($Let$)], !.rs('be') --> [l($be$)], !.rs('-->') --> [o($-$),o($-$),o($>$)], !.

Alternative: Definiere für jedes reservierte Symbol ein eigenes Prädikat, z.B. colon statt rs(:).

Page 14: PATR II Compiler

22.05.2000 PATR II Compiler 14

Weitere TerminalsymboleWeitere Terminalsymbole

uatom(A) --> [u(S)], {atom_string(A,S)}.latom(A) --> [l(S)], {atom_string(A,S)}.satom(A) --> [s(S)], {atom_string(A,S)}.

int(I) --> [i(I)].

atom(A) --> uatom(A), !.atom(A) --> latom(A), !.atom(A) --> satom(A), !.

atomic(A) --> atom(A), !.atomic(A) --> int(A), !.

Page 15: PATR II Compiler

22.05.2000 PATR II Compiler 15

RegelnamenRegelnamen

rule_name --> rs('{'), !, % start of rule namecurley_braces_terminated_string.

rule_name --> []. % rule names are optional

curley_braces_terminated_string --> rs('}'), !. % end of rule name

curley_braces_terminated_string --> [_], % read any symbolcurley_braces_terminated_string.

Regelnamen werden überlesen und nicht in die Prolog-Repräsentation der Regeln übernommen.

Page 16: PATR II Compiler

22.05.2000 PATR II Compiler 16

RegelexpansionRegelexpansion

sr_expansion((LHS ---> RHS),[LSugar|RSugar]) --> sr_lhs(LHS,LSugar),rs('-->'),sr_rhs(RHS,RSugar).

sr_lhs(LHS,Sugar) --> fsd(LHS,Sugar).sr_rhs(RHS,Sugar) --> ne_fsd_seq(RHS,Sugar).

ne_fsd_seq((FSD,FSDs),[Sugar|Sugars]) --> fsd(FSD,Sugar), ne_fsd_seq(FSDs,Sugars).

ne_fsd_seq(FSD,[Sugar]) --> fsd(FSD,Sugar).

fsd(Var,(FSD,Var)) --> uatom(FSD).

Page 17: PATR II Compiler

22.05.2000 PATR II Compiler 17

Syntax-Regeln: PfadgleichungenSyntax-Regeln: Pfadgleichungen

sr_path_equations((E,Es),Sugar) -->sr_path_equation(E,Sugar),sr_path_equations(Es,Sugar).

sr_path_equations(E,Sugar) --> sr_path_equation(E,Sugar).

sr_path_equation((LHS === RHS),Sugar) --> sr_path(LHS,Sugar), rs(=),sr_val(RHS,Sugar).

sr_val(V,Sugar) --> sr_path(V,Sugar).sr_val(V,_) --> atomic(V).

Page 18: PATR II Compiler

22.05.2000 PATR II Compiler 18

Syntax-Regeln: PfadeSyntax-Regeln: Pfade

sr_path(Var,Sugar) --> rs(<), fsd(FSD), rs(>), {member((FSD,Var),Sugar)}, !.

sr_path(Var:P,Sugar) --> rs(<), fsd(FSD), ne_feature_seq(P), rs(>), {member((FSD,Var),Sugar)}, !.

ne_feature_seq(F) --> feature(F).ne_feature_seq(F:P) -->

feature(F), ne_feature_seq(P).

fsd(FSD) --> uatom(FSD).feature(F) --> atomic(F).

Page 19: PATR II Compiler

22.05.2000 PATR II Compiler 19

Syntaktischer ZuckerSyntaktischer Zucker

sr_sugar_cats([(Cat,Var)|Sugar],Equations,((Var:cat === Cat),Descr)):-

sr_sugar_cats(Sugar,Equations,Descr).

sr_sugar_cats([],Descr,Descr).

Rule {sentence formation} S --> NP VP: <S head> = <VP head> <VP head subject> = <NP head>.

Rule {sentence formation} X0 --> X1 X2:

<X0 cat> = S<X1 cat> = NP<X2 cat> = VP<X0 head> = < X2 head><X2 head subject> = <X1 head>.

Page 20: PATR II Compiler

22.05.2000 PATR II Compiler 20

Lexikalische EinträgeLexikalische Einträge

lex_entry(C) --> rs('Word'), !, lex_entry_cont(C).

lex_entry_cont((FS ---> L :: Descr)) --> lexeme(L),rs(:), !,lex_definition(FS, Descr).

lexeme(L) --> atom(L).

Page 21: PATR II Compiler

22.05.2000 PATR II Compiler 21

Lexikon: MerkmalsstrukturenLexikon: Merkmalsstrukturen

lex_definition(FS,(LDef,LDefs)) --> lexdef(FS,LDef),lex_definition(FS,LDefs).

lex_definition(FS,LDef) --> lexdef(FS,LDef).

lexdef(FS,LDef) --> template_name(FS,LDef), !.

lexdef(FS,LDef) --> lex_path_equation(FS,LDef), !.

Page 22: PATR II Compiler

22.05.2000 PATR II Compiler 22

Lexikon: PfadgleichungenLexikon: Pfadgleichungen

lex_path_equation(FS, (LHS === RHS)) --> lex_path(FS, LHS), rs(=), !,lex_val(FS, RHS).

lex_path(FS,FS:P) --> rs(<), ne_feature_seq(P), rs(>), !.

lex_val(FS,V) --> lex_path(FS,V).lex_val(_,V) --> atomic(V).

Page 23: PATR II Compiler

22.05.2000 PATR II Compiler 23

TemplatesTemplates

template(C) --> rs('Let'), !, template_cont(C).

template_cont((N :- TDef)) --> template_name(FS,N),rs('be'),template_definition(FS,TDef),{assert(template(N))}.

Page 24: PATR II Compiler

22.05.2000 PATR II Compiler 24

Templates: Head & BodyTemplates: Head & Body

template_name(FS,N) -->atom(A),{N =.. [A,FS]}.

template_definition(FS,TDef) -->lex_definition(FS,TDef).

Page 25: PATR II Compiler

22.05.2000 PATR II Compiler 25

Löschen einer GrammatikLöschen einer Grammatik

clear_templates:-template(T),T =.. [F,_],abolish(F/1),fail.

clear_templates:- abolish(template/1).

clear_grammar:-abolish('::'/2),abolish(line/2),abolish(sentence/3),clear_templates.

Page 26: PATR II Compiler

22.05.2000 PATR II Compiler 26

Compiler OutputCompiler Output

A ---> B , C :: A : cat === 'S', B : cat === 'NP', C : cat === 'VP', A : head === C : head, C : head : subject === B : head.

A ---> uther :: A : cat === 'NP', A : head : agreement : gender === masculine, A : head : agreement : person === third, A : head : agreement : number === singular.

Page 28: PATR II Compiler

22.05.2000 PATR II Compiler 28

Offene Probleme und ErweiterungenOffene Probleme und Erweiterungen

• Syntaktischer Zucker der Form VP_1 VP_2 X

• Lexikalische Regeln

• Templates in Syntaxregeln

• Negation und Disjunktion

• Default Vererbung (Priority Union)

• ...

Page 29: PATR II Compiler

22.05.2000 PATR II Compiler 29

LiteraturLiteratur

• Shieber, Stuart (1986): An Introduction to Unification-based Approaches to Grammar. CSLI Lecture Notes.

• Gazdar, Gerald & Chris Mellish (1989): Natural Language Processing in Prolog. Addison Wesley.

• Covington, Michael A. (1994): Natural Language Processing for Prolog Programmers. Chap. 6: Parsing Algorithms. Prentice-Hall.