patr ii compiler

Post on 13-Jan-2016

48 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

PATR II Compiler. Prolog Aufbaukurs SS 2000 Heinrich-Heine-Universität Düsseldorf Christof Rumpf. Notationskonventionen. Instantiierungsmodus von Argumenten Blau : Input-Argumente Rot : Output-Argumente Cut roter Cut ! grüner Cut ! Prädikatsdefinitionen abgeschlossen - PowerPoint PPT Presentation

TRANSCRIPT

PATR II PATR II CompilerCompiler

Prolog Aufbaukurs SS 2000

Heinrich-Heine-Universität Düsseldorf

Christof Rumpf

22.05.2000 PATR II Compiler 2

NotationskonventionenNotationskonventionen

• Instantiierungsmodus von Argumenten– Blau: Input-Argumente– Rot: Output-Argumente

• Cut– roter Cut !– grüner Cut !

• Prädikatsdefinitionen– abgeschlossen– wird fortgesetzt

22.05.2000 PATR II Compiler 3

DirektivenDirektiven

% external resources

:- [tokenize]. % load tokenizer

% operators

:- op(510, xfy, : ). % attr:val:- op(600, xfx, ===). % path equation:- op(1100,xfx,'--->'). % syntax rule, lexical entry:- op(1200,xfx,'::'). % description annotation

22.05.2000 PATR II Compiler 4

3 Compiler-Komponenten3 Compiler-Komponenten

• Tokenizer– Input: PATR II-Grammatik– Output: Token-Zeilen

• Präprozessor– Input: Token-Zeilen– Output: Token-Sätze

• Syntax-Compiler– Input: Token-Sätze– Output: Prolog-Klauseln

compile_grammar(File):-clear_grammar,tokenize_file(File), read_sentences,compile_sentences.

22.05.2000 PATR II Compiler 5

Tokenizer-InputTokenizer-Input

; Shieb1.ptr; Sample grammar one from Shieber 1986

; Grammar Rules; ------------------------------------------------------------

Rule {sentence formation} S --> NP VP:

<S head> = <VP head><VP head subject> = <NP head>.

Rule {trivial verb phrase} VP --> V:

<VP head> = <V head>.

; Lexicon; ----------------------------------------------------------------

Word uther:<cat> = NP<head agreement gender> = masculine<head agreement person> third<head agreement number> = singular.

22.05.2000 PATR II Compiler 6

Tokenizer Output = Präprozessor InputTokenizer Output = Präprozessor Input

line(1,[o($;$),b(1),u($Shieb1$),o($.$),l($ptr$)]).line(2,[o($;$),b(1),u($Sample$),b(1),l($grammar$),b(1),l($one$),b(1),l($from$),b(1), ...line(3,[ ]).line(4,[ ]).line(5,[o($;$),b(1),u($Grammar$),b(1),u($Rules$)]).line(6,[o($;$),b(1),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$),o($-$), ...line(7,[ ]).line(8,[u($Rule$),b(1),o(${$),l($sentence$),b(1),l($formation$),o($}$)]).line(9,[b(2),u($S$),b(1),o($-$),o($-$),o($>$),b(1),u($NP$),b(1),u($VP$),o($:$)]).line(10,[b(1),o($<$),u($S$),b(1),l($head$),o($>$),b(1),o($=$),b(1),o($<$),u($VP$),b(1), ...line(11,[b(1),o($<$),u($VP$),b(1),l($head$),b(1),l($subject$),o($>$),b(1),o($=$),b(1), ...line(12,[b(1)]).line(13,[u($Rule$),b(1),o(${$),l($trivial$),b(1),l($verb$),b(1),l($phrase$),o($}$)]).line(14,[b(2),u($VP$),b(1),o($-$),o($-$),o($>$),b(1),u($V$),o($:$)]).line(15,[b(1),o($<$),u($VP$),b(1),l($head$),o($>$),b(1),o($=$),b(1),o($<$),u($V$),b(1),......line(41,[b(1),o($<$),l($head$),b(1),l($subject$),b(1),l($agreement$),b(1),l($number$),...line(42,[eof]).

22.05.2000 PATR II Compiler 7

Präprozessor Output = Compiler InputPräprozessor Output = Compiler Input

sentence( 1,11,[u($Rule$),o(${$),l($sentence$),l($formation$),o($}$),...

sentence(12,15,[u($Rule$),o(${$),l($trivial$),l($verb$),l($phrase$),o($}$),...

sentence(16,24,[u($Word$),l($uther$),o($:$),o($<$),l($cat$),o($>$),o($=$),...

sentence(25,30,[u($Word$),l($knights$),o($:$),o($<$),l($cat$),o($>$),o($=$),...

sentence(31,36,[u($Word$),l($sleeps$),o($:$),o($<$),l($cat$),o($>$),o($=$),...

sentence(37,41,[u($Word$),l($sleep$),o($:$),o($<$),l($cat$),o($>$),o($=$),...

sentence(42,42,[eof]).

Der Präprozessor entfernt Kommentare und Leerzeichen und fasst mit einem Punkt terminierte Sätze aus mehreren Zeilen zusammen. Der eigentliche Compiler kann sich dann auf das wesentliche konzentrieren.

22.05.2000 PATR II Compiler 8

Präprozessor: Main LoopPräprozessor: Main Loopread_sentences:-

abolish(cnt/1),write('preprocessing...'), nl,repeat,count(I),read_sentence(N,M,S),assert(sentence(N,M,S)),put(13), tab(3), write(I), write(' sentences preprocessed'),S = [eof], !, nl.

read_sentence(N,M,S):-retract(line(N,L)),read_sentence(L,N,M,S), !.

Backtracking

22.05.2000 PATR II Compiler 9

Präprozessor: Satz lesenPräprozessor: Satz lesen

read_sentence([eof],N,N,[eof]):- !. % end of fileread_sentence([o($.$)|_],N,N,[]):- !. % end of sentenceread_sentence([o($;$)|_],N,M,S):- !, % skip comment

N1 is N+1,retract(line(N1,L)), % next lineread_sentence(L,N1,M,S).

read_sentence([],N,M,S):- !, % end of lineN1 is N+1,retract(line(N1,L)), % next lineread_sentence(L,N1,M,S).

read_sentence([b(_)|T1],N,M,T2):- !, % skip blanksread_sentence(T1,N,M,T2).

read_sentence([H|T1],N,M,[H|T2]):- % collect tokensread_sentence(T1,N,M,T2).

22.05.2000 PATR II Compiler 10

Compiler: Main LoopCompiler: Main Loop

compile_sentences:-abolish(cnt/1),write('compiling...'), nl,retract(sentence(N,M,S)),compile_sentence((N,M),C,S,[]),assert(C),count(I), put(13), tab(3), write(I), write(' sentences compiled'),S = [eof], !,nl.

Backtracking

22.05.2000 PATR II Compiler 11

Compiler: SatztypenCompiler: Satztypen

% compile_sentence(Position,Clause,Sentence,Rest)

compile_sentence(_,C) --> [eof], !, {C = finished}.compile_sentence(_,C) --> syntax_rule(C), !.compile_sentence(_,C) --> lex_entry(C), !.compile_sentence(_,C) --> template(C), !.compile_sentence(P,_,_,_):-

P = (N,M), nl,write(' error in sentence between lines '),write(N),write(' and '), write(M), nl, fail.

22.05.2000 PATR II Compiler 12

Syntax-RegelnSyntax-Regeln

syntax_rule(C) --> rs('Rule'), !, syntax_rule_cont(C).

syntax_rule_cont((Expansion :: Descr)) -->

rule_name,

sr_expansion(Expansion,Sugar),

rs(:), !,

sr_path_equations(Equations,Sugar),

{sr_sugar_cats(Sugar,Equations,Descr)}.

22.05.2000 PATR II Compiler 13

Reservierte SymboleReservierte Symbolers(=) --> [o($=$)], !.rs(:) --> [o($:$)], !.rs(<) --> [o($<$)], !.rs(>) --> [o($>$)], !.rs('{') --> [o(${$)], !.rs('}') --> [o($}$)], !.rs('Rule') --> [u($Rule$)], !.rs('Word') --> [u($Word$)], !.rs('Let') --> [u($Let$)], !.rs('be') --> [l($be$)], !.rs('-->') --> [o($-$),o($-$),o($>$)], !.

Alternative: Definiere für jedes reservierte Symbol ein eigenes Prädikat, z.B. colon statt rs(:).

22.05.2000 PATR II Compiler 14

Weitere TerminalsymboleWeitere Terminalsymbole

uatom(A) --> [u(S)], {atom_string(A,S)}.latom(A) --> [l(S)], {atom_string(A,S)}.satom(A) --> [s(S)], {atom_string(A,S)}.

int(I) --> [i(I)].

atom(A) --> uatom(A), !.atom(A) --> latom(A), !.atom(A) --> satom(A), !.

atomic(A) --> atom(A), !.atomic(A) --> int(A), !.

22.05.2000 PATR II Compiler 15

RegelnamenRegelnamen

rule_name --> rs('{'), !, % start of rule namecurley_braces_terminated_string.

rule_name --> []. % rule names are optional

curley_braces_terminated_string --> rs('}'), !. % end of rule name

curley_braces_terminated_string --> [_], % read any symbolcurley_braces_terminated_string.

Regelnamen werden überlesen und nicht in die Prolog-Repräsentation der Regeln übernommen.

22.05.2000 PATR II Compiler 16

RegelexpansionRegelexpansion

sr_expansion((LHS ---> RHS),[LSugar|RSugar]) --> sr_lhs(LHS,LSugar),rs('-->'),sr_rhs(RHS,RSugar).

sr_lhs(LHS,Sugar) --> fsd(LHS,Sugar).sr_rhs(RHS,Sugar) --> ne_fsd_seq(RHS,Sugar).

ne_fsd_seq((FSD,FSDs),[Sugar|Sugars]) --> fsd(FSD,Sugar), ne_fsd_seq(FSDs,Sugars).

ne_fsd_seq(FSD,[Sugar]) --> fsd(FSD,Sugar).

fsd(Var,(FSD,Var)) --> uatom(FSD).

22.05.2000 PATR II Compiler 17

Syntax-Regeln: PfadgleichungenSyntax-Regeln: Pfadgleichungen

sr_path_equations((E,Es),Sugar) -->sr_path_equation(E,Sugar),sr_path_equations(Es,Sugar).

sr_path_equations(E,Sugar) --> sr_path_equation(E,Sugar).

sr_path_equation((LHS === RHS),Sugar) --> sr_path(LHS,Sugar), rs(=),sr_val(RHS,Sugar).

sr_val(V,Sugar) --> sr_path(V,Sugar).sr_val(V,_) --> atomic(V).

22.05.2000 PATR II Compiler 18

Syntax-Regeln: PfadeSyntax-Regeln: Pfade

sr_path(Var,Sugar) --> rs(<), fsd(FSD), rs(>), {member((FSD,Var),Sugar)}, !.

sr_path(Var:P,Sugar) --> rs(<), fsd(FSD), ne_feature_seq(P), rs(>), {member((FSD,Var),Sugar)}, !.

ne_feature_seq(F) --> feature(F).ne_feature_seq(F:P) -->

feature(F), ne_feature_seq(P).

fsd(FSD) --> uatom(FSD).feature(F) --> atomic(F).

22.05.2000 PATR II Compiler 19

Syntaktischer ZuckerSyntaktischer Zucker

sr_sugar_cats([(Cat,Var)|Sugar],Equations,((Var:cat === Cat),Descr)):-

sr_sugar_cats(Sugar,Equations,Descr).

sr_sugar_cats([],Descr,Descr).

Rule {sentence formation} S --> NP VP: <S head> = <VP head> <VP head subject> = <NP head>.

Rule {sentence formation} X0 --> X1 X2:

<X0 cat> = S<X1 cat> = NP<X2 cat> = VP<X0 head> = < X2 head><X2 head subject> = <X1 head>.

22.05.2000 PATR II Compiler 20

Lexikalische EinträgeLexikalische Einträge

lex_entry(C) --> rs('Word'), !, lex_entry_cont(C).

lex_entry_cont((FS ---> L :: Descr)) --> lexeme(L),rs(:), !,lex_definition(FS, Descr).

lexeme(L) --> atom(L).

22.05.2000 PATR II Compiler 21

Lexikon: MerkmalsstrukturenLexikon: Merkmalsstrukturen

lex_definition(FS,(LDef,LDefs)) --> lexdef(FS,LDef),lex_definition(FS,LDefs).

lex_definition(FS,LDef) --> lexdef(FS,LDef).

lexdef(FS,LDef) --> template_name(FS,LDef), !.

lexdef(FS,LDef) --> lex_path_equation(FS,LDef), !.

22.05.2000 PATR II Compiler 22

Lexikon: PfadgleichungenLexikon: Pfadgleichungen

lex_path_equation(FS, (LHS === RHS)) --> lex_path(FS, LHS), rs(=), !,lex_val(FS, RHS).

lex_path(FS,FS:P) --> rs(<), ne_feature_seq(P), rs(>), !.

lex_val(FS,V) --> lex_path(FS,V).lex_val(_,V) --> atomic(V).

22.05.2000 PATR II Compiler 23

TemplatesTemplates

template(C) --> rs('Let'), !, template_cont(C).

template_cont((N :- TDef)) --> template_name(FS,N),rs('be'),template_definition(FS,TDef),{assert(template(N))}.

22.05.2000 PATR II Compiler 24

Templates: Head & BodyTemplates: Head & Body

template_name(FS,N) -->atom(A),{N =.. [A,FS]}.

template_definition(FS,TDef) -->lex_definition(FS,TDef).

22.05.2000 PATR II Compiler 25

Löschen einer GrammatikLöschen einer Grammatik

clear_templates:-template(T),T =.. [F,_],abolish(F/1),fail.

clear_templates:- abolish(template/1).

clear_grammar:-abolish('::'/2),abolish(line/2),abolish(sentence/3),clear_templates.

22.05.2000 PATR II Compiler 26

Compiler OutputCompiler Output

A ---> B , C :: A : cat === 'S', B : cat === 'NP', C : cat === 'VP', A : head === C : head, C : head : subject === B : head.

A ---> uther :: A : cat === 'NP', A : head : agreement : gender === masculine, A : head : agreement : person === third, A : head : agreement : number === singular.

22.05.2000 PATR II Compiler 28

Offene Probleme und ErweiterungenOffene Probleme und Erweiterungen

• Syntaktischer Zucker der Form VP_1 VP_2 X

• Lexikalische Regeln

• Templates in Syntaxregeln

• Negation und Disjunktion

• Default Vererbung (Priority Union)

• ...

22.05.2000 PATR II Compiler 29

LiteraturLiteratur

• Shieber, Stuart (1986): An Introduction to Unification-based Approaches to Grammar. CSLI Lecture Notes.

• Gazdar, Gerald & Chris Mellish (1989): Natural Language Processing in Prolog. Addison Wesley.

• Covington, Michael A. (1994): Natural Language Processing for Prolog Programmers. Chap. 6: Parsing Algorithms. Prentice-Hall.

top related