what it’s ? “parsing” parsing or syntactic analysis is the process of analysing a string of...
TRANSCRIPT
Student: Alexandru Iliescu
A unification – based syntactic parser
PART
What it’s ?“parsing”
Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammer. The term parsing comes from Latin pars, meaning part (of speech).
The term has slightly different meanings in different branches of linguistics and computer science. Traditional sentence parsing is often performed as a pedagogical exercise, especially in inflected languages such as the Romance languages or Latin, sometimes with the aid of devices such as sentence diagrams. It usually emphasizes the importance of grammatical divisions such as subject and predicate.
Parsing a computer language
with two levels of grammar:
lexical and syntactic.
The first stage is the token
generation, or lexical analysis,
by which the input character
stream is split into meaningful
symbols defined by a grammar
of regular expressions.
For example, a calculator
program would look at an input
such as "12*(3+4)^2" and split
it into the
tokens 12, *, (, 3, +, 4, ), ^, 2,
each of which is a meaningful
symbol in the context of an
arithmetic expression.
The next stage is parsing or
syntactic analysis, which is
checking that the tokens form
an allowable expression.
D-PART PC-PART
D-PARTD-PART is a development environment
for unification-based grammers on Xerox 1100 series work stations.
The first version of D-PART, was written at the Scandinavian Summer Workshop for Computational Linguistics in Helsinki, Finland, in 1985.
D-PART
This formalism is suitable for
encoding a wide variety of grammers.
D-PART
D-PART consists of four basic parts:
A unification package;
Interpreter for rules and lexical items;
Input/output routines for directed
graphs;
An Earley style chart parser.
D-PARTParsing and Unification
x
y
unify copyz z’
restore x
restore y
The method entails making only one copy, not
two, when the operation succeds. In the event of
failure, D-PART simply restores the original structures
without copying anything.
D-PARTRules
A rule in D-PART is a list of atomic
constituent labels that may be followed by
specifications.
D-PARTRules
Example of a rule:
S -> NP VPIn D-PART notation is written as
(S NP VP)
D-PARTRules
Before a rule is used by the parser, D-
PART compiles it to a feature set. A feature
set can be displayed in different ways – for
example, as a matrix or as a direct graph.
D-PARTLexical Rules
A lexical rule is a special kind of
template with two attributes: in and out.
D-PARTLexical Rules
In applying a lexical rule to a graph, the
latter is first unified with the value of in. If
the operation succeds, the value of out is
passed on as the result.
D-PART
D-PART is not a commercial product. It is
made available to users outside SRI who
might wish to develop unification-based
grammars.
PC-PART
PC-PART is a implementation of PART-II
computational linguistic formalism for
personal computers, available for MS-DOS,
Microsoft Windows, Macintosh and Unix,
and is still under devlopment.
PC-PART
PC –PART has the following parts:
Chart parser;
Unification package;
Interpreter for grammar and lexical
rules;
PC-PART
PC-PATR uses a left corner chart parser
with these characteristics: bottom-up parse with top-down filtering based on
the categories;
left-to-right order-after each word is added to the
chart.
PC-PART
Unification
Unification is the basic operation applied to
feature structures in PC-PATR. It consists of the
merging of the information from two feature
structures. Two feature structures can unify if their
common features have the same values, but do not
unify if any feature values conflict.
PC-PART
Grammar rulesA PC-PATR grammar rule has these parts, in the
order listed:1. the keyword Rule;2. an optional rule identifier enclosed in braces ({});3. the nonterminal symbol to be expanded;4. an arrow (->) or equal sign (=);5. zero or more terminal or nonterminal symbols;6. an optional colon (:);7. zero or more feature constraints;8. an optional period (.).
PC-PART
Grammar rules
The optional rule identifier consists of
one or more words enclosed in braces.
PC-PART
Grammar rules
For example, this rule says that any category in the grammar rules can be replaced by two copies of the same category separated by a CJ.
Rule X -> X_1 CJ X_2 <X cat> = <X_1 cat> <X cat> = <X_2 cat> <X arg1> = <X_1 arg1> <X arg1> = <X_2 arg1>
PC-PART
Lexical rules
A PC-PATR lexical rule has these parts, in the order listed:1. the keyword Define;2. the name of the lexical rule;3. the keyword as;4. the rule definition;5. an optional period (.).
PC-PART
Several people have contributed to the
development of PC-PATR over the past few
years.Alan Buseman, Jim Skon, Bob Kasper,
and Nathan Miles all contributed to an earlier
program named SILPATR that contained the
same basic parsing and unification functions.
Bilbliography:
D-PART: A Development Environment for
Unification-Based Grammars, Lauri Karttunen;
PC-PART Reference Manual, Stephen McConnel;
Internet.