augmented grammars - stacks are the stanford

I

"

"

"

page 1 of 7Lecture 7

Augmented grammarsIn this lecture I will give a brief introduction to a variety of recent grammar formalisms which

extend the i is . parsing ideas, and hand oul come of the original descriptions.

In the next lecture we will look at some specific examples of phenomena which, need theaugmentations and see how they are handled by the various systems.

Loose grammars

In parsing natural language, as with all recognition systems, there is a tradefoff betweenspecificiiy (a: signing structures only when they are appropriate) and exhaustivity (assigning structuresto .5 many inputs as possible). Loot- grammars are designed to maximize exhaustivity, and are usefulwhen the benefits of assigning some soil of structure to every input outweigh the costs of giving upmuch of the specific information it conveys. For some purposes it is desirable to get some sort ofparsing even if the analysis is not complete. Information retrieval and interview simulation (see Colbyet.al.) are areas in which this has typically been true. A number of strategies have been attempted inextending the power of a simple context-free parser. These include:

Specially selected word classes oriented towards expected inputs

Eliminating "insignificant" words from the string before beginning tho parse

Special scans to pick up idiom patterns and replace them with word classes

"Fuzzy" matching of rules (allowing elements from a rule to be missing)

The result of such strategies is to allow ihe parser to assign structures to a large number ofsentences, based en a comparatively small set o' rule?. Of course, ii will often assign structureswhich not appropriate lo the input. In some cases the structure assigned will be one which shouldnot be selected at all, because one of the missing or ignored words prevents its applicability. In othercases, the one assigned will be plausible, but the information which was thrown away would haveclearly pointed to a different structure as being the one intended by the speaker.

It seems likely thai a person in understanding language is able to choose from a range ofstrategies, involving loose as well as more complete understanding, in later lectures we will discussways of describing a grammar in terms of a set of knowledge structures which can be used for thissort of partial understanding as well as more complete parsing. In the rest of this lecture we discussstrategies which are designed to keep llie specificity of context-free grammars while extending therange of phenomena they can cover.

Features and Conditions

Most natural languages (perhaps all) exhibit a kind cf interconnection between constituentswhich can be called agrooment. Seme examples of this were given in lecture 6, involving theagreement in number between ihe subject of a sentence and its verb (in English) and the agreementin number and gender between a noun and its adjectives (in Spanish). Other examples in English arethe agreement in number between determiner and noun {these lectures, tins lecture.) and the matchingof the form of a pronoun lo its function as subject or object {She kissed him. lie. kissed her).

The traditional descriptions of languages provide paradigms for structures, which can bethought of as sets of features. We can say that a particular noun has the features MASCULINESINGULAR, or that a particular pronoun has the features MASCULINE SINGULAR ACCUSATIVE. We can

CS/Linguistics 265-65 Fail 1.9/4-5

i

T. Winograd

i

A

"

"

"


then augment our grammar rules with conditions on the features the constituents have. For examplewe might have.

S -» MP VPNP -> DETERMINER NP2NP2 -> ADJECTIVE NP2NP2 -» NOUNVP -> VERBVP -> VERB NP

NUMBER *- NUMBERDETErminer " NUWDERNP2NUMBER <- NUMBERNP2NUMBER <- N'JMBERNOUNNUMBER *■ NUMBERVERBNUMBER <- NUMBERVERB

These conditions can be included as part of the grammar, and applied in conunction with anyparsing scheme. With each constituent is associated a list of features, and whenever the parser takesa step°which implies accepting a pattern match for a constituent, it must check the conditions, and setthe features for the higher structure. If the check fails, it is as if the rule didn't match. For a bottom-up parser, this is a simple addition to the step of testing rules. For a top-down parser, it must bedone at the point where the rule is "completed". If a chart is being maintained, this is the point atwhich a new edge is added. If not, it may be necessary to put some additional information into thestacks being saved, in order' to trigger the test at the appropriate time.

In order to simplify some of the rules, it is useful to allow constituents to have a multiple setof features. For example, the English "the" can be used with both singular and plural nouns, while"this" and "these" are more selective. The basic feature idea can be extended to allow a singleconstituent to be marked as both singular and plural (or in general to specify a set of features ratherthan a single one). We could then have a rule like:

Mb _, DETERMINER MP2 NUMBER *- NUMBERcETCRMINE.! r ' NUMBERNP2

where equality is replaced by set intersection, and the condition on a successful match of therule is that the resulting NUMBER be non-empty.

The features propagate their way up from the bottom of the tree. Words in the dictionary aregiven feature sets, and the rules define Ihe features of higher-level constituents in terms of thefeatures of their parts. Some versions of transformational grammar (see Chomsky, Aspects..., 1965)

assign features to individual words of the lexicon, using this as the basic way of assigning wordclasses (e.g. the feature NOUN on a word is equivalent to saying it is in the class of nouns). There areno features associated with any constituents above the word level. What is being described here is amore extended use of features throughout the grammar.

In addition to simple features like PLURAL and FEMININE it is often useful tc create features ata higher level which have no simple equivalent at the word level. For example we might have a set ofVERB features called TENSE, including INFINITIVE, PAST, PAST-PARTICIPLE, and PRESENT-PARTICIPLE.These are applied to distinguish verb forms like "ring., rang, rang, ringing: bring, brought, brought,bringing", etc. Note that this is an example where a single word form "brought" has both the featuresPAST and PAST-PARTICIPLE. We can then define a grammar which includes new productions like:

NP2 -> VERB NP2 TENSEVERB - PAST-PARTICIPLE

S -» NP VP by NPVP

-■>

AUK VERBVOICEVp - PASSIVE

VOICE - PASSIVE if AUK = beand TENSEverb - PAST-PARTICIPLE

VOICE - ACTIVE otherwiseS -> NP VP

NUMBER,-^ - NUMBERyp

"

"

"

Lecture 7 page 3 of 7

The first production allows noun phrases like "tho broken record", while the second allows"That record was broken by Steve Spitz". The simple expansion of S into NP VP does not check forthe VOICE

feature,

so it allows both "That record was broken, and "That record ia playing.. Thenotation for the rules is not fully specified here, and in fact there have been a number of differentsimple notations developed for expressing conditions as a simple compulation involving theconstituents and their features as inputs.

The idea of feature grammars or attribute grammars has been applied both in natural languageand in computer language parsing. Although the idea has often been applied directly to context-freerules it is most easily expressed as an augmentation to a transition-network formalism. By having atransition net combine a number of what would be separate context-free rules, we can avoid passingfeatures up through many levels of constituents. Woods' paper explores this idea.

Structure building

The other basic augmentation to context-free grammars is in producing assigned structureswhich are more useful than (lie simple constituent analysis. Since Chomsky was the first to emphasizethe importants of assigned structures, his idea of a "deep structure" was the original motivation.Wood's paper talks about using a set of registers to accumulate the surface constituents, then using aBUILD action to produce a deep structure tree directly as part of the parsing action. This decouplesthe assigned structure from the form of the transition net or the order of processing, and allows thesystem to make use of ihe sorts of "generalizations" which arise in transformational systems.

Kay used the chart parser to do this in a slightly different way. In our discussion of theconsumer/producer system, there was a step which was executed whenever a producer finished itsrule which resulted in adding a new edge to chart. Kay extended this by allowing it to add anarbitrary set of new edge^, perhaps representing quite a different structure than the sequence o,constituents which the producer found. The advantage of this approach is that the resultingstructure is put into the chart uniformly with the simpler structures, and therefore can serve as"producers" for yet other grammar rules. Each rule, as well as specifying the set of constituents tobe found specified ihe equivalent of a BUILD action, which could specify any set of edges to be added,based on the constituents which were found. Through a careful marshalling of the order of ruleapplication, he could again express many of the effects of transformations.

Winograd, in the- PROGRAMMAR system, did not explicitly build alternative structures, butinstead used a feature marking system to achieve the same effect. For example, a CLAUSE in theparsing process could be marked PASSIVE, indicating to the semantic routines thai the logical subjectwould be found as the object of Ihe "by" phrase. In the next lecture we will discuss some of theissues involved in explicit marking of structures.

HHHBOHKHHHBH««»«»«««H«»HH«H«»»H»M»«H«»«««H«H»«(->

PROGRAMMAR - Winograd

Bcsic Characteristics

The process is structured around the units of the grammar, which correspond to large-scalechunks of surface structure.

Use of explicit program formalisms allows the process to have a structure different from thestructure of the sentence being analyzed or the structures being assigned.

Specific processes can be triggered top-down or bottom-up and associated with particularwords.

CS/Linguistics 265-66 Fall 1974-5 T. Winograd

"

"

"


Feature lists are associated with each node, and used both for condition checking and formarking structures in a way which enables semantic routines to deal with "deep" functionalrelations. To do this it has a mechanism for searching for structures in trees.

Because program is explicit, so is any backup or parallelism

Major problems

Feature marking is a clumsy mechanism for structure assignment.

Subroutine calling is linked to structure building (as v/ith ATN), so the organization of theprocess is still tightly linked to structures. So actual grammars written were basically top-down.

Features are not explicitly organized into systems in the program -- only implicitly in thewriteup.

OHHHHHOHH«HHfI««HHHHII««Ht!I)H«HH»H««H«I-I«WTOHHHH»»H

Augmented transition networks - Woods

Bar.ic. Characteristics

The process is structured around the networks, which correspond loosely to surface structure.

Computation is seen as an adjunct to the basic procedure of stepping through the elementsone by nno. in thp form of augmentations on the arc transitions, stated in the form ofconditions.

Structure bunding operations are used to explicitly build the "deep structure", using registers(like variables).

Oriented towards depth-first sequential search with complete backup capabilities

Major Prcbem.

Hard to put in specialized processes like conjunction.

Basically top-down process structure is completely tied to surface structure by nots. Hard tofind partial sentences or skip over parts.

The chart parser - Kay

Basic Characteristics

Based on general idea of rewrite rules.

System sequences through set of rules which keep rewriting the chart.

Structure of the process comes from the form and ordering of the rewrite rules.

Basically oriented towards bottom-up parallel processing of alternatives

CS/Linguistics 265-66 Fall 1974-5 T. Winograd

1

"

"

"


Major Problems

Not much explicit structuring of grammar and process

Uses structuring of dummy elements to represent features

Inefficiencies of bottom-up orientation

HWMHOH^«H«'4e!J'.HHHH(-'ll««IK-KlH«0»«W-IMHHHHH«H(<H«Hfl«(.

The genera! syntactic processor - Kaplan

Basic Characteristics

An attempt to factor out the procedural strategies from the underlying structural observationsof the grammar, giving ihe user a chance to combine Independently and explicitly a number ofthe strategies and features described for various parsers above.

Uses consumers/producers idea, with ATN-like rules and chart-like bookkeeping structure,using registers to associate features with nodes;

Mejor problems

Since this is an attempt to build a basic "machine" it tries to make all of the ideas we havedescribed possible. Therefore it does not specifically decide many of the issues of parsing,leaving these to the particular user. It is not directly comparable to any more specificformalism or strategy.

The ATN-like orientation associates processes in a more-or-less one-to-one way with surfacestructures.

HM«H«(.HHH«HHH«eH»«HH««W»««*I««««**»«««HHW»»HttH(.M

Parsing Strategies

For grammars which arc not explicitly program oriented, different, strategies can be used. toexplore the set of possible parsings, like top-down, bottom-up, serial, parallel, etc. Thesestrategies are equivalent in the set of possible parsings the will find, but quite different inbehavior such as the amount of lime until a parsing is found, the order in which they will befound, etc. If we were to treat them only as formal systems, this might, not be of muchinterest, but there are two reasons why this kind of flexibility in "scheduling " the networks isinteresting for natural language.

The first is that for many applications it might seem feasible to allow the parser to proceeduntil it finds a single parsing, and 'hen to go ahead and use that parsing without checking tosee if there are other possibilities. Transition nets give us ways to control what that initialparsing will be. For example, one possibility is to follow the depth-first strategy, but to orderthe arcs from any node in some a-priori way, so that the arc taken is the one most likely toget the correct parsing. Another possibility is to combine breadth-first and depth-first

-strategies (perhaps with arc ordering as well) to tailor much more directly the order in whichpossibilities are explored.

A second reason for having an interest in this is the potential of using a transition net

Fall 1974-5CS/Linguistics 265-66 T. Winograd

t

"

"

"


formalism lo explain the actual empirical properties of human parsing. This goes beyond the

formal determination of which sentences are accepted, considering in addition data like the|enP th of time a human takes lo parse a particular sentence, or the likelihood of particularmistakes It may be possible to build an appropriate strategy model for a transition net which

explains 'in detail the behavior of human parsing systems. (See Kaplan (1972) for a discussionof many of the issues involved.)

iWHH

9,,

BHH«»(-l«HH»H"»««,,«"

I,WIiWHW,,WHHHH'

WH,,HeHH

Issues

Porspicuousnoss

How easy is it for people to understand and modify the knowledge structures

Choice of assigned structures

What sort of structures are built? How are they related to the surface structures and to

semantics?

Control of the process

How much is the process oriented towards the particular surface structures? How explicitly

should the grammar specified which things are to be handled top-down and which bo.tom-up.

How well can they be mixed?

Formal Properties

This isn't an issue for most of these grammars, since they are formally equivalent to Turing

machines. The issue is whether or not this has any relevance to their status as linguistictheories.

Extension io realistic versions of language

Marty of the forms used in actual communication do not correspond to a simple notion of

grammar

"

Examples of such phenomena can be found in conjunction and ellipsis. If our grammar is to

tc oun? .or he way n which conjunctions like "and" can appear. It must accept the fact that a almostany point in the structure of a sentence, we can insert an "arid" and a constituent which combines in

some way with the previous one.

Thinking in transition net terms, this means that there is an arc from almost every node,

labelled "and", and going back to some previous point in the same network. Even worse, there areoften several possibilities, as in the conjoined sentences:

She gave them a haul and I gave them a spoon.She gave them a howl and took nothing in return.

She gave them a hotel and me a spoon.She gave them a bowl and a spoon.She gave, them a bowl and spoon.

It —ms that the behavior of such conjunctions can be characterized in a rule stating moredirectly "Whenever you run across and, try to parse a constituent which matches some part of oneyou are in the process of building." Even with precise definitions for what is meant by a part and in

Fall 1974-5CS/Linguistics 265-66 T. Winograd

«

"

"

"

page 7 of 7Lecture /

the process", it is not clear how io integrate such an algorithm into the transition-net formalismwithout simply grafting It on.

Ellipsis is similar -- Asked Do you want a piece of apple, pie?, I can reply Bring me a piece, orfiring me a niece of pecan, or Bring me. pecan, or Bring me two. All of these are natural forms inEnglish, and the grammar must worry in detail about just when things can be left out. Again in thestraightforward transition net

formalism,

this would involve having a multitude of lambda-arcs,bypassing each element which might be deleted. It seems much more satisfying to express thedeletion as a separate algorithm which works in conjunction with the more usual transition neiworkmechanisms.

One of (lie interesting future possibilities for work with transition no's lies in specifying moreclearly just how such additional processes can be added on without losing the clarity of the nets. Oneof the prime advantages of a network formalism is that is gives a clear and explicit statement of thealternatives which are possible at any point in the parsing of an input. This is not true for' arbitraryprograms, and we want lo be particuarly careful about putting in layers of "hidden" processing.

Finally, it is riot easy to decide how semantic considerations should be intermixed with thestrategy component of a transition nel parser. We discussed earlier the possibility of modellinghuman' parsing performance through specific search strategies for following the alternatives in thegrammar. If human strategies make significant use of meaning, then there must be some way to addthese considerations to the more straightforward considerations of search. As is the case with- mostformalisms for natural language, there is little understanding of how to intermix the syntactic andsemantic considerations in a satisfying way. One of the main justifications for transition nets is thatby making explicit the structure of choices available in the syntactic form of language, they provide agood tie-in point from which to extend the analysis.

T. WinogradCS/Linguistics 265-66 Fall 1974-5

augmented grammars - stacks are the stanford

Documents