parsing with transformers

8/8/2019 Parsing With Transformers

http://slidepdf.com/reader/full/parsing-with-transformers 1/41

Parsing with Transformers

Olivier Gournet, Alexandre Borghi and Nicolas Pierron

Technical Report no0512, March 2006revision 920

To disambiguate C++, which is an essential step toward the Transformers project, a new approach us-ing attribute grammar has been adopted. An attribute grammar system, based on Stratego/XT, has beendeveloped from scratch to meet our needs. However, it still can be greatly improved, particularly ourattribute syntax.

Attribute grammars and its Transformers counterpart implementation, sdf-attribute will be de-picted and (hopefully) demystified. Although this paper is mainly focused on attribute grammar, prepro-cessing is not forgiven.

Keywords

Transformers, C++, program transformation, parsing, disambiguation, attribute grammar

Laboratoire de Recherche et Développement de l’Epita14-16, rue Voltaire – F-94276 Le Kremlin-Bicêtre cedex – France

Tél. +33 1 53 14 59 47 – Fax. +33 1 53 14 59 22

[email protected] – http://transformers.lrde.epita.fr/

http://[email protected]/

http://transformers.lrde.epita.fr/

http://transformers.lrde.epita.fr/

http://[email protected]/



2

Copying this document

Copyright c 2006 LRDE.Permission is granted to copy, distribute and/or modify this document under the terms of

the GNU Free Documentation License, Version 1.2 or any later version published by the Free

Software Foundation; with the Invariant Sections being just “Copying this document”, no Front-Cover Texts, and no Back-Cover Texts.

A copy of the license is provided in the file COPYING.DOC.



Contents

Introduction 6

1 Preprocessor headache 81.1 How things work today . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2 Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Transformers’ AG implementation 112.1 Evaluator generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.2 attrsdf2table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.3 str-lazy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.4 str-ref . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1.5 attr-defs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.1.6 Attribute propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.1.7 Checker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.1.8 Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Transformers’ AG Usage 22

3.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.1 Basic Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.1.2 Inherited Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.1.3 Attribute Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Automatic rules propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.1 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2.2 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Transformers’ AG syntax proposal 284.1 Default value/code for horizontal propagation . . . . . . . . . . . . . . . . . . . . 284.2 Vertical propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.3 Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.4 Ambiguous synthesizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.5 The woes of this AG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5 Others AG systems 325.1 UUAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.1.1 Presentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325.1.2 Writing AG in Utrecht University Attribute Grammar System (UU-AG) . 325.1.3 Incorporate it in Transformers . . . . . . . . . . . . . . . . . . . . . . . . . 335.1.4 A comparative example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2 FNC-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.2.1 Evaluators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365.2.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2.3 OLGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2.4 Incorporate it in Transformers . . . . . . . . . . . . . . . . . . . . . . . . . 38Conclusion 39



CONTENTS 4

6 Bibliography 40



List of Figures

2 Namespace definition ambiguity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.1 Preprocessing phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.1 Parse table generation, for c-grammar. . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Parse table generation, attrsdf2table details. . . . . . . . . . . . . . . . . . . . . . . 13

3.8 Left hand attribute traversal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.9 A list, seen from AsFix viewpoint. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27



Introduction

In almost every introduction papers about Transformers, people can learn that this project at-tempts to be a program transformation framework (Anisko et al., 2003), from source code tosource code. It is composed of a parser for several languages (with ISO/C++ as our maintarget), and tools to work on grammars to ease transformations. It heavily uses the genericrewriting Stratego/XT framework, which offers a solid foundation to the project.

However, in spite of our earlier optimism, Transformers is not able to parse C++ yet. We arestill working on the first stage of the parsing process, namely the disambiguation process1.

Attribute grammar were described a long time ago by Knuth ( Knuth, 1968). They representa way to attach semantic to an AST. Given a free-context grammar, at each of its non-terminalrules, two sets of symbols are attached, called attributes. Attributes can be “synthesized”, ie.defined solely in terms of attributes of the descendants of the corresponding non-terminal symbol, while other attributes are “inherited”, ie. defined in terms of attributes of the ancestors of the nonterminal symbol.

This paper is mainly about attribute grammars. Firstly, we will describe how to use the dedi-cated Attribute Grammar (AG) system2 of Transformers, then its internal working.

Ambiguity as leitmotif

Our real goal is the capability to use C/C++ as a source and a target language. We already haveC and C++ context-free grammar, inspired from the ISO standard (respectively ISO/IEC (1999)and ISO/IEC (2003)), but unfortunately their syntaxes are highly ambiguous. For instance, forthe namespace definition as stated in the standard, the listing 1 gives two derivation trees. Itcomes from the fact that the standard makes distinction between a namespace seen for the firsttime, and an extension to an already defined namespace. However, nothing can distinguish

them syntactically. The resulting tree can be seen in the figure 2.

NamedNSDefinition −> NSDefinition

OriginalNS −> NamedNSDefinitionExtensionNS −> NamedNSDefinition

"namespace" Identifier "{" NSBody "}" −> OriginalNS"namespace" OriginalNS "{" NSBody "}" −> ExtensionNS

Listing 1: C++ namespace ambiguity

1Some suggested the temporarily renaming of Transformers to Parsers.2If somebody finds a good name for it, mail us!



7 LIST OF FIGURES

NSDefinition

NamedNSDefinition NamedNSDefinition

ExtensionNS

OriginalNS

OriginalNS

Identifier

"namespace"

"namespace" NSBody

NSBody"{"

"}"

"}"

"{"

Identifier

Figure 2: Namespace definition ambiguity.

Several methods to disambiguate these languages (Durlin, 2007, 2008) have been tried. Firstly,there was ASF+SDF, which gave bad results. It was too much declarative and did not have theexpressive power we wanted. Then, came Stratego/XT, which had a different apprehension of the problem. Here, transformations are done using the paradigm of traversals and rewriting

rules. We started using it to disambiguate C++, but afterwards it revealed itself too slow, notextensible enough, and not maintainable enough. But it was a long time ago and the tools wewere using have greatly evolved since then. Finally, a compromise between the two has beenfound, raising attribute grammars on top of our priorities.



Chapter 1

Preprocessor headache

A quote from the Boost Preprocessor Library (?) website:

Problem #3

"I’d like to see Cpp abolished." - Bjarne Stroustrup

Solution

The C/C++ preprocessor will be here for a long time.

In practice, preprocessor metaprogramming is far simpler and more

portable than template metaprogramming [Czarnecki].

The use of a preprocessor is a tricky legacy of the C language, allowing programmers touse macros, sometimes in a weird manner. It makes parsing and unparsing more difficult forTransformers by introducing holes in the grammar, as generic tools are used. But as it is still inusage today, it must be managed in a good way, otherwise Transformers would be unusable.

The following requirement should be seen as a basis for Transformers capabilities to handlepreprocessor directives. However, some of them should be seen as optional, depending onTransformers fate.

• Handle control lines (PP directives). Of course, this is the main objective.

• Pretty-printing. Some efforts must be made to rewrite text as close as the original.

• Keep file structure. With the #include directive, a pretty huge amount of files can beparsed for a single compilation unit. It would be nice to respect the file structure whenpretty-printing.

• Do substitutions for all possible branches.

• Optimization. Little experimentation with Transformers shows that it is already slowenough, there is no need to add another launch and take coffee process into the pipeline.

• Error handling. As another front end process is added into the pipeline, errors rising

further must be correctly translated and reported to the user (for example: locations).

Things have not greatly evolved since the last attempt to manage it. Anyway, a quick sum-mary of the whole picture will be given, so one could easily take back the subject and do things



9 Preprocessor headache

better. Firstly, the current implementation will be looked over (more details can be found inGournet (2004)), then ideas collected over the time will be presented.

1.1 How things work today

Source File cppPreprocessed

source

sglr

asfix-concat

Perl scriptSplit

Preprocessedsource

Preprocessedsource

Preprocessedsource

Preprocessedsource chunks

AsFix parseForsest

Preprocessedsource

Preprocessedsource

Preprocessedsource

AsFix parsetree chunks

Figure 1.1: Preprocessing phase

Currently, the system is able to parse and expand any PP token, thanks to SGLR. However,the current system is not able to unparse these expanded macros, nor can it recreate the filehierarchy. The system pipeline is presented in figure 1.1. First, a source file is given to cpp, withthe necessary flags. These flags do the following: set cpp the input language source, keep com-ments, and remove tokens specifics to the GNU extension (tokens that are not in the standard).

It produces source code representing the entire compilation unit.

Then, a tiny hand written perl script is applied, to split this monolithic output into smallchunks. After that, SGLR processes these chunks one after one. The reason it cannot parse cppoutput at once is because SGLR stops if it encounter too many ambiguities, as discussed below.At this stage, we have a bunch of small parse forests, that we have to concatenate together.parse-c takes care of this, with a little complication. We must care about rule dependency, because as soon as two parse forests are concatenated together, some evaluation rules must beadded to link the two chunks. This is actually the subject of a deeper study, namely partialevaluation, but right now it is solved in a easy way. “Cuts” (between two preprocessed sourcechunks) are always done on the same grammar symbol, so the glue is extracted from a simplesnippet of code (see listing 1.2), then pasted between our two parse forests to keep the wholething consistent.

int i ;int i ;

Listing 1.2: Sample code to retrieve the evaluation rules needed to stick AsFix. These two linesare specifically provided to retrieve evaluation rules that lay between them. Subsequently, theserules will be pasted to stick again the cut parse forest.

Splitting output and then reassemble it was the solution originally kept, but there is anotherone. This one is to patch SGLR to let it accepts more that 10 000 ambiguities (500 000 should be a good choice). That was the chosen solution for the biggest benchmarks. One of them hasabout 60 000 ambiguities in one chunk. Actually, none of these two methods is perfect, so theyshould be used jointly.

At this stage, we have one big AsFix parse forest, ready for disambiguation and transfor-mation. The stage after transformation, preprocessing symbols refactoring, is scrupulouslyskipped, for the reason that nothing is implemented.



1.2 Directions 10

1.2 Directions

This section is a placeholder for directions that have been thought over the time. They may ormay not be used to carry out this problem.

• Improve our current technique, by patching cpp to be able to store more informations.Then, write a program (any language should be good) to do the pretty printing.

• Build from scratch yet another C preprocessor, and handle macros ourself. This is a fall back to the first technique, in the case where cpp reveals to be too much difficult to patchand maintain.

• Use SGLR with the so called Island Grammar (Largillier, 2005), to fetch only tokens weare interested in. The standard defines a grammar for preprocessor directives, it could be derived from it. Then, use a Stratego filter to do the same job as cpp, pretty printit in one source file. Eventually, it could be reparsed with our regular C/C++ grammar

and managed as if there was no PP directives. Thomas Largillier is currently working inthis way to preprocess and unpreprocess files using a grammar based on the one used byASF+SDF. More details about his work can be found in ?.



Chapter 2

Transformers’ AG implementation

The evaluator is under development, and the design, like the implementation, is not yet frozen.This means that parts of this chapter may or may not reflect the reality, and might becomeobsolete soon.

Two parts are distinctly separated. The first concerns the compilation of our AG system (thecompilation), and the second, the current attribute computation (the execution).

2.1 Evaluator generation

2.1.1 Overview

The Syntax Definition Formalism (SDF) modules are packed according to AttrSdf, which ex-tends SDF with our attribute syntax. This is performed by esdf , which is best described in thepaper Demaille et al. (2005). Then, a parse table and an evaluator are generated by the wholeattrsdf2table chain, composed of several steps. Besides the parse table and the evaluator, a prettyprint table is generated, as well as a signature file that will be used in transformation filters. Fig-ure 2.1 gives an overview of all these steps. Most of the tools used in this pipeline are standardStratego/XT tools, so they won’t be explained in details.

One tool that need more consideration is attrsdf2table, and will be thoroughly explained be-low.



2.1 Evaluator generation 12

pack-esdf

delete-desugared-cons

cat

parse-attrsdf-defintion attrsdf2table

SDF Modules

pp-attrsdf

sdf-cons

boxed-desugar

sdf-strip

Extended SDF Definition

Extra StrategoSources

pp-sdf

sdf2rtg

rtg2sig

SDF StrategoSignature

Parse Table

StrategoEvaluator

Source

boxed2pp-table

pp-pp-table

parse-pp-table

Pretty-printTable

strc

gcc

Evaluator

Figure 2.1: Parse table generation, for c-grammar.

2.1.2 attrsdf2table

The helper program, attrsdf2table, is composed of several filters (see figure 2.2):

• parse-attrsdf-definition: Produces an AsFix corresponding to our grammar. It takes the.edef file produced by esdf as input.

• attrs-desugar-ns: For attribute that aren’t explicitly embodied in a namespace, set it in the

current namespace (as explained in section 3.1.1).

• embed-attributes: There are two problems with our attribute section when translating toa parse table:



13 Transformers’ AG implementation

parse-attrsdf-definition

attrs-desugar-ns

sdf2table

deembed-attributes

sdf-labelize

attr-defs

Parse Table

SDF ExtendedDefinition

embed-attributes

sdf-strip

pp-attrsdf

clean-tbl-attributes

attrc

pp-stratego

EvaluatorStratego Source

from.tbl

deembed.tbl

labelized.tbl

desugared.tbl

Extra Stratego

Sources

Figure 2.2: Parse table generation, attrsdf2table details.

– The original SDF syntax accepts ATerms into annotation lists, but not symbols thatare capitalized as they are seen as SDF non-terminals. To avoid this problem, a filterprepends a lower case letter to all constructors for the parse table generation.

– Non-terminal aliases (labels) in production rules are not kept, but they are referencedas attribute names, and used during the evaluation. Thus, to ensure the translation between sorts and label names, an alias table is added to the annotations.

• sdf-strip: This is a tool coming from the Stratego/XT distribution, doing some pre-processingneeded by sdf2table.

• pp-attrsdf: Transform the input AsFix into text. This is unfortunately needed as sdf2tablecannot take an ATerm as input.

• sdf2table: Translate our grammar to a parse table. Starting from this point, work will bedirectly done on the parse table.

• deembed-attributes: Clean the gory hacks introduced by embed-attributes.




• sdf-labelize: During the parse table creation, some rules may be added (refer to 3.10 and3.11), with a great probability that the same non-terminal will appear twice in a produc-tion. Moreover, as it was said previously, sdf2table does not care about labels. To be ableto distinguish these symbols, labels are created. In a more general manner, it generates alabel for all non-terminals in each production, to ease further filters and evaluation.

• attr-defs: With a single traversal, this filter provides the checker (as explained in section2.1.7) and attributes propagation.

• clean-tbl-attributes: At this stage, the parse table is made heavy by annotation rules. Butthese annotations are not required anymore, they can be safely removed. If they were keptin the parse table, sources processed by SGLR with this parse table would also containthese annotation rules, and would enlarge the output AsFix 1. Labels are also removed, asthey will not be useful anymore.

• attrc: The attribute grammar evaluator is generated from this filter. The previous versionof the evaluator (used in Transformers 0.3) was based on the data-flow paradigm andwas not generated. Everything was dynamic and used only local information. It was

very slow and thus an evaluator generator has been created from scratch by ValentinDavid in mid-2005. The current version (used in Transformers 0.4) is generated frominformation contained in the grammar, so it can be much more efficient. The design of thegenerated evaluator relies on lazy evaluation to resolve attributes easily in the right order.Thus, attributes are computed on demand and a dependency graph is maintained whileevaluating.

2.1.3 str-lazy

As said previously, attribute evaluation relies on laziness to ensure the correctness of the eval-uation flow. But Stratego has no lazy evaluation, so a special tool has been designed to suit ourneeds.

str-lazy simulates laziness by maintaining a dependency graph. When an attribute must be computed, its direct dependencies are added to the graph. These added attributes add theirsdependencies, and so on. When all attribute dependencies are in the graph, the resolution isperformed and as one goes along, the dependency graph is updated. After the initial attributeis computed, the evaluation continues with the next attribute which needs resolution. Theimplementation of the graph is a matter of 180 lines of C (for performance considerations) andthe interface with Stratego is about 20 lines.

The main difference between the laziness and what is implemented in str-lazy is that itis possible to define multiple ways to compute a value. This difference is the main point that

make the disambiguation with AG transparent to the user. To set the value of an attribute itis required that all valid values computed during the evaluation are identical. Otherwise theattribute cannot be computed and the value is invalid.

The interface of str-lazy gives three functions which are new-value, getv and formula.

• new-value creates a pointer to a structure that contains an empty list of strategies andan ATerm which references nothing. Values are stored with an ATerm blob to ensure thatit could be garbage collected.

• formula registers a formula in the list of method to compute a value. This formula areregister with the input term, which is used to pass attribute dependencies. In the C code

the strategy is transferred with a static closure which from which the strategy pointer andthe static link are extracted.

1Which is already too fat.




• getv evaluates all formulae registered previously and fold all values with the equal op-eration to check if all valid values are identical.

lazy−add = (getv, getv); add

main =new−value => arg1

; new−value => arg2; new−value => result; formula( lazy−add; debug(!"result = ") | result , (arg1, arg2) ); formula( debug(!"arg2 = ") | arg2, 2 ); formula( debug(!"arg1 = ") | arg1, 1 ); <debug(!"begin = ")> 0; <getv> result => 3

Listing 2.3: str-lazy: lazy sum

The listings 2.3 is an example of laziness. In this example 3 values are created with thenew-value strategy. Formulae are declare by specifying the strategy that would be used tocompute its value followed by the term that contains the value to define and the input term of the strategy to evaluate. The example at Line 8 and 9 will just print there input term and setit to their value. The strategy used to compute the result at Line 7 use another strategy namedlazy-add that takes the input term (arg1, arg2) and evaluate each of the member of thecouple with getv (Line 1) before summing them. The result start to be evaluated at the Line 11.This program output is the sequence from 0 to 3.

Due to the delayed evaluation, strategies registered with a formula cannot use symbols fromthe strategy that have declared them except if they are evaluated in the same scope. The reasonis that the stack has been removed or replace since the strategy has been registered. Therefore

all references to a symbol not declared in the scope of the formula are invalid.

2.1.4 str-ref

This library2 provides references to manipulate graph. It is inspired by the implementation of Kalleberg and Visser (2006) that used the meta-stratego compiler to add some sugar over thenotation. This library provide references without backtracking possibilities which is not safe but could be more efficient than the implementation in Stratego.

str-ref provides four C functions to create, destroy, bind and dereference the pointer. Thisfunctions use an integer stored in a Ref constructor in order to not access an invalid memory.

The Stratego library provides strategies based on the C functions that manipulate the con-tent of a reference like apply-on-ref and where-on-ref and bind it after the rewritingto the reference in the case of apply-on-ref. Other common strategies are provided likecreate-bind-ref that create and bind the current term to the new created reference andderef-destroy-ref that move the content of the reference to the current term and delete thereference.

attr-defs use this library for two purposes. The first use of was done to manipulate aqueue with constant time operations. The second use of was done to manipulate the AG graph.

2In french “library” is translated to “bibliothèque”.




2.1.5 attr-defs

This program is responsible of checking the correctness of the grammar. Said differently, itchecks whether all attributes used are correctly defined everywhere. The checker could be dis-

abled, but then, inconsistent evaluation rules will be given to attrc, which will in turn producea flawed evaluator. Needless to say that it is not advised. This section talks about its internalmechanisms.

attr-defs was originally in Stratego3 and then written as second time in C++. The reasonswhy it was in C++ are that, according to its author Olivier Gournet, he is not fluent enoughin Stratego and for speed. This new filter is about 200 times faster than the original one was(although Olivier Gournet never considered this as a good reason because the old checker was badly written and relied sorely on linear searches in big lists). On the other hand, some partsgo very, very ugly in C++.

The last version of this tool is again written in Stratego with the help of str-ref to manip-ulate graphs quickly. The concept is to represent each production, symbol, attribute rules andattributes as a node of a graph with relations between each nodes (Pierron, 2007). This versionuses the parse table as a table of pointer that could be referenced by other nodes which improvethe speed to go through the grammar. There is not much difference of speed between the C++version and the Stratego version but this two systems cannot be compared since we do notknow precisely what algorithm was implemented in C++.

Attribute fetching makes heavy use of the ATerm library, and especially the ATmatch func-tion. One thing to note is flexibility. Using the ATerm library as it were done in C++ is lesspowerful than using Stratego pattern matching and all other goodies that this language offers.But, in this case we have to handle only a parse table. The parse table AsFix is very simple andhas a very particular form, as shown here:

parse-table(4

, 0

, [ label(prod(...))

, label(prod(...))

, label(prod(...))

]

, states([...])

, priorities([...])

)

Here, [] denotes a list, and ... something that was clipped. As a reminder, run

pp-aterm < C.tbl | less to see the content of the parse table, and run attrsdf2tablewith -k 2 to keep intermediate tables during the building. Remember that the final parse tabledoesn’t contain evaluation rules, so don’t search for them. We only care about the list of labels,which contains all the production rules. Others parts are not modified.

In the parse table, each prod constructor corresponds to a SDF production rule. Here is anexample of a SDF production, followed by its corresponding prod constructor in the parse tablewhere some attributes have been propagated.

"sizeof" UnaryExpression -> UnaryExpression

prod([lit("sizeof"), cf(opt(layout)), cf(sort("UnaryExpression"))]

, cf(sort("UnaryExpression"))

3You can find it in sdf-attribute/src/desugar/attr-desugar-magic.str.




, attrs(

[ term(cons("sizeof"))

, term(labels([label(3, "rule_3")]))

, term(

attribute-rule(

None, attribute(

Sym("rule_3"),

Some(namespace-head(namespace-name("decl"))),

"lr_table_in"

)

, Build(

attribute(

root,


"lr_table_in"

)

)

)

)

, term(

attribute-rule(

None

, attribute(

root,


"lr_table_syn"

)

, Build(

attribute(

Sym("rule_3"),


"lr_table_syn")

)

)

)

]

)

)

attr-defs makes heavy use of labels. Note that the constructor labels is a list of pairs;the first element is the index of the element, and the second a string to which we can later refer,to make it independent of production symbols order. Only context-free sections are listed. Thework of adding labels is carried out by sdf-labelize, because sdf2table doesn’t do that.

To solved the problem of big list manipulation, str-ref is used to replace all productionsin the parse table by references. A traversal register all symbols and if they are used or pro-duced and by which productions. This conversion step is reversible and can be used on anyparse table with or without attributes in it. A strategy named apply-on-graph4 is used tomanipulate the parse table as a graph in Stratego and restore the parse table in its original state.The same idea is applied to manipulate attributes and attribute rules; all attribute rules are re-placed by references and all attributes are linked to their attribute rules. This second graph islinked to the grammar to be able to go through the entire attribute grammar. A strategy namedapply-on-attr-graph is used to manipulate an attribute grammar as a graph instead of lists.

attr-defs contains the code of the attribute propagation and the cycle detection where both use the graph to manipulate the attribute grammar. The attribute propagation and cy-

4The Stratego module to manipulate parse table as a graph in Stratego can be fetch at sdf-attribute/src/check/table-graph.str




cle detections are respectively described in the subsection 2.1.6 and 2.1.7. It also provides the--statistics options to check the number of generated attribute rules in the AG.

2.1.6 Attribute propagation

This process is executed before the checker, thus errors in propagation could be detected. Thepropagation occurs in two fix-point phases(Demaille et al., 2008; Pierron, 2007):

• The first step propagates flags. Theses flags are used to have a choice which is local to agrammar production. They are used to show if an attribute could be synthesized througha symbol. Therefore this flags are only propagated for Bottom-Up and Left-to-Right at-tributes.

• The second step propagates attributes where they are needed in the grammar to obtainthe shortest path of attributes. All copy rules are added for Top-Down and Left-to-Right

propagations. All concatenations rules that concatenates the result of their children areadded for Bottom-Up but only for list manipulation for the moment.

The fix-point is implemented with a queue realised with str-ref. It is use to propagateflags and attributes and to check for circularity. The fix-point allows us to go on all productionsrelated to the previous modifications.

Each propagation has to satisfy a set of equations that ensure that an attribute is declared before it used and that describe the path follow by the attribute on a production depending onthe presence of the corresponding flags. The equations ensure that the AG is well-defined. Thecurrent attribute propagator can be extended to add specific propagations.

The propagation, which adds a lot of attributes in the grammar, is not a run-time problemsince copy rules are aliased in the generated evaluator.

The attribute propagation can be disabled if you run attrsdf2table with the option --no-proba.

2.1.7 Checker

The checker does some verification about the grammar. It ensure that there is no circularity inthe grammar by performing a transitive closure of the dependencies between attributes. Thistransitive closure is realized on all productions and use a fix-point to propagate modifications

between productions. This algorithm is a conservative algorithm as the test prove it in thesdf-attribute test suite. It is inspired by the article of (Rodeh and Sagiv, 1999, section 5).

Other kinds of checks are performed like producing warnings when this user write somethingthat could be generated by the attribute propagator. To do this, when the propagator is tryingto add a production, it will look if it already exists and report to the user if they have similarcode.

The checker part could be disable when running the tool named attrsdf2table with theoption --no-cycle.

It no longer checks if the attribute grammar is well-defined compared to the version imple-mented in C++ because the attribute propagation is used to do that for each attribute in the

grammar.

The transitive closure done on each production by the checker could probably be reusedto automatically add where clause that add dependencies between attributes. This is a point




totally useless in other AG because they are not bother by ambiguities. This point could also be useless in the case of Transformers if we enable the possibility to synthesized more than onevalue through an ambiguity.

2.1.8 Compiler

The compiler named attrc compiles a parse tree with attribute rules to a Stratego AST code.It can be generated for either parse tree or Abstract Syntax Tree (AST) by specifying the option--ast. At the moment the code is a bit messy and it should be organized in modules to separatethe goal of each steps. The best things should be to re-use the code done to manipulate the parsetable as a graph.

1 E −> S {cons("StartSymbol"),2 attributes (process:3 E.lr_value_inh := !04 root. trash := !E.lr_value_syn

5 )}67 Star −> E {cons("Expr"),8 attributes (process:9 local .tmp := !root.lr_value_inh

10 root. lr_value_syn := <add> (local.tmp, 42)11 )}1213 "∗" −> Star {cons("Star" ),14 attributes (process:15 local .input := !root .lr_value_inh16 )}

Listing 2.4: Small compilable AG

The listing 2.4 is an example which will support the explanation and help to understand therelation between the generated evaluator and the AG. The syntax is describe in the section 3.1.

The generated file is separated in four parts:

• In the first part the imports and constructors required by the evaluator are added. Thelist of imports can be extended with the option --imp of attrsdf2table. This optionenable the user to add extra imports that contains his strategies that are used in his AG.

• In the second part the list of grammar productions that create the binding traversal. The binding traversal is a top-down traversal which annotate each production of the grammarwith the attribute.

1 compute−E5253sort45cf(|2 p_attrib−root−process−lr_value_syn3 , p_attrib−root−process−lr_value_inh) =4 ( (? underamb(<id>) < ?is−amb + id)5 ; ( appl(6 prod(?[cf( sort ("Star" ))], ?cf(sort( "E" )), id )7 , id8 )9 ; where(! "" => amb−name)

10 )11 ; where(

12 ![ syn(p_attrib−

root−

process−

lr_value_syn)13 , inh(p_attrib−root−process−lr_value_inh)]14 ; if !is−amb then duplicate−attributes else copy−attributes end15 ; ?[attrib−root−process−lr_value_syn16 , attrib−root−process−lr_value_inh]




17 )18 )19 ; where(20 !attrib−root−process−lr_value_inh21 ; ?attrib−rule_1−process−lr_value_inh@attrib−local−process−tmp@_)

22 ; add−attributes(23 [ amb−compute(24 compute−Star5253sort45cf(|attrib−rule_1−process−lr_value_inh)25 | amb−name26 )27 ]28 | [ (( "process", "lr_value_inh" ), attrib−root−process−lr_value_inh)29 , (( "process", "lr_value_syn"), attrib−root−process−lr_value_syn)30 , (( "process∗", "tmp"), attrib−local−process−tmp)31 ]32 ) => tree33 ; where(34 formula(35 b_0

36 | attrib−root−process−lr_value_syn37 , (amb−name, tree, [attrib−local−process−tmp])38 )39 )

Listing 2.5: Small compiled AG

The listing 2.5 is a Stratego code extracted from the generated evaluator corresponding tothe production Star -> E of the listing 2.4.

Strategies are named with the symbol name of the grammar. (here, compute-E5253sort45cf

at l.1) This implies that two productions that produce the same symbol have the samestrategy name. Generating many strategies with the same name use the extending def-inition property of Stratego. These are generated to add the attributes and the attribute

rules.

Attribute rules are added to the tree in each strategy corresponding to a production of the grammar. The correspondence is made by matching the production in the AsFix for-mat. (here, at l.6) Attributes are declared with the new-value strategy of the librarystr-lazy and attributes that are only copied are aliased. (here, at l.21) The attributeroot.lr_value_syn located in listings 2.4 at the Line 10 cannot be aliased. The valueof this attribute is computed with a formula that is registered at Line 34.

Aliases are determined with a topological sort on attribute dependencies in the tree whereonly copy rules are kept. All attributes that are on the produced symbol (here, at l.2)are considered as already bound to a lazy value and cannot be aliased together. Aliasesremove approximately 70% of rules on the C and C++.

The top-down traversal is executed by the strategy add-attributes (here, at l.22) whichput the attributes on the tree (here, at l.28) and continue to add attributes on children of the production. (here, represented by a list of strategy calls at l.24)

• The third part contains all attribute rules converted to strategies. The listing 2.6 containsthe only formula used in the listing 2.5 at Line 35. This code is exactly the same as thecode from the Line 10 of the listing 2.4 except that a header has been added. The headerpart of the strategy is used to get the debug information accessible inside the amb-name

variable and retrieve the value of the given attribute with the getv strategy. The currentterm of the user code is the attributed tree created at the Line 32 of the listing 2.5.

1 b_0 =2 ( id , id , [getv])3 ; ?(amb−name, <id>, [attr_rule_var_0])

4 ; <add> (attr_rule_var_0, 42)

Listing 2.6: Small compiled AG




In our system attribute rules are not lazy at all because all lazy values are escaped fromthe input term before the evaluation of the rule written by the user. (listing 2.6 at l.2)

• The last part contains all strategies hard coded that are used to evaluate all the tree, andhandle ambiguities like the strategies add-attributes and amb-compute that appears

at the Line 22 of the listings 2.5. This part exists for the parse tree version of the evaluatorand for the AST version. It should be put as an import to remove useless pretty-printing.



Chapter 3

Transformers’ AG Usage

Like everything new in software engineering, documentation is sparse and interface are chang-ing rapidly. In this chapter, the usage of our system in its current version will be explained.

3.1 Syntax

3.1.1 Basic Syntax

1 l : L ine+ −> Document2 {cons(Document),

3 attribute (process:4 l . use_dt := ! True()5 root.tags := < filter (?Tag(_))> l . doc:content6 )}

Listing 3.1: Snippet of attribute rules

In the listing 3.1, line 1 represents a classical production rule, written in SDF. Lines 2 to 6 areSDF annotations. Line 2 is a well-known SDF annotation, used to specify a constructor. Lines 3to 6 consist in a custom annotation which specifies the attribute evaluation rules.

Each attribute annotation is put in a namespace (here, the namespace is process at l.3). At-tributes are specified using a customized notation, symbol-name.[namespace:]name . Thesymbol-name represents a non-terminal name in the production rule. The namespace refersto the place where an attribute live and can be accessed, as attributes could be put in differentnamespaces. If it is not specified, the attribute will be fetched from the current namespace (here,process). The name is the attribute name and could be whatever the user wants.

Using SDF alias to specify rule-name is useful when the non-terminal is not a valid Strategoidentifier. For instance, here we used l as an alias for Line+, because "+" is not a valid characterfor an identifier. The attribute root (l.5) is a special attribute, and refers to the non-terminal thatthe production rule produce. Here, root must be written instead of Document.

There must be, on the left hand side of ":=", an attribute that will contain the result of the ruleevaluation. On the right hand side, Stratego code with the optional addition of references to

attributes name are allowed. Almost all Stratego constructs can be used to specify an attributerule. Because of the desugaring from this syntax to real Stratego, some limitations exist. Forexample, Stratego keywords cannot be used as attribute names.



23 Transformers’ AG Usage

A node can set or retrieve its own attributes (using root) or its children attributes (usingchild non-terminal or alias, like l in the example). It cannot access its parent’s attributes, thatis, it is the parent responsibility to give or retrieve attribute for its children.

To factor some code, a local attribute can be used. Local attributes have a scope restricted

to the current production and cannot be retrieved from another node. They can be used byreplacing root by local. The local symbol name exists to ensure that the attribute cannot be used by another production. The root symbol name can be used to create local attributeas-well but the system could not ensure that it is not used by another node.

3.1.2 Inherited Patterns

1 context− free meta−syntax2 scope <diff , merge> =3 Open A Close −> B4 { attribute (process:5 root.lr_table_syn :=

6 < diff> ( A.lr_table_syn, root. lr_table_inh) => diff_table7 ; <merge> (root.lr_table_inh, diff_table )8 )}9

10 context− free syntax11 "{" Stm∗ "}" −> Stm {12 inherit ( scope<id, Fst>)13 }

Listing 3.2: Declaration of production pattern

In the listing 3.2, there is two sections. The first part from Line 1 to 9 is used to declare pat-terns. The second part from Line 10 to 13 is an usual SDF section used to declare the grammar.

Each pattern has a name (here, the name is scope at l.2) and a list of arguments that areenclosed between the symbols < and > and followed by an equal symbol. Arguments are sep-arated by commas. The enclosure can be omitted if there is no arguments. Arguments arestrategies written in Stratego code.

Production patterns (like at l.3) are identical to SDF production and the name. When a pro-duction pattern is used (like at l.12) with an inherit annotation, the annotated production andthe production are unified and symbol names are replaced in the production annotations of thepattern. After the renaming step all annotations are included as an annotation of the derivedproduction.

Therefore in the listing 3.2 the production at Line 11 define a scope for the attribute table.



3.2 Automatic rules propagation 24

3.1.3 Attribute Declaration

Attribute declaration is an optional section where it is not necessary to declare attributes. Usu-ally this could be helpful to ensure properties and to skip the writing of attribute propagation

in the name of the attribute.1 attributes2 table {3 propagation−type LR4 assertion check−content (5 map( (is−string, id ) )6 )7 }89 context− free syntax

10 "{" Stm∗ "}" −> Stm {attribute (process:11 root.table_syn := root.table_inh12 }

Listing 3.3: Overview of attribute declaration

In the listing 3.3, there is the declaration of the attribute table which is used after to create ascope.

Attributes are declared in an attribute section which contains the list of attributes that aredeclared. An attribute declaration starts with the name of the attribute (here, the name is table

at l.2) and is followed by the list of properties enclosed by brackets.

At the moment there are only two kinds of properties. The propagation-type property(l.3) can be followed by one of the following: TD, BU or LR to respectively define that the at-tribute is a Top-Down, Bottom-Up or Left-to-Right. In the case of the LR the suffixes inh and

syn have to be specified (like at l.11). The assertion property (l.4) is followed by the nameof the property and followed by a strategy contained in a pair of parenthesis. Assertions checkthe value of the produced attribute and produce an error message if the value does not satisfythe assertion.

3.2 Automatic rules propagation

3.2.1 Tables

Some things are recurrent. A repetitive task1 is to pass attributes up and down, without modi-fying them. This is for instance heavily used for context tables, because the same context tablemust be accessible from a lot of nodes in the AST. In the example 3.4, an attribute, Foo.ids, isadded to an existing table, initialized at A -> Start, then returned to it. The attribute rulesattached to C - > B and B - > A are in excess, because they are what a programmer shouldexpect if they were not present.

Like all cut-and-paste jobs, it is better to find a way to avoid it, either by factoring, or, in thiscase, by allowing attribute writers to use some kind of “magic”. The example 3.5 shows thecleaned way. Here, an attribute with a special prefix (lr_) is used. In each production rule,it exists in two flavors: lr_<attr-name>_in which represents the inherited attribute, andlr_<attr-name>_syn for the synthesized one.

Despite the name: “magic”, there is nothing weird with this propagation. When an attributeis inherited to a non-terminal), or synthesized from another non-terminal, evaluations rules will

1In C grammar, about 80% of the rules fall into this category.




Foo −> C{ attribute {ns:

root.up := <conc> (root.down, Foo.ids))}

C −> B

{ attribute (ns:C.down := root.downroot.up := C.up

)}B −> A

{ attribute (ns:B.down := root.downroot .up := B.up

)}A −> Start

{ attribute (ns:A.down := ![]root.res := A.up

)}Listing 3.4: Redundant attributes

Foo −> C{ attribute (ns:

root.lr_table_syn := <conc> (root. lr_table_in , Foo.ids))}

C −> BB −> AA −> Start

{ attribute (ns:A.lr_table_in := ![]

root. res := A.lr_table_syn)}

Listing 3.5: Less attributes, more fun

be added to all below production rules. This process stops whenever the user explicitly usesthis attribute in evaluation rules. It is interesting to see how the listing 3.6 is expanded to 3.7.

A B C −> D

D −> Start{ attribute (ns:

D.lr_table_in := ![])}

Listing 3.6: Before expansion

The lr_ prefix was not randomly chosen. It means attributes do a left to right traversal of thetree, as in 3.8.



3.2 Automatic rules propagation 26

A B C −> D{ attribute (ns:

A.lr_table_in := root. lr_table_inB. lr_table_in := A.lr_table_synC.lr_table_in := B.lr_table_syn

root. lr_table_syn := C.lr_table_syn)}

D −> Start{ attribute (ns:

D.lr_table_in := ![])}

Listing 3.7: After expansion

D

Start

A B C

2

3

58

46

7

9

1 10

synthesizedattribute

inheritedattribute

Figure 3.8: Left hand attribute traversal.

3.2.2 Lists

Another interesting and useful 2 way to extend automatic propagation rules would be to extendit to lists. Lists are only SDF sugar, they are actually implemented as a binary tree (see figure3.9).

Let’s take an example, as shown in listing 3.10. After sdf2table, it is desugared and additionalrules are introduced, as in listing 3.11.

Id −> LL+ −> Start

Listing 3.10: A tiny SDF grammar using list.

The problem is that when attributes are passed from the rule L+ -> Start to Id -> L (andvice-versa), we have to explicitly desugar this list and add all for the eight rules listed in 3.11.

Take a breath, and try to exhibit exactly what would be the desired behavior:

• Bottom-up traversal: All attributes are fetched from the leaves, concatenated together aslists are joined. Only propagation of synthesized attributes would be required. This is

the most useful feature to be developed, as it is often required. For instance, for the Cgrammar, it would be to fetch a list of declaration specifiers, as they are totally unrelatedand don’t need to have the context.

2From our experience writing C-grammar attributes.




L+

L+ L+

L+

L+

Start

L L

L

Id Id

Id

Figure 3.9: A list, seen from AsFix viewpoint.

Id −> L−> L∗

L+ −> L∗

L∗ L+ −> L+L+ L∗ −> L+L+ L+ −> L+ {left}L −> L+L+ −> Start

Listing 3.11: Example of listing 3.10 desugared.

• Top-down traversal: Used when some information must be sent to all leaves, with therequirement that this information is independent from other leaves. Only propagation of

inherited attributes would be required. Albeit easy to implement, its usefulness remainsto be proved.



Chapter 4

Transformers’ AG syntax proposal

The attribute syntax used in Transformers is not perfect for what we must do in spite of allthe things already done to improve it. Some improvements will be proposed to make easierthe work of developers. Considerations about improvements of the current attribute syntax aresplit in two parts: those which are entirely in the way of thinking of the current syntax andthose which consist of deeper syntax modifications.

Some problems and hints on how to develop these extensions will be discussed in the nextchapter, the author left it as an open case for his successors.

4.1 Default value/code for horizontal propagation

The first improvement comes from a simple observation. We can see in listing 4.1 that all pro-duction rules have Typespecifier at their right hand sides. To manage the particular case of theclass typedef, the key attribute must be propagated upwards, which is done by the first attributerule. The rules which have TypeSpecifier at their left hand sides propagate the key attribute todetect when a class typedef is used and some computations are done on upper attributes in thegrammar. So the key attribute must be set for all possible TypeSpecifier even if there is no classtypedef. Consequently, all production rules with TypeSpecifier at their right hand sides mustassign their key attribute. Only three of those rules are written here but there are three morein the C++ grammar. This pattern can be found at several locations in the C++ grammar andrequires to add a lot of attribute rules which we do not want to write. We only want to write

the interesting attribute rule (the first one) and know what is the default value if no value isexplicitly set.

A possible answer to this problem is a new section named attributes used to specify the defaultattribute value or code. Listing 4.2 represents the template for a grammar file including thenew section. All default values are brought together in a place that is easily reachable giventhe production rules using it because they are in the same file. The interesting attributes are inthe grammar, directly at the location where they are important and really used. In the future,if we find that some other information about attributes should be specified for some reason,this section could play this role too. For instance, attributes could be explicitly specified assynthesized or inherited, à la UU-AG.

The TypeSpecifier should be optional. If it is not specified, all attributes named key wouldhave a default value of None().



29 Transformers’ AG syntax proposal

%% 7.1.5 [dcl.type]ClassSpecifier −> TypeSpecifier{ attributes (disamb:

root .key := ClassSpecifier .key)}

SimpleTypeSpecifier −> TypeSpecifier{ attributes (disamb:

root .key := None())}

EnumSpecifier −> TypeSpecifier{ attributes (disamb:

root .key := None())}

[...]

Listing 4.1: Default value/code or horizontal propagation

module TypeSpecifiers

imports[...]

exports

sortsTypeSpecifier

[...]

attributesdefault [TypeSpecifier.]key = None()

[...]

context− free syntaxClassSpecifier −> TypeSpecifier[...]

Listing 4.2: Default value - proposed solution

4.2 Vertical propagation

We must propagate attribute values from node to node very often. This represents verticalpropagation in the sense of the grammar, ie. the AST. The listing 4.3 is an example of sucha phenomenon. This is another polluting thing because the interesting attribute rules are notthose which propagate but those which give information. So the proposal is here to write inter-esting attributes only and to automatically add propagation rules in the same way as the tableones previously seen in subsection 3.2.1.

Note that research has been done to avoid this kind of heavy process. For instance, FNC-2automatically detects when rules are missing and adds them implicitly. Some people extendcanonical attribute grammars for several purposes and Reference Attribute Grammars (RAGs)(Hedin, 1999) offer to avoid our explicit and inelegant attribute propagations (even if they havenot been created for this). With those attribute grammars, attributes may be references to other

attributes (or nonterminals), so propagations become useless, and this is probably a better so-lution. Nevertheless, using a RAG in Transformers is not so easy because of some techniquesused such as branches pruning, implying that at some times some references reference eithernothing or several things at a time. So reference attributes handling implies some extra run-time



4.3 Tables 30

computations to suit our system.

ClassSpecifier −> TypeSpecifier{ attributes (disamb:

root.key := ClassSpecifier .key)}

TypeSpecifier −> DeclSpecifier{ attributes (disamb:

root. key := TypeSpecifier.key)}

list : DeclSpecifier+ −> DeclSpecifierSeq{ attributes (disamb:

root.key := list .key)}

ds:DeclSpecifierSeq? il : InitDeclList? " ; " −> SimpleDecl

{ attributes (disamb:[...]

)}

Listing 4.3: Vertical propagation

4.3 Tables

Tables are a very debatable subject. They are used everywhere and have an arcane syntax. Ashort example of what must be written to use tables is shown in listing 4.4.

The OLGA language, which is a language designed for attribute handling, does not sup-port tables natively but offers to write types and functions to manipulate them as we want.Bruno Vivien (1997) wrote an E-LOTOS compiler with FNC-2 and implemented symbol tablesin OLGA. This is a huge amount of work which is representative of the potential of this system.The table implementation is very intuitive and composed of a set of functions to manipulatedata easily. So there is no magic nor universal solution. We must use the constructs proposed by the language and sugar above it if necessary. Several styles can be chosen: the first proposalfor table syntax 4.5 uses a style based on operators, instead of the second proposal 4.6 which ismore functional.

A.lr_table_in := ![ B.str | C.lr_ns_in ]

A.lr_table_in := ![ (B.name, B.str, "a" ) | C.lr_table_syn ]

B.string := !A.lr_table_in; filter (( "a" , "b" , ?str )); str

Listing 4.4: Tables

A.lr_table_in := B. str + C.lr_ns_inA.lr_table_in := (B.name, B.str, "a" ) + C.lr_table_syn

A.lr_table_in[ ("a" , "b" , ?s tr ) ]B.str := str

Listing 4.5: First proposal for table syntax



31 Transformers’ AG syntax proposal

A.lr_table_in := insert C.lr_ns_in B.strA.lr_table_in := insert C.lr_table_syn (B.name, B.str, "a" )

lookup A.lr_table_in ("a" , "b" , ?str)B.str := str

Listing 4.6: Second proposal for table syntax

4.4 Ambiguous synthesizing

When synthesizing attributes on an ambiguous node, attributes can be different, and there is noway to determine which is the right one.

This was a problem for a long time, forcing the user to set a special value, invalid(), fromthe branch he knows he does not want the value to be returned. This was cumbersome, andadded lots of uninteresting code to the attributes.

However, on ambiguity node, only one branch is valid, thus only attributes from this branchshould be candidate for passing upward. Now, the AG system works this way, and the user hasnot to care about anything anymore.

4.5 The woes of this AG

Quoting Valentin David, the main Transformers’ AG system author: “If you don’t know how itworks internally, you can’t use it.”

That is, some parts are very tricky, and the overall desired behavior could be easily modified by some undocumented "feature". To write a good specification, it is better to know how thesystem is working internally.

Moreover, debugging tools are almost non-existent1. The compilation process is better thanit was in the beginning. Missing attributes are checked and reported, a stack trace is printed toprovide error location, typo errors are reported, to cite a few improvements.

1Hey, who said that God needed debugging tools to create Earth ? :)



5.1 UUAG 34

Reaching the efficiency of Transformers would require a lot of modifications. Plus, if weconsider using a dedicated tool set we do not want to add a layer to be able to use our subtletiesabove, particularly when this layer involves heavy work.

5.1.4 A comparative example

In this section, a comparative study of two AG system used for disambiguation — the UUAGone and ours — will be carried out. A piece of ambiguous grammar (listing 5.4) will be used asa discussion thread. It follows the Core language proposed by Vasseur (2004).

EnumName <− Identifier IdentifierTypeName <− Identifier IdentifierUse <− TypeName | EnumNameRoot <− Use

Listing 5.4: Simple grammar presenting an ambiguity.

The ambiguity is simple to imagine. When a tree would be derived from the grammar 5.4,nothing helps us to determine whether Use should have TypeName or EnumName as children.Thus, it will have both.

Make the assumption that lines represented in listing 5.5 (not represented in our grammar)were parsed before. Now, suppose that the first identifier of EnumName or TypeName is atype, and the second is a variable. Then, when the first Identifier (the type) is T, it should be a TypeName. If it is E, it should be an EnumName. Otherwise this is an error.

type T

enum EListing 5.5: Contextual information.

An example input is illustrated in Lst. 5.6. Here, g will be a TypeName, and r an EnumName.

E gT r

Listing 5.6: Example input.

With Transformers

The grammar is pretty simple, and does not need thorough explanations. It is presented inlisting 5.7.

With UUAG

A special node, not existing in the former grammar, AmbTE, has been added in 5.8.

Here, three attributes are declared, each for a given subset of nodes. val and ok are synthe-

sized, table is both inherited and synthesized.SEM Root

| Root child. table = { [(" E", "enum−name"), ("T", "type−name")] }



35 Others AG systems

type: Identifier var: Identifier −> TypeName{ attributes (disamb:

root. ok :=<lookup> ((type.string, TypeName()), root.lr_table_in)

root.lr_table_syn :=

!root. ok; ![( type.string , var.string) | root. lr_table_in ]

)}

type: Identifier var: Identifier −> EnumName{ attributes (disamb:

root. ok :=<lookup> ((type.string, EnumName()), root.lr_table_in)

root.lr_table_syn :=!root. ok

; ![( type.string , var.string) | root. lr_table_in ])}

TypeName−

> Use

EnumName −> Use

dus:Use+ −> Root{ attributes (disamb:

dus.lr_table_in := ![( "T", TypeName()), ("E", EnumName())])}

Listing 5.7: SDF grammar using attributes.This example is not very clean. Two different things get mixed in lr_table: types identificationand variables identification.

SEM AmbTE| AmbTE lhs.val = { if @first . ok then @first .val else if @second.ok then @second.val else "error" }

SEM TypeName| TypeName lhs.ok = { filter (\(t , v) −> v == "type−name" && t == @type) @lhs.table /= [] }

lhs.val = { @val ++ ":typedef" }

SEM EnumName| EnumName lhs.ok = { filter (\(t , v) −> v == "enum−name" && t == @type ) @lhs.table /= [] }

lhs.val = { @val ++ ":enum" }

In this section, semantic rules are attached to nodes. The context table was previously initial-ized, pretending lines listed in 5.5 were previously parsed.

Listing 5.11 represents the main program. Haskell code between braces is given verbatim toHaskell, and contains the entry point. Two partial parse forest, example1 and example2 are cre-ated, containing the interesting ambiguities, then processed by the AG system with the functionsem_Root.

5.2 FNC-2

FNC-2 (Parigot, 1988) is an AG-processing system developed at the INRIA. It accepts all strongly

non-circular attribute grammars but other AG types too, like Dynamic Attribute Grammars(DAGs) (a type of AGs subsuming circular ones) (Parigot et al., 1996a, b).



5.2 FNC-2 36

DATA Root| Root child:Use

DATA Use| Use child:AmbTE

DATA AmbTE| AmbTE first:TypeName second:EnumName

DATA TypeName| TypeName type:String val:String

DATA EnumName| EnumName type:String val:String

Listing 5.8: Node declaration

ATTR Root Use AmbTE TypeName EnumName [ | | val:String ] ATTR Use AmbTE TypeName EnumName [ | table:{[(String, String)]} | ] ATTR TypeName EnumName [ | | ok:Bool ]

Listing 5.9: Attributes declaration

5.2.1 Evaluators

FNC-2 uses generated evaluators based on the “visit-sequence” paradigm. This paradigm usesas much information as possible from the grammar to reach a high level of performance. Fur-thermore, it offers to apply very efficient space optimization techniques. Some results showthat the generated evaluators are as efficient in time and space as hand-written programs using

a tree as internal data structure.

FNC-2 is highly versatile and can be suitable for many situations.

Actually, the generated evaluators offer several evaluation modes:

• exhaustive (evaluating all attributes),

• incremental (evaluating only a set of attributes and saving the current state of evaluationto resume later),

• parallel (in order to use several computers with a shared memory to achieve a commonevaluation).

Furthermore, these evaluators can be generated in several languages:

• C,

• Le_Lisp (on Centaur system),

• fSDL (on The CoSy Compilation System),

• Caml.



37 Others AG systems

{−− The string "T g" was parsedexample1 :: Useexample1 = Use (AmbTE (TypeName ("T") ("g")) (EnumName ("T") ("g")))−− The string "E g" was parsed

example2 :: Useexample2 = Use (AmbTE (TypeName ("E") ("g")) (EnumName ("E") ("g")))

main :: IO ()main = do putStr "result1 : "

print (sem_Root (Root example))putStr "result2 : "print (sem_Root (Root example))

}

Listing 5.11: Main program

5.2.2 Tools

FNC-2 eases the use of attribute grammars with its flexibility, but also provides many tools toease the life of developers:

• a generator of abstract tree constructors driven by parsers (ATC), implemented in twoways: one on top of SYNTAX, and one on top of Lex and Yacc,

• a generator of unparsers of attributed abstract trees (PPAT), based on the TEX-like notionof nested boxes of text,

• an interactive circularity trace system (XVISU),

• source-level optimizations of AGs resulting from descriptional composition,

• . . .

5.2.3 OLGA

FNC-2 uses OLGA, a language designed to handle attribute grammars. It is a strongly typedfunctional language, similar to Caml, but is also designed to support the writing of seman-tic rules (also known as attribute rules) attached to their nodes in the abstract tree. Conse-quently, attributes are typed, like all OLGA identifiers. As we have seen before, many evalua-

tion schemes can be used but this is independent from the writing of attributes. Performance isachieved without loss of expressiveness.

Moreover, many helpful things are done to ease the work of developers, such as default copyrules of attributes, abbreviated syntax for the root node of the local tree, and predefined at-tributes (for instance, to save the location of the symbols in the input source).

We can easily set apart the attribute declarations which are in the header from the grammarrules and their associated attribute rules which are in the body. The focus is to highlight thegrammar aspect of the language. In this context the examples do not use control flow state-ments, type declarations, functions, nor any other advanced constructs.

An example of attribute grammar header in OLGA is shown in the listing 5.12. Attributedeclarations follow the keyword attribute. Each attribute is typed and qualified as synthesizedor inherited. Moreover, nonterminals which own an attribute are specified at the attribute dec-laration.



5.2 FNC-2 38

attributesynthesized $s (A, B) : int ;[...]

Listing 5.12: OLGA header

The body of the OLGA file holds production rules. The listing 5.13 shows the declaration of a production rule and its associated attribute rules.

where A −> B1 B2 B3 use$s := $s(B1) + $s(B2) + $s(B3);

[...]end where

Listing 5.13: OLGA rules

5.2.4 Incorporate it in Transformers

The attribute rules must be entirely rewritten in OLGA if we consider using of FNC-2. Even if OLGA is another specific domain language to learn, it is designed to handle attributes nicely.

Nevertheless, FNC-2 shares some problems with UU-AG when you want to transpose thissystem in the Transformers world.

FNC-2 and Transformers share some syntactic abbreviations such as automatic propagationand a special syntax for the root node, but attributes and attribute rules are not embedded inthe grammar which is a strong advantage of Transformers.

A very important point for FNC-2 is its flexibility in evaluations and evaluators. The multi-visit paradigm uses information from the grammar to efficiently evaluate attributes in terms of performance and memory usage too. In fact, the type of the dependency node can be deducedfrom the type of the node we want to process, so the order of evaluation can be computed atcompile-time.

The incremental mode could be very useful if associated to a cache system for the includemechanism of C/C++. The result of the evaluations of standard headers and more generallyheaders used more than once can be saved in the cache, to be reused later when the same file isincluded. However this system cannot express its real power because of the high context sensi-tivity of the C/C++ grammar. Disambiguation of the code held by two includes can depend on

the order of the includes in question.

Sadly, the FNC-2 project died in 1998, when the main author left the team. There is no moresupport, so if we find bugs in the system, we will be alone to solve them. Even worse we cannotfind the sources of the project anywhere. However there are many very interesting technical andhighly detailed papers involving new approaches to problems using special kinds of attributegrammars and specific evaluation (mixing data-flow and multi-visit evaluations by determiningthe maximal sub-multi-visit parts of the grammar,. . . ). Moreover, many projects used FNC-2,and OLGA seemed to be a very good way to write attribute grammars.



Conclusion

In this paper, we only talked about attribute grammar system, its usage, implementation, andpossible extensions. We believe that our approach is good, but there is some work pending forthe Transformers project.

On the one hand, attribute writing should be improved, from our experience. Several pos-sibilities were explored, for lists and context tables. On the other hand, the implementation initself is quite good today, but needs some lifting to deal with problems like the explosion in timeand memory for the compilation process. Some aspects and solutions were developed in thispaper, although there are many others.

Anyway, we should not forget that disambiguating is not the main objective of Transformers.It is just a means. More references to other works on Transformers can be found in the (some-what outdated) Transformers technical report Anisko et al. (2003),orin David (2004). Moreover,real transformations 1 should not be forgotten, like Despres (2004) and ?.

1Cf. introduction footnote.



Chapter 6

Bibliography

Anisko, R., David, V., and Vasseur, C. (2003). Transformers: a C++ program transformationframework. Technical report, LRDE.

Baars, A., Swierstra, D., and Löh, A. (1999). UU-AG System. http://www.cs.uu.nl/wiki/

Center/AttributeGrammarSystem .

Borghi, A. (2005). Attribute grammar: A comparison of techniques. Technical report, LRDE.

David, V. (2004). Attribute grammars for C++ disambiguation. Technical report, LRDE.

Demaille, A., Durlin, R., Pierron, N., and Sigoure, B. (2008). Automatic attribute propagationfor modular attribute grammars. Submitted.

Demaille, A., Largillier, Th., and Pouillard, N. (2005). ESDF: A proposal for a more flexibleSDF handling. Communication to Stratego Users Day 2005.

Despres, N. (2004). C++ transformations panorama. Technical report, LRDE.

Durlin, R. (2007). Semantics driven disambiguation. Technical report, EPITA Research andDevelopment Laboratory (LRDE).

Durlin, R. (2008). Semantics driven disambiguation: A comparison of different approaches.Technical report, EPITA Research and Development Laboratory (LRDE).

Gournet, O. (2004). Progress in c++ source preprocessing. Technical report, LRDE.

Hedin, G. (1999). Reference attributed grammars.

ISO/IEC (1999). ISO/IEC 9899:1999 (E). Programming languages - C.

ISO/IEC (2003). ISO/IEC 14882:2003 (E). Programming languages - C++.

joint work with S. Doaitse Swierstra, A. L. and Baars, A. (2005). Attribute Grammars in Haskellwith UUAG. www.cs.ioc.ee/~tarmo/tsem04/loeh-slides.pdf .

Kalleberg, K. T. and Visser, E. (2006). Strategic graph rewriting: Transforming and traversingterms with references. In Proceedings of the 6th International Workshop on Reduction Strategies inRewriting and Programming, Seattle, Washington. Accepted for publication.

Knuth, D. E. (1968). Semantics of context-free languages. Journal of Mathematical System Theory,pages 127–145.

Largillier, T. (2005). Proposition d’un support pour les notions de programmation avancées enc++. Technical report, LRDE.

Parigot, D. (1988). FNC-2 Attribute Grammar System. http://www-rocq.inria.fr/

oscar/www/fnc2/.

http://www.cs.uu.nl/wiki/Center/AttributeGrammarSystem



http://www.cs.ioc.ee/~tarmo/tsem04/loeh-slides.pdf


http://www-rocq.inria.fr/oscar/www/fnc2/









41 BIBLIOGRAPHY

Parigot, D., Roussel, G., Jourdan, M., and Duris, E. (May 1996a). Dynamic attribute grammars.Technical Report 2881, INRIA.

Parigot, D., Roussel, G., Jourdan, M., and Duris, E. (September 1996b). Dynamic attributegrammars.

Pierron, N. (2007). Formal Definition of the Disambiguation with Attribute Grammars. Tech-nical report, EPITA Research and Development Laboratory (LRDE).

Rodeh, M. and Sagiv, M. (1999). Finding circular attributes in attribute grammars. J. ACM,46(4):556–ff.

Vasseur, C. (2004). Semantics driven disambiguation: a comparison of different approaches.Technical report, LRDE.

Vivien, B. (1997). Etude et réalisation d’un compilateur e-lotos à l’aide du générateur de com-pilateurs syntax/fnc-2.

parsing with transformers

Documents