xtext the very least
TRANSCRIPT
The very least about Xtext
Juri Luca De Coi
Saint-Etienne, France, 16-05-2011
Outline
• Introduction
• How to specify the target language
– Terminal rules
– Parser rules
• How to specify the target AST
– The working example
– Return types
• Further features
• The header
• Getting Xtext up & running
Outline
• Introduction
• How to specify the target language
– Terminal rules
– Parser rules
• How to specify the target AST
– The working example
– Return types
• Further features
• The header
• Getting Xtext up & running
Xtext (I)
Xtext is a language development framework
– i.e., a technology supporting the activity of developing languages
Given the Xtext grammar of a language, it provides you with
for that language
an Eclipse editor with • content assistance
• quick fixes • template proposals
• outline • hyperlinking
• syntax coloring • project wizard
a code generator
a serializer and code formatter
a scoping and linking framework
a validator an AST builder
a parser
Xtext (II)
The more you want, the more you have to pay BUT • if you are fine with the (reasonable) defaults,
your amount of work will be pretty low • otherwise, Xtext is highly configurable
– Each automatically generated class can be replaced in a non-invasive way
What do we want? • A parser • An AST builder We will (almost) only focus on Xtext’s grammar
language
Technical remark
Xtext is based on Ecore
Knowledge of Ecore is required to exploit Xtext’s full potential
Ecore is the core of the Eclipse Modeling Framework Project (EMF)
– EMF is “a modeling framework and code generation facility for building tools and other applications based on a structured data model”
I will try to leave Ecore out as much as possible
I will skip some (most) parts of Xtext
Xtext’s grammar language (XGL)
• A language to describe (textual) languages
• An Xtext grammar describes
– the syntax of the target language
– the structure of the target AST
The syntax of the target language
Not surprisingly, XGL distinguishes between
• lexical level
• syntactic level
specifies the language’s
by means of
exploited by the
Lexical level
tokens keywords, terminal rules
lexer (a.k.a. scanner or tokenizer)
Syntactic level
grammar parser rules
parser
Outline
• Introduction
• How to specify the target language
– Terminal rules
– Parser rules
• How to specify the target AST
– The working example
– Return types
• Further features
• The header
• Getting Xtext up & running
Terminal rules (I)
terminal NAME : expression ;
expression can contain
1. keywords – single- or double-quoted
– can have any length
– can contain arbitrary characters • including the escape sequences \b, \t , \n , \f , \r , \" , \' and \\
• Unicode escape sequences (e.g., \u123) are not supported
EX: 'foo', "\foo", '"', "'"
Terminal rules (II)
2. wildcard (.)
– An arbitrary character
– EX: .
3. rule calls
– Terminal rules can only point to other terminal rules
– EX: ID (assuming that ID is the name of a terminal rule)
4. character ranges (..)
– Extremes are included
– EX: 'a'..'z', 'A'..'Z', '0'..'9'
Terminal rules (III)
5. until token (->)
– All input between the preceding and the following token (extremes are included)
– EX: '/*' -> '*/'
6. negated token (!)
– Input different than the following
– EX: !'\n'
7. cardinality operators (?, *, + or nothing)
– EX: '^'?, '\r'*, '9'+
Terminal rules (IV)
8. groups (token sequences)
– EX: 'a' . ID (assuming that ID is the name of a terminal rule)
9. alternatives (|)
– EX: ' ' | '\t' | '\r' | '\n'
Operator priority
Ordered by decreasing priority
Parenthesis (()) can override default priorities terminal ID:
'^'?
('a'..'z'|'A'..'Z'|'_')
('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
Character ranges ..
Until token, Negated token ->, !
Cardinality operators ?, *, + or nothing
Groups Token sequences
Alternatives |
Technical remark
NOTE: Terminal rules can hide each other
The order of terminal rules is crucial
This is especially important when mixing
• newly introduced rules and
• rules from imported grammars (cf. below)
Outline
• Introduction
• How to specify the target language
– Terminal rules
– Parser rules
• How to specify the target AST
– The working example
– Return types
• Further features
• The header
• Getting Xtext up & running
Parser rules (I)
name : expression ;
expression can contain
1. keywords – single- or double-quoted
– can have any length
– can contain arbitrary characters • including the escape sequences \b, \t , \n , \f , \r , \" , \' and \\
• Unicode escape sequences (e.g., \u123) are not supported
EX: 'foo', "\foo", '"', "'"
Parser rules (II)
2. rule calls – EX: ID (assuming that ID is the name of a rule)
3. cardinality operators (?, *, + or nothing) – EX: '^'?, '\r'*, '9'+
4. groups (token sequences) – EX: 'a' ID
5. unordered groups (&) – Elements can appear in any order but only once
– Elements with cardinality * or + must appear continuously without interruption
– EX: 'a' & ID*
Parser rules (III)
6. alternatives (|)
– EX: ' ' | '\t' | '\r' | '\n'
Operator priority
Ordered by decreasing priority
Parenthesis (()) can override default priorities Action:
'{' TypeRef (
'.' ID ('='|'+=') 'current'
)? '}' ;
Cardinality operators ?, *, + or nothing
Groups Token sequences
Unordered groups &
Alternatives |
Outline
• Introduction
• How to specify the target language
– Terminal rules
– Parser rules
• How to specify the target AST
– The working example
– Return types
• Further features
• The header
• Getting Xtext up & running
The structure of the resulting AST
• You typically (should) know the AST you want before defining a textual representation for it
• You will now learn how to instruct Xtext to build the ASTs you want
• Let start with the classical example
Arithmetical expressions (I)
Arithmetical expressions (II)
• We have to define a corresponding textual representation – i.e., we have to define a corresponding grammar
• To keep things easy, let define a grammar which – does not consider operator priorities
– does not consider operator associativity
– requires to explicitly specify parenthesis
EX:
• not 1 + 2 * (3 – 4 / 5)
• but 1 + (2 * (3 – (4 / 5)))
Arithmetical expressions (III)
Expression ::= IntOrPar ( FactorSign
IntOrPar | TermSign IntOrPar )?
IntOrPar ::= INT | '(' Expression ')'
INT ::= '0' | '1'..'9' '0'..'9'*
FactorSign ::= '*' | 'multiply' | '/'
| 'divide'
TermSign ::= '+' | 'plus' | '-' |
'minus'
Arithmetical expressions (IV)
Ho
w t
o in
stru
ct X
text
to
bu
ild t
his
A
ST o
ut
of
1 + (2 * (3 – (4 / 5)))
?
Outline
• Introduction
• How to specify the target language
– Terminal rules
– Parser rules
• How to specify the target AST
– The working example
– Return types
• Further features
• The header
• Getting Xtext up & running
Return types
• Each rule should specify a return type
• The return type defaults to – ecore::EString (for terminal and data type rules–cf.
below)
– the rule’s name (otherwise)
IntOrPar returns Expression: … ;
terminal INT returns ecore::EInt: … ;
The Xtext framework will create The parser generated by the
Xtext framework will create
a Java class for each (non-
existing) return type
an instance of such class
whenever applying a rule with
such a return type
Enumeration rules
TermSign and FactorSign are enumerations
• enumerations can be specified by (enumeration) rules
enum TermSign: PLUS='+' | PLUS='plus' |
MINUS='-' | MINUS='minus';
• If the value is omitted, you will get equal name and value
• The first enumeration value is the default one
The Xtext framework will create The parser generated by the
Xtext framework will create
an enumeration for each
enumeration rule
an enumeration value whenever
applying the corresponding
enumeration rule
It is (theoretically) possible • using alternative literals • referencing a value twice In practice, Xtext complains
It is (theoretically) possible • using alternative literals • referencing a value twice In practice, Xtext complains
Terminal rules
• Terminal and data type rules (cf. below) return ecore::EString by default
• You probably want the following rule to return an integer
terminal INT: '0' | '1'..'9' '0'..'9'*;
To this goal, you have to
1. declare the return type in the rule terminal INT returns ecore::EInt: … ;
2. create a value converter (VC)
3. create a value converter service (VCS)
4. register the VC at the VCS
Creating a value converter
Create a class implementing IValueConverter /* Responsible for the string-to-value conversion */
X toValue(String, AbstractNode)
/* Responsible for the value-to-string conversion */
String toString(X)
• X is the return type of the grammar rule • ValueConverterExceptions signal conversion
errors IValueConverter and ValueConverterException belong
to package org.eclipse.xtext.conversion AbstractNode belongs to package
org.eclipse.xtext.parsetree
Creating a value converter service
Create a class implementing IValueConverter
• The easiest way is by extending AbstractDeclarativeValueConverterServ
ice
• Extend DefaultTerminalConverters if you imported grammar Terminals (cf. below)
IValueConverter belongs to package
org.eclipse.xtext.conversion
AbstractDeclarativeValueConverterService belongs to package org.eclipse.xtext.conversion.impl
DefaultTerminalConverters belongs to package org.eclipse.xtext.common.services
Terminals belongs to package org.eclipse.xtext.common
Registering VCs at VCSs Declare as many VCS fields as IValueConverters you need @Inject private type name; • type implements IValueConverter • name is an arbitrary name Declare as many VCS methods as grammar rules you handle @ValueConverter(rule = "rule") public IValueConverter<returnType> rule(){
return converter; }
• rule is the name of the grammar rule • returnType is the type returned by converter • converter is the IValueConverter responsible for rule Inject belongs to package com.google.inject ValueConverter belongs to package
org.eclipse.xtext.conversion
Simple actions
IntOrPar returns Expression:
'(' Expression ')' |
{Integer} value=INT;
The Xtext framework will The parser generated by the Xtext
framework will
• create a class Expression
• create a class Integer
(extending Expression)
• add a field value of type
ecore::EInt to class
Integer
In the first case
return the created Expression
In the second case
• create an Integer
• assign the parsed INT to its field value
• return the created Integer
The right-hand side can be either of • a rule call • a keyword • a cross-reference (cf. below) • an alternative of the formers
Field assignment
• The operator = assigns atomic values to fields
• The operator += assigns multiple values to fields
Pair:
values+=Element ',' values+=Element;
• The operator ?= assigns binary values to fields
Wrapper: isNull?='null' | inner=Wrapped;
The Xtext framework will add The parser generated by the Xtext
framework will
a list field (with values of the proper
type) for each assignment with the +=
operator
add elements to such a list whenever
creating the corresponding object
a boolean field for each assignment
with the ?= operator
initialize such a field to false (resp.
true) if the parser does not scan
(resp. scan) the assignment’s right-side
The Xtext framework will The parser generated by the Xtext
framework will
• create classes Factor and
Term (extending
Expression)
• add them fields left, sign
and right (of the proper type)
In there is no optional part
return the created Expression
Otherwise
• create a Factor or Term
• assign the parsed IntOrPar to its field
left
• go on as expected
Assigned actions
Expression: IntOrPar (
{Factor.left=current} sign=FactorSign
right=IntOrPar |
{Term.left=current} sign=TermSign
right=IntOrPar
)? ;
Outline
• Introduction
• How to specify the target language
– Terminal rules
– Parser rules
• How to specify the target AST
– The working example
– Return types
• Further features
• The header
• Getting Xtext up & running
Hidden tokens
• Can be defined at (parser) rule- or grammar-level
– Rule-level hidden tokens override grammar-level ones
• Are automatically skipped when processing the rule/grammar
EX: Expression return Expression hidden(WS): … ;
Grammar-level Rule-level
When importing one single
grammar, its hidden tokens are
reused
Hidden tokens defined for a calling rule are
reused for called rules (unless they define
their own hidden tokens)
Data type rules
They are parser rules which
• contain neither assignments nor actions
• only call terminal or data type rules
The AST builder simply concatenates the parsed text
Why should we use data type rules instead of terminal rules?
• They allow hidden tokens
• They allow backtracking
References: Motivation
• In a language, it is often the case that the same entity is referred over and over
EX: Variables and methods in Java programs
• You do not want the AST builder to create new instances of the entity whenever a reference is found
• You rather want the AST builder to point to the entity created at definition-time
field=[type|rule]
where
• field is the field of the object created by the AST builder which is supposed to refer to an entity
• type is the class of the referred entity
• rule is a grammar rule specifying the string representation of the reference
– If omitted (with the preceding |), org.eclipse.xtext.common.Terminals
.ID is assumed
Notice that • references can only be used within assignments • entities of different classes can have the same string representation • cross-references across file boundaries are supported
• as long as the referenced entities are on the classpath
References: Syntax
References: (Default) Semantics
• In order to be referenceable, entities must have a field name
• Reference resolution is based on qualified names
• An entity’s qualified name is computed by concatenating
– the qualified name of the entity’s container
– a dot (.)
– the entity’s name
Outline
• Introduction
• How to specify the target language
– Terminal rules
– Parser rules
• How to specify the target AST
– The working example
– Return types
• Further features
• The header
• Getting Xtext up & running
The header of Xtext grammars
Consists of declarations of • the grammar’s name and possibly • imported grammars • grammar-scoped hidden tokens • imported Ecore packages • the Ecore package to generate The first rule in a grammar (entry rule) is assumed
to be its entry point • i.e., it is the first rule the parser generated by
Xtext will try to apply
Name and imported grammars
The grammar’s name
• Xtext grammar names follow Java’s naming conventions
The grammar file must have the same name as the grammar it contains (and extension .xtext)
EX: grammar org.eclipse.xtext.Xtext
Imported grammars
• The current grammar can reuse (or override) rules defined in other grammars
EX: with org.eclipse.xtext.common.Terminals
Hidden tokes and imported packages
Grammar-scoped hidden tokens
• Are declared just like hidden tokens for rules
EX: hidden(WS)
Imported Ecore packages
• You do not really need to care about them
• Just do not be scared if you see something like
import
"http://www.eclipse.org/emf/200
2/Ecore" as ecore
The package to generate
• Among else, Xtext creates an Ecore package (whatever it is)
• Just keep in mind that
– A name and a namespace URI are required to create an Ecore package
– You must provide Xtext with such data
EX: generate myDsl "http://www.univStEtienne.fr/my
dsl/MyDsl"
Exercise
To test your understanding of XGL, have a look at XGL’s Xtext grammar
http://dev.eclipse.org/viewcvs/v
iewvc.cgi/org.eclipse.tmf/org.e
clipse.xtext/plugins/org.eclips
e.xtext/src/org/eclipse/xtext/X
text.xtext?root=Modeling_Projec
t&view=markup
Outline
• Introduction
• How to specify the target language
– Terminal rules
– Parser rules
• How to specify the target AST
– The working example
– Return types
• Further features
• The header of Xtext grammars
• Getting Xtext up & running
Getting Xtext up & running (I)
Install Eclipse
1. Download Eclipse Modeling Tools – http://www.eclipse.org/downloads/
2. Start Eclipse Modeling Tools
3. Click on Install Modeling Components (the fifth icon from the left on the icon bar right below the menu bar)
4. Select Xtext
Getting Xtext up & running (II)
Create an Xtext project
1. File New Project… Xtext Xtext project
2. Choose a meaningful project name, language name and file extension
3. Uncheck the Create generator project box
4. Click on Finish
5. Add http://download.itemis.com/ant
lr-generator-3.0.1.jar to the project’s classpath
Getting Xtext up & running (III) Generate the language artifacts 1. Replace the content of the automatically opened
grammar file with your grammar 2. Locate the file GenerategrammarName.mwe2 next to
the grammar file in the package explorer view 3. Choose Run As MWE2 Workflow from its context menu 4. Possibly add your converters and converter service to the
non-ui project – Add the following method to the class
grammarNameRuntimeModule @Override
public Class<? extends IValueConverterService> bindIValueConverterService() {
return converterService.class; }
Getting Xtext up & running (IV)
Run the generated IDE plug-in
1. Right-click on the Xtext project and choose Run As Eclipse Application
– This will spawn a new Eclipse workbench
2. Create a new project
3. Create a new file with the file extension you chose in the beginning
– This will open the generated entity editor
4. Enjoy the editor