xtext the very least

The very least about Xtext

Juri Luca De Coi

Saint-Etienne, France, 16-05-2011

Outline

• Introduction

• How to specify the target language

– Terminal rules

– Parser rules

• How to specify the target AST

– The working example

– Return types

• Further features

• The header

• Getting Xtext up & running

Xtext (I)

Xtext is a language development framework

– i.e., a technology supporting the activity of developing languages

Given the Xtext grammar of a language, it provides you with

for that language

an Eclipse editor with • content assistance

• quick fixes • template proposals

• outline • hyperlinking

• syntax coloring • project wizard

a code generator

a serializer and code formatter

a scoping and linking framework

a validator an AST builder

a parser

Xtext (II)

The more you want, the more you have to pay BUT • if you are fine with the (reasonable) defaults,

your amount of work will be pretty low • otherwise, Xtext is highly configurable

– Each automatically generated class can be replaced in a non-invasive way

What do we want? • A parser • An AST builder We will (almost) only focus on Xtext’s grammar

language

Technical remark

Xtext is based on Ecore

Knowledge of Ecore is required to exploit Xtext’s full potential

Ecore is the core of the Eclipse Modeling Framework Project (EMF)

– EMF is “a modeling framework and code generation facility for building tools and other applications based on a structured data model”

I will try to leave Ecore out as much as possible

I will skip some (most) parts of Xtext

Xtext’s grammar language (XGL)

• A language to describe (textual) languages

• An Xtext grammar describes

– the syntax of the target language

– the structure of the target AST

The syntax of the target language

Not surprisingly, XGL distinguishes between

• lexical level

• syntactic level

specifies the language’s

by means of

exploited by the

Lexical level

tokens keywords, terminal rules

lexer (a.k.a. scanner or tokenizer)

Syntactic level

grammar parser rules

parser

Outline

• Introduction


– Terminal rules

– Parser rules



– Return types


• The header


Terminal rules (I)

terminal NAME : expression ;

expression can contain

1. keywords – single- or double-quoted

– can have any length

– can contain arbitrary characters • including the escape sequences \b, \t , \n , \f , \r , \" , \' and \\

• Unicode escape sequences (e.g., \u123) are not supported

EX: 'foo', "\foo", '"', "'"

Terminal rules (II)

2. wildcard (.)

– An arbitrary character

– EX: .

3. rule calls

– Terminal rules can only point to other terminal rules

– EX: ID (assuming that ID is the name of a terminal rule)

4. character ranges (..)

– Extremes are included

– EX: 'a'..'z', 'A'..'Z', '0'..'9'

Terminal rules (III)

5. until token (->)

– All input between the preceding and the following token (extremes are included)

– EX: '/*' -> '*/'

6. negated token (!)

– Input different than the following

– EX: !'\n'

7. cardinality operators (?, *, + or nothing)

– EX: '^'?, '\r'*, '9'+

Terminal rules (IV)

8. groups (token sequences)

– EX: 'a' . ID (assuming that ID is the name of a terminal rule)

9. alternatives (|)

– EX: ' ' | '\t' | '\r' | '\n'

Operator priority

Ordered by decreasing priority

Parenthesis (()) can override default priorities terminal ID:

'^'?

('a'..'z'|'A'..'Z'|'_')

('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;

Character ranges ..

Until token, Negated token ->, !

Cardinality operators ?, *, + or nothing

Groups Token sequences

Alternatives |

Technical remark

NOTE: Terminal rules can hide each other

The order of terminal rules is crucial

This is especially important when mixing

• newly introduced rules and

• rules from imported grammars (cf. below)

Outline

• Introduction


– Terminal rules

– Parser rules



– Return types


• The header


Parser rules (I)

name : expression ;

expression can contain

1. keywords – single- or double-quoted

– can have any length

– can contain arbitrary characters • including the escape sequences \b, \t , \n , \f , \r , \" , \' and \\

• Unicode escape sequences (e.g., \u123) are not supported

EX: 'foo', "\foo", '"', "'"

Parser rules (II)

2. rule calls – EX: ID (assuming that ID is the name of a rule)

3. cardinality operators (?, *, + or nothing) – EX: '^'?, '\r'*, '9'+

4. groups (token sequences) – EX: 'a' ID

5. unordered groups (&) – Elements can appear in any order but only once

– Elements with cardinality * or + must appear continuously without interruption

– EX: 'a' & ID*

Parser rules (III)

6. alternatives (|)

– EX: ' ' | '\t' | '\r' | '\n'

Operator priority

Ordered by decreasing priority

Parenthesis (()) can override default priorities Action:

'{' TypeRef (

'.' ID ('='|'+=') 'current'

)? '}' ;

Cardinality operators ?, *, + or nothing

Groups Token sequences

Unordered groups &

Alternatives |

Outline

• Introduction


– Terminal rules

– Parser rules



– Return types


• The header


The structure of the resulting AST

• You typically (should) know the AST you want before defining a textual representation for it

• You will now learn how to instruct Xtext to build the ASTs you want

• Let start with the classical example

Arithmetical expressions (I)

Arithmetical expressions (II)

• We have to define a corresponding textual representation – i.e., we have to define a corresponding grammar

• To keep things easy, let define a grammar which – does not consider operator priorities

– does not consider operator associativity

– requires to explicitly specify parenthesis

EX:

• not 1 + 2 * (3 – 4 / 5)

• but 1 + (2 * (3 – (4 / 5)))

Arithmetical expressions (IV)

Ho

w t

o in

stru

ct X

text

to

bu

ild t

his

A

ST o

ut

of

1 + (2 * (3 – (4 / 5)))

?

Outline

• Introduction


– Terminal rules

– Parser rules



– Return types


• The header


Return types

• Each rule should specify a return type

• The return type defaults to – ecore::EString (for terminal and data type rules–cf.

below)

– the rule’s name (otherwise)

IntOrPar returns Expression: … ;

terminal INT returns ecore::EInt: … ;

The Xtext framework will create The parser generated by the

Xtext framework will create

a Java class for each (non-

existing) return type

an instance of such class

whenever applying a rule with

such a return type

Enumeration rules

TermSign and FactorSign are enumerations

• enumerations can be specified by (enumeration) rules

enum TermSign: PLUS='+' | PLUS='plus' |

MINUS='-' | MINUS='minus';

• If the value is omitted, you will get equal name and value

• The first enumeration value is the default one

The Xtext framework will create The parser generated by the

Xtext framework will create

an enumeration for each

enumeration rule

an enumeration value whenever

applying the corresponding

enumeration rule

It is (theoretically) possible • using alternative literals • referencing a value twice In practice, Xtext complains

It is (theoretically) possible • using alternative literals • referencing a value twice In practice, Xtext complains

Terminal rules

• Terminal and data type rules (cf. below) return ecore::EString by default

• You probably want the following rule to return an integer

terminal INT: '0' | '1'..'9' '0'..'9'*;

To this goal, you have to

1. declare the return type in the rule terminal INT returns ecore::EInt: … ;

2. create a value converter (VC)

3. create a value converter service (VCS)

4. register the VC at the VCS

Creating a value converter

Create a class implementing IValueConverter /* Responsible for the string-to-value conversion */

X toValue(String, AbstractNode)

/* Responsible for the value-to-string conversion */

String toString(X)

• X is the return type of the grammar rule • ValueConverterExceptions signal conversion

errors IValueConverter and ValueConverterException belong

to package org.eclipse.xtext.conversion AbstractNode belongs to package

org.eclipse.xtext.parsetree

Creating a value converter service

Create a class implementing IValueConverter

• The easiest way is by extending AbstractDeclarativeValueConverterServ

ice

• Extend DefaultTerminalConverters if you imported grammar Terminals (cf. below)

IValueConverter belongs to package

org.eclipse.xtext.conversion

AbstractDeclarativeValueConverterService belongs to package org.eclipse.xtext.conversion.impl

DefaultTerminalConverters belongs to package org.eclipse.xtext.common.services

Terminals belongs to package org.eclipse.xtext.common

Registering VCs at VCSs Declare as many VCS fields as IValueConverters you need @Inject private type name; • type implements IValueConverter • name is an arbitrary name Declare as many VCS methods as grammar rules you handle @ValueConverter(rule = "rule") public IValueConverter<returnType> rule(){

return converter; }

• rule is the name of the grammar rule • returnType is the type returned by converter • converter is the IValueConverter responsible for rule Inject belongs to package com.google.inject ValueConverter belongs to package

org.eclipse.xtext.conversion

Simple actions

IntOrPar returns Expression:

'(' Expression ')' |

{Integer} value=INT;

The Xtext framework will The parser generated by the Xtext

framework will

• create a class Expression

• create a class Integer

(extending Expression)

• add a field value of type

ecore::EInt to class

Integer

In the first case

return the created Expression

In the second case

• create an Integer

• assign the parsed INT to its field value

• return the created Integer

The right-hand side can be either of • a rule call • a keyword • a cross-reference (cf. below) • an alternative of the formers

Field assignment

• The operator = assigns atomic values to fields

• The operator += assigns multiple values to fields

Pair:

values+=Element ',' values+=Element;

• The operator ?= assigns binary values to fields

Wrapper: isNull?='null' | inner=Wrapped;

The Xtext framework will add The parser generated by the Xtext

framework will

a list field (with values of the proper

type) for each assignment with the +=

operator

add elements to such a list whenever

creating the corresponding object

a boolean field for each assignment

with the ?= operator

initialize such a field to false (resp.

true) if the parser does not scan

(resp. scan) the assignment’s right-side

The Xtext framework will The parser generated by the Xtext

framework will

• create classes Factor and

Term (extending

Expression)

• add them fields left, sign

and right (of the proper type)

In there is no optional part

return the created Expression

Otherwise

• create a Factor or Term

• assign the parsed IntOrPar to its field

left

• go on as expected

Assigned actions

Expression: IntOrPar (

{Factor.left=current} sign=FactorSign

right=IntOrPar |

{Term.left=current} sign=TermSign

right=IntOrPar

)? ;

Outline

• Introduction


– Terminal rules

– Parser rules



– Return types


• The header


Hidden tokens

• Can be defined at (parser) rule- or grammar-level

– Rule-level hidden tokens override grammar-level ones

• Are automatically skipped when processing the rule/grammar

EX: Expression return Expression hidden(WS): … ;

Grammar-level Rule-level

When importing one single

grammar, its hidden tokens are

reused

Hidden tokens defined for a calling rule are

reused for called rules (unless they define

their own hidden tokens)

Data type rules

They are parser rules which

• contain neither assignments nor actions

• only call terminal or data type rules

The AST builder simply concatenates the parsed text

Why should we use data type rules instead of terminal rules?

• They allow hidden tokens

• They allow backtracking

References: Motivation

• In a language, it is often the case that the same entity is referred over and over

EX: Variables and methods in Java programs

• You do not want the AST builder to create new instances of the entity whenever a reference is found

• You rather want the AST builder to point to the entity created at definition-time

field=[type|rule]

where

• field is the field of the object created by the AST builder which is supposed to refer to an entity

• type is the class of the referred entity

• rule is a grammar rule specifying the string representation of the reference

– If omitted (with the preceding |), org.eclipse.xtext.common.Terminals

.ID is assumed

Notice that • references can only be used within assignments • entities of different classes can have the same string representation • cross-references across file boundaries are supported

• as long as the referenced entities are on the classpath

References: Syntax

References: (Default) Semantics

• In order to be referenceable, entities must have a field name

• Reference resolution is based on qualified names

• An entity’s qualified name is computed by concatenating

– the qualified name of the entity’s container

– a dot (.)

– the entity’s name

Outline

• Introduction


– Terminal rules

– Parser rules



– Return types


• The header


The header of Xtext grammars

Consists of declarations of • the grammar’s name and possibly • imported grammars • grammar-scoped hidden tokens • imported Ecore packages • the Ecore package to generate The first rule in a grammar (entry rule) is assumed

to be its entry point • i.e., it is the first rule the parser generated by

Xtext will try to apply

Name and imported grammars

The grammar’s name

• Xtext grammar names follow Java’s naming conventions

The grammar file must have the same name as the grammar it contains (and extension .xtext)

EX: grammar org.eclipse.xtext.Xtext

Imported grammars

• The current grammar can reuse (or override) rules defined in other grammars

EX: with org.eclipse.xtext.common.Terminals

Hidden tokes and imported packages

Grammar-scoped hidden tokens

• Are declared just like hidden tokens for rules

EX: hidden(WS)

Imported Ecore packages

• You do not really need to care about them

• Just do not be scared if you see something like

import

"http://www.eclipse.org/emf/200

2/Ecore" as ecore

The package to generate

• Among else, Xtext creates an Ecore package (whatever it is)

• Just keep in mind that

– A name and a namespace URI are required to create an Ecore package

– You must provide Xtext with such data

EX: generate myDsl "http://www.univStEtienne.fr/my

dsl/MyDsl"

Exercise

To test your understanding of XGL, have a look at XGL’s Xtext grammar

http://dev.eclipse.org/viewcvs/v

iewvc.cgi/org.eclipse.tmf/org.e

clipse.xtext/plugins/org.eclips

e.xtext/src/org/eclipse/xtext/X

text.xtext?root=Modeling_Projec

t&view=markup

Outline

• Introduction


– Terminal rules

– Parser rules



– Return types


• The header of Xtext grammars


Getting Xtext up & running (I)

Install Eclipse

1. Download Eclipse Modeling Tools – http://www.eclipse.org/downloads/

2. Start Eclipse Modeling Tools

3. Click on Install Modeling Components (the fifth icon from the left on the icon bar right below the menu bar)

4. Select Xtext

Getting Xtext up & running (II)

Create an Xtext project

1. File New Project… Xtext Xtext project

2. Choose a meaningful project name, language name and file extension

3. Uncheck the Create generator project box

4. Click on Finish

5. Add http://download.itemis.com/ant

lr-generator-3.0.1.jar to the project’s classpath

Getting Xtext up & running (III) Generate the language artifacts 1. Replace the content of the automatically opened

grammar file with your grammar 2. Locate the file GenerategrammarName.mwe2 next to

the grammar file in the package explorer view 3. Choose Run As MWE2 Workflow from its context menu 4. Possibly add your converters and converter service to the

non-ui project – Add the following method to the class

grammarNameRuntimeModule @Override

public Class<? extends IValueConverterService> bindIValueConverterService() {

return converterService.class; }

Getting Xtext up & running (IV)

Run the generated IDE plug-in

1. Right-click on the Xtext project and choose Run As Eclipse Application

– This will spawn a new Eclipse workbench

2. Create a new project

3. Create a new file with the file extension you chose in the beginning

– This will open the generated entity editor

4. Enjoy the editor

xtext the very least

Documents