binary studio academy pro: antlr course by alexander vasiltsov (lesson 1)

Post on 12-Jul-2015

221 Views

Category:

Software

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

ANTLR 4

ANother Tool for Language Recognition

by Alexander Vasiltsov

What ANTLR can do

● Can generate parser using formal language

description called grammar

● Grammar describes language in EBNF-like

way

● Automatically generates classes for walking

through syntax tree

● Contains powerful error recovery mechanism

● Can deal with left recursive rules

Terence Parr

http://www.antlr.org/

Where to read

Successful usages

● Twitter search engine

● Hadoop (Hive & Pig)

● Oracle (SQL Developer IDE, Migration

Tools)

● NetBeans IDE

How it works

Target Languages

ANTLR 4 Target languages:

● Java

● C#

● Python

ANTLR 3 also supports following languages: C,

C#, Java, JavaScript, ActionScript, Objective-C,

Perl, Python, Ruby and other.

Setup for Java

Java 1.6 or newer required

1) Download latest ANTLR4 package (antlr-4.4-

complete.jar) at

http://www.antlr.org/download.html

It’s done!

Setup for C#

Java 1.6 or newer required!

1) Add ANTLR reference to the projectPM> Install-Package Antlr4

2) Install ANTLR Language Support extension

ANTLRWorks

http://tunnelvisionlabs.com/products/demo/antlrworks

Lexing

Lexing (tokenizing) - is the process of grouping

of input chars stream into words (tokens).

Token contains at least 2 data fragments: its

type and matched text

Parsing

Parsing - is the process of matching of linear

sequence of tokens with language’s formal

grammar

Parse tree (syntax tree) is a result of parsing

Syntax tree

Syntax tree represents the structure of

recognized sentence where each node gives

an abstract name to its children nodes

Nodes represent grammar rules

Leafs represent tokens

Parsing process

Parser generation by ANTLR4

ArrayInitParser.java (.cs) Contains parser class definition according to grammar

named ArrayInit

ArrayInitLexer.java (.cs) Contains lexer class definition respectively

ArrayInit.tokens

ArrayInitLexer.tokens

Internal ANTLR’s files, contain token dictionary with

corresponding identifiers

ArrayInitListener.java (.cs) Listener’s interface - for walking through syntax tree

and its processing

ArrayInitBaseListener.java (.cs) Base listener class with empty methods

ArrayInitVisitor.java (.cs) Visitor’s interface - also for walking through syntax tree

using Visitor design pattern

ArrayInitBaseVisitor.java (.cs) Base visitor class with empty methods

Syntax tree structure

Walker

Listener

Visitor

“Visitor” design pattern

Parser’s generation step-by-step

● Java target language:> java -jar antlr-4.4-complete.jar <grammar-file-name>

● C# target language: add grammar file to the

project and compile it. Generated classes

will be added to obj\Debug directory

Common grammar structure

Typical Grammar

top related