binary studio academy pro: antlr course by alexander vasiltsov (lesson 2)
TRANSCRIPT
ANTLR 4
Grammars
by Alexander Vasiltsov
EBNF
● lexeme “::=” its description (or “=”)
● ‘...’ - text element - character or group of
characters
● А В - element А followed by element B
(concatenation)
● A | B - element А or В (choice)
● [A] - element А exists or not (optional
existence)
● {A} - zero or more А elements (repeat)
● (А В) - elements grouping
ANTLR Notation
Grammar patterns
Sequence of elements
Choice between multiple alternatives
Token dependence - precence of some token
requires presence of its counterpart
somewhere in a phrase
Nested phrase - a self-similar language
construct
Sequence
This is a finite or arbitrarily long sequence of
tokens or subphrases
Sequence with terminator
Sequence with separator
Choiсe (Alternatives)
This is a set of alternative phrases
Token Dependency
The presence of one token requires the
presence of one or more subsequent tokens
Nested Phrase
This is a self-similar language structure
Common lexical structures
Lexical Starter Kit (1)
Lexical Starter Kit (2)
Lexical Starter Kit (3)
Line between lexer and parser
● Match and discard anything in the lexer that the parser
does not need to see at all
● Match common tokens such as identifiers, keywords,
strings, and numbers in the lexer
● Lump together into a single token type those lexical
structures that the parser does not need to distinguish
● Lump together anything that the parser can treat as a
single entity
● On the other hand, if the parser needs to pull apart a
lump of text to process it, the lexer should pass the
individual components as tokens to the parser
JSON grammar (1)grammar JSON;
json: object
| array
;
object
: '{' pair (',' pair)* '}'
| '{' '}' // empty object
;
pair: STRING ':' value ;
array
: '[' value (',' value)* ']'
| '[' ']' // empty array
;
value
: STRING
| NUMBER
| object // recursion
| array // recursion
| 'true' // keywords
| 'false'
| 'null'
;
JSON grammar (2)
STRING : '"' (ESC | ~["\\])* '"' ;
fragment ESC : '\\' (["\\/bfnrt] | UNICODE) ;
fragment UNICODE : 'u' HEX HEX HEX HEX ;
fragment HEX : [0-9a-fA-F] ;
NUMBER
: '-'? INT '.' [0-9]+ EXP? // 1.35, 1.35E-9, 0.3, -4.5
| '-'? INT EXP // 1e10 -3e4
| '-'? INT // -3, 45
;
fragment INT : '0' | [1-9] [0-9]* ; // no leading zeros
fragment EXP : [Ee] [+\-]? INT ; // \- since - means "range" inside [...]
WS : [ \t\n\r]+ -> skip ;
Typical JSON
Parse tree