parboiled explained
TRANSCRIPT
![Page 1: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/1.jpg)
Parboiled2 explained
![Page 2: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/2.jpg)
Covered
Why Parboiled2Library basicsPerfomance optimizationsBest PracticesMigration
![Page 3: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/3.jpg)
Features PEG No lexer required Flexible typesfe EDSL Compile-time optimizations Decent error reporting scala.js support
![Page 4: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/4.jpg)
When regex fail
Parse arbitrary HTML with regexes is like asking Paris Hilton to write an operating system (c)
![Page 5: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/5.jpg)
When regex fail
![Page 6: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/6.jpg)
Performance (regex)
Parsing
Warmup
620.38
621.95
Parboiled2Regex
Data is taken from here:http://bit.ly/1XHAJaA
Lower is better
![Page 7: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/7.jpg)
Performance (json)
Parboiled1
Parboiled2
Argonaut
Json4SNative
Json4SJackson
85.64
13.17
7.01
8.06
4.09
Data is taken from here:http://myltsev.name/ScalaDays2014/#/
Lower is better
![Page 8: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/8.jpg)
Performance (json)
Parser combinators
Parboiled1
Parboiled2
Argonaut
Json4SNative
Json4SJackson
2385.78
85.64
13.17
7.01
8.06
4.09
Data is taken from here:https://groups.google.com/forum/#!topic/parboiled-user/bGtdGvllGgU
Lower is better
![Page 9: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/9.jpg)
Alternatives
● Grappa [java]● ANTLR● Regexps● Parser-combinators● Language Workbenches (xtext, MPS)
![Page 10: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/10.jpg)
<dependency>
<groupId>org.parboiled</groupId>
<artifactId>parboiled_2.11</artifactId>
<version>2.1.0</version>
</dependency>
![Page 11: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/11.jpg)
import org.parboiled2._
class MyParser (val input: ParserInput) extends Parser { // Your grammar}
![Page 12: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/12.jpg)
Rule DSL
![Page 13: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/13.jpg)
Basic match
def CaseDoesntMatter = rule { ignoreCase("string")}
def MyCharRule = rule { 'a' }def MyStringRule = rule { "string" }
def MyCharRule = rule { ch('a') }def MyStringRule = rule { str("string") }
![Page 14: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/14.jpg)
Basic match
def CaseDoesntMatter: Rule0 = rule { ignoreCase("string") }
def MyCharRule: Rule0 = rule {'a'}
def MyStringRule: Rule0 = rule { "string" }
![Page 15: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/15.jpg)
Syntactic predicates
● ANY – matches any character except EOI● EOI – virtual chararter represents the end of input
val EOI = '\uFFFF'
You must define EOI at the end of the main/root rule
![Page 16: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/16.jpg)
Syntactic predicates● anyOf – at least one of the defined chars● noneOf – everything except those chars
def Digit = rule { anyOf("1234567890")}
def Visible = rule { noneOf(" \n\t")}
![Page 17: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/17.jpg)
Character ranges
def Digit = rule { '0' - '9' }def AlphaLower = rule { 'a' - 'z' }
Good, but not flexible(the main issue of parboiled1)
● Sometimes you don't need ANY character
● You have a range of characters
![Page 18: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/18.jpg)
Character predicatesThere is set of predifined char predicates:
● CharPredicate.All● CharPredicate.Digit● CharPredicate.Digit19● CharPredicate.HexDigit
Of course you can defien your own
![Page 19: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/19.jpg)
def AllButQuotes = rule {
CharPredicate.Visible -- "\"" -- "'"
}
def ValidIdentifier = rule {
CharPredicate.AlphaNum ++ "_"
}
CharPredicate from (_.isSpaceChar)
Character predicates
![Page 20: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/20.jpg)
def ArithmeticOperation = rule {
anyOf("+-*/^")
}
def WhiteSpaceChar = rule { noneOf(" \t\n")}
anyOf/noneOf
![Page 21: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/21.jpg)
def cows = rule { 1000 times "cow" }
def PRI = rule { 1 to 3 times Digit }
N times
![Page 22: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/22.jpg)
def OptWs = rule { zeroOrMore(Whitespace) // Whitespace.*}
def UInt = rule { oneOrMore(Digit) // Whitespace.+}
def CommaSeparatedNumbers = rule { oneOrMore(UInt).separatedBy(",")}
0+/1+
![Page 23: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/23.jpg)
import CharPredicate.Digit
// "yyyy-mm-dd"def SimplifiedRuleForDate = rule { Year ~ "-" ~ Month ~ "-" ~ Day}
def Year = rule { Digit ~ Digit ~ Digit ~ Digit}
def Month = rule { Digit ~ Digit }def Day = rule { Digit ~ Digit }
Sequence
![Page 24: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/24.jpg)
// zeroOrOnedef Newline = rule { optional('\r') ~ '\n'}
def Newline = rule { '\r'.? ~ '\n'}
Optional
![Page 25: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/25.jpg)
def Signum = rule { '+' | '-' }
def bcd = rule { 'b' ~ 'c' | 'b' ~ 'd'}
Ordered choice
![Page 26: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/26.jpg)
// why order mattersdef Operator = rule { "+=" | "-=" | "*=" | "++" | "--" | "+" | "-" | "*" | "/" ...}
def Operators = rule { ("+" ~ ("=" | "+").?) | ("-" ~ ("=" | "-").?) | ...}
Order matters
![Page 27: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/27.jpg)
Running the parserclass MyParser(val input: ParserInput)
extends Parser {
def MyStringRule: Rule0 = rule {
ignoreCase("match") ~ EOI }
}
![Page 28: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/28.jpg)
Running the parser
val p1 = new MyParser("match")val p2 = new MyParser("much")
p1.MyStringRule.run() // Success
p2.MyStringRule.run() // Failure
Different delivery schemes are also available
![Page 29: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/29.jpg)
Running the parser
val p1 = new MyParser("match")val p2 = new MyParser("much")
p1.MyStringRule.run() // Success
p2.MyStringRule.run() // Failure
Different delivery schemes are also available
![Page 30: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/30.jpg)
BKVserver.name = "webserver"server { port = "8080" address = "192.168.88.88"
settings { greeting_message = "Hello!\n It's me!" }}
![Page 31: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/31.jpg)
Performance
![Page 32: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/32.jpg)
Unroll n.times for n <=4
// Slowerrule { 4 times Digit }
// Fasterrule { Digit ~ Digit ~ Digit ~ Digit }
![Page 33: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/33.jpg)
Faster stack operations
// Much fasterdef Digit4 = rule { Digit ~ Digit ~ Digit ~ Digit ~ push( #(charAt(-4))*1000 + #(charAt(-3))*100 + #(charAt(-2))*10 + #(lastChar) )}
![Page 34: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/34.jpg)
Do not recreate CharPredicate
class MyParser(val input: ParserInput) extends Parser { val Uppercase = CharPredicate.from(_.isUpper)
…
}
![Page 35: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/35.jpg)
Use predicatesdef foo = rule { capture(zeroOrMore(noneOf("\n")))}
def foo = rule { capture(zeroOrMore(!'\n')) //loop here}
def foo = rule { capture(zeroOrMore( !'\n' ~ ANY ))}
![Page 36: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/36.jpg)
Best Practices
![Page 37: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/37.jpg)
Best Practices
● Unit tests● Small rules● Decomposition● Case objects instead of strings
![Page 38: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/38.jpg)
Push case objectsdef LogLevel = rule {
capture("info" | "warning" | "error")
}
def LogLevel = rule {
“info” ~ push(LogLevel.Info)
| “warning" ~ push(LogLevel.Warning)
| “error" ~ push(LogLevel.Error)
}
![Page 39: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/39.jpg)
Simple syntax for object capture
case class Text(s: String)
def charsAST: Rule1[AST] = rule {
capture(Chars) ~> ((s: String) => Text(s))
}
def charsAST = rule {
capture(Chars) ~> Text
}
![Page 40: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/40.jpg)
Named rulesdef Header: Rule1[Header] =
rule("I am header") { ... }
def Header: Rule1[Header] = namedRule("header") {...}
def UserName = rule {
Prefix ~ oneOrMore(NameChar).named("username")
}
![Page 41: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/41.jpg)
Migration
![Page 42: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/42.jpg)
Migration
● Separate classpath org.parboiled vs org.parboiled2
● Grammar is hard to break● Compotition: trait → abstract class● Removing primitives library
![Page 43: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/43.jpg)
Drawbacks
![Page 44: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/44.jpg)
Drawbacks
● PEG (absence of lexer)● No support for left recursive grammars● No error recovery mechanism● No IDE support● No support for indentation based grammars● Awful non informative error messages
![Page 45: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/45.jpg)
![Page 46: Parboiled explained](https://reader031.vdocuments.mx/reader031/viewer/2022021506/589c99821a28abf4148b4965/html5/thumbnails/46.jpg)
Q/A