tmpa-2017: extended context-free grammars parsing with generalized ll
TRANSCRIPT
Extended Context-Free Grammars Parsing with
Generalized LL
Author: Artem Gorokhov
Saint Petersburg University
Programming Languages and Tools Lab, JetBrains
March 4,2017
Artem Gorokhov (SPbU) March 4,2017 1 / 15
Extended Context-Free Grammar
S = a M*
M = a? (B K )+
| u B
B = c | 𝜀
Artem Gorokhov (SPbU) March 4,2017 3 / 15
Existing solutions
ANTLR, Yacc, Bison
I Can’t use ECFG without transformationI Admit only subclass of Context-Free languages (LL(k), LR(k))
Some research on ECFG parsing
I No toolsI LL(k), LR(k)
I Admit arbitrary CFG (including ambiguous)I Can’t use ECFG without transformation
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Existing solutions
ANTLR, Yacc, BisonI Can’t use ECFG without transformationI Admit only subclass of Context-Free languages (LL(k), LR(k))
Some research on ECFG parsing
I No toolsI LL(k), LR(k)
I Admit arbitrary CFG (including ambiguous)I Can’t use ECFG without transformation
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Existing solutions
ANTLR, Yacc, BisonI Can’t use ECFG without transformationI Admit only subclass of Context-Free languages (LL(k), LR(k))
Some research on ECFG parsing
I No toolsI LL(k), LR(k)
I Admit arbitrary CFG (including ambiguous)I Can’t use ECFG without transformation
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Existing solutions
ANTLR, Yacc, BisonI Can’t use ECFG without transformationI Admit only subclass of Context-Free languages (LL(k), LR(k))
Some research on ECFG parsingI No toolsI LL(k), LR(k)
I Admit arbitrary CFG (including ambiguous)I Can’t use ECFG without transformation
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Existing solutions
ANTLR, Yacc, BisonI Can’t use ECFG without transformationI Admit only subclass of Context-Free languages (LL(k), LR(k))
Some research on ECFG parsingI No toolsI LL(k), LR(k)
Generalized LL
I Admit arbitrary CFG (including ambiguous)I Can’t use ECFG without transformation
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Existing solutions
ANTLR, Yacc, BisonI Can’t use ECFG without transformationI Admit only subclass of Context-Free languages (LL(k), LR(k))
Some research on ECFG parsingI No toolsI LL(k), LR(k)
Generalized LLI Admit arbitrary CFG (including ambiguous)I Can’t use ECFG without transformation
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Existing solutions
ANTLR, Yacc, BisonI Can’t use ECFG without transformationI Admit only subclass of Context-Free languages (LL(k), LR(k))
Some research on ECFG parsingI No toolsI LL(k), LR(k)
Generalized LL
I Admit arbitrary CFG (including ambiguous)I Can’t use ECFG without transformation
Artem Gorokhov (SPbU) March 4,2017 6 / 15
Automata and ECFGs
Grammar G0
S = a*S b? | c =⇒
RA for grammar G0
cS
S
a
b
ε
ε
Artem Gorokhov (SPbU) March 4,2017 7 / 15
Recursive Automata Minimization
Grammar G1
S = K K K K K K |K a K K K KK = S K | a K | a
Automaton for G1
K
a
K K KKS
K
K K K K
a
K
KK
S
a
Minimized automaton for G1
a S
a
K
K
SK K K K
K
K
Artem Gorokhov (SPbU) March 4,2017 8 / 15
Derivation Trees for Recursive Automata
Input:
aacb
Automaton:
cS
S
a
b
a
Derivation trees:
S,0,4
b,3,4a,0,1 a,1,2
c,2,3
S,2,3
S,0,4
b,3,4a,0,1
a,1,2
c,2,3
S,2,3
S,1,3
S,0,4
b,3,4
a,0,1
a,1,2
c,2,3
S,1,4
S,2,3
Artem Gorokhov (SPbU) March 4,2017 9 / 15
SPPF for Recursive Automata
Input:
aacb
Automaton:
cS
S
a
b
a
Shared Packed Parse Forest:
S,0,4
b,3,4
a,0,1 a,1,2
3,1,3
c,2,3
S,1,4
S,2,3
3,0,3
S,1,3
2,0,2
Artem Gorokhov (SPbU) March 4,2017 10 / 15
SPPF for Recursive Automata
Input:
aacb
Automaton:
cS
S
a
b
a
Shared Packed Parse Forest:
S,0,4
b,3,4
a,0,1 a,1,2
3,1,3
c,2,3
S,1,4
S,2,3
3,0,3
S,1,3
2,0,2
Artem Gorokhov (SPbU) March 4,2017 10 / 15
SPPF for Recursive Automata
Input:
aacb
Automaton:
cS
S
a
b
a
Shared Packed Parse Forest:
S,0,4
b,3,4
a,0,1 a,1,2
3,1,3
c,2,3
S,1,4
S,2,3
3,0,3
S,1,3
2,0,2
Artem Gorokhov (SPbU) March 4,2017 10 / 15
SPPF for Recursive Automata
Input:
aacb
Automaton:
cS
S
a
b
a
Shared Packed Parse Forest:
S,0,4
b,3,4
a,0,1 a,1,2
3,1,3
c,2,3
S,1,4
S,2,3
3,0,3
S,1,3
2,0,2
Artem Gorokhov (SPbU) March 4,2017 10 / 15
Input processing
Descriptors queue
Descriptor (G, i, U, T) uniquely defines parsing process stateI G - position in grammarI i - position in inputI U - stack nodeI T - current parse forest root
Artem Gorokhov (SPbU) March 4,2017 11 / 15
Input processing
Descriptors queue
Descriptor (G, i, U, T) uniquely defines parsing process stateI G - position in grammar state of RAI i - position in inputI U - stack nodeI T - current parse forest root
Artem Gorokhov (SPbU) March 4,2017 11 / 15
Input processing
Input : bc
Grammar:
S = a C_opt| b C_opt| S C_opt
C_opt = 𝜀 | c
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : ∙ bc
Grammar:
S = ∙ a C_opt| b C_opt| S C_opt
C_opt = 𝜀 | c
Descriptors queue
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : ∙ bc
Grammar:
S = a C_opt| ∙ b C_opt| S C_opt
C_opt = 𝜀 | c
Descriptors queue
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : ∙ bc
Grammar:
S = a C_opt| b C_opt| ∙ S C_opt
C_opt = 𝜀 | c
Descriptors queue
S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : ∙ bc
Grammar:
S = ∙ a C_opt| b C_opt| S C_opt
C_opt = 𝜀 | c
Descriptors queue
S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : ∙ bc
Grammar:
S = a C_opt| ∙ b C_opt| S C_opt
C_opt = 𝜀 | c
Descriptors queue
S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : b ∙ c
Grammar:
S = a C_opt| b ∙ C_opt| S C_opt
C_opt = 𝜀 | c
Descriptors queue
S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
b,0,1
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : b ∙ c
Grammar:
S = a C_opt| b C_opt| S C_opt
C_opt = ∙ 𝜀 | c
Descriptors queue
C_opt = ∙𝜀, 1, . . . , . . .S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : b ∙ c
Grammar:
S = a C_opt| b C_opt| S C_opt
C_opt = 𝜀 | ∙ c
Descriptors queue
C_opt = ∙c , 1, . . . , . . .C_opt = ∙𝜀, 1, . . . , . . .
S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : b ∙ c
Grammar:
S = a C_opt| b C_opt| S C_opt
C_opt = 𝜀 | ∙ c
Descriptors queue
C_opt = ∙c , 1, . . . , . . .C_opt = ∙𝜀, 1, . . . , . . .S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : bc∙
Grammar:
S = a C_opt| b C_opt| S C_opt
C_opt = 𝜀 | c ∙
Descriptors queue
C_opt = ∙c , 1, . . . , . . .C_opt = ∙𝜀, 1, . . . , . . .S = ∙ S C_opt, 0, . . . , . . .
S = ∙ b C_opt, 0, . . . , . . .
S = ∙ a C_opt, 0, . . . , . . .
C_opt,1,2
c,1,2
Artem Gorokhov (SPbU) March 4,2017 12 / 15
Input processing
Input : ∙ bc
Automaton :
cab
S
S Descriptors queue
S , 0, . . . , . . .
Artem Gorokhov (SPbU) March 4,2017 13 / 15
Input processing
Input : ∙ bc
Automaton :
cab
S
S b,0,1
S,0,1
b,0,1
Artem Gorokhov (SPbU) March 4,2017 13 / 15
Evaluation
Grammar G1
S = K K K K K K |K a K K K KK = S K | a K | a
RA for grammar G1
a S
a
K
K
SK K K K
K
K
Experiment results for input a40
Memory usageTime,secDescriptors Stack Edges SPPF Nodes
Grammar 7,940 6,974 111,127,244 81
RA 5,830 4,234 74,292,078 54
Ratio 27% 39% 33 % 35 %Artem Gorokhov (SPbU) March 4,2017 14 / 15
Applicability
Graph parsing: all input strings in one graph
abcdabfd
=⇒ bac
df
Graph parsing resultsMemory usage
Time, minDescriptors Stack Edges Stack Nodes
Grammar 21,134,080 7,482,789 2,731,529 02.26
RA 9,153,352 2,792,330 839,148 01.25
Ratio 57% 63% 69 % 45 %
Artem Gorokhov (SPbU) March 4,2017 15 / 15