cyk algorithm & cfl reachability
DESCRIPTION
CYK Algorithm & CFL reachability. By - Lohit Krishnan Chetas Mahajan. Outline. CYK Algorithm Background Problem statement. Intuition Terminologies Formal description and example. Background. Named after C ocke, Y ounger K asami. Some fascinating qualities: - PowerPoint PPT PresentationTRANSCRIPT
CYK Algorithm & CFL reachability
By - Lohit Krishnan
Chetas Mahajan
Outline
• CYK Algorithm– Background– Problem statement.– Intuition– Terminologies – Formal description and
example
Background
• Named after Cocke,YoungerKasami.
• Some fascinating qualities:– It shows that deciding if s ϵ L(G) is in P for any CNF
CFG G.– Uses a “dynamic programming” or “table-filling
algorithm” which solves decision problem.
Problem Statement• Given the CFG G :
S -> AB | BCA -> BA | aB -> CC | bC -> AB | a
• L be the language generated by G.
• Is the string “baaba”, a valid member of the L ?
• How many substrings of “baaba” are valid members of L ?
• How many distinct substrings of the given string are valid members of L ?
• How many non-empty substrings of the given string are not valid members of L ?
• How many substrings of the given string are only generated by B ?
Problem Statement
• Given a context-free grammar G and a string w – G = (V, Σ ,P , S) where • V finite set of variables • Σ (the alphabet) finite set of terminal symbols • P finite set of rules • S start symbol (distinguished element of V) • V and Σ are assumed to be disjoint
– G is used to generate the strings of language L• Does w ϵ L(G) ?? (Membership Problem)
Terminology
• Let n be the length of the string w.• Partition the given string using n+1 lines.• Number those lines from 0 to n.• Now, we define – xij as the substring of the string w which lies
between the lines i and j. (Here i < j).
– Tij be the set of non-terminals which generate
string xij
Terminology
• Grammar :S-> AB | BCA -> BA | aB -> CC | bC -> AB | a
• String to be checked is “baaba”.
• x13 = aa
• x35 = ba
• x05 = baaba
• T23 = Non-terminals generating x23 (i.e “a”).
• T23 = { A, C }b a a b a
0 1 2 3 4 5
• Build a table T of Tij , 0 ≤ i ≤ n -1 ; 1 ≤ j ≤ n ; i < j
Intuition of the algorithm
• Tij are the subproblems of Dynamic Programming.
• In this problem, we need to decide whether the start symbol belongs in T0n.
• Formation of DP: -
• T(T1T2
) = { X | X->t1t2 and t1 ϵ T1 and t2 ϵ T2 }
• Tij = U T(TikTkj)
j-1
k = i+1
T(0,1) T(1,2) T(2,3) T(3,4) T(4,5)
T(0,2) T(1,3) T(2,4) T(3,5)
T(0,3) T(1,4) T(2,5)
T(0,4) T(1,5)
T(0,5)
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
B A,C A,C B A,C
S -> AB | BCA -> BA | aB -> CC | bC -> AB | a
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
B A,C A,C B A,C
A,S
S -> AB | BCA -> BA | aB -> CC | bC -> AB | a
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
B A,C A,C B A,C
A,S B
S -> AB | BCA -> BA | aB -> CC | bC -> AB | a
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
B A,C A,C B A,C
A,S B S,C
S -> AB | BCA -> BA | aB -> CC | bC -> AB | a
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
B A,C A,C B A,C
A,S B S,C A,S
S -> AB | BCA -> BA | aB -> CC | bC -> AB | a
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
B A,C A,C B A,C
A,S B S,C A,S
-S -> AB | BCA -> BA | aB -> CC | bC -> AB | a
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
B A,C A,C B A,C
A,S B S,C A,S
- BS -> AB | BCA -> BA | aB -> CC | bC -> AB | a
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
B A,C A,C B A,C
A,S B S,C A,S
- B BS -> AB | BCA -> BA | aB -> CC | bC -> AB | a
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
B A,C A,C B A,C
A,S B S,C A,S
- B B
-S -> AB | BCA -> BA | aB -> CC | bC -> AB | a
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
B A,C A,C B A,C
A,S B S,C A,S
- B B
- S,C,AS -> AB | BCA -> BA | aB -> CC | bC -> AB | a
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
B A,C A,C B A,C
A,S B S,C A,S
- B B
- S,C,A
S,C,A
S -> AB | BCA -> BA | aB -> CC | bC -> AB | a
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
Answers• Is the string “baaba”, a valid
member of the L ? Yes !!
B A,C A,C B A,C
A,S B S,C A,S
- B B
- S,C,A
S,C,A
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
Answers• Is the string “baaba”, a valid
member of the L ? Yes !!
• How many substrings of “baaba” are valid members of L ?
5
B A,C A,C B A,C
A,S B S,C A,S
- B B
- S,C,A
S,C,A
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
Answers• Is the string “baaba”, a valid
member of the L ? Yes !!
• How many substrings of “baaba” are valid members of L ?
5
• How many distinct substrings of the given string are valid members of L ?
4
B A,C A,C B A,C
A,S B S,C A,S
- B B
- S,C,A
S,C,A
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
Answers• Is the string “baaba”, a valid
member of the L ? Yes !!
• How many substrings of “baaba” are valid members of L ?
5
• How many distinct substrings of the given string are valid members of L ?
4
• How many non-empty substrings of the given string are not valid members of L ?
15 – 5 = 10
Answers
• Is the string “baaba”, a valid member of the L ?
Yes !!
• How many substrings of “baaba” are valid members of L ?
5
• How many distinct substrings of the given string are valid members of L ?
4
• How many non-empty substrings of the given string are not valid members of L ?
15 – 5 = 10
• How many substrings of the given string are only generated by B ?
5
B A,C A,C B A,C
A,S B S,C A,S
- B B
- S,C,A
S,C,A
b a a b a0 1 2 3 4 5
b a a b a
ba aa ab ba
baa aab aba
baab aaba
baaba
CFL Reachability
Outline
• CFL reachability– Motivation– Problem definition– Variants of CFL
Reachability problem– Relation with other
Problems– Algorithm– Example
Motivation
“Program Analysis via Graph-reachability”By Thomas Reps
Motivation
• Program analysis requires extraction of information from a program without actually running it.
• Classical data-flow analysis maintains set of “dataflow facts” with each program point.
• Program analysis Graph Reachability problem(GRP)
• GRP is a special case of CFL Reachability problem.
Problem Definition
• Let L be a context-free language over alphabet ∑, and let G be a graph whose edges are labeled with members of ∑.
• Each path in G defines a word over ∑*, namely, the word obtained by concatenating, in order, the labels of the edges on the path. A path in G is an L-path if its word is a member of L.
Variants of CFL Reachability Problem
1. The all-pairs L-path problem.2. The single-source L-path problem.3. The single-target L-path problem4. The single-source/single-target L-path
problem.• Other Variants : Multi-source L-path problem,
the multi-target L-path problem, and the multi-source/multi-target L-path problem
Example
• L be the language that consists of strings of matched parentheses and square brackets, with zero or more e’s inside it.• Only one L-Path : [(e[])eee[e]]
Relation with other problems
• Ordinary Graph Reachability Problem– Put all the labels as e, and L = e*
• CFL Recognition Problem – “Given a string w and a context-free language L, is
w ϵ L?”– Create a linear graph s →... → t, that has |w|
edges, and label the ith edge with the ith letter of w. – There is an L-path from s to t iff w ϵ L.
Algorithm• Normalize the grammar so that the right-hand side of each production has at most two symbols (either terminals or nonterminals).• Add additional edges as shown in the figure below.
A ϵ N B, C ϵ (N U T)
• Solution can be obtained via edges labelled with Start Symbol of the Grammar.
Example
• Grammar :S-> AB | BCA -> BA | aB -> CC | bC -> AB | a
• Graph G :
• All pair L-Path Problem.
b a a
b
Questions ??