introduction to language theory prepared by manuel e. bermúdez, ph.d. associate professor...
TRANSCRIPT
Introduction to Language Theory
Prepared by
Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida
Programming Language Translators
Introduction to Language TheoryDefinition: An alphabet (or vocabulary) Σ is a
finite set of symbols.
Example: Alphabet of Pascal:+ - * / < … (operators)begin end if var (keywords)<identifier> (identifiers)<string> (strings)<integer> (integers); : , ( ) [ ] (punctuators)
Note: All identifiers are represented by one symbol, because Σ must be finite.
Introduction to Language Theory
Definition: A sequence t = t1t2…tn of symbols from an alphabet Σ is a string.
Definition: The length of a string t = t1t2…tn (denoted |t|) is n. If n = 0, the string is ε, the empty string.
Definition: Given strings s = s1s2…sn and
t = t1t2…tm, the concatenation of s and t, denoted st, is the string s1s2…snt1t2…tm.
Introduction to Language Theory
Note: εu = u = uε, uεv = uv, for any strings u,v (including ε)
Definition: Σ* is the set of all strings of symbols from Σ.
Note: Σ* is called the reflexive, transitive closure of Σ.
Σ* is described by the graph (Σ*, ·), where “·” denotes concatenation, and there is a designated “start” node, ε.
Introduction to Language TheoryExample: Σ = {a, b}.
(Σ*, ·)
Σ* is countably infinite, so can’t compute all of Σ*, and can only compute finite subsets of Σ*, but can compute whether a given string is in Σ*.
ε
a
b
aa
ab
ba
bb
aba
abba
b
ba
a
b
a
b
Introduction to Language Theory
Example: Σ = Pascal vocabulary. Σ* = all possible alleged Pascal
programs, i.e. all possible inputs to Pascal compiler.
Need to specify L Σ*, the correct Pascal programs.
Definition: A language L over an alphabet Σ is a subset of Σ*.
Introduction to Language Theory
Example: Σ = {a, b}.L1 = ø is a languageL2 = {ε} is a languageL3 = {a} is a languageL4 = {a, ba, bbab} is a languageL5 = {anbn / n >= 0} is a language
where an = aa…a, n timesL6 = {a, aa, aaa, …} is a language
Note: L5 is an infinite language, but described finitely.
Introduction to Language Theory
THIS IS THE MAIN GOAL OF LANGUAGE SPECIFICATION :
To describe (infinite) programming languages finitely, and to provide corresponding finite inclusion-test algorithms.
Language Constructors
Definition: The catenation (or product) of two languages L1 and L2, denoted L1L2, is the set
{uv | uL1, vL2}.
Example: L1 = {ε, a, bb}, L2 = {ac, c}
L1L2 = {ac, c, aac, ac, bbac, bbc}
= {ac, c, aac, bbac, bbc}
Language Constructors
Definition: Ln = LL…L (n times), and L0 = {ε}.
Example: L = {a, bb} L3 = {aaa, aabb, abba, abbbb, bbaa, bbabb, bbbba, bbbbbb}
Language ConstructorsDefinition: The union of two languages L1 and L2 is
the set L1 L2 = {u | uL1} { v | vL2}
Definition: The Kleene star (L*) of a language is the set L* = U Ln, n >0.
Example: L = {a, bb} L* = {any string composed of a’s and
bb’s}
Definition: The Transitive Closure (L+) of a language L is the set L+ = U Ln, n > 1.
∩ ∩
Language Constructors
Note: In general, L* = L+ U {ε}, but L+ ≠ L* - {ε}.
For example, consider L = {ε}. Then {ε} = L+ ≠ L* – {ε} = {ε} – {ε} = ø.
Grammars
Goal: Providing a means for describing languages finitely.
Method: Provide a subgraph (Σ*, →*) of (Σ*, ·), and a start node S, such that the set of reachable nodes (from S) are the strings in the language.
Grammars
Example: Σ = {a, b}
L = {anbn / n > 0}
ε
a
b
aa
ab
ba
bb
aab
aaa
bbb
bba
aaba
bbaa
bbab
aabb
b
a
b
a
b
a
a
b
bb
a
a
a
b
Grammars
“=>” (derives) is a relation defined by a finite set of rewrite rules known as productions.
Definition: Given a vocabulary V, a production is a pair (u, v) V* x V*, denoted u → v. u is called the left-part; v is called the right-part.
Grammars
Example: Pseudo-English.V = {Sentence, NP, VP, Adj, N, V, boy, girl, the, tall, jealous, hit, bit}
Sentence → NP VP (one production)NP → NNP → Adj NPN → boyN → girlAdj → theAdj → tallAdj → jealousVP → V NPV → hitV → bit
Note: English is much too complicated to be described this way.
Grammars
Definition: Given a finite set of productions P V* x V* the relation => is defined such that
, β, u, v V* , uβ => vβ iff u → v P is a production.
Example: Sentence → NP VP Adj → the NP → N Adj → tall NP → Adj NP Adj → jealous N → boy VP → V NP N → girl V → hit
V → bit
Grammars
Sentence => NP VP=> Adj NP VP=> the NP VP=> the Adj NP VP=> the jealous NP VP=> the jealous N VP=> the jealous girl VP=> the jealous girl V NP=> the jealous girl hit NP => the jealous girl hit Adj NP=> the jealous girl hit the NP=> the jealous girl hit the N => the jealous girl hit the
boy
GrammarsDefinition: A grammar is a 4-tuple G = (Φ, Σ, P, S) where
Φ is a finite set of nonterminals, Σ is a finite set of terminals, V = Φ U Σ is the grammar’s vocabulary, S Φ is called the start or goal symbol, and P V* x V* is a finite set of productions.
Example: Grammar for {anbn / n > 0}.
G = (Φ, Σ, P, S), where Φ = {S}, Σ = {a, b}, and P = {S → aSb, S → ε}
Grammars
Derivations: S => aSb => aaSbb => aaaSbbb => aaaaSbbbb → …
ε ab aabb aaabbb aaaabbbb
Note: Normally, grammars are given by simply listing the productions.
=> => =>=> =>
Grammar Conventions
TWS convention
1. Upper case letter (identifier) – nonterminal2. Lower case letter (string) – terminal3. Lower case greek letter – strings in V*4. Left part of the first production is assumed to
be the start symbol, e.g.S → aSbS → ε
5. Left part omitted if same as for preceeding production, e.g.S → aSb → ε
GrammarsExample: Grammar for identifiers.
Identifier → Letter→ Identifier Letter→ Identifier Digit
Letter → ‘a’ → ‘A’ → ‘b’ → ‘B’
.
.→ ‘z’ → ‘Z’
Digit → ‘0’→ ‘1’..→ ‘9’
Grammars
Definition: The language generated by a grammar G, is the set L(G) = { Σ* | S =>* }
Definition: A sentential form generated by a grammar G is any string α such that S =>* .
Definition: A sentence generated by a
grammar G is any sentential form such that Σ*.
GrammarsExample:
sentential forms
S => aSb => aaSbb => aaaSbbb => aaaaSbbbb > … ε ab aabb aaabbb aaaabbbb
Lemma: L(G) = { | is a sentence}
Proof: Trivial.
=> => => =>=>sentences
GrammarsDerivations: A => aABC => aaABCBC => …
aBC aaBCBC aaaBCBCBC abC aabCBC aaaBBCBCC abc aabBCC aaaBBBCCC
aabbCC aaabBBCCC (2) aabbcC aaabbbCCC aabbcc aaabbbcCC (2)
aaabbbccc
L (G) = {anbncn | n > 1}
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
The Chomsky Hierarchy
A hierarchy of grammars, the languages they generate, and the machines the accept those languages.
The Chomsky HierarchyType Language
NameGrammarName
RestrictionsOn grammar
Accepting Machine
0 RecursivelyEnumerable
Unrestricted re-writing system
None Turing Machine
1 Context-Sensitive Language
Context- Sensitive Grammar
For all →, ||≤||
Linear Bounded Automaton
2 Context- Free Language
Context- Free Grammar
For all →,Φ.
Push-Down Automaton(parser)
3 RegularLanguage
RegularGrammar
For all →,Φ, UΦU{}
Finite- State Automaton