Transcript

Introduction to Language Theory

Prepared by

Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida

Programming Language Translators

Introduction to Language TheoryDefinition: An alphabet (or vocabulary) Σ is a

finite set of symbols.

Example: Alphabet of Pascal:+ - * / < … (operators)begin end if var (keywords)<identifier> (identifiers)<string> (strings)<integer> (integers); : , ( ) [ ] (punctuators)

Note: All identifiers are represented by one symbol, because Σ must be finite.

Introduction to Language Theory

Definition: A sequence t = t1t2…tn of symbols from an alphabet Σ is a string.

Definition: The length of a string t = t1t2…tn (denoted |t|) is n. If n = 0, the string is ε, the empty string.

Definition: Given strings s = s1s2…sn and

t = t1t2…tm, the concatenation of s and t, denoted st, is the string s1s2…snt1t2…tm.

Introduction to Language Theory

Note: εu = u = uε, uεv = uv, for any strings u,v (including ε)

Definition: Σ* is the set of all strings of symbols from Σ.

Note: Σ* is called the reflexive, transitive closure of Σ.

Σ* is described by the graph (Σ*, ·), where “·” denotes concatenation, and there is a designated “start” node, ε.

Introduction to Language TheoryExample: Σ = {a, b}.

(Σ*, ·)

Σ* is countably infinite, so can’t compute all of Σ*, and can only compute finite subsets of Σ*, but can compute whether a given string is in Σ*.

ε

a

b

aa

ab

ba

bb

aba

abba

b

ba

a

b

a

b

Introduction to Language Theory

Example: Σ = Pascal vocabulary. Σ* = all possible alleged Pascal

programs, i.e. all possible inputs to Pascal compiler.

Need to specify L Σ*, the correct Pascal programs.

Definition: A language L over an alphabet Σ is a subset of Σ*.

Introduction to Language Theory

Example: Σ = {a, b}.L1 = ø is a languageL2 = {ε} is a languageL3 = {a} is a languageL4 = {a, ba, bbab} is a languageL5 = {anbn / n >= 0} is a language

where an = aa…a, n timesL6 = {a, aa, aaa, …} is a language

Note: L5 is an infinite language, but described finitely.

Introduction to Language Theory

THIS IS THE MAIN GOAL OF LANGUAGE SPECIFICATION :

To describe (infinite) programming languages finitely, and to provide corresponding finite inclusion-test algorithms.

Language Constructors

Definition: The catenation (or product) of two languages L1 and L2, denoted L1L2, is the set

{uv | uL1, vL2}.

Example: L1 = {ε, a, bb}, L2 = {ac, c}

L1L2 = {ac, c, aac, ac, bbac, bbc}

= {ac, c, aac, bbac, bbc}

Language Constructors

Definition: Ln = LL…L (n times), and L0 = {ε}.

Example: L = {a, bb} L3 = {aaa, aabb, abba, abbbb, bbaa, bbabb, bbbba, bbbbbb}

Language ConstructorsDefinition: The union of two languages L1 and L2 is

the set L1 L2 = {u | uL1} { v | vL2}

Definition: The Kleene star (L*) of a language is the set L* = U Ln, n >0.

Example: L = {a, bb} L* = {any string composed of a’s and

bb’s}

Definition: The Transitive Closure (L+) of a language L is the set L+ = U Ln, n > 1.

∩ ∩

Language Constructors

Note: In general, L* = L+ U {ε}, but L+ ≠ L* - {ε}.

For example, consider L = {ε}. Then {ε} = L+ ≠ L* – {ε} = {ε} – {ε} = ø.

Grammars

Goal: Providing a means for describing languages finitely.

Method: Provide a subgraph (Σ*, →*) of (Σ*, ·), and a start node S, such that the set of reachable nodes (from S) are the strings in the language.

Grammars

Example: Σ = {a, b}

L = {anbn / n > 0}

ε

a

b

aa

ab

ba

bb

aab

aaa

bbb

bba

aaba

bbaa

bbab

aabb

b

a

b

a

b

a

a

b

bb

a

a

a

b

Grammars

“=>” (derives) is a relation defined by a finite set of rewrite rules known as productions.

Definition: Given a vocabulary V, a production is a pair (u, v) V* x V*, denoted u → v. u is called the left-part; v is called the right-part.

Grammars

Example: Pseudo-English.V = {Sentence, NP, VP, Adj, N, V, boy, girl, the, tall, jealous, hit, bit}

Sentence → NP VP (one production)NP → NNP → Adj NPN → boyN → girlAdj → theAdj → tallAdj → jealousVP → V NPV → hitV → bit

Note: English is much too complicated to be described this way.

Grammars

Definition: Given a finite set of productions P V* x V* the relation => is defined such that

, β, u, v V* , uβ => vβ iff u → v P is a production.

Example: Sentence → NP VP Adj → the NP → N Adj → tall NP → Adj NP Adj → jealous N → boy VP → V NP N → girl V → hit

V → bit

Grammars

Sentence => NP VP=> Adj NP VP=> the NP VP=> the Adj NP VP=> the jealous NP VP=> the jealous N VP=> the jealous girl VP=> the jealous girl V NP=> the jealous girl hit NP => the jealous girl hit Adj NP=> the jealous girl hit the NP=> the jealous girl hit the N => the jealous girl hit the

boy

GrammarsDefinition: A grammar is a 4-tuple G = (Φ, Σ, P, S) where

Φ is a finite set of nonterminals, Σ is a finite set of terminals, V = Φ U Σ is the grammar’s vocabulary, S Φ is called the start or goal symbol, and P V* x V* is a finite set of productions.

Example: Grammar for {anbn / n > 0}.

G = (Φ, Σ, P, S), where Φ = {S}, Σ = {a, b}, and P = {S → aSb, S → ε}

Grammars

Derivations: S => aSb => aaSbb => aaaSbbb => aaaaSbbbb → …

ε ab aabb aaabbb aaaabbbb

Note: Normally, grammars are given by simply listing the productions.

=> => =>=> =>

Grammar Conventions

TWS convention

1. Upper case letter (identifier) – nonterminal2. Lower case letter (string) – terminal3. Lower case greek letter – strings in V*4. Left part of the first production is assumed to

be the start symbol, e.g.S → aSbS → ε

5. Left part omitted if same as for preceeding production, e.g.S → aSb → ε

GrammarsExample: Grammar for identifiers.

Identifier → Letter→ Identifier Letter→ Identifier Digit

Letter → ‘a’ → ‘A’ → ‘b’ → ‘B’

.

.→ ‘z’ → ‘Z’

Digit → ‘0’→ ‘1’..→ ‘9’

Grammars

Definition: The language generated by a grammar G, is the set L(G) = { Σ* | S =>* }

Definition: A sentential form generated by a grammar G is any string α such that S =>* .

Definition: A sentence generated by a

grammar G is any sentential form such that Σ*.

GrammarsExample:

sentential forms

S => aSb => aaSbb => aaaSbbb => aaaaSbbbb > … ε ab aabb aaabbb aaaabbbb

Lemma: L(G) = { | is a sentence}

Proof: Trivial.

=> => => =>=>sentences

Grammars

Example: A → aABC→ aBC

aB → ab bB → bb bC → bc CB → BC

cC → cc

GrammarsDerivations: A => aABC => aaABCBC => …

aBC aaBCBC aaaBCBCBC abC aabCBC aaaBBCBCC abc aabBCC aaaBBBCCC

aabbCC aaabBBCCC (2) aabbcC aaabbbCCC aabbcc aaabbbcCC (2)

aaabbbccc

L (G) = {anbncn | n > 1}

=>

=>

=>

=>

=>

=>

=>

=>

=>

=>

=>

=>

=>

=>

=>

=>

The Chomsky Hierarchy

A hierarchy of grammars, the languages they generate, and the machines the accept those languages.

The Chomsky HierarchyType Language

NameGrammarName

RestrictionsOn grammar

Accepting Machine

0 RecursivelyEnumerable

Unrestricted re-writing system

None Turing Machine

1 Context-Sensitive Language

Context- Sensitive Grammar

For all →, ||≤||

Linear Bounded Automaton

2 Context- Free Language

Context- Free Grammar

For all →,Φ.

Push-Down Automaton(parser)

3 RegularLanguage

RegularGrammar

For all →,Φ, UΦU{}

Finite- State Automaton

Language Hierarchy

3: Regular Languages

{an | n > 0}

2: Context-free Languages

1: Context-Sensitive Languages

{anbn | n>0}

{anbncn | n>0}

0: Recursively Enumerable Languages

English?

We will deal with type 2 (syntax) and type 3 (lexicon) languages.


Top Related