CS453 Lecture Regular Expressions and Transition Diagrams 2
Should be done
Lab hours and Office hours
Sign up for the mailing list at , starting to send important info to list – http://groups.google.com/group/cs453-spring-2011
Read Ch 1 and skim Ch 2 through 2.6, read 3.3 and 3.4
Start working with the MiniSVG start up code – You should at least step through the existing code with a debugger before Thursday’s class
CS453 Lecture Regular Expressions and Transition Diagrams 3
Do Soon
Quiz 1 is due tonight, was posted Tuesday night
Continue working with the MiniSVG start up code – You should at least step through the existing code with a debugger before Thursday’s class
Send email to [email protected] if you would like to help put together Meggy Jr devices this Sunday
– 2-3pm in rm 425 – Will only be able to accommodate the first 8 students, still 3 slots left – Will open it up for grad students and other undergrads on Thursday – We will have one other session sometime next week Tuesday or Thursday night
HW1 due Wednesday
CS453 Lecture Regular Expressions and Transition Diagrams 4
Plan for Today
Interpreter and Compiler Structure, or Software Architecture
Overview of Programming Assignments – The MiniSVG interpreter and MeggyJava compiler we will be building.
CS453 Lecture Regular Expressions and Transition Diagrams 5
Structure of a Typical Compiler
“sentences”
Synthesis
optimization
code generation
target language
IR
IR code generation
IR
Analysis
character stream
lexical analysis
“words” tokens
semantic analysis
syntactic analysis
AST
annotated AST
interpreter
CS453 Lecture Regular Expressions and Transition Diagrams 6
Example MeggyJava program
import meggy.Meggy;
class PA3Flower {public static void main(String[] whatever){{
// Upper left petal, clockwise Meggy.setPixel( (byte)2, (byte)4, Meggy.Color.VIOLET ); Meggy.setPixel( (byte)2, (byte)1, Meggy.Color.VIOLET); … }}
CS453 Lecture Regular Expressions and Transition Diagrams 7
Atmel assembly for Meggy.setPixel() call …main:… call _Z18MeggyJrSimpleSetupv # Push constant 2 onto stack ldi r24, 2 push r24 # Push constant 4 onto stack ldi r24, 4 push r24 # Push Meggy.Color.VIOLET onto the stack. ldi r22, 6 push r22 # Pop the arguments into registers in reverse order. pop r20 pop r22 pop r24 call _Z6DrawPxhhh call _Z12DisplaySlatev
CS453 Lecture Regular Expressions and Transition Diagrams 8
Structure of the MeggyJava Compiler
“sentences”
Synthesis Analysis
character stream
lexical analysis
“words” tokens
semantic analysis
syntactic analysis
AST
AST and symbol table
code gen
Atmel assembly code
PA2: MeggyJava and Atmel warmup PA3: setPixel compiler PA4: add control flow PA5: add functions PA6: add variables and objects PA7: add arrays
CS453 Lecture Regular Expressions and Transition Diagrams 9
Structure of the MiniSVG Interpreter (PA1)
calls to report and draw routines
Analysis
character stream
lexical analysis
tokens
syntactic analysis
Rectangle: (0,0) 500x500 color: GREEN Circle: (120,150) radius:60 color: WHITE Circle: (350,150) radius:60 color: WHITE Circle: (120,150) radius:30 color: BLUE Circle: (350,150) radius:30 color: BLUE Circle: (120,150) radius:10 color: BLACK Circle: (350,150) radius:10 color: BLACK Circle: (250,300) radius:100 color: RED Line: (250,350) (350,350) color: BLACK Line: (0,100) (100,100) color: BLACK Line: (100,200) (200,200) color: BLACK Line: (300,400) (400,400) color: BLACK Line: (500,500) (500,500) color: BLACK Line: (50,100) (100,100) color: BLACK Line: (150,200) (200,200) color: BLACK Line: (200,400) (400,400) color: BLACK Line: (400,200) (400,200) color: BLACK Line: (450,200) (400,200) color: BLACK
CS453 Lecture Regular Expressions and Transition Diagrams 10
face.svg
<svg xmlns="http://www.w3.org/2000/svg"> <!-- THIS MAKES A FACE!!! --> <!-- From Stephanie and Kiley --> <!-- background --> <rect x = "0" y = "0" width = "500" height = "500" fill="green" />
<!-- eyes --> <circle cx="120" cy="150" r="60" fill="white" /> ...
<!-- mouth --> <circle cx="250" cy="300" r="100" fill="red" /> <line x1="250" y1="350" x2="350" y2="350" stroke="black" />
<!-- hair --> ... </svg>
CS453 Lecture Regular Expressions and Transition Diagrams 11
MiniSVG Renderer
<svg xmlns="http://www.w3.org/2000/svg">
<!-- rectangles --> <rect x="20" y=" 20" width="300" height="250" fill="red" /> <rect x="30" y="20" width="300" height="250" fill="blue" /> <rect x = "40" y = "20" width = "300" height = "250" fill="green" />
<!-- white circle on top of rectangles --> <circle cx="120" cy="150" r="60" fill="white" />
<!-- black diagonal line --> <line x1="0" y1="0" x2="300" y2="300" stroke="black" />
</svg>
About The Slides on Languages and Finite Automata
Slides Originally Developed by Prof. Costas Busch (2004) – Many thanks to Prof. Busch for developing the original slide set.
Adapted with permission by Prof. Dan Massey (Spring 2007) – Subsequent modifications, many thanks to Prof. Massey for CS 301 slides
Adapted with permission by Prof. Michelle Strout (Spring 2011) – Adapted for use in CS 453
A language is a set of strings
String: A finite sequence of letters
Examples: “cat”, “dog”, “house”, …
Defined over a fixed alphabet:
{ }zcba ,,,, …=!
Languages
Empty String
A string with no letters:
Observations:
!
"
!
" = 0
"w = w" = w
"abba = abba" = abba
Regular Expressions
Regular expressions describe regular languages
Example:
describes the language
!
(a | (b)(c)) *
!
L((a | (b)(c))*) = ",a,bc,aa,abc,bca,...{ }
Recursive Definition for Specifying Regular Expressions
!
", #, $
!
r1 | r2(r1)(r2)r1 *
r1( )
Are regular expressions
Primitive regular expressions:
2r1rGiven regular expressions and
Example Regular Expressions and Regular Definitions
Keywords
Operations
Identifiers
Numbers
CS453 Lecture Regular Expressions and Transition Diagrams 17
Finite Automaton
Input
String Output
String
Finite Automaton
Finite Accepter
Input
“Accept” or “Reject”
String
Finite Automaton
Output
Transition Graph
initial state
final state “accept” state
transition
Abba -Finite Accepter
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
Initial Configuration
1q 2q 3q 4qa b b a
5q
a a bb
ba,
Input String a b b a
ba,
0q
Reading the Input
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b b a
ba,
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b b a
ba,
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b b a
ba,
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b b a
ba,
0q 1q 2q 3q 4qa b b a
Output: “accept”
5q
a a bb
ba,
a b b a
ba,
Input finished String Rejection
1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b a
ba,
0q
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b a
ba,
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b a
ba,
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
a b a
ba,
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,Output: “reject”
a b a
ba,
Input finished
The Empty String
1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q
!
"
1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q
Output: “reject”
Would it be possible to accept the empty string?
!
"
Another Example
a
b ba,
ba,
0q 1q 2q
a ba
a
b ba,
ba,
0q 1q 2q
a ba
a
b ba,
ba,
0q 1q 2q
a ba
a
b ba,
ba,
0q 1q 2q
a ba
a
b ba,
ba,
0q 1q 2q
a ba
Output: “accept”
Input finished Rejection
a
b ba,
ba,
0q 1q 2q
ab b
a
b ba,
ba,
0q 1q 2q
ab b
a
b ba,
ba,
0q 1q 2q
ab b
a
b ba,
ba,
0q 1q 2q
ab b
a
b ba,
ba,
0q 1q 2q
ab b
Output: “reject”
Input finished
Which strings are accepted?
Formalities
Deterministic Finite Accepter (DFA)
( )FqQM ,,,, 0!"=
Q!
!
0q
F
: set of states
: input alphabet
: transition function
: initial state
: set of final states
Input Alphabet !
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
{ }ba,=!
ba,
Set of States Q
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
{ }543210 ,,,,, qqqqqqQ =
ba,
Initial State 0q
1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q
Set of Final States F
0q 1q 2q 3qa b b a
5q
a a bb
ba,
{ }4qF =
ba,
4q
Transition Function !
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
QQ !"#:$
ba,
( ) 10, qaq =!
2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q 1q
( ) 50, qbq =!
1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
0q
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
ba,
( ) 32, qbq =!
Transition Function !
0q 1q 2q 3q 4qa b b a
5q
a a bb
ba,
! a b0q
1q
2q
3q
4q
5q
1q 5q
5q 2q5q 3q
4q 5q
ba,5q5q5q5q
CS453 Lecture Regular Expressions and Transition Diagrams 54
Transition Diagrams
Definition - A finite automata specialized for lexical analysis
Differences from finite state automata - Finding more than one string in a single input stream
- Do not accept until hit a character with no out transition - Asterisk notation indicates the need to put last character back in input
- Do not show sink states
0q 1q 2q 3q 4qa b b a
Initial Configuration
Input String a b b a
0q
a a a a
1q 2q 3q 4qa b b a
b b $ b b
0q 1q 2q 3q 4qa b b a
a b b a a a a ab b $ b b
Recognize token abba when attempt to process 3rd a
token
a b b a a a a ab b $ b b
0q 1q 2q 3q 4qa b b a
a b b a a a a ab b $ b b
1q 2q 3q 4qa b b a0q
token token token
a b b a a a a ab b $ b b
0q 1q 2q 3q 4qa b b a
!
q5 *ws
a b b a a a a ab b $ b b
0q 1q 2q 3q 4qa b b a
!
q5 *ws
token
a b b a a a a ab b $ b b
1q 2q 3q 4qa b b a
!
q5 *ws
0q
a b b a a a a ab b $ b b
1q 2q 3q 4qa b b a0q
token
!
q5 *ws
CS453 Lecture Lexical Analysis with JLex 63
Demonstration of Longest Match and Priority