master project abdullah sheneamer mscs graduate candidate fall 2012 dcspm: develop and compile...

30
MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master project presentation 11/xx/2012

Upload: austin-brooks

Post on 12-Jan-2016

225 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

1

M A S T E R P R O J E C T

A B D U L L A H S H E N E A M E R

M S C S G R A D U AT E C A N D I D AT E

FA L L 2 0 1 2

DCSPM: Develop and Compile Subset of PASCAL Language to MSIL

Abdullah Sheneamer Master project presentation

11/xx/2012

Page 2: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

2

Outline

Introduction to MSIL Related Works Why PASCAL to MSIL PASCAL Compiler Lexical Analyzer Design Symbol Table Design Parser and MSIL Design Improvements Evaluations Lesson Learned Future Work Conclusion

Abdullah Sheneamer Master project presentation 11/xx/2012

Page 3: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

3

Introduction to MSIL

Microsoft intermediate language(MSIL) is the lowest-level  human readable programming language defined by the  Common Language Infrastructure (CLI) specification and .NET Framework

(MSIL) includes instructions for loading, storing, initializing, and calling methods on objects, as well as instructions for arithmetic and logical operations.

Abdullah Sheneamer Master project presentation 11/xx/2012

Page 4: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

4

Related Works

11/xx/2012Abdullah Sheneamer Master project presentation

“The Design and Implementation of C-like Language Interpreter” [XX11]

The authors presented a paper designs and implements a C-like language interpreter using C++ based on the idea of modularity. The function of lexical analyzer is to read character strings from the source program, split them into separate words, and constructs the internal expression of these words, that is, TOKEN. The basic idea of lexical analyzer design is: first, to judge the start and the end position of a word; second, to judge the attribute of a word. After a word is separated, the next thing is to determine its attribute

“Simple Calculator Compiler Using Lex and YACC” [Upad11] The author presented a paper containing the details of how one can develop the simple

compiler for procedural language using Lex (Lexical Analyzer Generator) and YACC (Yet Another Compiler-Compiler). Lex tool helps write programs whose control flow is directed by instances of regular expressions in the input stream.

Page 5: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

5

Why PASCAL to MSIL

- Allow PASCAL to run on .NET platform - Study how compiler in .NET environment

work - PASCAL can now be run on modern

machines - MSIL is platform independent - JIT compilers can be optimized for

specific machines and architectures

Abdullah Sheneamer Master project presentation

11/xx/2012

Page 6: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

6

PSCAL Compiler

Compilation process: takes a PASCAL source code and produce (MSIL) Microsoft intermediate language.

Execution process: MSIL must be converted to CPU-specific code, usually by a just-in-time(JIT) Compiler . Native code is computer programming (code) that is compiled to run with a particular processor (such as an Intel x86- class processor) and its set of  instructions.

Abdullah Sheneamer Master project presentation 11/xx/2012

Page 7: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

Abdullah Sheneamer Master project presentation

7

11/xx/2012

Page 8: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

8

Compilation Process

Lexical Analysis

Parser & MSIL

Symbol Table

Error Handler

PASCAL Source Code

Abdullah Sheneamer Master project presentation 11/xx/2012

MSIL Code

Output

Page 9: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

9

Lexical Analyzer Design

11/xx/2012Abdullah Sheneamer Master project presentation

After reading next character from input stream ;

State 0 : identify the current token and decide the next state ;

State 1 : Handle identifiers and keywords.

State 2: Handle Number .

State 3 : Handle one – character token or two –character token .

State 4,5 : Handle Comments “\\” or “\*”, skip the line start with “\\” or skip the data between “\*” and “*\”.

Page 10: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

10

Lexical Analyzer Design (cont.)

11/xx/2012Abdullah Sheneamer Master project presentation

Begin -/-1 lexbuf=“”

2- state=0;

INITIAL0

WhiteSpace/ No Action

Letter Or @ Or _/Place it in

lexbuf

Letter Or Digit

/Place it in lexbuf

ID1

Anything Else/ 1- return that last char into the input

stream. 2- search the lexbuf in Symbol.3- insert it as ID if not found otherwise get the row number P. 4- build the

token as: [code=sympol[p,token],

[attr=p]5. Enqueue the token and

set lexbuf=“”.

Anything Else/ 1- return that last char

into the input stream. 2- Build the token as :

[code: NUM, attr: value]

3. Enqueue the token and set lexbuf=“”.

NUM2

Digit/Place it in lexbuf

Letter Or @ Or _/Place it in

lexbuf

Page 11: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

11

Lexical Analyzer Design (cont.)

11/xx/2012Abdullah Sheneamer Master project presentation 4/10/2012Abdullah Sheneamer Master project

INITIAL0

Unrelated Chararcter1- Return last char into input

stream.2- Build the token:

[ Code=ASCII(first char in lexbuf); attr=-1]

3- lexbuf=“”; state=0;4- Return the token to the

parser.

One or Two Char

3

Sequence is”//”/

state=4;

Anything else/Place it in

lexbuf

Sequence is”*/”/

lexbuf=“”; state=0;

Sequence is”/*”/

lexbuf=“”; state=5;

Other character: 1- Place it in lexbuf. 2- Get the code for the

two charcter token in lexbuf. 3- Build the

token:[code = obtained code; attr=-

1]. 4- lexbuf=“”; state=0. 5- Return

the token to the parser

Multiple line

comment5Single line

comment4

New line/ lexbuf=“”; state=0;

Anything else/Place it in lexbuf

Page 12: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

12

Symbol Table Design

11/xx/2012Abdullah Sheneamer Master project presentation

Every key word is a token and has a unique integer code The identifier token has a code 256 The number token has a code 257 For every special character is a token and has an integer token code equals its

ASCII number. Tokens of two characters have unique to Codes

Token Code Keyword

300 Begin

323 If

302 For

305 Switch

376 While

Token Code Tow – Characters Tokens

406 !=

407 ==

408 <=

409 >=

Page 13: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

13

Parser and MSIL Design

11/xx/2012Abdullah Sheneamer Master project presentation

The parser is used the most of PASCAL Grammar BNF [22]

Such as nested if/else and if logic expression statement.

Page 14: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

14

Parser and MSIL Design (Cont.)

11/xx/2012Abdullah Sheneamer Master project presentation

Page 15: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

15

Parser and MSIL Design (Cont.)

11/xx/2012Abdullah Sheneamer Master project presentation

Page 16: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

16

Improvements

11/xx/2012Abdullah Sheneamer Master project presentation

Two Improvements in DCSPM Compiler: 1- Lexical Analysis Improvement

Array List Dictionary

Page 17: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

17

Improvements (Cont.)

11/xx/2012Abdullah Sheneamer Master project presentation

2- MSIL Code Output Improvement

Simple Pascal Code:

begina:=0; b:=1; c:=2;if( a== 0) then begin a:= b+c; end; end;end.

IL_0000: ldc.i4.0 IL_0001: stloc.0 IL_0002: ldc.i4.1 IL_0003: stloc.1 IL_0004: ldc.i4.2 IL_0005: stloc.2 IL_0006: ldloc.0 IL_0007: ldc.i4.1 IL_0008: ceq IL_000a: stloc.3 IL_000b: ldloc.3 IL_000c: brfalse.s IL_0012 IL_000e: ldloc.1 IL_000f: ldloc.2 IL_0010: add IL_0011: stloc.0 IL_0012: ret

IL_0000: ldc.i4.0 IL_0001: stloc.0 IL_0002: ldc.i4.1 IL_0003: stloc.1 IL_0004: ldc.i4.2 IL_0005: stloc.2 IL_0006: ldloc.0 IL_0007: ldc.i4.1 IL_0008: ceq IL_000a: ldc.i4.0  IL_000b: ceq IL_000d: stloc.3 IL_000e: ldloc.3 IL_000f: brtrue.s IL_0015 IL_0011: ldloc.1 IL_0012: ldloc.2 IL_0013: add IL_0014: stloc.0 IL_0015: ret

Page 18: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

18

Evaluations

11/xx/2012Abdullah Sheneamer Master project presentation

1- Array list data structure vs. Dictionary data structure

11 22 33 44 55 66 77 88 990

1

2

3

4

5

6

7

8

9

10

Array List

Dictionary

Page 19: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

19

Evaluations (cont.)

11/xx/2012Abdullah Sheneamer Master project presentation

Collection Ordering Contiguous Storage?

Direct Access? Lookup Efficiency

ManipulateEfficiency

Notes

Dictionary Unordered Yes Via Key Key:O(1)

O(1) Best for high performance lookups.

ArrayList User has precise control over element ordering

Yes Via Index O(n) O(n) Best for smaller lists

Complexity of Array list vs. Dictionary

Page 20: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

20

Evaluations (cont.)

11/xx/2012Abdullah Sheneamer Master project presentation

2- Parser phase test

11 22 33 44 55 66 77 88 990

2

4

6

8

10

12

14

16

Parser Phase

Parser Phase

# lines of Pascal code

Tim

e m

s

Page 21: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

21

Evaluations (cont.)

11/xx/2012Abdullah Sheneamer Master project presentation

3- Initial and Improved nested If/else MSIL Code

11 22 33 44 55 66 77 88 990

2

4

6

8

10

12

14

16

18

if/else MSIL results

unimprove MSIL code improve MSIL code

# lines of Pascal Code

Tim

e m

s

Page 22: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

22

Evaluations (cont.)

11/xx/2012Abdullah Sheneamer Master project presentation

Size of Initial and Improved nested if/else MSIL Code

11 22 33 44 55 66 77 88 990

2

4

6

8

10

12

14

16

Size of initial and improve if/els MSIL

Unimprove SizeImprove Size

# lines of Pascal Code

Siz

e/k

b

Page 23: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

23

Lessons Learned

11/xx/2012Abdullah Sheneamer Master project presentation

ildasm.EXE: Converts IL to human readable code tool C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin

ILASM.EXE: Converts human readable code to IL toolC:\WINDOWS\Microsoft.NET\Framework\v1.1.4322

Or C:\Windows\Microsoft.NET\Framework\v2.0.50727

Date Time and Time SpanDateTime Start = DateTime.Now;

lex(); TimeSpan Elapsed = DateTime.Now- Start;speed = "Time Elapsed of Lexical Analysis: " + Elapsed.TotalMilliseconds + "ms";

Stopwach classSystem.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch();

Stopwatch stopwatch = new Stopwatch();

Stopwatch.Start();

lex();

stopwatch.Stop();

speed = "Time Elapsed of Lexical Analysis: " + Elapsed.TotalMilliseconds + "ms";

Page 24: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

24

Lessons Learned (cont.)

11/xx/2012Abdullah Sheneamer Master project presentation

Nested if/else logic statement

Page 25: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

25

Future Works

11/xx/2012Abdullah Sheneamer Master project presentation

Many statements and data structures of Pascal language are yet to be supported and related MSIL generated:

1- complicated case statement.2- if logic of a complex condition with multiple levels 3- assert statement 4- exit statement 5- goto statement6- repeat statement 7- next statement 8- complicated one dimensional array, 9- two dimensional array data structure 10- queue data structure 11- stack data structure

Page 26: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

26

Conclusion

11/xx/2012Abdullah Sheneamer Master project presentation

The DCSPM compiler is useful to legacy Pascal to run on modern machines and its MSIL is a platform independent. MSIL code is verified for safety during runtime and MSIL can be executed in any environment supporting the CLI (Common Language Infrastructure).

One dimensional array has two cases when compiling to MSIL. First, when the array has one element or 2 elements will be the same looks like the MSIL of other statements ( if/else/while….etc)

The initial lexical analysis is using array list data structure in symbol table and the improved lexical analysis which is using a dictionary data structure in symbol table too. So, when I had tested the two situations by Stopwatch class.

Page 27: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

27

Conclusion (cont.)

11/xx/2012Abdullah Sheneamer Master project presentation

A batch timer.cmd file to calculate time of MSIL results.

Improved nested if/else statement faster than initial nested if/else statement, although both of them have the same results.

The experiences learned in this project can serve as a foundation for developing new programming language.

Page 28: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

28

Demo & Questions

11/xx/2012Abdullah Sheneamer Master project presentation

http://cs.uccs.edu/~gsc/pub/master/asheneam/src/COMPILER/bin/Debug/

Page 29: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

29

Bibliography

[MC5tk]: http://msdn.microsoft.com/en-us/library/c5tkafs1(v=vs.71).aspx [XX11]: Xiaohong Xiao and You Xu “The Design and Implementation of C-like Language Interpreter” Proceedings of  2nd International Symposium on  Intelligence Information Processing and 

Trusted Computing (IPTC), pp. 104-107, 2011 [Upad11]: Mohit Upadhyaya “Simple Calculator Compiler Using Lex and YACC” Proceedings of 3 rd IEEE

Interenational Conference on Elecronic Computer Technology (ICECT), Vol. 6, pp. 182-187, 8-10 April 2011 [DLNYM]: C# To Program By H.M Deitel & P.J.Deitel& J.Listfield & T.R. Nieto & C.Yaeger & M.Zlatkina. [L97]: Compiler Construction principles and practice by Kennth C.louden [MN11]: Data Structure using Java By D.S.Malik & P.S.Nair. [L06]: An introduction to formal languages and automata. Fourth Edition.  Peter Linz [ASU11]: Compilers Principles, Techniques and Tools (2nd Edition) Alfred V. Aho, Monica S. Lam , Ravi Sethi

, Jeffrey D. Ullman [AL09]: Develop a Compiler in Java for a Compiler Design Course Abdul Sattar and Torben Lorenzen [Assembly11]: Guide to assembly language [electronic resource] : a concise introduction / James T. Streib.

Streib, James T. London ; New York : Springer, c2011. [WFRBE89-90]: Using a Stack Assembler Language in a Compiler Course by Dr. Gerald Wildenberg St . John

Fisher College, Rochester, NY Bristol Polytechnic, England (1989-1990 )

Abdullah Sheneamer Master project presentation 11/xx/2012

Page 30: MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master

30

Bibliography (cont.)

11/xx/2012Abdullah Sheneamer Master project presentation

[ LS56]: Expert .NET 2. IL assembler/ Serge Lidin. Lidin, Serge. 1956- Berkeley, CA [CodeProject]: http://www.codeproject.com/Articles/3778/Introduction-to-IL-Assembly-Language [MHt8e]: http://msdn.microsoft.com/en-us/library/ht8ecch6(v=vs.71) [ K08]:Pro C# 2008 and the .NET 3.5 Platform, Fourth Edition [ CodeMSIL]: http://www.codeguru.com/csharp/.net/net_general/il/article.php/c4635/MSIL-Tutorial.htm [WikiPascal]: http://en.wikipedia.org/wiki/Pascal_(programming_language) [PagesCs]: http://pages.cs.wisc.edu/~fischer/cs536.s08/lectures/Lecture02.4up.pdf [MArraylist]: http://msdn.microsoft.com/en-us/library/system.collections.arraylist.aspx [MKx37]:http://msdn.microsoft.com/en-us/library/kx37x362.aspx [WikiExpr]:http://en.wikipedia.org/wiki/Microsoft_Visual_Studio_Express#Visual_C.23_Express [DllAssem]: http://dll-repair-tools.com/dll-files/fusiondll-the-assembly-manager [learnExp]:http://www.learnvisualstudio.net/start-here/lesson-1-1-installing-visual-c-2010-express-edition/ ) [SeasPascal]: http://www.seas.gwu.edu/~hchoi/teaching/cs160d/pascal.pdf [GeekClass]:

http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx

[DotArray]: http://www.dotnetperls.com/arraylist [Ecma]: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf