low level virtual machine c# compiler senior project proposal

30
ASSUMPTION UNIVERSITY Faculty of Science and Technology Low Level Virtual Machine C# Compiler Senior Project Proposal In partial fulfillment of the course SC4299 Senior Project Semester 1 / Year 2009 GROUP MEMBERS Prabir Shrestha (4915302) Myo Min Zin (4845411) Napaporn Wuthongcharernkun (4846824) COMMITTEE MEMBERS Dr. Songsak Channarukul A. Se Won Kim ADVISOR Dr. Kwankamol Nongpong

Upload: prabirshrestha

Post on 12-Nov-2014

8.154 views

Category:

Documents


0 download

DESCRIPTION

LLVM IR generator for subset of C# compiler

TRANSCRIPT

Page 1: Low Level Virtual Machine C# Compiler Senior Project Proposal

ASSUMPTION UNIVERSITY Faculty of Science and Technology

Low Level Virtual Machine C# Compiler

Senior Project Proposal

In partial fulfillment of the course SC4299 Senior Project

Semester 1 / Year 2009

G ROUP MEMBERS

Prabir Shrestha (4915302) Myo Min Zin (4845411)

Napaporn Wuthongcharernkun (4846824)

COMM ITTEE MEMBERS

Dr. Songsak Channarukul A. Se Won Kim

ADVISOR

Dr. Kwankamol Nongpong

Page 2: Low Level Virtual Machine C# Compiler Senior Project Proposal

Table of Contents

1 Introduction ............................................................................................................ 1

1.1 Motivation ....................................................................................................... 2

1.2 Problem Statement .......................................................................................... 3

1.3 Objectives ........................................................................................................ 5

2 Literature Review ................................................................................................... 6

2.1 Source Language Background ........................................................................ 6

2.2 LLVM Description .......................................................................................... 6

2.3 Contributions to C# ......................................................................................... 7

3 Scope ...................................................................................................................... 9

3.1 Keywords ........................................................................................................ 9

3.2 Operators and Special Characters ................................................................... 9

4 The Framework..................................................................................................... 12

4.1 Scanner .......................................................................................................... 13

4.2 Parser ............................................................................................................. 13

4.3 Semantic Analyzer ........................................................................................ 17

4.4 Code Generator ............................................................................................. 18

4.5 Assembling and Linking ............................................................................... 21

5 Gantt Chart ........................................................................................................... 23

6 References ............................................................................................................ 24

7 Appendix .............................................................................................................. 25

7.1 LLVM C# Compiler EBNF .......................................................................... 25

Page 3: Low Level Virtual Machine C# Compiler Senior Project Proposal

List o f Figu res

Figure 1-A: Compilation Phases .................................................................................... 2

Figure 4-A: Overall Process of LLVM C# Compiler .................................................. 12

Figure 4-B: Custom Coco/R function .......................................................................... 15

Figure 4-C: Sample AST Nodes .................................................................................. 16

Figure 4-D: Sample AST Binary Nodes ...................................................................... 16

Figure 4-E: Sample AST Loop Nodes ......................................................................... 17

Figure 4-F: Semantic Error Code Fragment ................................................................ 18

Figure 4-G: Sample C# Code Fragment ...................................................................... 18

Figure 4-H: LLVM IR Equivalent of the C# Code Fragment ..................................... 19

Page 4: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

1

1 Int roduction

Modern programming languages today give us a means of expressivity for

applications in a variety of ways, through varying means.

The developer‟s choice of language in constructing an application, first and foremost,

could almost instantly convey to us information about the purpose of the system

design.

There are a myriad of classifications of styles of programming languages, from

logical, imperative, and functional to object-oriented styles of programming. The

wide mainstream use and popularity of object-oriented programming languages we

believe is due to its ability to effectively and easily model the real world objects and

their functionalities that we see around us in a way that machines can understand.

Modern high-level languages such as the source language we have focused on, C#,

more often than not contains a combination of all the above listed programming

paradigms. In the newer versions that have been released, an increased ease of use in

functionalities have been deployed in several areas such as generics, Language

Integrated Queries (Linq), and anonymous functions to name a few.

However the focus of our project will be primarily on the basic object-oriented

elements of the language which will capture the core-constructs of the syntax and

semantics of our source language.

Diversity in alternative usage is another factor of importance when there are large

communities of users for a particular language. To further this reason an alternative

method of deploying and compiling a C# application is primarily our objective in this

project. Large existing compiler frameworks are widely in use for the C# language

Page 5: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

2

such as Microsoft's .NET and Mono. These systems are sometimes however bulky

due to the sets of features it provides even for those which developers would not be

using. Therefore the practicality and usefulness of our project is seen as a small

portable tool for developers of C# applications.

The core objective of this project is to create a compiler for the C# language that

generates a portable intermediate representation of low level code, which can then be

used across a wide variety of architectures and operating systems with minimal or no

code modification to the original source. In order to accomplish the task, Low Level

Virtual Machine Intermediate Representation (LLVM IR) has been chosen as the

target code output generated by the compiler due to its nature of independence.

F i g u r e 1 - A : C o m p i l a t i o n P h a s e s

1.1 Motivation

From different contributions and evolutions to the compiler technologies and

programming paradigms we had motivations to pursue in the creation of a new C#

compiler.

Distributing the binaries created by the C# compilers requires us to install the bulky

.NET Framework. Even a traditional “helloworld” program would require all the

features of .NET Framework to be installed. To solve this problem we have taken the

Page 6: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

3

approach of C and C++ which link the appropriate libraries required to the program

successfully.

D Language has also been one of the major inspirations, providing the programmers

with features of modern languages such as automatic memory management by

garbage collection, interfaces and yet producing high performance codes to enable

system programming [1] such as system drivers and even operating systems.

Writing of operating system has been evolving throughout the past decades from

assembly codes to high level languages such as C and C++. There have been many

other projects such as SharpOS [2], Comos and even Microsoft‟s research operating

system – Singularity [3], which have taken a different approach by writing the kernel,

device drivers and application in managed code. The compilers of these operating

systems have been the motivation to create a C# compiler that produces native codes.

“Write Once, Run Anywhere” (WORA) slogan from Sun Microsystems has made us

think to generate a portable code which could be used over a wide variety of operating

system and computer architectures.

1.2 Problem Statement

The way we write programs have been evolving ever since the beginning of the stored

program concept and continue to evolve even at the present due to the advances in

hardware and software. From the introduction of Java and now the .Net framework,

the concept of virtual stack machine and Just in Time Compilation (JIT) has been

coming to popularity. One of the notable compilers which use this concept is C#. It

has been allowing the programmers to write compiled machine-independent codes

which could virtually be executed in any architecture.

Page 7: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

4

Even though Java byte-code and Common Language Infrastructure (CLI) consists of

highly machine independent code, it has not been a candidate for system

programming due to performance issues such as lack of speed as compared to other

languages such as C and C++ and due to the JIT. LLVM has a similar concept of JIT

by converting the code to a compiled LLVM bit code which could then be executed in

other architecture and operating system. In order to gain better performance for a

particular architecture or operating system, it could further be compiled to a native

code. As of writing, LLVM‟s retargettable code generator currently supports most of

the popular architectures such as x86, x86-64, PowerPC, PowerPC-64, ARM, Thumb,

SPARC, Alpha, CellSPU, PIC16 MIPS, MSP430, SystemZ and XCore [4].

While languages such as C and C++ provide better execution speed than compared to

C# and Java, programmers do have to face with unsafe codes such as manual memory

management which could lead to memory leak or dangling pointers. This memory

problem is usually solved by the use of garbage collection as seen in C# and Java. It

also introduces the concepts of delegates by avoiding the use of unsafe function

pointers.

As developers have been writing their codes, a set of common principles on the way

they write code have been evolving. Uses of accessors and mutators have been a

common way of accessing variable in the object oriented world rather than the use of

public variables. Many of these features have been addressed by C# language.

Because of features such as the memory management and the adhering to the

principles of writing a program, we have chosen C# as an input language for our

compiler.

Page 8: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

5

Migration to different platforms causes the programmers to write architecture specific

code to each of those platforms. Languages such as C and C++ do not have a straight

forward way to know the length of integer – 32 bit or 64 bit. But C# provides an

easier way to access it by using the inbuilt Int32 object.

1.3 Objectives

The objective of our project is to create a compiler for the C# language in which the

target language is in a form of low level independent language similar to assembly

code called Low Level Virtual Machine Intermediate Representation (LLVM IR).

The focus of our project will be primarily on each phase of the compilation process,

from scanning the source language until target code generation. These phases include

Lexical Analysis, Syntax Analysis, Semantic Analysis and Intermediate Code

Generator. Other phases such as assembling and linking will be handled by LLVM

tools.

The finalization and expected outcome of the project will be a compiler that is set to

be functional for the C# language specifications according to the designated scope of

the language that we determined.

The basic requirements for the compiler include the following:

The compiler will properly recognize the lexical structures of the C# language.

Check the syntax taking into account the correct grammar according to the

language specifications as well as the semantics of the program otherwise

generating errors accordingly.

Page 9: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

6

2 Literature Review

2.1 Source Language Background

C# is a high-level object-oriented programming language that is part of the .NET

language family developed by Microsoft. Although the language is considered to be

primarily object-oriented a closer look reveals that it is in fact a multi-paradigm

language with aspects of functional and imperative programming styles included in it

as well.

It is currently designed to function within the Common Language Infrastructure (CLI)

which provides a CTS (Common Type System) and CLS (Common Language

Specification) so that when it is compiled it generates the CIL (Common Intermediate

Language).

2.2 LLVM Description

Low Level Virtual Machine (LLVM) is a compiler infrastructure that consists of two

primary components, an optimizer and a code generator. It is designed so that

optimizations of programs can occur at different phases of the program life such as

compile-time, link-time and run-time [5].

LLVM IR (Intermediate Representation) is a low-level language similar to assembly

language containing RISC like instruction set that effectively captures the operations

of the processor whilst avoiding machine-specific constraints such as pipelines,

physical registers and other low-level calling conventions. By increasing the layer of

abstraction apart from the hardware specifics in the code, the LLVM IR is in a sense,

Page 10: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

7

platform independent and can be used on a variety of machines with different

hardware specifications.

The common code representation used throughout all phases of the LLVM

compilation strategy is a Single Static Assignment (SSA) based representation which

provides type safety, low-level operations and is flexible and capable of representing

high-level languages in a clear and efficient manner.

A key important factor contributing to the productivity of the LLVM system is its

virtual instruction set. The LLVM code is a low level representation while being able

to contain high-level information due to its designed structure.

2 . 3 C o n t r i b u t i o n s t o C #

Other C# compiler projects that are available apart from Microsoft's .NET framework

are discussed briefly here to give an overview of the relevant developments that have

surfaced in this particular field, these include Mono, Cosmos(IL2CPU) [6], Bartok

and Ensemble.

Mono is an open source implementation of the .NET framework, it contains a Mono

C# compiler that is written in C# and can be run on several different operating

systems such as Linux, UNIX, Mac OS X and Solaris. The concept of how it works is

first the C# code gets compiled into MSIL then the Mono JIT translates the MSIL into

native code at run time which is similar to as the original implementation of the .NET

framework by Microsoft.

Cosmos(C# Open Source Managed Operating System) is an OS that is written entirely

in C#, the OS makes use of IL2CPU which is an AOT(ahead-of-time) compiler that

Page 11: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

8

translates the CIL into machine code by outputting raw assembly files which then get

processed through NASM (Netwide Assembler).

Bartok was originally made for the use of the OS Singularity developed by Microsoft

Research. It works by translating CIL into native code by using three intermediate

representations, HIR (High-level IR), MIR (Medium-level IR) and LIR (Low-level

IR). At each of these representations starting from high-level it works its way down to

low-level IR and gradually changes the code representation at each phase until it

reaches the lowest level which is basically assembly, and then a standard linker puts

the objects together to create the native x86 executables.

Page 12: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

9

3 Scope

The scope from the language specifications has been determined for our project

according to the following listed keywords and operators, which is a subset of C#

version 1.0. We have chosen version 1 rather than the newer versions of C# because

we will not be supporting most of those new additional features such as Generics,

Language Integrated Query (Linq).

3 . 1 K e y w o r ds

3 . 2 O p e r a t o r s a n d Sp e c i a l C h a r a c t e r s

Primary

x.y

f(x)

a[x]

x++

x--

Unary

+

-

!

++x

--x

Relational

and type

testing

<

>

<=

Assignment

=

+=

-=

*=

/=

Multiplicative

& Conditional

*

/

&&

||

Base

bool

break

char

class

const

continue

do

else

enum

explicit

extern

false

float

for

get

if

implicit

In

int

namespace

new

null

operator

object

public

override

private

protected

return

sealed

set

sizeof

static

string

struct

this

true

typeof

using

virtual

void

while

value

is

Page 13: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

1 0

new

(T)x

>=

==

!=

Based on the ECMA-334 C# Language Specification [7], the value of char in C# is a

Unicode Character. Microsoft‟s implementation of .NET framework implements it as

16-bit characters that can be used to represent most of the known written languages in

the world. For our C# compiler we will not be implementing the original version but

rather, char will be the size of 8-bit which is the same as the standard C and C++. This

holds the same for string type.

C# Language Specification based on the ECMA-344 allows a distinct type for

enumeration type such as byte, sbyte, short, ushort and int. The compiler will only be

supporting 32-bit integer (int) as the enumeration type.

Microsoft .NET has provided base class libraries, which are the classes, structures,

enumerations and delegates, for C# programmers to deal I/O, accessing Database. In

our complier, we will be providing a subset of these libraries.

System.Array System.Char System.Random

System.Console System.Byte System.Boolean

System.Enum System.String

We will be providing our own libraries for the end user to assemble and link with the

output LLVM IR. The provided libraries will be performing most of the

functionalities of the above .NET libraries. Should there be any exceptional cases; the

user manual will also be provided.

Page 14: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

1 1

In our implementation of the compiler using declaratives can be used only at the top

of the file and cannot be placed inside the namespace block.

Only single dimension arrays will be supported.

Optimization would not be taken into consideration during the code generation of

LLVM IR.

Page 15: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

1 2

4 The Framework

The compiler will be written in C# language using the Microsoft Visual Studio and

Microsoft .NET framework.

Implementation of scanner and parser is done by the automatic scanner and parser

generator called Coco/R which is also written in C#. In order to make the generation

of scanner and parser easier we have also created a Coco/R plugin which can be used

directly from Visual Studio.

F i g u r e 4 - A : O v e r a l l P r o c e s s o f L L V M C # C o m p i l e r

Page 16: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

1 3

4.1 Scanner

Basically, Coco/R takes the attributed grammar of source language and generates a

scanner and recursive descent parser for this particular language. The scanner

generated by Coco/R reads the input stream and returns the stream of tokens to the

parser.

In a traditional overview of the compilation scanning and parsing process are seen as

two distinct separate processes occurring one after the other. However using the

COCO/R tool the scanner and parser generation occurs at the same time where the

scanner codes and parser codes are written in the same attributed grammar file ending

usually with .atg extension.

The scanner generator's purpose is to perform the lexical analysis on the source

language. What it does is it takes the syntax input of the program, tokenizes it and

checks for lexical errors. Tokenization refers to the process of categorizing the syntax

of the program into its basic building blocks which are tokens. Tokens usually include

identifiers, keywords, numbers and symbols; these are the fundamental building

blocks of a program.

4.2 Parser

The parser generator handles the syntax analysis for the source language. During the

syntax analysis phase the focus of concern is checking for the source input program's

adherence to the grammatical rules of the source language. There are two major

techniques for parsing, table driven and recursive descent. The Coco/R tool deploys

the recursive descent parsing technique.

Page 17: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

1 4

Recursive descent parsing is a well-known as top-down parsing technique that is

simple, convenient and accomplishes the task efficiently for the next sequenced

phase, semantic analysis to begin. The top-down parsing technique as the name

suggests starts constructing the parse tree from the top of the tree, the root and works

its way downwards, making predictions for each next token input as to which

production rule may be used, and adding them on to the parse tree. The control flow

of recursive descent parsing is strictly linear, no jumps, loops or conditional

statements are used. However recursive subroutines are in effect as that is a primary

characteristic of recursive descent parsing.

However in general for this parsing technique a basic requirement of the grammar is

that it should be in LL(1) form.

LL(1) is an abbreviation for left to right with left canonical derivations using only

one look-ahead symbol. The grammar of the source language which we have written

for our compiler however is not in LL(1) form, this then presents another factor into

the equation, there are a number of solutions that Coco/R uses for grammars that are

not in LL(1) form. They are typically termed 'Conflict Resolvers' and include the

following.

1. Multi-symbol Look ahead

2. Resolver Symbols

Multi-symbol Look ahead

In this technique the Coco/R generated parser uses two global variables that store the

last recognized terminal and the current look ahead symbol. When the need arises to

look ahead more than one symbol, the generated scanner does this by using the

Page 18: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

1 5

methods ResetPeek() and Peek(). The ResetPeek method initializes the peeking to

begin from the symbol after the current look-ahead symbol. The Peek method returns

the next symbol as a Token but does not remove it from the input stream, so these

symbols will be sent again by the scanner when parsing resumes.

To make it easier for us to look ahead more than one token ahead, we have created a

custom function called PeekutilizingResetPeek() and Peek() functions of Coco/R

which returns the n-th token after the current look ahead token.

F i g u r e 4 - B : C u s t o m C o c o / R f u n c t i o n

Resolver Symbols

These are artificial tokens that are added into a separate section in the grammar to

help direct the parser in the correct way. They are inserted on- the-fly during parse

time as seen necessary by the resolution routine that is used by Coco/R. These

resolution routines are automatically put into the generated parser by Coco/R.

During the parsing phase, Abstract Syntax Tree (AST) is generated. All the AST

nodes inherit from a common class called AstNode. Some AstNodes implement

IAstExpression indicating it is an expression while some inherit from

IAstHasExpression allowing to retrieve multiple expressions for the particular node.

Page 19: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

1 6

F i g u r e 4 - C : S a m p l e A S T N o d e s

For simplification, AstBinaryExpression was created containing LeftOpearand and

RightOperand which return an object implementing IAstExpression. Other binary

expressions such as binary arithmetic, logical expression derive from

AstBinaryExpression.

F i g u r e 4 - D : S a m p l e A S T B i n a r y N o d e s

Page 20: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

1 7

F i g u r e 4 - E : S a m p l e A S T L o o p N o d e s

4.3 Semantic Analyzer

Semantic Analysis is the phase in the compilation process that follows after the

parsing phase.

Once the parsing and scanning phase has been completed this means that the source

code has been checked for lexical and syntax errors. The next step then is to check

that the program source code is semantically correct as well as not all program

properties can be expressed correctly using CFG form.

This task is aided by the semantic actions that are added onto the grammar in a format

that Coco/R supports.

For instance, types of errors that will be checked for during this phase are type

checks, scoping of variables, constant values not being changed, no redefinitions of

classes and methods within their respective scopes, initialization of variables and

fields.

Moreover, the source language C# does not allow the identifier to be used before it is

declared. Since C# is a strongly-typed, a language in which the type errors are

detected during compilation time; the compiler has to know the type information of a

certain identifier before it is used.

Page 21: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

1 8

If the compiler encounters the declaration of an identifier, it stores the type

information assigned to that identifier. In the later part of the program, when the

compiler examines the expression containing this identifier, it is verified by its type

information. For example, the follow fragment of C# is syntactically correct but

semantically wrong and will give a complier error.

F i g u r e 4 - F : S e m a n t i c E r r o r C o d e F r a g m e n t

In this example, the identifier x is used without being declared. When the compiler

encounters the expression x = 10, the type of the operands are compared and the

identifier x is checked if it is assignable. Since the identifier x is not declared before

this expression, the complier do not have the type information of x and will not be

able to perform any of these. Then, it will give a compile time error to the

programmer.

Once the semantic analysis process has been completed the source program is ready

to move on to the code generation phase.

4.4 Code Generator

After the creation of AST and passing the semantic analysis, appropriate LLVM IR

would be generated if no errors were encountered.

F i g u r e 4 - G : S a m p l e C # C o d e F r a g m e n t

Page 22: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

1 9

The above C# code fragment would be generated to LLVM IR similar to the

following code.

F i g u r e 4 - H : L L V M I R E q u i v a l e n t o f t h e C # C o d e F r a g m e n t

; Declares a global string constant

The comments in LLVM begin with a semi colon terminating at the end of the line.

declare i32 @printf(i8*, ...) nounwind

This line at the end of the code in the sample generated LLVM IR contains the

declaration of the function called printf which takes in the first parameter as a pointer

to integer of 8 bits along with varying number of arguments.

As our generated code requires the use of system calls to the operating system to print

the text using Console.WriteLine C# function, we need to support some mechanism to

notify the operating system about writing the text in LLVM IR. Other features such as

returning the operating system the exit code also requires the use of system calls. This

could be achieved by hardcoding the architecture and operating system specific

assembly code in the LLVM IR. But to achieve portability among different systems

Page 23: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

2 0

the code generator will make use of the Standard C Library which can be linked to the

generated LLVM code during the link phase.

Due to the existences of the printf function in Standard C Library, the body of the

printf function is not defined in the LLVM IR. Like a function in C# can be called

before the declaration of the function, LLVM IR too makes use of the same feature by

enabling to write the function definition before the actual calling of the function as

shown in the generated LLVM IR which is appended to the end of the code.

@.str = internal constant [4 x i8] c"%d\0A\00"

This code creates a global variable called .str , an array of 8 bits integer whose array

size is 4. @ denotes a global variable in LLVM. Since LLVM supports arbitrary bit

width for integer ranging from 1 bit to 231

-1 (approximately 8 million) explicit size

must be defined in integer type. (LLVM code generation does not support large

integer types to be used as function return types. The specific limit on how large a

return type the code generator can currently handle is target independent; currently it

is often 64 bits for 32-bit targets and 128 bits for 64-bit target. [8])The string variable

is integer of 8 bits due to the fact that the size of „char‟ in standard C is of 8 bits.

“%d\0A\00” is hexadecimal equivalent of “%d\n\0”. Special characters in LLVM are

escaped using “\xx” where xx is the ASCII code for the character in hexadecimal.

define void @PrintSquare(i32 %n) nounwind

{

; code omitted for brevity

}

The above block of code contains the function definition for PrintSquare function

accepting integer of 32 bits as parameter whose return type is void. „nounwind‟

keyword is added to inform that the function never returns the unwind or exception

control flow. In case the function does return, its runtime behavior is undefined.

Page 24: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

2 1

%n_addr = alloca i32

The above statement creates a local variable named n_addr and allocates memory in

the stack frame which automatically gets released when it is returned to the caller.

After the allocation of the memory the pointer to the allocated memory is returned

which is stored in the n_addr variable. „%‟ sign indicates the variable is local.

store i32 %n, i32* %n_addr

This statement copies the integer value of local value n to the memory location

pointed by n_addr variable.

%0 = load i32* %n_addr

The above code fragment copies the integer value of the memory pointed to the

memory location stored at n_addr variable to a local variable named 0 (zero). Variable

names which are numeric are referred as unnamed temporaries in LLVM.

%2 = mul i32 %0, %1

The mul i32 instruction performs multiplication on integer of 32 bits on local

unnamed temporary 0 and 1 and stores the value in unnamed temporary 2.

%4 = call i32 (i8*, ...)* @printf(i8*

getelementptr ([4 x i8]* @.str, i32 0, i32 0),

i32 %3) nounwind

The “getelementptr” instruction performs address calculation of the local variable .str

and doesn‟t access the memory. “call” instruction calls the function named printf and

passes the calculated memory location of the .str variable along with the integer value

stored at unnamed temporary 3.

4.5 Assembling and Linking

After the LLVM IR has been generated it is the user‟s responsibility to assemble and

link it further down to the appropriate binary executable. The LLVM IR generated by

our compiler can be compiled to LLVM bitcode. With the help of GNU binutils it

Page 25: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

2 2

could be further compiled to native code or be able to generate architecture specific

assembly code. These tools are open source and also can be executed on wide

varieties of architectures and operating systems. For windows, we will be using the

officially LLVM tools while for the Gnu binutils we will be using the one from

Mingw, as it provides direct compatibility with Windows.

Page 26: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

2 3

5 Gantt Chart

Page 27: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

2 4

6 Refer ences

[1]. Intro - D Programming Language - Digital Mars. D Programming Lauage -

Digital Mars. [Online] [Cited: August 5, 2009.] http://www.digitalmars.com/d/.

[2]. SharpOS Wiki. [Online] [Cited: August 14, 2009.] http://www.sharpos.org.

[3]. Singularity - Microsoft Research. [Online] [Cited: August 14, 2009.]

http://research.microsoft.com/en-us/projects/singularity/.

[4]. LLVM Compiler Infrastructure Project. LLVM Compiler Infrastructure Project.

[Online] [Cited: August 5, 2009.] http://llvm.org/Features.html.

[5]. Lattner, Chris and Adve, Vikram. The LLVM Compiler Infrastructure Project.

The LLVM Compiler Infrastructure Project. [Online] March 2004. [Cited: August 8,

2009.] http://llvm.org/pubs/2004-01-30-CGO-LLVM.pdf.

[6]. Cosmos. [Online] [Cited: August 14, 2009.] http://www.gocosmos.org.

[7]. [Online] [Cited: August 14, 2009.] http://www.ecma-

international.org/publications/files/ECMA-ST-WITHDRAWN/ECMA-

334,%201st%20edition,%20December%202001.pdf.

[8]. LLVM Assembly Language Reference Manual. The LLVM Compiler

Infrastructure Project. [Online] [Cited: August 10, 2009.]

http://llvm.org/docs/LangRef.html.

[9]. Standard ECMA-334 - C# Language Specification. [Online] [Cited: August 12,

2009.] http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-

334.pdf.

Page 28: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

2 5

7 Append ix

7 . 1 L L V M C # C o m p i le r E B NF

L L V M CS h a rp = { U s in gD e c l ar i t iv e } { N a me sp a c e Me m b er } . U s i n gD e c la r i ti v e = " u s i ng " Qu a l id e n t "; " . N a m e sp a c eM e m be r = ( " n a me s p ac e " Q u a li de n t "{ " { N a m es p a ce M e mb e r }" }" | { T yp e M od i f ie r s } Ty p e D ec l ) . Q u a l id e n t = id e n t {" . " } i d e nt . T y p e Mo d i fi e r s = "p ub l i c " | "p r o te c t ed " | " pr i v at e " | " se al e d " . T y p e De c l = ( " c la s s " i d en t [C la s s B as e ] C l a ss B o dy [ " ; " ] | "s t r uc t " i de n t [B a s e] S tr u c tB od y [ "; " ] | "e n u m" i de nt [ " :" I nt T y pe ] En um B o d y [ " ;" ] ) . C l a s sB a s e = ": " Cl as s T y pe . C l a s sT y p e = Qu a l id en t | " o b je c t " | "s tr i n g ". C l a s sB o d y = "{ " { {M e m b er M o di f i er } C la s s M e m be r } " } ". M e m b er M o di f i er = " ov e r r id e " | " pr i v at e" | "s e a le d " | " st at i c " | " e x t er n " | " vi r t ua l" . C l a s sM e m be r = S t ru ct M e m be r . S t r u ct B o dy = " { " { { M e m be r M od i f ie r } St ru c t M em b e r } " } " . S t r u ct M e mb e r = " c o ns t " T y p e id e n t " = " E x p r { ", " i d e nt " =" E xp r } "; " | i d e nt " (" [ Fo rm a l P ar a m s] " )" [ Co ns t r u ct o r Ca l l ] ( B lo ck | " ; " ) | ( " i mp l i ci t " |" ex p l i ci t " ) " o pe r a to r" T y pe " (" T yp e id en t " )" ( B l o ck | " ; " ) | T y p eD e c l | T y p e " o pe r a to r" O v er l o ad a b le O p " (" T y pe i de n t ( " , " Ty p e i d e n t | ) " ) " ( B l oc k| " ; " ) | F i e ld { " , " F ie l d } " ; " | Q u a li d e nt " (" [ F o r ma l P ar a m s] " )" ( B l o ck | " ;" ) | " { " A c c es s o rs " } " . B a s e = " :" Q ua l i de nt . I n t T yp e = " i nt " | "c h a r ". E n u m Bo d y = " {" E nu mM e m b er { ", " En u m Me mb e r } " } " . E n u m Me m b er = i d e nt [ " = " E x p r ] . T y p e = S im p l e T y p e | C l a ss T y pe . S i m p le T y pe = I n t Ty pe | "b o o l" | " f l oa t" .

Page 29: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

2 6

C o n s tr u c to r C al l = ": " ( "b a s e" | " th i s ") " ( " [ A r gu m e nt { " ," A r g u me n t } ] ") " . B l o c k = "{ " {S t a te me n t } " } " . S t a t em e n t = ( " c o ns t " T yp e i de n t " = " E x p r { " , " i d e nt " =" E xp r} | L o ca l V ar De c l "; " | E m be d d ed St a t e me n t ) . L o c a lV a r De c l = T yp e L o c al V a r { " ," L oc al V a r }. L o c a lV a r = i de n t [ "= " I ni t ]. E m b e dd e d St a t em e n t = B l oc k | " ; " | S t at e m en t E xp r " ; " | " i f" " (" E xp r " ) " E m b ed d e dS t a te me n t [" e l se " E m b e dd e d St a t em e n t] | " w hi l e " " ( " Ex p r ") " Em b e dd e d St at e m e nt | " d o" E mb e d de dS t a t em e n t " w hi l e " "( " E xp r ") " "; " | " f or " "( " [F or I n i t] " ;" [ Ex p r ] "; " [ Fo r I nc ] ") " E m b e dd e d St a t em e n t | " b re a k " " ; " | " c on t i nu e " " ;" | " r et u r n" [ Ex pr ] " ;" . F o r I ni t = L o ca l V ar De c l | S t at e m en t E xp r { " , " S t at e m en t E xp r} . F o r I nc = S t a te m e nt Ex p r {" , " S t a te m e nt Ex p r } . S t a t em e n tE x p r = Un ar y A ss i g nO p E xp r . A s s i gn O p = " =" | " += " | " - = " | "* = " | " / = " . E x p r = U na r y ( O rE xp r | A s s ig n O pE x p r) . O r E x pr = A n d Ex p r { "| | " Un a r y A n dE x p r }. A n d E xp r = E q lE x p r{ " & & " U n a ry E ql E x pr } . E q l E xp r = R e lE x p r{ ( " ! = " | "= = " ) U n ar y R e l Ex p r } . R e l E xp r = A d dE x p r{ ( " < " | "> " | " < =" | " > =" ) } U na r y A dd E x p r | " i s " T y p e. A d d E xp r = M u lE x p r{ ( " + " |" - " ) M u lE x p r} . M u l E xp r = { ( "* " | "/ ") U n ar y } . U n a r y = {( " +" | " -" | "! " | " + +" | " -- " | " ( " T y p e " ) " ) } P r i m ar y . P r i m ar y = ( i d e n t | L i te r a l | " ( " E x pr " ) "

Page 30: Low Level Virtual Machine C# Compiler Senior Project Proposal

Low Level Virtual Machine C# Compiler

Senior Project Proposal

2 7

| ( "b o o l" | " c ha r " | " fl o a t" | " i nt " | " o bj e c t" | " s t r in g " ) " ." i de nt | " t hi s " | " b a s e" ( " . " i d e nt | " [ " E x pr " ]" ) | " n ew " Ty pe ( "( " [ A r gu m e nt { " , " A r g um e n t} ] ") " | A r r a yI n i t ) | " t yp e o f" " ( " Ty p e " ) " | " s iz e o f" " ( " Ty p e " ) " ) { "+ + " | " -- " | " . " i d en t | " ( " [ A rg um e n t { " , " A r gu m e nt }] " ) " } . L i t e ra l = i n te g e rC on s t a nt | r e a lC o n st an t | c h a ra c t er C o ns ta n t | s t r i ng C o ns t a nt | " tr u e " | " fa l s e" | " nu l l " . O v e r lo a d ab l e Op = " + " | " - " | "! " | " + + " | " - - " | " t r u e" | " f a ls e " | " * " | "/ " | "= = " | " ! =" | " > " | " < " | " >= " | " < =" . F i e l d = id e n t[ " =" I n i t ] . F o r m al P a ra m s = P ar [ " , " F o r ma l P ar a m s] . P a r = T y pe i de n t . // r e f a n d o u t n o t s up p o r te d A c c e ss o r s = Ge t A cc es s o r | S et A c ce s s or . G e t A cc e s so r = id e n t (B l o c k | "; " ). S e t A cc e s so r = id e n t (B l o c k | "; " ) . A r g u me n t = E xp r . I n i t = E xp r | A r ra yI n i t . A r r a yI n i t = "{ " [E xp r { " , " E x p r} ] "} ".