modern programming languages

57
(Cs-432) Lecture # 04

Upload: templeton-luka

Post on 30-Dec-2015

37 views

Category:

Documents


1 download

DESCRIPTION

Modern Programming Languages. (Cs-432) Lecture # 04. Implementation Methods. Programming languages can be implemented by any of three general methods Compilation Pure interpretation Hybrid implementation systems. Compilation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Modern Programming Languages

(Cs-432) Lecture # 04

Page 2: Modern Programming Languages

Programming languages can be implemented by any of three general methods◦ Compilation◦ Pure interpretation◦ Hybrid implementation systems

Page 3: Modern Programming Languages

At one extreme, programs can be translated into machine language, which can be executed directly on the computer, this method is called a “compiler implementation” and has the advantages of very fast program execution, once the program translation process is completed

Most production implementations of languages, such as C, COBOL, C++ and Ada are by compilers

Page 4: Modern Programming Languages

The language that a compiler translates is called the “source language”

The compilation process and program execution takes place in several phases

Page 5: Modern Programming Languages
Page 6: Modern Programming Languages

The lexical analyzer gathers the characters of the source program into lexical units

The lexical units of a program are identifiers, special words, operators and punctuation symbols

The lexical analyzer ignore the comments in the source program because the compiler has no use of them

Page 7: Modern Programming Languages

The syntax analyzer takes the lexical units from the lexical analyzer and uses them to construct hierarchical structures called “parse trees”

These parse trees represents the syntactic structure of the program

Page 8: Modern Programming Languages

The intermediate code generator produces a program in a different language, at an intermediate level between the source program and the final output of the compiler

Intermediate code sometimes look very much like assembly languages, in fact sometimes are actually assembly codes

Page 9: Modern Programming Languages

The semantic analyzer is an internal part of the intermediate code generator

The semantic analyzer checks for errors, such as type errors, that are difficult to detect during syntax analysis

Page 10: Modern Programming Languages

Optimization which improves programs by making them smaller or faster or both

Because many kind of optimization are difficult to do on machine language, most optimization is done on intermediate code

Page 11: Modern Programming Languages

The code generator translates the optimized intermediate code version of the program into an equivalent machine language program

Page 12: Modern Programming Languages

The symbol table serves as a database for the compilation process

The primary contents of the symbol table are the type and attribute information of each user-defined name in the program

This information is placed in the symbol table by the lexical and syntax analyzer and is used by the semantic analyzer and the code generator

Page 13: Modern Programming Languages

Most user programs also require programs from the operating system like input/output

The compiler builds calls to required system programs when they are needed by the user program

Before the machine language programs produced by a compiler can be executed, the required programs from the operating system must be found and linked to the user program

Page 14: Modern Programming Languages

The process of collecting system programs and linking them to user programs is called “linking” and “loading”

The process of linking is performed by a system program called “linker”

Linker does not link only system programs rather it can link user programs that resides in libraries

The user and system code together are sometimes called a “load module” or “executable image”

Page 15: Modern Programming Languages

The speed of connection between a computer’s memory and its processor usually determines the speed of the computer, because instruction often can be executed faster than they can be moved to the processor for execution

This connection is called the von Neumann bottleneck

It is the primary limiting factor in the speed of von Neumann architecture computers

Page 16: Modern Programming Languages

Pure interpretation lies at the opposite end (from compilation) of implementation methods

With this approach, programs are interpreted by another program called an interpreter, with no translation, whatever

The interpreter program acts as a software simulation of a machine whose fetch-execute cycle deals with high-level language program statements rather than machine instructions

This software simulation obviously provides a virtual machine for the language

Page 17: Modern Programming Languages
Page 18: Modern Programming Languages

Pure interpretation has the advantage of allowing easy implementation of many source-level debugging operations, because all runtime error messages can refer to source level units

This system has the serious disadvantage that execution is 10 to 100 times slower than in compiled systemsDecodingSimilar statement is decoded every time it appears in

source code Pure interpretation often requires more space, in

addition to the source program, the symbol table must be presented during interpretation, which performed every time the source is executed

Php is the example of pure interpretation

Page 19: Modern Programming Languages

Some language implementations systems are a compromise between compilers and pure interpreters, they translate high level language program to an intermediate language designed to allow easy interpretation

This method is faster than pure interpretation because the source language statements are decoded only once

Page 20: Modern Programming Languages
Page 21: Modern Programming Languages

Java implementation is hybrid, its intermediate form called “byte code”, provides portability to any machine that has a byte code interpreter and associated runtime system◦ Together these are called a “Java Virtual Machine”

Just In Time (JIT) compilation in .NET also translates programs to an intermediate language◦ A Just-in-Time ( JIT) implementation system initially

translates programs to an intermediate language. Then, during execution, it compiles intermediate language methods into machine code when they are called

Page 22: Modern Programming Languages

A preprocessor is a program that processes a program immediately before the program is compiled

Preprocessor instructions are embedded in programs

Preprocessor instructions are commonly used to specify that the code from another file is to be included

For example #include”iostream.h” Another preprocessor instruction are used

to define symbols to represent expressions

Page 23: Modern Programming Languages

For example #define max(A, B) ((A)>(B)?(A):(B))

To determine the largest of two given expressions

For example x = max(2*y , z/1.73)

Page 24: Modern Programming Languages

A programming environment is the collection of tools used in the development of software

It consist of file system, text editor, link and a compiler at least

Page 25: Modern Programming Languages

Jbuilder is a programming environment that provides integrated compiler, editor, debugger and file system in one GUI, for java development

Microsoft Visual Studio .NET is another programming environment, it consist of large collection o software development tools

This system can be used to develop software in any of the five, C#, Visual Basic, Jscript, F# (Functional Language) and C++

NetBeans is a development environment that is primarily used for java application development but also support JavaScript, Ruby and PHP

Page 26: Modern Programming Languages

Introduction to MPLProgramming domainsLanguage evaluation criteriaLanguage trade-offs Influences on language design Programming design methodologiesLanguage categories Implementation methodsPreprocessorsProgramming environments

Page 27: Modern Programming Languages

Please find the implementation details of any language available these days (Java, C#, Visual Basic, Php etc)[Submit a hard copy not more than 02 pages][Please avoid copy/paste and submit whatever you

understand ]

Plagiarism will be treated strictly

Page 28: Modern Programming Languages

The study of programming languages can be divided into examination of Syntax and Semantics

The Syntax of a programming language is the form of its expressions, statements and program units

Semantic is the meaning of those expressions, statement and program units

Although they are often separated for discussion purposes, but syntax and semantics are closely related

Page 29: Modern Programming Languages

A language, whether natural (English) or artificial (java), is a set of strings of characters from some alphabet set

The string of language are called sentences or statements

The syntax rules of a language specify which strings of characters from the language’s alphabet are in the language

In comparison to natural languages, programming languages are syntactically very simple and concrete

Page 30: Modern Programming Languages

Lowest level syntactic units are called Lexemes

The description of lexemes can be given by a lexical specification, which usually separate from syntactic description of the language

The lexemes of a programming language include its numeric literals, operators and special words

Program is strings of lexemes rather than of characters

Page 31: Modern Programming Languages

Lexemes are partitioned into groups, like the name of variables, methods, classes etc in a programming language called Identifiers

Each lexemes group is represented by a name, called Token

Token of a language is a category of its lexemes, for example an identifier is a token that can have lexemes, or instances, such as sum and total

In some cases a token has only a single possible lexeme for example “+” arithmetic operator

Page 32: Modern Programming Languages

For example consider the following statement:

Index = 2*count+17;◦ The lexemes and tokens of this statement are:

Lexemes Tokens

Index Identifier

= Equal_sign

2 Int_literal

* Mult_op

Count Identifier

+ Plus_op

17 Int_literal

; semicolor

Page 33: Modern Programming Languages

Languages can be formally defined in two distinct ways◦ By Recognition ◦ By Generator

Page 34: Modern Programming Languages

Suppose we have a language L that uses an alphabet set ∑ of characters, to define L formally using the recognition method, we would need to construct a method R, called a recognition device, capable of reading strings and indicate whether a given input string was in L or not

When fed any string of character over ∑, accepts it only if it is in L, then R is the description of L

This might seem like a lengthy and ineffective process

Page 35: Modern Programming Languages

In next method the syntax analysis part of a compiler is a recognizer for the language the compiler translates

In this role, the recognizer need not test all possible strings of characters from some set to determine whether each is in the language, rather it need only determine whether given programs are in the language

In effect then, the syntax analyzer determine whether the given program are syntactically correct

The structure of syntax analyzer is also known as Parser

Page 36: Modern Programming Languages

A language generator is a device that can be used to generate the sentences of a language

There is a close connection between formal generation and recognition devices for the same language, we will discuss it later

Page 37: Modern Programming Languages

The formal language generation mechanisms are called grammars, that are commonly used to describe the syntax of programming languages

Page 38: Modern Programming Languages

In the middle 1950s, Noam Chomsky and John Backus, developed the syntax description formalism, which become the most widely used method for programming languages syntax

Page 39: Modern Programming Languages

Chomsky described four classes of generative devices or grammars that define four classes of languages

Two of these grammars classes, named context free and regular are turned out to be useful for describing the syntax of programming languages

The forms of the tokens of programming languages can be described by regular grammars

The syntax of whole programming languages, with minor exceptions, can be described by context free grammarsHis work was later applied to programming languages

Page 40: Modern Programming Languages

John Backus introduced a new formal notation for specifying programming language syntax

A meta-language is language that is used to describe another language. BNF is a meta language for programming languages

Page 41: Modern Programming Languages

BNF uses abstraction for syntactic structure, for example a simple assignment statement might be represented by the abstraction like

<assign> -> <var> = <expression>The text on the left side of arrow is abstraction being

definedThe text of the right side of arrow is the definition of

abstraction The right side consist of mixture of tokens, lexemes

and references to other abstractionsAltogether, the definition is called a rule or productionTotal = subtotal1+subtotal2

Page 42: Modern Programming Languages

The abstraction in a BNF description or grammar are often called non-terminal symbols and the lexemes and tokens of the rules are called terminal symbols

A BNF description or grammar is a collection of rules

1. S -> AB2. S -> ASB3. A -> a4. B -> b

Page 43: Modern Programming Languages

The sentences of the language are generated through a sequence of application of the rules, beginning with a special non-terminal of the grammar called start symbol

This sequence of rule applications is called a derivation

1. S -> AB2. S -> ASB3. A -> a4. B -> b

S -> AB | ASBA -> aB -> b

Page 44: Modern Programming Languages

Solve: S 01S | 0S1 | S01 | 10S | 1S0 | S10 |

Page 45: Modern Programming Languages

The derivation of a program in this language is as follows:

Page 46: Modern Programming Languages

The derivation begins with the start symbol <program>

Each successive string in the sequence is derived from the previous string by replacing one of the non-terminals with its definitions

Each of the strings in the derivation, including <program>, is called sentential form

The sentential form, consisting of only terminals, or lexemes, is the generated sentence

Page 47: Modern Programming Languages

By choosing alternative RHSs of rules in the derivation, different sentences in the language can be generated

By exhaustively choosing all combinations of choices, the entire language can be generated

This language, like most others is infinite, so one cannot generate all the sentences in the language in finite time

Page 48: Modern Programming Languages

Lets have another example

Page 49: Modern Programming Languages

The grammar describes assignment statements whose right sides are arithmetic expressions with multiplication and addition operators and parenthesis for example A=B*(A+C)

Page 50: Modern Programming Languages

One of the most attractive feature of grammars is that they naturally describe the hierarchical syntactic structure of the sentences of the language they define

These hierarchical structure are called parse trees

Page 51: Modern Programming Languages

Parse tree for the previous example

Page 52: Modern Programming Languages

Every internal node of a parse tree is labeled with a non-terminal symbol

Every leaf is labeled with a terminal symbol Every sub-tree of a parse tree describes one

instance of an abstraction in the sentence

Page 53: Modern Programming Languages

A grammar that generates a sentential form for which there are two or more distinct parse trees is said to be ambiguous

S → A00 A → ε | AA | 0 | 1 is ambiguous

because it has two parse trees: S

A 0 0

ε

S

A 0 0

A A

ε ε

Page 54: Modern Programming Languages
Page 55: Modern Programming Languages

The grammar of the given example is ambiguous because the sentence A=B+C*A has two distinct parse trees

Page 56: Modern Programming Languages

Syntactic ambiguity of language structures is a problem because compilers often base the semantic of those structures on their syntactic form

Specifically the compiler chooses the code to be generated for a statement by examining its parse tree, if a language has more than one parse tree, then the meaning of the structure cannot be determined uniquely

Page 57: Modern Programming Languages

There are several other characteristics of a grammar that are sometimes useful in determining whether a grammar is ambiguous, they include the following:◦ If the grammar generates a sentence with more than one

left-most derivation ◦ If the grammar generates a sentence with more than one

right-most derivations When such ambiguous situation arises, it uses non-

grammatical information provided by the designer for construct the correct parse tree

For more Understanding (http://www.cs.utsa.edu/~wagner/CS3723/grammar/grammars2.html)