modern programming languages

(Cs-432) Lecture # 04

Programming languages can be implemented by any of three general methods◦ Compilation◦ Pure interpretation◦ Hybrid implementation systems

At one extreme, programs can be translated into machine language, which can be executed directly on the computer, this method is called a “compiler implementation” and has the advantages of very fast program execution, once the program translation process is completed

Most production implementations of languages, such as C, COBOL, C++ and Ada are by compilers

The language that a compiler translates is called the “source language”

The compilation process and program execution takes place in several phases

The lexical analyzer gathers the characters of the source program into lexical units

The lexical units of a program are identifiers, special words, operators and punctuation symbols

The lexical analyzer ignore the comments in the source program because the compiler has no use of them

The syntax analyzer takes the lexical units from the lexical analyzer and uses them to construct hierarchical structures called “parse trees”

These parse trees represents the syntactic structure of the program

The intermediate code generator produces a program in a different language, at an intermediate level between the source program and the final output of the compiler

Intermediate code sometimes look very much like assembly languages, in fact sometimes are actually assembly codes

The semantic analyzer is an internal part of the intermediate code generator

The semantic analyzer checks for errors, such as type errors, that are difficult to detect during syntax analysis

Optimization which improves programs by making them smaller or faster or both

Because many kind of optimization are difficult to do on machine language, most optimization is done on intermediate code

The code generator translates the optimized intermediate code version of the program into an equivalent machine language program

The symbol table serves as a database for the compilation process

The primary contents of the symbol table are the type and attribute information of each user-defined name in the program

This information is placed in the symbol table by the lexical and syntax analyzer and is used by the semantic analyzer and the code generator

Most user programs also require programs from the operating system like input/output

The compiler builds calls to required system programs when they are needed by the user program

Before the machine language programs produced by a compiler can be executed, the required programs from the operating system must be found and linked to the user program

The process of collecting system programs and linking them to user programs is called “linking” and “loading”

The process of linking is performed by a system program called “linker”

Linker does not link only system programs rather it can link user programs that resides in libraries

The user and system code together are sometimes called a “load module” or “executable image”

The speed of connection between a computer’s memory and its processor usually determines the speed of the computer, because instruction often can be executed faster than they can be moved to the processor for execution

This connection is called the von Neumann bottleneck

It is the primary limiting factor in the speed of von Neumann architecture computers

Pure interpretation lies at the opposite end (from compilation) of implementation methods

With this approach, programs are interpreted by another program called an interpreter, with no translation, whatever

The interpreter program acts as a software simulation of a machine whose fetch-execute cycle deals with high-level language program statements rather than machine instructions

This software simulation obviously provides a virtual machine for the language

Pure interpretation has the advantage of allowing easy implementation of many source-level debugging operations, because all runtime error messages can refer to source level units

This system has the serious disadvantage that execution is 10 to 100 times slower than in compiled systemsDecodingSimilar statement is decoded every time it appears in

source code Pure interpretation often requires more space, in

addition to the source program, the symbol table must be presented during interpretation, which performed every time the source is executed

Php is the example of pure interpretation

Some language implementations systems are a compromise between compilers and pure interpreters, they translate high level language program to an intermediate language designed to allow easy interpretation

This method is faster than pure interpretation because the source language statements are decoded only once

Java implementation is hybrid, its intermediate form called “byte code”, provides portability to any machine that has a byte code interpreter and associated runtime system◦ Together these are called a “Java Virtual Machine”

Just In Time (JIT) compilation in .NET also translates programs to an intermediate language◦ A Just-in-Time ( JIT) implementation system initially

translates programs to an intermediate language. Then, during execution, it compiles intermediate language methods into machine code when they are called

A preprocessor is a program that processes a program immediately before the program is compiled

Preprocessor instructions are embedded in programs

Preprocessor instructions are commonly used to specify that the code from another file is to be included

For example #include”iostream.h” Another preprocessor instruction are used

to define symbols to represent expressions

For example #define max(A, B) ((A)>(B)?(A):(B))

To determine the largest of two given expressions

For example x = max(2*y , z/1.73)

A programming environment is the collection of tools used in the development of software

It consist of file system, text editor, link and a compiler at least

Jbuilder is a programming environment that provides integrated compiler, editor, debugger and file system in one GUI, for java development

Microsoft Visual Studio .NET is another programming environment, it consist of large collection o software development tools

This system can be used to develop software in any of the five, C#, Visual Basic, Jscript, F# (Functional Language) and C++

NetBeans is a development environment that is primarily used for java application development but also support JavaScript, Ruby and PHP

Introduction to MPLProgramming domainsLanguage evaluation criteriaLanguage trade-offs Influences on language design Programming design methodologiesLanguage categories Implementation methodsPreprocessorsProgramming environments

Please find the implementation details of any language available these days (Java, C#, Visual Basic, Php etc)[Submit a hard copy not more than 02 pages][Please avoid copy/paste and submit whatever you

understand ]

Plagiarism will be treated strictly

The study of programming languages can be divided into examination of Syntax and Semantics

The Syntax of a programming language is the form of its expressions, statements and program units

Semantic is the meaning of those expressions, statement and program units

Although they are often separated for discussion purposes, but syntax and semantics are closely related

A language, whether natural (English) or artificial (java), is a set of strings of characters from some alphabet set

The string of language are called sentences or statements

The syntax rules of a language specify which strings of characters from the language’s alphabet are in the language

In comparison to natural languages, programming languages are syntactically very simple and concrete

Lowest level syntactic units are called Lexemes

The description of lexemes can be given by a lexical specification, which usually separate from syntactic description of the language

The lexemes of a programming language include its numeric literals, operators and special words

Program is strings of lexemes rather than of characters

Lexemes are partitioned into groups, like the name of variables, methods, classes etc in a programming language called Identifiers

Each lexemes group is represented by a name, called Token

Token of a language is a category of its lexemes, for example an identifier is a token that can have lexemes, or instances, such as sum and total

In some cases a token has only a single possible lexeme for example “+” arithmetic operator

For example consider the following statement:

Index = 2*count+17;◦ The lexemes and tokens of this statement are:

Lexemes Tokens

Index Identifier

= Equal_sign

2 Int_literal

* Mult_op

Count Identifier

+ Plus_op

17 Int_literal

; semicolor

Languages can be formally defined in two distinct ways◦ By Recognition ◦ By Generator

Suppose we have a language L that uses an alphabet set ∑ of characters, to define L formally using the recognition method, we would need to construct a method R, called a recognition device, capable of reading strings and indicate whether a given input string was in L or not

When fed any string of character over ∑, accepts it only if it is in L, then R is the description of L

This might seem like a lengthy and ineffective process

In next method the syntax analysis part of a compiler is a recognizer for the language the compiler translates

In this role, the recognizer need not test all possible strings of characters from some set to determine whether each is in the language, rather it need only determine whether given programs are in the language

In effect then, the syntax analyzer determine whether the given program are syntactically correct

The structure of syntax analyzer is also known as Parser

A language generator is a device that can be used to generate the sentences of a language

There is a close connection between formal generation and recognition devices for the same language, we will discuss it later

The formal language generation mechanisms are called grammars, that are commonly used to describe the syntax of programming languages

In the middle 1950s, Noam Chomsky and John Backus, developed the syntax description formalism, which become the most widely used method for programming languages syntax

Chomsky described four classes of generative devices or grammars that define four classes of languages

Two of these grammars classes, named context free and regular are turned out to be useful for describing the syntax of programming languages

The forms of the tokens of programming languages can be described by regular grammars

The syntax of whole programming languages, with minor exceptions, can be described by context free grammarsHis work was later applied to programming languages

John Backus introduced a new formal notation for specifying programming language syntax

A meta-language is language that is used to describe another language. BNF is a meta language for programming languages

BNF uses abstraction for syntactic structure, for example a simple assignment statement might be represented by the abstraction like

<assign> -> <var> = <expression>The text on the left side of arrow is abstraction being

definedThe text of the right side of arrow is the definition of

abstraction The right side consist of mixture of tokens, lexemes

and references to other abstractionsAltogether, the definition is called a rule or productionTotal = subtotal1+subtotal2

The abstraction in a BNF description or grammar are often called non-terminal symbols and the lexemes and tokens of the rules are called terminal symbols

A BNF description or grammar is a collection of rules

1. S -> AB2. S -> ASB3. A -> a4. B -> b

The sentences of the language are generated through a sequence of application of the rules, beginning with a special non-terminal of the grammar called start symbol

This sequence of rule applications is called a derivation

1. S -> AB2. S -> ASB3. A -> a4. B -> b

S -> AB | ASBA -> aB -> b

Solve: S 01S | 0S1 | S01 | 10S | 1S0 | S10 |

The derivation of a program in this language is as follows:

The derivation begins with the start symbol <program>

Each successive string in the sequence is derived from the previous string by replacing one of the non-terminals with its definitions

Each of the strings in the derivation, including <program>, is called sentential form

The sentential form, consisting of only terminals, or lexemes, is the generated sentence

By choosing alternative RHSs of rules in the derivation, different sentences in the language can be generated

By exhaustively choosing all combinations of choices, the entire language can be generated

This language, like most others is infinite, so one cannot generate all the sentences in the language in finite time

Lets have another example

The grammar describes assignment statements whose right sides are arithmetic expressions with multiplication and addition operators and parenthesis for example A=B*(A+C)

One of the most attractive feature of grammars is that they naturally describe the hierarchical syntactic structure of the sentences of the language they define

These hierarchical structure are called parse trees

Parse tree for the previous example

Every internal node of a parse tree is labeled with a non-terminal symbol

Every leaf is labeled with a terminal symbol Every sub-tree of a parse tree describes one

instance of an abstraction in the sentence

A grammar that generates a sentential form for which there are two or more distinct parse trees is said to be ambiguous

S → A00 A → ε | AA | 0 | 1 is ambiguous

because it has two parse trees: S

A 0 0

ε

S

A 0 0

A A

ε ε

The grammar of the given example is ambiguous because the sentence A=B+C*A has two distinct parse trees

Syntactic ambiguity of language structures is a problem because compilers often base the semantic of those structures on their syntactic form

Specifically the compiler chooses the code to be generated for a statement by examining its parse tree, if a language has more than one parse tree, then the meaning of the structure cannot be determined uniquely

There are several other characteristics of a grammar that are sometimes useful in determining whether a grammar is ambiguous, they include the following:◦ If the grammar generates a sentence with more than one

left-most derivation ◦ If the grammar generates a sentence with more than one

right-most derivations When such ambiguous situation arises, it uses non-

grammatical information provided by the designer for construct the correct parse tree

For more Understanding (http://www.cs.utsa.edu/~wagner/CS3723/grammar/grammars2.html)

modern programming languages

Documents