csci 5535 course project -- a report on interpreted programming languages

Upload: lorddeath

Post on 05-Apr-2018

231 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    1/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 1

    CSCI 5535 Project

    A Report on

    Interpreted Programming languages

    by

    Xiaoli Zhang

    Helen Wong

    December 11, 1996

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    2/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 2

    Content

    1. Introduction

    2. Two important Languages in the Evolution of Interpreted Languages

    PascalSmalltalk

    3. Interpreter and Virtual Machine

    Traditional Compilation Process

    Self Compilation

    Compiler and Interpreter

    Intermediate Language

    Just-in-time and On-the-fly

    Virtual Machine

    Examples of intermediate languages and related abstract machines

    Abstract Machine to Actual MachinePortability of Interpreters

    4. Scripting Languages and Interpreted Languages

    5. Case Study

    a) Java

    Overview

    Java Virtual Machine

    Java Language Construct and Javas Interpreter

    b) Tcl/TkOverview

    On-the fly Bytecode compiler for Tcl

    c) Both are Web Programming Languages

    d) Another Mobile Language: Omniware

    6. Why Interpreted Languages?

    Portability

    Security

    Reusability

    Rapid Development

    Performance

    Other Advantages of Interpreted Languages

    7. Summary

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    3/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 3

    1. Introduction

    Interpreted Languages have become more and more popular. In recent years, interpreted language

    such as Java, Tcl/Tk and Perl are the hot topics and wide-spread. Why? Generally, it is because

    they are portable, easy to use, fast to develop and safe. And most interpreted languages are closely

    related to Web programming. In this paper, we will do some study to expose the nature of inter-

    preted programming languages and how these features of interpreted languages are achieved.

    2. Two important Languages in the Evolution of Interpreted Languages

    Pascal

    Pascal is one of the early interpreted language developed by Niklaus Wirth. The non-interpreted

    Pascal was designed and implemented in 1967. The first Pascal compiler was implemented for the

    CDC6000 computer family. It was written in Pascal itself. In implementing Pascal compiler,

    Wirth found that the effort to generate good code is proportional to the mismatch between lan-

    guage and machine, and the CDC6000 had certainly not been designed with high-level languagein mind. [1]

    Whats more, after the existence of Pascal became well-known, many people asked Wirth for

    assistance in implementing Pascal on various other machines. Most of them wanted to use Pascal

    for teaching purpose. They liked Pascal for its simplicity and implementation elegance while did

    not care much about the performance.

    Thereupon, Wirth decided to provide a compiler version that would generate code for machines of

    different designs. Later, the code became known as P-code. P-code is an abstract machine code

    whose target is a Virtual Machine called P-machine. As an intermediate language, P-code is then

    interpreted to emulate its virtual machine on real machine. The P-code version Pascal was easy toconstruct because the new compiler was developed as a substantial exercise in structured pro-

    gramming by stepwise refinement and therefore the first few refinement steps could be adopted

    unchanged. It also proved to be very successful in spreading the language among many users on

    different machines. Wirth had regretted that he had not possessed the wisdom to foresee the

    dimensions of this movement. Otherwise, he would have put more efforts into designing and doc-

    umenting P-code. [1]

    Pascals P-code and related Virtual Machine elaborated the concept of existed Intermediate Lan-

    guage and Virtual Machine and thus are very important in the evolution of interpreted lan-

    guages. Now, P-code has almost become a household word in the area of programming

    languages. With the Virtual Machine, Pascal-P system was well developed to an environment withintegrated compiler, filter, editor, and debugger. This caused Pascals further spreading out.

    As mentioned above, Pascal-P is both compiled and interpreted. It has both compiler such as

    pcom and interpreter such aspint. [4] As a whole, it takes place in two phases, first the compiler

    compiles a source code into P-code, and then the interpreter interprets the P-code. This imple-

    mentation used self-compilation: The compiler is written in its own source language and can

    compiler itself. This approach is a common combination of elementary methods and is called

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    4/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 4

    bootstrap which is also very helpful in software migration. In Pascal-P, The resulting compiler is

    written in the Virtual Machine Language -- P-code and generates code for this same machine.

    Hence the compiler itself must be interpreted. [2] Similar story happens in Javas implementation.

    Smalltalk

    Another significant interpreted language is Smalltalk which was developed during the 1970s at

    Xerox PARC (Palo Alto Research Center). It was the first language to really exploit a graphical

    user interface. Many of the ideas for the Macintosh came from Smalltalk. Smalltalk is more of an

    envrionment rather than a language. This is because there is a Smalltalk Virtual Machine, and the

    entire operation of the Smalltalk environment and language is built on the virtual machine. [6]

    We here call Smalltalk an interpreted language solely because Smalltalk is a P-Machine. What

    actually happens as a result of a message sent in Smalltalk is:

    first the system checks to see if the method has already been translated to machine code that

    has been cached in memory if the native machine code form is in the cache, the system executes that machine code

    if the cache doesnt contain a translated form of the method, the system dynamically compiles

    the methods bytecode [5]

    Dynamic translation yields the benefits of the execution speed of compiled code and the space

    compactness of bytecode. If all the code in a running Smalltalk image were kept purely in the

    form of compiled machine code, the image would consume 5-10 times as much memory, and

    therefore could in fact degrade performance on a virtual memory system by causing increased

    paging. [7]

    Many features in Smalltalk are worthy to be borrowed by new interpreted languages such as Java.One of these features isjust-in-time compilation we mentioned above. Currently, Java is imple-

    menting just-in-time compilation of the bytecode into native code to improve its performance.

    We will address some not well-known interpreted Languages next section in introducing Inter-

    preter and Virtual Machine. Also we will address in details those popular interpreted languages

    such as Java and Tcl as case study while address Perl as scripting language.

    3. Interpreter and Virtual Machine

    In last section, quite a few terms (in bold characters) related to interpreter languages are men-

    tioned. This section, we will report in details the concepts represented by these terms. Lets go alittle bit backward.

    Traditional Compilation Process

    The process that translate a high-level language into machine code, which the hardware can

    understand is done by the compiler. The task of compiler has two subtasks: analysis of the source

    program and synthesis of the object program. Typically as in figure 1, the analysis tasks consists

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    5/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 5

    of three subphases: lexical analysis, syntax analysis and semantic analysis. While the synthesis

    task is usually a single phase: code-generation

    Figure 1

    The lexical analyzer is responsible for reading the characters of the source program and recogniz-ing basic syntactic components or tokens that they represent and returning the tokens to the syntax

    analyzer or parser. Then the parser has to determine how to group and structure the tokens accord-

    ing to the syntax rules of the language. The output of parser is a representation of the syntactic

    structure of the source program and often expressed in the form of parse tree. The parse tree is

    then passed to the semantic analyzer which is to determine the meaning of the source program

    including the meaning of declarations and scopes of identifiers, storage allocation, type checking,

    selection of appropriate polymorphic operators, addition of automatic type transfers, etc.

    The code generator in the last phase of the compilation process takes the output from the semantic

    analyzer as input and generate machine code or assembly language for the target hardware.

    It has to know the machine architecture including machine instructions, allocation of machineregisters, addressing, interfacing with the operating system and so on. in order to generate object

    code for that machine.

    If we say that the analysis phase or front end is language-dependent, --- the analyzers have to

    know the syntactic and semantic rules of the language, --- the synthesis phase or the back end is

    machine-dependent.

    The code generator usually includes some form of code optimizer to produce faster or more com-

    pact code. The code generation may include both machine-dependent and machine-independent

    techniques. [27]

    Self Compilation

    While compiler is to translate high level languages into machine code or object code, most com-

    pilers themselves are software written in high level languages, some of them are in the source lan-

    guages they are supposed to compile. How can this self-compilation be achieved? This is done by

    a process called bootstrapping. We are going to take Pascal as an example to try to illustrate the

    bootstraping process. Refer to Table 1, suppose there are machines X, Y, Z. Any two of them

    sourceprogram

    lexicalanalyzer

    syntaxanalyzer

    Front End

    semanticanalyzer

    codegenerator

    object

    program

    Back End

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    6/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 6

    could be the same or different.

    source code of target used compiler object code

    Pascal Compiler machine

    (1) { Modula-2 for Y } [in Xs assembly language, running on X] Xs object code

    (2) { Pascal for Z } [in Modular-2, running on X] Ys object code

    (3) { Pascal for any } [Pascal, running on Y] Zs object code

    Table 1. Bootstraping

    In compilation process (1), a Pascal compiler source code for Y was written in Modula-2. The

    code then was compiled by a Modula-2 compiler written in Xs assembly language, and was

    translated into Xs object code. Once this new compiler existed, a Pascal compiler source code for

    Z written in Pascal could be passed to it and was translated into Ys object code, as in (2). Further,

    another Pascal compiler source code written in Pascal for an arbitrary machine could be passed to

    the newest compiler and could be translated into Zs object code as in (3). Note a compiler willcompile its input source code into object code of its target machine. While the compiler itself is an

    object code of the machine where it is running on. This machine does not have to be its target

    machine. [27]

    Compiler and Interpreter

    We denote interpreted languages to those languages using an interpreter in its compilation pro-

    cess. So, whats an interpreter by definition?

    A translator takes a program written in a source language as input and translates it into a pro-

    gram having the same meaning but written in an object language. If the source language is ahigher-level one, the translator is a compiler. Generally, compiler generates machine code or

    abstract machine code from source code.

    A interpreter directly executes its source language, without first translating it into an object lan-

    guage. Some Lisp or APL implementations could be considered to be pure interpreters. But many

    languages implementation consist of both compiler and interpreter. The former translates the

    source language into an interpretable intermediate language, in this case, the intermediate lan-

    guage is the source language for the interpreter. [2]

    With the intermediate language and interpreter, the compilation process becomes more sophisti-

    cated, typically as in Figure 2. The semantic analysis phase is often followed by another processthat takes the parse tree from the syntax analyzer and produces a linear sequence of instructions

    equivalent to the original source program. [27] The sequence of instructions can be considered as

    abstract machine code since it is targeted not to an actual machine but an abstraction of real

    machines. This abstraction is often called abstract machine or virtual machine.

    Intermediate Language

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    7/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 7

    The intermediate language, which occurs between two phases of an language translation process,

    Figure 2

    is an object code for the first phase and a source language for the second phase. It is very impor-

    tant for modern interpreted languages such as Java and plays a great role in languages portability

    and security.

    With intermediate language, the problem arising from the characteristics of the target hardware

    can be confined to the code generator. So the front end of the compiler can be used for any differ-

    ent code generator for different machines. And the compiler can be easily ported to different

    machine such as by bootstrapping, since now only the code generator is necessary to be ported. If

    we substitute the code generator with an interpreter in the back end of the compilation process,

    the implementation of a language on new hardware will be further easier since the implementationof an interpreter is much easier than that of a code generator. We will see this typically in Javas

    implementation (section 5. a)).

    The minor disadvantage of intermediate language is that it is sometimes somewhat harder to gen-

    erate optimized machine code from intermediate language than directly from parse tree. [27]

    An intermediate language is usually designed for a particular source language. It reflects the con-

    machine

    code

    memory

    interpreter

    code

    ator

    semantic

    abstract

    codegenerator

    on-the-fly nativemachinecode

    abstract machine code = intermediate language

    Front end Back end

    analyzer

    syntaxanalyzer

    lexicalanalyzer

    gener-

    machine

    sourceprogram

    code

    abstract

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    8/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 8

    structs, data types and operators of the source language in its basic operation. For example, P-

    machine is a hypothetical stack-based virtual machine with very simple structure and P-code con-

    tains many instructions closely related to Pascal languages construct. While the designing of an

    intermediate code for several different source languages is hard. Such kind of generic intermedi-

    ate language was proposed early in 1950s, a language called UNCOL(UNiversal Computer-Ori-

    ented Language), but failed to be developed due to practical difficulties. Recently, people are

    thinking about generic intermediate language and there are some on going. We will address this in

    next section about Virtual Machine.

    Intermediate languages represent internal interfaces in the compilation process and consequently

    they can take any suitable form: trees, triples, quadruples, assembly languages, bytecode, etc. Pas-

    cals P-code is a famous intermediate language and is in assembly language. [2]

    Just-in-time and On-the-fly

    As in Figure 2, intermediate language can be interpreted directly by an interpreter, or sometimes

    compiled again by native compiler or code generator into native machine code. This native compi-lation process is often Just-in-time compilation, means the native compiler rewrite those com-

    puter-intensive sections into native machine code at run-time as necessary, and the native machine

    code will not exist in disc file system but directly in memory. So, the compilation of intermediate

    language into native machine code is often a on-the-fly compilation which by definition is that:

    the output of the compiler does not exist in the disc file system, but is loaded into memory portion

    by portion. While the input of on-the-fly compilation could be either high-level source code or

    intermediate code, the output could be either intermediate code or native machine code. When the

    output is an intermediate code such as Bytecode, interpreter is necessary to interpret the Bytecode

    usually cached in memory.

    Virtual Machine

    As mentioned above, a virtual machine is an abstraction of a family of real machines. To be more

    accurate, a virtual machine is a fictitious target machine of an intermediate language, it specifies

    an somewhat ideal machine for some kind of convenience, either easier to write a simple-minded

    compiler or closer to most real machines.

    Most computers now have a set of general-purpose registers. Usually, operations take one of their

    operands from a register and the other from memory. Only some of the registers are for addressing

    in most cases.[2] So, a register-based virtual machine is closer to most real machine and the emu-

    lation of the virtual machine on a real machine needs less native machine instructions thus has less

    overhead.

    While most virtual machine now are stack-based. A stack machine has few actual registers, but an

    operand stack where operations find their operands and put their results. The advantage of stack

    machines is that they can be totally independent of computer[2]. And the compilation process is

    relatively simple for a stack machine.

    Again, the usefulness of an virtual machine stems from the fact that it allows the majority of the

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    9/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 9

    compilation process to be isolated from dependency on a specific machine.

    Examples of intermediate languages and related abstract machines [2]

    P-code and related abstract machine: The abstract machine associated with Pascal-P is very

    conventional and flexible. It is a stack machine with five registers: top of stack, base of global

    variables, top of heap used for dynamic variables, base of local variables and instruction

    counter.

    EM-1 intermediate language and related machine: EM-1 is more sophisticated language

    than P-code and is closer to actual assembly language than P-code. It contains 130 instruc-

    tions and P-code has only 60. It also contains a dozen pseudo instructions. While EM-1

    machine is very similar to P-code machine, with a stack of local variable areas whose top is

    used as an execution stack, a heap, a global variable area, and a program area.

    Janus and its abstract machine: The abstract machine associated with the intermediate lan-

    guage Janus has a memory that is divided into several independent areas that are organized astree structures. It uses a stack for expression evaluation, a processing unit to execute Janus

    instructions, and three specialized registers: condition code, instruction counter, and index

    register.

    We will address two other important virtual machine: Java Virtual Machine and Omniware in

    Case Study of this paper.

    Abstract Machine to Actual Machine

    To obtain an actual implementation, the abstract machine must then be transported into actual

    machine. Generally, it is the interpreter that executes abstract machines instruction set in actualmachine and gives abstract machine actual implementation. The two generally ways of interpreter

    implementation are:

    If the intermediate language resembles an assembly language, like Pascals P-code, the base

    operations of the abstract machine could be implemented using a macroprocessor. But if the

    macroprocessor is not a very powerful one, the resulting code is usually rather inefficient.

    An interpreter could be programmed or microprogrammed, which would amount to direct

    execution of the abstract machines instruction set, like Javas interpreter.[2]

    For performance reasons, interpreter is not the only way for abstract machine to actual machine.The intermediate code could be recompiled usually on-the-fly into targets machines native code.

    Portability of Interpreters

    The functionality of the front end of the compilation process of a language is identical to different

    target machines. With the self-compilation technique, the implementation of the front end could

    be totally portable to various platforms, such as Javas compiler javac. While the non-portable part

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    10/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 10

    of the implementation of a language is interpreter or code generator.

    Like code generator, an interpreter has to make use of the operating system facilities of the target

    machine by performing input and output, making use of graphics or window systems, making

    storage allocation requests, etc. That is, an interpreter has to deal with the run-time library

    which is a convenient way of providing interface between the compiled (abstract) machine code

    and the operating system, including a set of routines that can be called by the compiled code to

    perform all the machine and operating system dependent functions required by the users high-

    level language program. It is possible to write part of the run-time library in high level language,

    as Javas API in Java. But at least part of the run-time library will have to be written in a low-level

    language to make use of particular machine and operating system facilities.

    4. Scripting Languages and Interpreted Languages

    Scripting languages are good for their implementing variables, flow control and procedures for

    commands and serving as glues for commands.

    Scripting Languages are all interpreted. UNIX shell languages are simple scripting languages,

    they are interpreted directly with no intermediate languages and virtual machines involved. The

    interpreter for UNIX shell language is just a single executable, such as /usr/local/bin/sh for

    Bourne Shell, /usr/local/bin/ksh for Korn Shell.

    The interpreter that interprets high-level language directly has to include the lexical and syntax

    analysis phases in the front end of compilation process. But most directly interpreted high-level

    languages such as UNIX shell programming languages are simple enough that the interpreters can

    still be kept simple.

    But the modern scripting language, such as Perl, is not interpreted directly any more. The designgoal of Perl is to make a scripting language easy to develop and portable. So the implementation

    process of Perl is two-phased. Perl is both compiled and interpreted. It is compiled because the

    program is completely read and parsed before the first statement is executed. It is an interpreted

    because there is no object code sitting around filling up disk space. In some way, its the best of

    both world, typically a on-the-fly compiler-interpreter process.

    While the compilation does take time -- its inefficient to have a voluminous Perl program that

    does one small quick task and then exits, because the runtime for the program will be dwarfed by

    the compiler time -- it is more efficient for heavy tasks such as those with a large body of loop.

    Compilation will save the time for reparsing. That is why another directly interpreted scripting

    language, Tcl is switching to the on-the-fly bytecode compiler-interpreted style like Perl. To takemore advantage of this style of interpreted languages. A caching of the compiled object code

    between invocations is used by both Perl and Tcl. We will address Tcls on-the-fly bytecode com-

    pilation in case study of Tcl.

    The on-the-fly compilation of Tcl or Perl is different from Javas on-the-fly compilation, since

    they are on-the-fly bytecode compilation. The whole compiler-interpreter compilation process

    happens at run-time; the target of the on-the-fly compilation is bytecode that will be cached in

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    11/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 11

    memory and then be interpreted dynamically.

    While all scripting languages are interpreted, not all interpreted languages are scripting lan-

    guages. An example is the popular language: PostScript, which is a page description language.

    PostScript language is typically interpreted, stack-based. The stack-based feature make the source

    code of PostScript natural to be interpreted and portable. This feature makes PostScript device

    independent, meaning that the image is described without reference to any specific device fea-

    tures. So, PostScript files in their source code can be transferred from machine to machine even by

    email in ASCII form and then be interpreted by interpreter such as ghostview and those pluggedin printers without any modification.

    5. Case Study

    a) Java

    Overview

    Java is a simple, familiar to user, Object-Oriented language. That is because Java takes the syntax

    very similar to C and C++ while it is a cleaned-up version of C++. It supports garbage collection

    removed off a bunch of features in C and C++ that make C and C++ complex, such as: pointers,

    automatic coercions, operator overloading and multiple inheritance, etc.

    Other important aspects for Javas success are its internet-related features, as in the following:

    Dynamic: In Java, classes are linked only as needed. New code modules can be linked in on

    demand from a variety of sources, even from sources across a network. Instead of simply

    downloading static pages of texts and images, Javas applets can be download through webbrowser and run in the client machine. This support the image animation and real-time user-

    program interaction.

    Threaded: Modern network-based applications, such as the HotJava Web browser, typically

    need to do several things at the same time. A user can run several animations concurrently

    while downloading an image and scrolling the page. Javas multithreading capability provides

    the means to support this feature. [11]

    The reason why Java is a popular mobile language is that it is architecture neutral and portable.

    To accommodate the diversity of operating environments, the Java compiler generated bytecode--an architecture neutral intermediate format designed to transport code efficiently to multiple hard-

    ware and software platforms. The interpreted nature of Java solves both binary distribution prob-

    lem and the version problem; the same Java language byte codes will run on any platform.

    Javas portability also relies on its basic data types and the behavior of its arithmetic operator.

    This makes programs the same on every platform. There are no data type incompatibilities across

    hardware and software architectures.

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    12/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 12

    The self-compilation feature of Java is also a factor that makes Java more portable. Javas com-

    piler is written in Java and exists as Java bytecode. Furthermore, Java API and HotJava browser

    all exist as bytecode. Finally, in Java system, only the interpreter is left to be run-time system

    dependent.

    Java Virtual Machine

    The architecture-neutral and portable platform of Java is the Java Virtual Machine. Its the specifi-

    cation of an abstract machine for which Java compiler can generate code. Specific implementa-

    tions of the Java Virtual Machine for specific hardware and software platforms then provide the

    concrete realization of the virtual machine. The Java Virtual Machine is based primarily on the

    POSIX interface specification -- an industry-standard definition of a portable system interface.

    Implementing the Java Virtual Machine on new architectures is a relatively straightforward task as

    long as the target platform meets the basic requirements such as support for multithreading. [11]

    Java VM is called A soft-CPU. It is a stack-based machine. JVM supports about 248 bytecodes,each performs a basic CPU operation like adding an integer to a register, combining the numbers

    in two registers, jumping to subroutines, storing a result, incrementing or decrementing registers,

    etc. In effect, JVM is a stacked arithmetic logic unit with local and global variables.

    To add two numbers, the VM actually works as follows: the VM first pushes them onto its stack,

    then adds them. After completing the addition, the VM leaves the results on the stack for the next

    step in the process. To emulate this in a real machine, most probably a register-based machine, it

    takes quite a few real machine instructions and memory references. So, there is overhead for the

    transportation from stack-based VM to register-based real machine. We will address this further in

    section 6 of this paper where addressing Performance.

    At the beginning, Java Virtual Machine is the target machine just for Java source language.

    Recently, people are trying to support other languages on top of the same Java Virtual Machine.

    According to Javas creator James Gosling, languages like Visual Basic, COBOL, Dylan and

    Scheme are fairly reasonable bet for the Java VM. So, although JVM was not designed as a

    generic virtual machine, it is now intend to serve as one for existing requirements. [28]

    Java Language Construct and Javas Interpreter

    A Java programmer can create:

    Applets: Programs that are included in HTML pages through the APP tag and displayed inthe HotJava browser. The simple hello world program shown in A Simple Java Program is

    an applet. The HotJava browser is invoked by the hotjava command included in the Java code

    distribution.

    Applications: The stand-alone program written in Java and executed independently is the

    HotJava browser. This is done using the Java interpreter--java, included in the Java code distri-

    bution.

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    13/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 13

    Protocol handlers: Programs that are loaded into the users HotJava browser and interpreter

    protocol. These protocols include standard ones such as HTTP or programmer-defined proto-

    cols.

    Content handlers: A program loaded into the users HotJava browser, which interprets files

    of a type defined by the Java programmer. The Java programmer provides the necessary code

    for the users HotJava browser to display/interpret this special format.

    Native methods: Methods that are declared in a Java class but implemented in C. These

    native methods essentially allow a Java programmer to access C code from Java. [10]

    There is another tools in JDK called AppletViewer for testing and running applets. AppletViewer

    also has Java interpreter, java, embedded.

    Java Interpreter is plugged into every Java-enabled web browser. Here is a practical way to under-

    stand the technical description of Java by looking at the processes that occur when a user with aJava-enabled browser requests a page containing a Java applet:

    1. The user sends a request for an HTML document to the information providers server.

    2. The HTML document is returned to the users browser. The document contains the APP tag,

    which identifies the applet.

    3. The corresponding applet bytecode is transferred to the users host. This bytecode had been

    previously created by the Java compiler using the Java source code for that applet.

    4. The Java-enabled browser on the users host interprets the bytecode and provides the display.

    5. The user may have further interaction with the applet but with no further downloading from

    the providers Web server. This is because the bytecode contains all the information necessary

    to interpret the applet.

    b) Tcl/Tk

    Overview

    Tcl stands for Tool Command Language, which is an extensible embedded command language or

    a scripting language, implemented by John Ousterhout originally from University of California,Berkeley, now working for Sun.

    What makes Tcl different from other scripting languages is the ability of easily adding a Tcl

    interpreter to applications. A Tcl interpreter consists of a set of commands, a set of variable

    bindings and a command execution state. It is the basic unit manipulated by most of the Tcl

    library procedures. Applications may have one or more interpreters according to their complexity

    respectively. Multiple interpreters may responsible for different purposes. Tcl commands may be

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    14/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 14

    built-in commands such as those flow control key words: for, if, case, eval, etc. or may be appli-

    cation-specific commands defined by users. The application-specific commands have no limit to

    be extended up to the developer and user group. Since programmers can structure their applica-

    tions using a set of primitive operations as well as any existing commands together with any new

    command(s) developed by themselves to best suit their need, there is no need to invent a com-

    mand language for new application. All commands are embedded in Tcl code via creating inter-

    preter object(s) inside the application by calling library procedures, similar to defining an extern

    function in C. That is natural for Tcl to create an interpreter inside the Tcl source code since an

    interpreter is equally a set of commands. Unlike other languages, such as Java, where an inter-

    preter is a separate executable even though the execution of the interpreter costs memory and CPU

    time

    concurrently with interpreting the bytecode.

    The aspects that set Tcl apart form other extension languages, such as Scheme, Elisp and Python

    are: (1). Tcl has simple constructs somewhat like C and Tcl primitives are written in C or C++

    procedures. (2). Tcl C library provides a clean interface to native C code. (3). Most extensions

    include new functionality such as socket access for network programming, database access, tele-phone control and expected interactive features. (4) Tcl is open to be developed by its community.

    [21]

    The most notable extension of Tcl is Tk, a toolkit for X windows as well as for windows and Mac.

    Tk provides a convenient way for user to build Motif-based GUIs because of its higher-level inter-

    face to X and its rapid turnaround in development.

    Safe-Tcl is a subset of Tcl where access to system resource is controlled. With something secure,

    Safe-Tcl is for running network agents. With the combination of Tk and Safe-Tcl, a web browser

    called TkWWW is now available for free. [12]

    On-the-fly Bytecode compiler for Tcl [19]

    Although Tcl has bunch of advantages as a new scripting language. Its lack of structure and slow-

    ness make it not good for large applications. To improve Tcls performance, people in Sun Micro-

    systems Laboratories are working on an on-the-fly bytecode compiler for Tcl. Below are some

    direct quotation from the paper An On-the-fly Bytecode Compiler for Tcl by Brian T. lewis of

    that lab:

    So far Tcl is interpreted directly. Although the current Tcl interpreter is fast enough for most Tcl

    uses, there are many applications that need greater speed. The two main performance problems incurrent Tcl system (Tcl 7.5) are script reparsing and conversions between strings and other data

    representations. The current interpreter spends as much as 50% of its time in parsing. it reparses

    the body of a loop, for example, on each iteration. Data conversions also consume a great deal of

    time. It is reported that 92% of the time in incrs command procedure Tcl_incrCmd() was spent

    converting between strings and integers.

    To solve these performance problems, a new Tcl compiler and interpreter are being developed at

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    15/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 15

    Sun Microsystems Laboratories. Their goal for the bytecode compiler is to improve the speed for

    compute intensive Tcl scripts by a factor of 10.

    The compiler translates Tcl scripts at program runtime, or on-the-fly, into a sequence of bytecode

    instructions that are then interpreted. The compiler eliminates most runtime script parsing. It also

    makes many decisions at compiler time that are made now only at runtime. It can tell, for exam-

    ple, whether a variable name refers to a scalar or an array element. It also compiles away many

    type conversions. As an example, it can recognize whether the argument string specifying the

    increment amount in an incr command represents a constant integer.

    The bytecode interpreter uses dual-ported objects extensively. These objects contain both a

    string and an internal representation appropriate for some data type. For example, a Tcl list is now

    represented as an object that holds the lists string representation as well as an array of pointers to

    the objects for each list element, dual-port objects avoid most runtime type conversions. they also

    improve the speed of many operations since an appropriate representation is available. The com-

    piler itself uses dual-ported objects to cache the bytecode resulting from the compilation of each

    script.

    c) Both are Web Programming Languages

    As we mentioned in a) of this section, Java is an Internet-Oriented language. Tcl/Tk is also closely

    related to Web programming. Sun has recently released a Tcl/Tk plug-in for NetScape Navigator.

    It allows Web pages to contain Tcl/Tk scripts and display interfaces in the browser window. The

    plug-in used the Safe-Tcl mechanism to ensure that even untrusted script can be executed safely.

    So whats the difference between Java and Tcl/Tk?

    Tcl is a high-level scripting language. It is good for creating small and medium-sized applicationsquickly and gluing existing things together. It has a simple syntax and almost no structure, which

    makes it good for scripting. However, at least so far, Tcl is an directly interpreted language so it

    may not perform well for very large tasks. Think of Tcl as something like UNIX shell, except that

    it is embeddable and portable and can be used for Internet scripting, including CGI implementa-

    tion.

    Java, on the other hand, is a system programming language like C or C++. it is much more struc-

    tures than Tcl. This makes Java easier to build large complex application than Tcl. Java is also

    compiled, which results in great efficiency. Java also supports multi-threading, whereas Tcl does

    not. Think of Java as something like C++ except simpler and more powerful and with facilities for

    sending Java programs around the Internet as executable content. [20]

    Since both Java and Tcl are properties of Sun and both are web programming languages, people

    are thinking of a marriage of Java and Tk, using Tk as the GUI building part of Java. It is said that

    Sun has a early version of a Tcl-to-Java interface.

    d) Another Mobile Language: Omniware

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    16/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 16

    Mobile language is pretty popular recently. It denotes those languages that can be easily ported

    and widely run on many nodes of the network. Since any programming language can be a web

    programming language and does not have to be portable, it is better to call Java, Perl and Tcl

    mobile languages.

    Another notable mobile language in our reports point of view is Omniware.

    Omniware is an interpreted language with two-phase compiler-interpreter process. It defines a vir-

    tual machine called OmniVM.

    The advantages of Omniware are:

    1. OmniVM is a register-based virtual machine, and thus, it is closer to most real machine. So,

    the transportation from OmniVM to real machines is a shorter and lighter process than from

    Java Virtual Machine which is stack-based.

    2. The design of OmniVM has all languages with C/C++ constructs in mind. So it can be thecompiler targets of C/C++ and many others. In this case, Omniware serves somewhat a

    generic virtual machine.

    Omniware uses a technique called Software-based Fault Isolation which adds instructions to

    check at runtime that addresses are within legal address space to provide security, but as many

    other mobile languages, access to hosts system resource still remain a big problem in Omniware.

    [12]

    6. Why Interpreted Languages?

    Now the hottest languages such as Java, Tcl/Tk and Perl are all interpreted languages. Why? Animportance reason we think is they are all closely related with Internet. To be Internet-Oriented,

    the most importance feature of the language is portable. It has to operate in distributed environ-

    ment, which means that security is of paramount importance. Interpreted languages have advan-

    tage to support both these features.

    Portability

    A program is portable if the effort required for its transport is much less than the effort required

    for its initial implementation and if its initial qualities remains the same after the transport. The

    portability of a program can be evaluated by measuring the transport effort. For example, if I is the

    work involved in initial implementation, and T is the work involved in transport, then the pro-grams portability can be evaluated as: (I-T)/I. Hence any program can be mathematically deter-

    mined to be 100 percent portable, which means that there is no transport effort involved, but this is

    impossible. [2]

    A mechanism that support software portability thus is the mechanism that can reduce efforts in

    software transportation. Some significant this kind of mechanisms are:

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    17/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 17

    A compiler generates intermediate code that is independent of the target computer. If the com-

    piler is self-compilation, itself is also portable. This is typically a compiler-interpreter

    mechanism with typical example as Pascal-P, Snobol4 and Java.

    A compiler can also be divided into two parts, the front end depending on the source language

    and the back end on the object language which in turn depends on the target machine. The

    interface between these two parts, if well designed, can be independent of both languages. A

    on-going study of generic virtual machine focuses on this mechanism. The mobile language

    Omniware mentioned above is a nice try in this category.

    Isolating those platform dependent parts of software, then using configuration tools such as

    imake to enable code to be compiled and installed on different platforms.

    The first two mechanisms are typically realized by interpreted languages with virtual machine.

    The virtual machines of interpreted languages are the platforms for architecture neutral and porta-

    ble languages. In this case, Java and Omniware are the typical examples.

    Security

    Part of Javas security mechanism comes from its language design policy: simplicity. It excludes

    many dangerous features in C++, such as pointer, with which programmer could directly manipu-

    late memory by accidents. And at the same time, Java provide automatic garbage collection. But

    the more important security mechanisms comes from its compiler-interpreter nature mentioned

    above.

    The compiler-interpreter mechanism with bytecode provides several levels of security defense for

    Java. The first level is provided by the extensive compile-time checking. A trustworthy compiler

    ensures that Java source code does not violate the safety rules. The second level is provided bybytecode verifier. This happens in the run time. Java just does not trust any applet coming from

    anywhere of the internet, and the bytecode verifier has to ensure that the code passed to Java inter-

    preter is in a fit state to be executed and can run without breaking Java interpreter. The third level

    defense is done by the class loader. The class loader dynamically partition each network class

    source into its own private namespace and then prevents classes in one namespace from polluting

    other namespace. [13]

    While Javas security is mainly provided by its compiler-interpreted mechanism, Tcls security is

    provided by Safe-Tcl.

    Safe-Tcl is a mechanism that initializes a Tcl interpreter to a safe subset of Tcl commands so thatTcl scripts cannot harm their hosting machine or application. There are also mechanisms to grant

    privileges to a safe interpreter so the script can do non-trivial things.

    So the basic approach to ensuring safety is to first completely remove the file command from safe

    interpreters and then replaced with command aliases. The NetScape Tcl plug-in supports Tcl/Tk

    applets, also called Tclets. The Tcl plug-in implements the standard Safe-Tcl subset, plus a lim-

    ited version of Tk.

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    18/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 18

    Command aliases are the primary mechanism provided by Safe-Tcl to grant privileges. An alias is

    a command in the untrusted interpreter that is really implemented by a different, fully trusted

    interpreter. This is much like the user-mode and kernel-modes in multiuser operating systems. In

    Safe-Tcl, an untrusted script is isolated in its interpreter context, and given a few extra commands

    that are carefully implemented by another Tcl interpreters to ensure safety.

    Reusability

    Scripting languages as interpreted languages typically provide glue for commands. A shared, uni-

    versal scripting language like Tcl serves as a powerful and flexible glue for assembling reusable

    components.

    Tcl is a reusable command language because almost everything in this language is a command,

    from the Flow Control: for, if, case, continue, etc. to Variables and Procedures: global, proc,return, set. These built-in commands provide programmability and extensibility for free. Users

    of Tcl will feel free to develop any application-specific commands similar to those UNIX com-mands to UNIX shell. And these commands will appear the same as the built-in commands in Tcl.

    The most important design goal of Tcl is reusability. Thus it is component-approached. rather

    than building a new application as self-contained monolith with hundreds of thousands of lines of

    code, Tcl is a combination of many smaller reusable components. Each component would be

    small enough to be implemented by a small group, and interesting applications could be created

    by assembling existing components. [17]

    Rapid Development

    Reusability provides a way for rapid development of software application. The scripting or inter-preted nature of interpreted languages are obvious good for rapid development. Instead of the

    heavyweight compiler, link, crush, debug cycles, interpreted languages can be interpreted directly

    and are easier to trace whats happening in the interpreting processes.

    Performance

    Currently, Java runs about 30 times slower than an equivalent C program. This seems not very bad

    considering those advantages Java has. Actually, performance is always a consideration of Javas

    designer. They thought they have achieved a superior performance by adopting a scheme by

    which the interpreter can run at full speed without needing to check the runtime environment.

    Also, the automatic garbage collection runs as a low-priority background thread, ensuring a highprobability that memory is available when required, leading to better performance. Whats more,

    Sun have also been improving performance by providing just in time compilation of the byte-

    code into native code. Applications requiring large amounts of computer power can be designed

    such that compute-intensive sections can be rewritten in native machine code as required and

    interfaced with the Java platform.

    In general, Javas interactive applications respond quickly even though they are interpreted. But

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    19/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 19

    the efforts to improve performance will never get to an end. The current performance of Java still

    cannot meet the needs of a category of applications.

    Typically, an interpreted language has relatively low performance because of the overhead for

    fetching and decoding each virtual command or virtual instruction before performing the work

    specified by the commands. Most virtual machines at present are stack-based while most real

    machine are register-based. Interpreting the intermediate code to emulate corresponding virtual

    machine on a real machine thus is a heavier process compared with the situation where the virtual

    machine and the real machine have similar structure, either both stack-based or both register-

    based.

    In Java, interpreting consists of token threadings. Each token threading is for one bytecode execu-

    tion. A token threading requires about three instruction and five memory references. And each vir-

    tual instruction required several real machine instruction. For example, executing an integer add

    (IADD) of JVM on most general-purpose processor-Sparc, 80x86, 680X0, PowerPC, ARM and

    MIPS-requires at least seven conventional processor instructions when using a C source code

    interpreter.

    To improve performance, a just-in-time compilation technique has been applied to Java which

    translates Java bytecode into instructions for the host processor at runtime. This technique does

    improve Javas performance by several times. Since native code compilers (or code generators)

    are usually complex software which cost both memory and execution time. This JIT compilation

    uses a less aggressive optimization which just translate each byte-code to in-line machine code or

    keep the top of the stack in a register.

    The performance improvement by JIT compilation is limited and it compromises with memory

    cost. There are arguments that the most efficient execution vehicle for many Java applications

    would be a dedicated Java chip which directly executes the Bytecode. Sun is now building apicoJava chip which is a microcontroller intended to directly execute Java Bytecodes. It is a

    simple, stack-based processor. Rather than being a pure stack architecture, the machine would

    have specific hardware features for dealing with Bytecode and other hardware feathers to fit gar-

    bage collection, object-oriented, multithreading nature of Java.

    Now forget those rare-existed and newly-designed stack-based real machines, and lets talk about

    just stack-based virtual machines on register-based real machines.

    As mentioned above, the execution time of an interpreted program depends on the number of

    commands interpreted, the fetching and decoding cost of each command, and the time spent actu-

    ally executing the operation specified by the commands. Since the number of commands requiredto accomplish a given task depends on the level of the virtual machine of the language, the perfor-

    mance of a interpreted language mainly depends on the level of the virtual machine defined for

    that language. A simple virtual machine might require the execution of a large number of com-

    mands, like Java. But the overhead of each virtual command is small and nearly fixed. In contrast,

    Perl and Tcl each define complex virtual machines and result in non-uniform slowdowns relative

    to the C implementations even their virtual machine can execute a given program in fewer com-

    mands. [26]

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    20/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 20

    As we mentioned in section 5. b), An On-the fly Bytecode Compiler for Tcl is being implemented

    to improve Tcls performance. And caching those compiled bytecode will be very helpful in

    improving the performance of the interpreted languages such as Perl and Tcl. [24]

    Other Advantages of Interpreted Languages [27]

    Type of a variable could change dynamically during execution

    Compiling efficient code to handle a dynamic typing where type of a variable could change during

    execution time is hard as the type of a variable is not known at compile time. While an interpreter

    could handle this situation easily and efficiently.

    An interpreter can be very good for debugging

    The interpreter can access the source program in its original form or in an internal form at any

    time. It also keeps holding a symbol table containing variable names and values. So, programmerscan get diagnostic information in easily understandable forms.

    7. Summary

    Most of interpreted languages have Virtual Machine, either explicitly defined such as Javas vir-

    tual machine, or implicitly defined such as Tcls and Perls. Some simple scripting languages such

    as UNIX shell PLs are interpreted directly and do not have virtual machine.

    Virtual Machines play a great role in interpreted programming languages. With the assistance of

    Virtual Machine, the compiler-interpreter mechanism provides portability, security and better per-

    formance for interpreted PLs.

    Scripting languages as interpreted languages are good for gluing programming components.

    When the group of components are open for extension, such as Tcls commands, built-in plus

    application-specific commands, the language can provide great reusability.

    The development processes of interpreted languages are relatively lightweight compared with the

    compile-link-test cycles in a traditional compiled language. So, interpreted languages are good for

    rapid development.

    Acknowledgment

    We would like to appreciate our Professor Benjamin Zorn for guidance of the topics in this paper.

    We believe that without his help we would have been still in a maze.

    .

    References

    [1]. Wirth N. From Programming Language Design to Computer Construction, ACM, February

    1985, Vol 28, No. 2

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    21/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    By Xiaoli Zhang & Helen Wong Dec. 11, 1996 21

    [2]. Lecarm O., Cart M. P., Gart M. Software Portability, McGraw-Hill Publishing Company,

    1989

    [3]. Kamin S. N. Programming Languages: an Interpreter-Based Approach, Addison-Wesley Pub.

    Co., 1990

    [4]. Pembereton S., Daniels M. Pascal Implementation: The P4 Compiler and Interpreter, ISBN:

    0-13-653-0311

    [5]. Newsgroup: comp.lang.smalltalk

    [6]. Byrne S. B. GNU Smalltalk Users Guide, http://www.cs.utah.edu/csinfo/texinfo/mst/

    mst_toc.html

    [7]. Goldberg, Robson, Smalltalk-80: The Language and Its Implementation, Addison Wesley,

    1983, ISBN 0-201-11371-6

    [8]. Sun Microsystems. The Java Virtual Machine Specification. http://java.sun.com/doc/vmspec/

    html/vmspecl.html, 1995

    [9]. Gosling, J, Java Intermediate Bytecodes, ACM SIGPLAN Workshop on Intermediate Repre-

    sentation, Jan. 1995

    [10]. Sun JavaSoft: Getting Started: The Java Developers Kit

    [11]. Sun JavaSoft: Design Goals of Java 1.2

    [12]. Caron J. Java: Status Report and Language Overview, CSCI 5535 Project, Dec. 1995, Uni-

    versity of Colorado at Boulder

    [13]. Wang W, An Y, Zang L, Security --- How is it implemented in the Java language?, CSCI 5535

    Project, Dec. 1995, University of Colorado at Boulder

    [14]. Sun JavaSoft: A Look Inside the Java Platform

    [15]. Sun JavaSoft: The Java language Environment, a White paper

    [16]. Abelson, H. and Sussman, G.J. Structure and Interpretation of Computer Programs, MITPress, Cambridge, MA, 1985

    [17]. Ousterhout J. K. Tcl and Tk Toolkit, Addison-Wesley, ISBN 0-201-63337-X

    [18]. Ousterhout J. K. Tcl: An Embeddable Command language, USENIX Conference Proceed-

    ings, 1990

  • 8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

    22/22

    CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

    [19]. Lewis B. T. An On-the-fly Bytecode Compiler for Tcl. http://www.sunlabs.com/people/

    brian.lewis/

    [20]. Ousterhout. J. K. Whats Happening at Sun Labs. http://www.sunlabs.com/research/tcl/

    team.html, April 1996

    [21]. Welch B. Practical Programming in Tcl and Tk, Prentice-Hall, 1995, ISBN 0-13-182007-9

    [22]. newsgroup: comp.lang.tcl

    [23]. Ousterhout. J. K. An Introduction To Tcl Scripting, http://www.sunlabs.com/people/

    john.ousterhout/

    [24]. Schwartz R. L. Learning Perl, OReilly & Associates, Inc. 1993

    [25]. Perl Documentation, http://www.csc.tntech.edu/docs/perl.html

    [26]. Romer, T. H. Lee D. etc. The structure and Performance of Interpreters, ACM, Oct.1996

    [27]. Watson, D High-level Languages and Their Compilers, Addison-Wesley Publishing Com-

    pany, 1989

    [28]. Gosling on Java, DATAMATION, March 1, 1996