programming language concepts lexical and syntactic analysis · 2018-08-27 · most important steps...

Post on 19-Feb-2020

12 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Programming Language ConceptsLexical and Syntactic Analysis

Janyl Jumadinova24 January, 2017

Most Important Steps in Compilation

I Optional Preprocessing

I Lexical analysis (scanning)

I Syntax analysis (parsing)

I Semantic analysis

I Intermediate code generation

I Optimization (usually machine-independent)

I Final code generation

I Optional final optimization

2/30

Lexical Analysis

3/30

Lexical Analysis

For each token type, give a description:

- either a literal string (e.g., “≤” or “while” to describe an operatoror reserved word),

- or a < rule > (e.g., the rule < unsigned int > might stand for “asequence of one or more digits”; the rule < identifier > might standfor “a letter followed by a sequence of zero or more letters or digits.”

4/30

Lexical Analysis

For each token type, give a description:

- either a literal string (e.g., “≤” or “while” to describe an operatoror reserved word),- or a < rule > (e.g., the rule < unsigned int > might stand for “asequence of one or more digits”; the rule < identifier > might standfor “a letter followed by a sequence of zero or more letters or digits.”

4/30

Lexical Analysis

Lexical analysis produces a “token stream” in which the progam isreduced to a sequence of token types, each with its identifyingnumber and the actual string (in the program) corresponding to it.

5/30

6/30

Syntactic Analysis

I The syntax of a language is described by a grammar thatspecifies the legal combinations of tokens.

I Grammars are often specified in BNF notation (“Backus NaurForm”):

<item1> ::= valid replacements for <item1>

<item2> ::= valid replacements for <item2>

7/30

Syntactic Analysis

I The syntax of a language is described by a grammar thatspecifies the legal combinations of tokens.

I Grammars are often specified in BNF notation (“Backus NaurForm”):

<item1> ::= valid replacements for <item1>

<item2> ::= valid replacements for <item2>

7/30

Syntactic Analysis

I The syntax of a language is described by a grammar thatspecifies the legal combinations of tokens.

I Grammars are often specified in BNF notation (“Backus NaurForm”):

<item1> ::= valid replacements for <item1>

<item2> ::= valid replacements for <item2>

7/30

Syntactic Analysis

8/30

Grammars (Context-free Gramars)

I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.

I Collection of TERMINALS (“constants”, strings that can’t bereplaced)

I One special variable called the START SYMBOL.

I Collection of RULES, also called PRODUCTIONS.

variable → rule1 | rule2 | rule3 | ...

You can also write each rule on a separate line (as in the book)

9/30

Grammars (Context-free Gramars)

I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.

I Collection of TERMINALS (“constants”, strings that can’t bereplaced)

I One special variable called the START SYMBOL.

I Collection of RULES, also called PRODUCTIONS.

variable → rule1 | rule2 | rule3 | ...

You can also write each rule on a separate line (as in the book)

9/30

Grammars (Context-free Gramars)

I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.

I Collection of TERMINALS (“constants”, strings that can’t bereplaced)

I One special variable called the START SYMBOL.

I Collection of RULES, also called PRODUCTIONS.

variable → rule1 | rule2 | rule3 | ...

You can also write each rule on a separate line (as in the book)

9/30

Grammars (Context-free Gramars)

I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.

I Collection of TERMINALS (“constants”, strings that can’t bereplaced)

I One special variable called the START SYMBOL.

I Collection of RULES, also called PRODUCTIONS.

variable → rule1 | rule2 | rule3 | ...

You can also write each rule on a separate line (as in the book)

9/30

Grammars (Context-free Gramars)

I Collection of VARIABLES (things that can be replaced by otherthings), also called NON-TERMINALS.

I Collection of TERMINALS (“constants”, strings that can’t bereplaced)

I One special variable called the START SYMBOL.

I Collection of RULES, also called PRODUCTIONS.

variable → rule1 | rule2 | rule3 | ...

You can also write each rule on a separate line (as in the book)

9/30

Grammars (Context-free Gramars)

Grammar

A, B, and C are non-terminals.0, 1, and 2 are terminals.The start symbol is A.The rules are:

I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2

Can 2011020 can be parsed?

10/30

Grammars (Context-free Gramars)

Grammar

A, B, and C are non-terminals.0, 1, and 2 are terminals.The start symbol is A.The rules are:

I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2

Can 2011020 can be parsed?

10/30

Grammars (Context-free Gramars)

Grammar

A, B, and C are non- terminals.0, 1, and 2 are terminals.The start symbol is A, the rules are:

I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2

Can 1112202 can be parsed?Can 00102 can be parsed?Can 2120 can be parsed?

11/30

Grammars (Context-free Gramars)

Grammar

A, B, and C are non- terminals.0, 1, and 2 are terminals.The start symbol is A, the rules are:

I A→ 0A|1C |2B|0I B → 0B|1A|2C |1I C → 0C |1B|2A|2

Can 1112202 can be parsed?Can 00102 can be parsed?Can 2120 can be parsed?

11/30

12/30

Syntactic Analysis

I The process of verifying that a token stream represents a validapplication of the rules is called parsing.

I Using the BNF rules we can construct a parse tree:

13/30

Sample Parse Tree (portion)

14/30

Sample Parse Tree (failed)

15/30

Grammar for Java (version 8)

I Overview of notation used:https://docs.oracle.com/javase/specs/jls/se8/html/

jls-2.html

I The full syntax grammar:https://docs.oracle.com/javase/specs/jls/se8/html/

jls-19.html

16/30

Compiling

So far, we have looked at:

I the scanner (lexical analysis)–tokenizes input

I the parser (syntactic analysis)–validates structure

I Next, we do semantic analysis: is the program meaningful?

EXAMPLE:

In Java,int i, i, i;

has the right structure for a declaration, but it’s not legal toredeclare i within the same block of code.

17/30

Compiling

So far, we have looked at:

I the scanner (lexical analysis)–tokenizes input

I the parser (syntactic analysis)–validates structure

I Next, we do semantic analysis: is the program meaningful?

EXAMPLE:

In Java,int i, i, i;

has the right structure for a declaration, but it’s not legal toredeclare i within the same block of code.

17/30

Compiling

So far, we have looked at:

I the scanner (lexical analysis)–tokenizes input

I the parser (syntactic analysis)–validates structure

I Next, we do semantic analysis: is the program meaningful?

EXAMPLE:

In Java,int i, i, i;

has the right structure for a declaration, but it’s not legal toredeclare i within the same block of code.

17/30

Semantic Analysis

I During lexical analysis and parsing, as we process tokens wegather the user-defined names into a symbol table.

I Symbol table contains information such as:I where the symbol first appeared (usually in a declaration)I whether it has an initial value (parsing will tell us this)I what its type is (parsing tells us), etc.

18/30

Semantic Analysis

I During lexical analysis and parsing, as we process tokens wegather the user-defined names into a symbol table.

I Symbol table contains information such as:I where the symbol first appeared (usually in a declaration)I whether it has an initial value (parsing will tell us this)I what its type is (parsing tells us), etc.

18/30

As the parser encounters names, it looks them up to see if they arealready declared; if not, it creates a table entry. (Some names arepre-declared as part of the language.)

19/30

20/30

21/30

22/30

Intermediate Code Generation

I During this phase, the parsed program is converted into asimpler, step-by-step description in some intermediatelanguage.

I Intermediate language may exist only as an internalrepresentation within the compiler–it does not need be an“actual” language).

I A simple example is something called “three-address code.”

23/30

Intermediate Code Generation

I During this phase, the parsed program is converted into asimpler, step-by-step description in some intermediatelanguage.

I Intermediate language may exist only as an internalrepresentation within the compiler–it does not need be an“actual” language).

I A simple example is something called “three-address code.”

23/30

Intermediate Code Generation

24/30

Why Intermediate Code Generation?

25/30

Why Intermediate Code Generation?

25/30

Optimization

26/30

Optimization

27/30

Optimization

Here is a list of the many, many kinds of optimizations:http://www.compileroptimizations.com/

(The examples show the effects on source code, but theoptimizations are usually made on the intermediate code.)

28/30

Code Generation

I It is usually very straightforward to generate machineinstructions from intermediate code, since the intermediatecode is simple.

I Some further machine-specific optimizations may take placeduring or after this stage.

29/30

Pipelining

I The steps in compilation don’t need to be done in wholephases, one after the other, but can be “pipelined”:

I int count = 1;

create tokens, pass to parser, generate some intermediate code

I j = j + count;

create tokens, pass to parser, generate some intermediate code... etc. ...

30/30

top related