lexical and syntax analysis

24
LEXICAL AND SYNTAX ANALYSIS CSci210.BA4

Upload: annora

Post on 25-Feb-2016

89 views

Category:

Documents


0 download

DESCRIPTION

CSci210.BA4. Lexical and syntax analysis. Chapter 4 Topics. Introduction Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Lexical and syntax analysis

LEXICAL AND SYNTAX ANALYSIS

CSci210.BA4

Page 2: Lexical and syntax analysis

Chapter 4 Topics Introduction Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing

Page 3: Lexical and syntax analysis

Introduction Syntax analyzers almost always based

on a formal description of the syntax of the source language (grammars)

Almost all compilers separate analyzing syntax into:Lexical Analysis – low-level Syntax Analysis – high-level

Page 4: Lexical and syntax analysis

Reasons to Separate Syntax and Lexical Analysis

Simplicity – lexical analysis is less complex, so the process is simpler when separated

Efficiency – allows for selective optimization Portability – lexical analyzer is somewhat

platform dependent whereas the syntax analyzer is more platform independent

Page 5: Lexical and syntax analysis

Lexical Analysis A pattern matcher for character strings Performs syntax analysis at the lowest

level of the program structure Extracts lexemes from a given input

string and produce the corresponding tokens

Page 6: Lexical and syntax analysis

Lexical Analysis (continued)result = oldsum – value / 100;

Token LexemeIDENT resultASSIGN_OP = IDENT oldsumSUB_OP -IDENT valueDIV_OP /INT_LIT 100SEMICOLON ;

Page 7: Lexical and syntax analysis

Building a Lexical Analyzer Write a formal description of the tokens and

use a software tool that constructs lexical analyzers when given such a description

Design a state transition diagram that describes the tokens and write a program that implements the diagram

Design a state transition diagram that describes the tokens and hand-construct a table-driven implementation of the state diagram

Page 8: Lexical and syntax analysis

State (Transition) Diagram Design A directed graph with nodes labeled with

state names and arcs labeled with input characters

Including states and transitions for each and every token pattern would be too large and complex

Transitions can be combined to simplify the state diagram

Page 9: Lexical and syntax analysis

The Parsing Problem Two goals of syntax analysis:

Check the input program for any syntax errors, produce a diagnostic message if an error is found, and recover

Produce the parse tree, or at least a trace of the parse tree, for the program

Two Classes of parsers:Top-downBottom-up

Page 10: Lexical and syntax analysis

Top-Down Parsers Traces or builds a parse tree in preorder

(leftmost derivation) The most common top-down parsing

algorithms:Recursive descentLL parsers

Page 11: Lexical and syntax analysis

Bottom-Up Parsers Produce the parse tree by beginning at

the leaves and progressing towards the root

Most common bottom-up parsers are in the LR family

Page 12: Lexical and syntax analysis

Complexity of Parsing Parsing algorithms that work for any

unambiguous grammar are complex and inefficient: O(n3)

Compilers use parsers that only work for a subset of all unambiguous grammars, but do it in linear time: O(n)

Page 13: Lexical and syntax analysis

Recursive-Descent Parsing Top-Down Parser EBNF is ideal for the basis of a

recursive-descent parserEach terminal maps to a functionFor a non-terminal with more than one RHS,

look at the next token to determine which side to choose

No mapping = syntax error

Page 14: Lexical and syntax analysis

Recursive-Descent Parsing Grammar for an expression:

<expr> → <term> {+ <term>}<term> → <factor> {* <factor>}<factor> → id | int_constant | ( <expr> )

How do we parse?Expression: 1 + 2

<expr> → <term> + <term> → <factor> + <term>

→ 1 + <term>

Page 15: Lexical and syntax analysis

Recursive-Descent Parsing Grammar for an expression:

<expr> → <term> {+ <term>}<term> → <factor> {* <factor>}<factor> → id | int_constant | ( <expr> )

What does code look like?void expr() {

term();while (nextToken == ADD_OP) {lex();term();}

}

Page 16: Lexical and syntax analysis

Recursive-Descent Parsing The LL (Left Recursion) Problem

<expr> → <expr> + <term><expr> → <expr> + <term> + <term><expr> → <expr> + <term> + <term> + <term>

How do we fix it?Modify grammar to remove left recursionBefore: <expr> → <expr> + <term>After: <expr> → <term> + <term>

<term> → id | int_constant | <expr>

Page 17: Lexical and syntax analysis

Recursive-Descent Parsing The Pairwise Disjointness Problem

If the grammar is not pairwise disjoint, how do you know which RHS to pick based on the next token?

<variable> → identifier | identifier[<expr>]

How do we fix it? Left Factoring

<variable> → identifier<new><new> → ø | [<expr>]

Page 18: Lexical and syntax analysis

Bottom-Up Parsing Parsing is based on reduction

Reverse of a rightmost derivationAt each step, find the correct RHS that

reduces to the previous step in the derivation

Example Grammar<S> → <A>b Input: ab<A> → a Step 1: <A>b<A> → b Step 2: <S>

Page 19: Lexical and syntax analysis

Bottom-Up Parsing Most bottom-up parsers are shift-reduce

algorithmsShift – move token onto the stackReduce – replace RHS with LHS

Page 20: Lexical and syntax analysis

Bottom-Up Parsing Handles

Def: is the handle of the right sentential form iff

= w if and only if S =>*rm Aw =>rm wThe handle of a right sentential form is its

leftmost simple phraseBottom-Up Parsing is essentially looking for

handles and replacing them with their LHS

Page 21: Lexical and syntax analysis

Bottom-Up Parsing Advantages of Shift Reduction Parsers

They can be built for all programming languages

They can detect syntax errors as soon as it is possible in a left-to-right scan

They LR class of grammars is a proper superset of the class parsable by LL parsers (for example, many left recursive grammars are LR, but none are LL)

Page 22: Lexical and syntax analysis

Bottom-Up Parsing Shift Reduction Algorithms

Input Sequence – input to be parsedParse Stack – input is shifted onto the

parse stackACTION Table – what the parser doesGOTO Table – holds state symbols to be

pushed onto the stack when a reduction is completed

Page 23: Lexical and syntax analysis

Bottom-Up Parsing ACTION Table (or Parse Table)

Rows = State SymbolsColumns = Terminal symbols

ValuesShift – push token on stackReduce – replace handle with LHSAccept – stack only has start symbol and

input is emptyError – original input is invalid

Page 24: Lexical and syntax analysis

Bottom-Up Parsing GOTO Table (or Parse Table)

Rows = State SymbolsColumns = Nonterminal Symbols

Values indicate which state symbol should be pushed onto the parse stack after a reduction has been completed