layered combinator parsers with a unique state

28
Layered Combinator Layered Combinator Parsers with a Unique Parsers with a Unique State State Pieter Koopman Rinus Plasmeijer Nijmegen, The Netherlands

Upload: taariq

Post on 05-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

Layered Combinator Parsers with a Unique State. Pieter Koopman Rinus Plasmeijer Nijmegen, The Netherlands. Overview. conventional parser combinators requirements new combinators system-architecture new parser combinators separate scanner and parser error handling. parser combinators. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Layered Combinator Parsers with a Unique State

Layered Combinator Layered Combinator Parsers with a Unique Parsers with a Unique

StateState

Pieter KoopmanRinus Plasmeijer

Nijmegen, The Netherlands

Page 2: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 2

Overview

conventional parser combinators requirements new combinators system-architecture new parser combinators separate scanner and parser error handling

Page 3: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 3

parser combinators

Non deterministic, list of results:: Parser s r :== [s] -> [ ParseResult s r ]:: ParseResult s r :== ([s],r)

fail & yieldfail = \ss = []yield r = \ss = [(ss,r)]

recognize symbolsatisfy :: (s->Bool) -> Parser s ssatisfy f = pwhere p [s:ss] | f s = [(ss,s)] p _ = []

symbol sym :== satisfy ((==) sym)

Page 4: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 4

parser combinators 2 sequence-combinators(<&>) infixr 6::(Parser s r)(r->Parser s t)->Parser s t(<&>) p1 p2 = \ss1 = [ tuple \\ (ss2,r1) <- p1 ss1 , tuple <- p2 r1 ss2 ] (<+>)infixl 6::(Parser s(r->t))(Parser s r)->Parser s t(<+>) p1 p2 = \ss1 = [ (ss3,f r) \\ (ss2,f) <- p1 ss1 , (ss3,r) <- p2 ss2 ]

choose-combinator(<||>) infixr 4::(Parser s r) (Parser s r)->Parser s r(<||>) p1 p2 = \ss = p1 ss ++ p2 ss

Page 5: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 5

parser combinators 3

some useful abbreviations(@>) infixr 7(@>) f p :== yield f <+> p

(<:>) infixl 6(<:>) p1 p2 :== (\h t=[h:t]) @> p1 <+> p2

Page 6: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 6

parser combinators 4

Kleene starstar p = p <:> star p

<||> yield []

plus p = p <:> star p

parsing an identifieridentifier :: Parser Char String

identifier = toString @> satisfy isAlpha

<:> star (satisfy isAlphanum)

Page 7: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 7

parser combinators 5

context sensitive parserstwice the same character

doubleChar = satisfy isAlpha <&> \c -> symbol c

arbitrary look aheadlookAhead

= symbol 'a' +> symbol 'b' <||> symbol 'a' +> symbol 'c'

Page 8: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 8

parser combinators 5

context sensitive parserstwice the same character

doubleChar = satisfy isAlpha <&> \c -> symbol c

arbitrary look aheadlookAhead

= symbol 'a' +> symbol 'b' <||> symbol 'a' +> symbol 'c' <||> star (satisfy isSpace) +> symbol 'a' <||> symbol 'x'

Page 9: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 9

properties of combinators

+ concise and clear parsers+ full power of fpl available+ context sensitive + arbitrary look-ahead+ can be efficient, continuations IFL '98

- no error handling (messages & recovery)

- no unique symbol tables- separate scanner yields problems

scan entire file before parser starts

Page 10: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 10

Requirements parse state with

error file notion of position user-defined extension e.g. symbol table

possibility to add separate scanner efficient implementation, continuations

for programming languages we want a single result (deterministic grammar)

Page 11: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 11

Uniqueness

files and windows that should be single-threaded are unique in Clean

fwritec :: Char *File -> *File

data-structures can be updated destructively when they are unique

only unique arrays can be changed

Page 12: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 12

System-architecture

replace the list of symbols by a structure containing actual input position error administration user defined part of the state

use a type constructor class to allow multiple levels

Page 13: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 13

Type constructor class

Reading a symbolclass PSread ps s st :: (*ps s *st)->(s, *ps s *st)

Copying the state is not allowed,use functions to manipulate the input

class PSsplit ps s st :: (s, *ps s *st)->(s, *ps s *st)

class PSback ps s st :: (s, *ps s *st)->(s, *ps s *st)

class PSclear ps s st :: (s, *ps s *st)->(s, *ps s *st)

Minimal parser state requires Clean 2.0class ParserState ps symbol state

| PSread, PSsplit, PSback, PSclear ps symbol state

Page 14: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 14

New parser combinators

Parsers have three arguments1. success-continuation

determines action upon successSuccCont :== Item failCont State -> (Result, State)

2. fail-continuation specifies what to do if parser failsFailCont :== State -> (Result, State)

3. current input stateState :== (Symbol, ParserState)

Page 15: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 15

New parser combinators 2 yield and fail, apply appropriate

continuationyield r = \succ fail tuple = succ r fail tuple

failComb = \succ fail tuple = fail tuple

sequence of parsers, change continuation<&> p1 p2 = \sc fc t -> p1 (\a _ -> p2 a sc fc) fc t

choice, change continuations(<|>) p1 p2

= \succ fail tuple = p1 (\r f t = succ r fail (PSclear t)) (\t2 = p2 succ fail (PSback t2)) (PSsplit tuple)

Page 16: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 16

string input a very simple instance of ParserState:: *StringInput symbol state = { si_string :: String // string holds input , si_pos :: Int // index of current char , si_hist :: [Int] // to remember old positions , si_state :: state // user-defined extension , si_error :: ErrorState }instance PSread StringInput Char statewhere PSread si=:{si_string,si_pos} = (si_string.[si_pos],{si & si_pos = si_pos+1})

instance PSsplit StringInput Char statewhere PSsplit (c,si=:{si_pos,si_hist}) = (c,{si & si_hist = [si_pos:si_hist]})

instance PSback StringInput Char statewhere PSback (_,si=:{si_string,si_hist=[h:t]}) = (si_string.[h-1],{si & si_pos = h, si_hist = t})

Page 17: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 17

Separate scanner and parser sometimes it is convenient to have a

separate scannere.g. to implement the offside rule

task of scanner and parser is similar.So, use the same combinators

due to the type constructor class we can nest parser states

Page 18: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 18

a simple scanner

use of combinators doesn’t change produces tokens (algebraic datatype)

scanner = skipSpace +> ( generateOffsideToken <|> satisfy isAlpha <:> star (satisfy isAlphanum) <@ testReserved o toString <|> plus (satisfy isDigit)<@ IntToken o to_number 0 <|> symbol '=' <@ K EqualToken <|> symbol '(' <@ K OpenToken <|> symbol ')' <@ K CloseToken )

Page 19: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 19

generating offside tokens use an ordinary parse functiongenerateOffsideToken

= pAcc getCol <&> \col -> // get current coloumn

pAcc getOffside <&> \os_col -> // get offside position

handleOS col os_col

where

handleOS col os_col

| EndGroupGenerated os_col

| col < os_col

= pApp popOffside (yield EndOfGroupToken)

= pApp ClearEndGroup failComb

| col <= os_col

= pApp SetEndGroup (yield EndOfDefToken)

= failComb

Page 20: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 20

Parser state for nesting parser state contains scanner and its state:: *NestedInput token state = E. .ps sym scanState: { ni_scanSt :: (ps sym scanState) , ni_scanner :: (ps sym scanState) -> *(token, ps sym scanState)) , ni_buffer :: [token] , ni_history :: [[token]] , ni_state :: state }

can be nested to any depth we can, but doesn’t have to, use this

Page 21: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 21

Parser state for nesting 2

NestedInput

*File

*ErrorState

*OffsideState

ScanState

scanner

*HashTable

Page 22: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 22

Parser state for nesting 3

apply scanner to read token

instance PSread NestedState token state

where

PSread ns=:{ns_scanner, ns_scanSt}

# (tok, state) = ns_scanner ns_scanSt

= (tok, {ns & ns_scanSt = state})

here, we ignored the buffer define instances for other functions in class ParserState

Page 23: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 23

error handling general error correction is difficult correct simple errors skip to new definition otherwise

Good error messages: location: position in file what are we parsing: stack of

contexts

Error [t.icl,20,[caseAlt,Expression]]: ) expected instead of =

Page 24: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 24

error handling 2 basic error generationparseError expected val = \succ fail (t,ps) = let msg = toString expected +++ " expected instead of " +++ toString t in succ val fail (PSerror msg (PSread ps))

useful primitiveswantSymbol sym = symbol sym <|> parseError sym sym

want p msg value = p <|> parseError msg value

skipToSymbol sym = symbol sym <|> parseError sym sym +> star (satisfy ((<>) sym)) +> symbol sym

Page 25: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 25

Parser

Parsing expressionspExpression = "Expression" ::> BV @> match mBasicValue <|> pIdentifier <|> symbol CaseToken +> pDeter Case @> pCompoundExpression <+ wantSymbol OfToken <+> star pCaseAlt <+ skipToSymbol EndOfGroupToken

<|> symbol OpenToken +> pCompoundExpression <+ wantSymbol CloseToken

Page 26: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 26

identifiers in hashtable

use a parse-function hashtable is user defined state in

ParserStatepIdentifier

= match mIdentToken

<&> \ident = pAccSt (putNameInHashTable ident)

<@ \name={app_symb=UnknownSymbol name, app_args=[]}

the function pAccSt applies a function to the user defined state

Page 27: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 27

limitations of this approach syntax specified by parse

functions grammar is not a datastructure no detection of left recursion

runtime error instead of nice message no automatic left-factoring

do it by hand, or runtime overhead

p1 = p <&> q1 <|> p <&> q2

p2 = p <&> (q1 <|> q2)

Page 28: Layered Combinator Parsers with a Unique State

Parser Combinators Pieter Koopman 28

discussiondiscussion old advantages

concise, fpl-power, arbitrary look ahead, context sensitve

new advantages unique and extendable parser state one or more layers decent error handling,

simple error correction can be added still efficient, overhead < 2 non-determinism only when needed