Functional Programming guest lecture by Tim Sheard Parsing in Haskell Defining Parsing Combinators.

Download Functional Programming guest lecture by Tim Sheard Parsing in Haskell Defining Parsing Combinators.

Post on 20-Jan-2016

213 views

Category:

Documents

1 download

TRANSCRIPT

  • Functional Programmingguest lecture by Tim SheardParsing in HaskellDefining Parsing Combinators

  • Find these slides atwww.cs.pdx.edu/~sheard/course/guest/ParsingInHaskell.ppt

    Example can be found atwww.cs.pdx.edu/~sheard/course/guest/ParsingInHaskell.hs

  • ParsingParsing is imposing tree structure on linear text (usually in strings or files)Plan of this lectureIntroduce the Parsec libraryWrite some simple parsersTest themDefine a simple version of the parsers to see how they work. Parsec is a much more sophisticated library

  • Include the followingmodule ParsingInHaskell where

    import Text.ParserCombinators.Parsecimport Text.ParserCombinators.Parsec.Token import Text.ParserCombinators.Parsec.Language

  • ParsecType: data Parser a =

    Functionparse :: Parser b -> String -> [a] -> Either ParseError b

    run :: Show a => Parser a -> String -> IO () run p input = case (parse p "" input) of Left err -> do{ putStr "parse error at " ; print err } Right x -> print x

  • Operationschar :: Char -> CharParser a Char

    string :: String -> CharParser a String

    satisfy :: (Char -> Bool) -> CharParser a Char

    () :: Parser c -> Parser c -> Parser c

  • test1test1 = do { string "A" ; char ' ' ; string "big" ; char ' ' ; string "cat" }

  • test2test2 = do { a
  • test3word s = lexeme haskell (string s)

    test3 = do { a

  • A Simple Grammar for English Example taken from Floyd & Beigel.

    | I | we | you | he | she | it | they | a | an | the | | | me | us | you | him | her | it | them. . . . . .

  • As a parsec grammarsentence = do { subject; verb; predicate}pronoun1 = word "I" word "we" word "you" word "he" word "she" word "it" word "they"pronoun2 = word "me" word "us" word "you" word "him" word "her" word "it" word "them"subject = pronoun1 pronoun2article = word "a" word "the"predicate = do { article; (noun simpleNounPhrase) }adjective = word "red" word "pretty"noun = word "cat" word "ball"simpleNounPhrase = do { adjective; simpleNounPhrase} return ""object = pronoun2 nounPhrasenounPhrase = simpleNounPhrase do {article; noun}verb = word "ate" word "hit"

    test4 = run sentence "I hit the pretty red cat"

  • Some simple combinatorsmany :: Parser c -> Parser [c]

    sepBy :: Parser c -> Parser d -> Parser [c]

    option :: a -> Parser a -> Parser a

    chainl1 :: GenParser a -> GenParser (a->a->a) -> GenParser a

    (chainl1 p op x) parses one or more occurrences of p, separated by op Returns a value obtained by a left associative application of all functions returned by op to the values returned by p.

  • Making Parse Treesdata Variable = Var String deriving (Show,Eq)

    data Expression = Constant Integer -- 5 | Contents Variable -- x | Minus Expression Expression -- x - 6 | Greater Expression Expression -- 6 > z | Times Expression Expression -- x * y deriving (Show,Eq)

  • Variablesparens x = between (char '(') (char ')') x

    pVar = lexeme haskell (do { c

  • Simple TermssimpleExp :: Parser ExpressionsimpleExp = (do { n
  • Complex termsfactor = chainl1 simpleExp (lexeme haskell (char '*')>> return Times)

    summand = chainl1 factor (lexeme haskell (char '-')>> return Minus)

    relation = chainl1 summand (lexeme haskell (char '>') >> return Greater)

    test4 = run pExp "x - 2 > 5"

  • Defining our own Type of a Parserdata Parser a = Parser (String -> [(a,String)])

    A function inside a data definition.The output is a list of successful parses.This type can be made into a monadA monad is the sequencing operator in Haskell. Also be made into a Monad with zero and (++) or plus.

  • Defining the MonadTechnical details, can be ignored when using combinatorsinstance Monad Parser where return v = Parser (\inp -> [(v,inp)]) p >>= f = Parser (\inp -> concat [applyP (f v) out | (v,out) []) mplus (Parser p) (Parser q) = Parser(\inp -> p inp ++ q inp)

    instance Functor Parser where . . .

    where applyP undoes the constructorapplyP (Parser f) x = f xNote the comprehension syntax

  • Typical ParserBecause the parser is a monad we can use the Do syntax .

    do { x1

  • Running the ParserRunning Parsers

    papply :: Parser a -> String -> [(a,String)]papply p = applyP (do {junk; p})

    junk skips over white space and comments. We'll see how to define it later

  • Simple PrimitivesapplyP :: Parser a -> String -> [(a,String)]applyP (Parser p) = p

    item :: Parser Charitem = Parser (\inp -> case inp of "" -> [] (x:xs) -> [(x,xs)])

    sat :: (Char -> Bool) -> Parser Charsat p = do {x

  • Examples? papply item "abc"[('a',"bc")]

    ? papply (sat isDigit) "123"[('1',"23")]

    ? parse (sat isDigit) "abc"[]

  • Useful Parsers char :: Char -> Parser Charchar x = sat (x ==)

    digit :: Parser Intdigit = do { x

  • Exampleschar x = sat (x ==)

    ? papply (char 'z') "abc"[]

    ? papply (char 'a') "abc"[('a',"bc")]

    ? papply digit "123"[(1,"23")]

    ? papply upper "ABC"[('A',"BC")]

    ? papply lower "ABC"[]

  • More Useful Parsersletter :: Parser Charletter = sat isAlpha

    Can even use recursionstring :: String -> Parser Stringstring "" = return ""string (x:xs) = do {char x; string xs; return (x:xs) }

    Helps define even more useful parsersidentifier :: Parser Stringidentifier = do {x

  • Examples? papply (string "tim") "tim is red"[("tim"," is red")]

    ? papply identifier "tim is blue"[("tim"," is blue")]

    ? papply identifier "x5W3 = 12"[("x5W3"," = 12")]

  • Choice -- 1 parser or anotherNote that the ++ operator (from MonadPlus) gives non-deterministic choice.

    instance MonadPlus Parser where (Parser p) ++ (Parser q) = Parser(\inp -> p inp ++ q inp)

    Sometimes wed like to prefer one choice over another, and take the second only if the first fails

    We dont we need an explicit sequencing operator because the monad sequencing plays that role.

  • Efficiencyforce :: Parser a -> Parser aforce p = Parser (\ inp -> let x = applyP p inp in (fst (head x), snd (head x)) : (tail x) )

    Deterministic Choice(+++) :: Parser a -> Parser a -> Parser ap +++ q = Parser(\inp -> case applyP (p `mplus` q) inp of [] -> [] (x:xs) -> [x])

  • Example

    ? papply (string "x" +++ string "b") "abc" []

    ? papply (string "x" +++ string "b") "bcd"[("b","cd")]

  • Sequences (more recursion)many :: Parser a -> Parser [a]many p = force (many1 p +++ return [])

    many1 :: Parser a -> Parser [a]many1 p = do {x Parser [a]p `sepby` sep = (p `sepby1` sep) +++ return []

    sepby1 :: Parser a -> Parser b -> Parser [a]p `sepby1` sep = do { x

  • Example? papply (many (char 'z')) "zzz234"[("zzz","234")]

    ? papply (sepby (char 'z') spaceP) "z z z 34"[("zzz"," 34")]

  • Sequences separated by operators

    chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser achainl p op v = (p `chainl1` op) +++ return v

    chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser ap `chainl1` op = do {x

  • Tokens and Lexical IssuesspaceP :: Parser ()spaceP = do {many1 (sat isSpace); return ()}

    comment :: Parser ()comment = do{string "--"; many (sat p); return ()} where p x = x /= '\n'

    junk :: Parser ()junk = do {many (spaceP +++ comment); return ()}

    A Token is any parser followed by optional white space or a comment

    token :: Parser a -> Parser atoken p = do {v

  • Using Tokenssymb :: String -> Parser Stringsymb xs = token (string xs)

    ident :: [String] -> Parser Stringident ks = do { x 10*m + n)

  • Example? papply (token (char 'z')) "z 123"[('z',"123")]

    ? papply (symb "tim") "tim is cold"[("tim","is cold")]

    ? papply natural "123 abc"[(123," abc")]

    ? papply (many identifier) "x d3 23"[(["x"]," d3 23")]

    ? papply (many (token identifier)) "x d3 23"[(["x", "d3"],"23")]

  • More Parsersint :: Parser Intint = token integer

    integer :: Parser Intinteger = (do {char '- ; n

  • Example: Parsing Expressions data Term = Add Term Term | Sub Term Term | Mult Term Term | Div Term Term | Const Int

    addop:: Parser(Term -> Term -> Term)addop = do {symb "+"; return Add} +++ do {symb "-"; return Sub}

    mulop:: Parser(Term -> Term -> Term)mulop = do {symb "*"; return Mult} +++ do {symb "/"; return Div}

  • Constructing a Parse treeexpr :: Parser Termaddop :: Parser (Term -> Term -> Term)mulop :: Parser (Term -> Term -> Term) expr = term `chainl1` addopterm = factor `chainl1` mulopfactor = (do { n
  • Array Based Parserstype Subword = (Int,Int)

    newtype P a = P (Array Int Char -> Subword -> [a])unP (P z) = z

    emptyP :: P ()emptyP = P f where f z (i,j) = [() | i == j]

    notchar :: Char -> P Charnotchar s = P f where f z (i,j) = [z!j | i+1 == j, z!j /= s]

    charP :: Char -> P CharcharP c = P f where f z (i,j) = [c | i+1 == j, z!j == c]

  • anychar :: P Charanychar = P f where f z (i,j) = [z!j | i+1 == j]

    anystring :: P(Int,Int)anystring = P f where f z (i,j) = [(i,j) | i P (Int,Int)symbol s = P f where f z (i,j) = if j-i == length s then [(i,j)| and [z!(i+k) == s!!(k-1) | k

  • Combinatorsinfixr 6 |||(|||) :: P b -> P b -> P b(|||) (P r) (P q) = P f where f z (i,j) = r z (i,j) ++ q z (i,j)

    infix 8

  • run :: String -> P b -> [b]run s (P ax) = ax (s2a s) (0,length s)

    s2a s = (array bounds (zip [1..] s)) where bounds = (1,length s)

    instance Monad P where return x = P(\ z (i,j) -> if i==j then [x] else []) (>>=) (P f) g = P h where h z (i,j) = concat[ unP (g a) z (k,j) | k

  • Examplesp1 = do { symbol "tim"; c ex4"5"

    Main> ex5""

  • Exercise in classWrite a parser for regular expressions

Recommended

View more >