solving an old problem: how do we get a stack trace in a lazy functional language? simon marlow

Solving an old problem:How do we get a stack trace in a lazy functional

language?

Simon Marlow

Motivation

General problem

• Maintaining a link between the original source code and the final executable code– In such a way that we can still optimise– Important for profiling: not much point in profiling

an unoptimised program– Important (but a bit less so) for debugging

Stack traces

• A stack trace is a great point in the design space– Abstraction of execution that is• cheap to maintain (sometimes even free)• gives great insight for debugging: “how did I get here?”• is ideal for profiling too:

– divides up the profile as a tree– allows aggregation both bottom-up, and across the tree: “find

the total cost of all calls to map and their children”

But.. lazy evaluation?

• In Haskell we’re envious of strict languages for their easy access to stack traces– and nothing else – In Python, Erlang, etc. every exception prints out the stack trace by

default• zero effort by the programmer, “always on”• Even if you’re just a visitor to a web site written in that language

• Claim:– the stack traces you get for free in a strict functional language are

not useful enough• interference from tail calls• functional abstractions lead to strange results

– So in practice you want some explicitly-managed stack trace anyway

More motivation

• In GHC we had (at least) three users for this functionality:– Profiling– Debugging (in GHCi)– Coverage analysis– (maybe) Flat profiling: “where is the program counter?”

• We had two forms of Core annotation– one for profiling (SCC)– one used by Coverage and the debugger (Tick), but it was

wrong in the case of the debugger• Goal: use a unified infrastructure

Constraints

• Manageable overhead– e.g. 2x for profiling– less for flat profiling, more for coverage– results consistent with optimised compilation – i.e. the

profile has the “same shape”• Sensible, predictable semantics– you get the stack trace you expect– simple refactorings don’t have surprising effects– all the usual Core-to-Core transformations apply and don’t

mess up the results• Ideally: always-on, low usage barrier

A source-language annotation

• “push label L on the stack while evaluating E”• (precise semantics later)• Main point: this is a construct of the source language and the

intermediate language (Core)• Compiler can add these automatically, or the user can add them• We get to choose how detailed we want to be:

– exported functions only– top-level functions only– all functions (good for profiling)– call sites (good for debugging)– all sub-expressions (fine-grained debugging or profiling)

• Similar to SCC in GHC now

push L E

Another annotation

• “Count evaluations of E; assign the result to label L”– for coverage, we will use tick exclusively.– for debugging and space profiling, just use push.– for time profiling, we often want to count entries

too, so we can use both push and tick, with "tick L E" meaning 'bump the count of L pushed on the current stack".

tick L E

Why separate tick and push?

• They have different semantics, and therefore support different transformations.

Semantics of tick

• A valid transformation is one that maintains the counts.

𝐶 [𝑡𝑖𝑐𝑘𝐿𝐸 ]𝐿++¿→𝐶 [𝐸 ]¿

Tick is friendly to the optimiser

• If we’re definitely going to tick L later, we can do it now.

• Allows the optimiser to bring together subexpressions for– beta-reduction– case-of-known-constructor

(tick L f) x -> tick L (f x)

case tick L E of { .. } -> tick L (case E of { .. })

But tick is difficult to get rid of

• Cannot reduce tick L x• Cannot do anything with \x. tick L \y. E• Unlike push, tick “sticks” and cannot be

completely optimised away, even if it surrounds code with zero cost.– Another reason to separate tick from push: don’t

use tick if you don’t have to– We added an option to GHC’s profiler to switch off

tick

Changing the number of evals

• Does GHC ever change the number of times an expression is evaluated, compared to the standard lazy semantics?– 0 -> 1– 1 -> many– many -> 1– 1 -> 0 (oh no!)

0-> 1 and 1 -> many

• Cost must be bounded & small• eta expansion is often beneficial

• cost of doing “a +# b” is far less than building the closure for “\y. case ..” and applying it.

• But if there’s a tick in there, we cannot do this optimisation because it would change the counts

• -fno-state-hack!

λ x. case a +# b of c -> λ y. ... -> λ x . λ y . case a +# b of c -> ...

many -> 1

• Does GHC ever reduce the number of evals?– yes of course: full laziness

– We do not automatically disable full laziness when using ticks:• FL can make a big difference to runtime• the user gets to see how many times the expression was really evaluated.

– Tradeoff between predictability and being faithful to the real operational behaviour• Right now: -fno-state-hack –fno-full-laziness gives predictable results• Results are always correct for zero/non-zero (coverage works)

λ x . f x (tick L (fib y)) -> let z = tick L (fib y) in λ x . f x z

tick was straightforward, now for push

• “push label L on the stack while evaluating E”• So let’s write a semantics for that• Define stacks:

push L E

type Stack = [Label]push :: Label -> Stack -> Stackcall :: Stack -> Stack -> Stack

stack at the call site stack of the function

stack for the call

Executable semantics

eval :: Stack -> Expr -> E (Stack,Expr)

eval stk (EInt i) = return (stk, EInt i)eval stk (ELam x e) = return (stk, ELam x e)

eval stk (EPush l e) = eval (push l stk) e

eval stk (ELet (x,e1) e2) = do insertHeap x (stk,e1) eval stk e2

eval stk (EPlus e1 e2) = do (_,EInt x) <- eval stk e1 (_,EInt y) <- eval stk e2 tick stk return (stk, EInt (x+y))

eval stk (EApp f x) = do (lam_stk, ELam y e) <- eval stk f eval lam_stk (subst y x e)

current stack

E is a State monad containing the Heap: a mapping from Var

to (Stack,Expr)

Values are straightforward

push L on the stack, evaluate

the bodysuspend the computation e1 on the

heap, capture the current stack

A side effect, used to record the value of the

current stack

Application continues with the stack returned

by evaluating the lambda

Executable semantics (variables)

eval stk (EVar x) = do r <- lookupHeap x case r of (stk', EInt i) -> return (stk', EInt i) (stk', ELam y e) -> return (call stk stk’, ELam y e) (stk',e) -> do deleteHeap x (stkv, v) <- eval stk' e insertHeap x (stkv,v) eval stk (EVar x) Here’s where we are

“calling” a function

Given this semantics, define push & call

• The problem now is to find suitable definitions of push and call that– Behave like a call stack– Have nice properties: • transformation-friendly• predictable/robust• implementable

Aside: cost-centres

• This is a slight generalisation of Sansom’s cost-centre profiling semantics, where– stacks have one element– push L S = [L]– call Sapp Slam = Sapp “lexical scoping”

– call Sapp Slam = Slam “evaluation scoping”– (Sansom used a hybrid scheme, choosing between

the two alternatives in different situations)

Aside(2): lazy evaluation is not the problem

• Lazy evaluation is dealt with by– capturing the current stack

when we suspend a computation as a thunk in the heap

– temporarily restoring the stack when the thunk is evaluated

• Nothing controversial at all – we just need a mechanism for capturing and restoring the stack.

eval stk (ELet (x,e1) e2) = do insertHeap x (stk,e1) eval stk e2

eval stk (EVar x) = do r <- lookupHeap x case r of ... (stk',e) -> do deleteHeap x (stkv, v) <- eval stk' e insertHeap x (stkv,v) eval stk (EVar x)

Aside(3): tail calls

• The semantics says nothing about tail calls – there is no call stack, so no way to express the difference.

• Any implementation should respect the semantics.

Examples

• The heap is initialised with the top-level bindings (give each the stack <CAF>)

• When we get to (f y), current stack is <main>• f is already evaluated• call <main> <CAF> = <main>• eval <main> (push f y+y)• eval <main,f> (y+y)• at the +, the current stack is <main,f>

f = λ x. push “f” x+x

main = λ x. push “main” let y = 1 in f y

Let’s assume, for now,call Sapp Slam = Sapp

Use the call-site stack?

• Previous example suggests this might be a good choice?

• After all, this gives exactly the call stack you would get in a strict language

call sapp slam = sapp

But we have to be careful

• If instead of this:

• We wrote this:

• Now it doesn’t work so well: the “f” label is lost.• In this semantics, the scope of push does not

extend into lambdas

f = λ x. push “f” x+x


f = push “f” (λ x . x+x)


Just label all the lambdas?

• Idea: make the compiler label all the lambdas automatically

• e.g. the compiler inserts a push inside any lambda:

• Now we get a useful stack again: <main,f1>

f = push “f” (λ x . push “f1” x+x)


Properties

• This semantics has some nice properties.

push L x => x

push L (λ x . e) => λ x . epush L (C x1 .. xn) => C x1 .. xn

let x = λ y . e in push L e' => push L (let x = λ y . e in e') push L (let x = e in e') => let x = push L e in push L e

O(1) change to cost attribution, no change to profile shape

Note if e is a value, the push L will disappear

since the stack attached to a lambda is irrelevant (except for heap profiling)

Inlining

• We expect to be able to substitute a function’s definition for its name without affecting the stack. e.g.

• should be the same as

• and indeed it is in this semantics.– (inlining functions is crucial for optimisation in GHC)

f = λ x . push “f1” x+x


main = λ x. push “main” let y = 1 in (λ x . push “f1” x+x) y

More properties

• Adding an extra binding doesn’t change the stack

• recall ‘push L x == x’• arguably useful: the stack is robust with

respect to this transformation (by the compiler or user)


g = push “g” f

main = λ x. push “main” let y = 1 in g y

But...

• eta-expansion loses this property

• Now the stack at the + will be <main,g,f>


g = λ x . push “g” f x

main = λ x. push “main” let y = 1 in g y

Concrete example

• When we tried this for real, we found that in functions like

• h does not appear on the stack, although in

• now it does. This is surprising and undesirable.

h = f . g

h x = (f . g) x

Worse...

• Let’s make a state monad:

newtype M s a = M { unM :: s -> (s,a) }

instance Monad (M s) where (M m) >>= k = M $ λ s -> case m s of (s',a) -> unM (k a) s' return a = M $ λ s -> (s,a)

errorM :: String -> M s aerrorM s = M $ λ _ -> error s

runM :: M s a -> s -> arunM (M m) s = case m s of (_,a) -> a

Suppose we want the stack when error is

called, for debugging

Using a monad

• Simple example:

• We are looking for a stack like <main,runM,bar,mapM,foo,errorM>

• Stack we get: <runM>

main = print (runM (bar ["a","b"]) "state")

bar :: [String] -> M s [String]bar xs = mapM foo xs

foo :: String -> M s Stringfoo x = errorM x

Why?

• Take a typical monadic function:

• Desuraging gives

• Adding push:

• Expanding out (>>):

• recall that push L (λ x . e) = λ x. e

f = do p; q

f = p >> q

f = push “f” (p >> q)

f = push “f” (λ s -> case p s of (a,s’) -> b s’)

The IO monad

• In GHC the IO monad is defined like the state monad given earlier.

• We found that with this stack semantics, we get no useful stacks for IO monad code at all.

• When profiling, all the costs were attributed to main.

call Sapp Slam = Sapp?

• Summary:– attractive: easy to explain, easy to implement– but results are sometimes surprising, at worst

useless– Note: nothing to do with tail-calls or laziness– don’t strict languages have the same problem?

Seems to be a consequence of functional abstractions.

Can we find a better semantics?

• call Sapp Slam = ?

• non-starter: call Sapp Slam = Slam

– ignores the calling context– gives a purely lexical stack, not a call stack– (possibly useful for flat profiling though)

• Clearly we want to take into account both Sapp and Slam somehow.

Think about what properties we want

• Push inside lambda:

– (recall that the previous semantics allowed dropping the push here)

– This will give us a push that scopes over the inside of lambdas, not just outside.• which will in turn give us that stacks are robust to eta-

expansion/contraction

push L (λ x. e) == λ x. push L e

What does it take to make this true?

• Consider

• If we work through the details, we find that we need

• Not difficult: e.g.

let f = push “f” λ x . e in ... f ...

call S (push L Sf) == push L (call S Sf)

let f = λ x . push “f” e in ... f ...

type Stack = [Label]push = (:)call = foldr push

like flip (++), but useful to define it

this way

Recursion?

• We do want finite stacks– the mutator is using tail recursion

• Simplest approach: push is a no-op if the label is already on the stack somewhere:

• still satisfies the push-inside-lambda property• but: not so good for profiling or debugging– the label on top of the stack is not necessarily

where the program counter is

push l s | l `elem` s = s | otherwise = l : s

Inlining of functions

• (remember, allowing inlining is crucial)• Consider

• Work through the details, and we need that

• interesting: calling a function whose stack is a prefix of the current stack should not change the stack.

let g = λ x.e inlet f = push “f” g inf y

let f = push “f” λ x.e inf y

call (push L S) S == push L S

Inlining

• if push == (:), this doesn’t hold• does hold if push ignores duplicate labels• The definitions I want to use:

• more useful for profiling/debugging: the top-of-stack label is always correct, we just truncate the stack on recursion.

call (push L S) S == push L S

call Sapp Slam = foldr push Spre Slam’ where (Spre, Sapp’, Slam’) = commonPrefix Sapp Slam

push l s | l `elem` s = dropWhile (/= l) s | otherwise = l : s

Break out the proof tools

• QuickCheck.

• but this corresponds to something very strange:

prop_append2 = forAllShrink stacks shrinkstack $ \s -> forAll Main.labels $ \x -> call (s `push` x) s == s `push` x

*** Failed! Falsifiable (after 8 tests and 2 shrinks): (E :> "e") :> "b""e"

push “f”...let g = λ x.e inlet f = push “f” g inf y

A more restricted property

• This is a limited form of the real property we need for inlining

• The push-inside-lambda property behaves similarly: we need to restrict the use of duplicate labels to make it go through.

prop_stack2a = forAllShrink stacks shrinkstack $ \s -> forAll Main.labels $ \x -> x `elemstack` s || call (s `push` x) s == s `push` x

*Main> quickCheck prop_stack2a+++ OK, passed 100 tests.

Open questions

• Can we formulate the properties in such a way that admits a useful definition of call/push?

• Are there other, more detailed, definitions of call/push that we can use?– e.g. Allwood’s “finding the needle”, where

recursion results in part of the stack being elided as “...”

Status

• GHC 7.4.1 has a new implementation of profiling and coverage using push & tick– Optimiser assumes that inlining and push-inside-lambda are safe, along

with a collection of other transformations– compile-time switchable between two semantics for call/push:

1. drop duplicate labels2. (default) call using “common-prefix”, push truncates on recursion

– a bit naughty: we know that (1) is correct but not very useful, (2) is more useful and we think correct in the cases we care about

– Results look good• we’ve already found and fixed performance bugs in GHC using the new profiler

– Have a bunch of test cases to verify that we get the same stacks with and without optimisation

– Profiling now works on parallel programs too

Programmatic access to stack trace

• The GHC.Stack module provides runtime access to the stack trace

• On top of which is built this:

• e.g. now when GHC panics it emits a stack trace (if it was compiled with profiling)

-- | like 'trace', but additionally prints a call stack if one is-- available.traceStack :: String -> a -> a

solving an old problem: how do we get a stack trace in a lazy functional language? simon marlow

Documents

profiling debugging

motivation slide

profiling scc

flat profiling

space profiling

stack traces

time profiling

current stack