functional programming concepts for data processing · fp is naturally suited for expressing...

70
Functional Programming Concepts for Data Processing Chris Lindholm UCAR SEA 2018 1 / 70

Upload: others

Post on 08-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Functional Programming Concepts for Data Processing

Chris Lindholm UCAR SEA 2018

1 / 70

Page 2: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

About MeI work at CU LASP

I work primarily with Scala

I teach Haskell at work

Goal: leverage FP to improve scientific software

2 / 70

Page 3: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

The PlanConvince you Functional Programming is worth learning about

Introduce Functional Programming

Refactor some code to be more Functional

3 / 70

Page 4: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Why Functional Programming?

4 / 70

Page 5: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Why Functional Programming?FP helps us write good software

5 / 70

Page 6: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Side E�ects

6 / 70

Page 7: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Side E�ectsSide effects are difficult to reason about

7 / 70

Page 8: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Side E�ectsSide effects are difficult to reason about

Side effects are difficult to test

8 / 70

Page 9: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Side E�ectsSide effects are difficult to reason about

Side effects are difficult to test

Side effects are difficult to scale

9 / 70

Page 10: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Side E�ectsSide effects are difficult to reason about

Side effects are difficult to test

Side effects are difficult to scale

Side effects are unnecessary

10 / 70

Page 11: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Why Functional Programming?FP helps us write good software

FP is naturally suited for expressing data-oriented workflows

11 / 70

Page 12: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Data Processing is Function-yFunctions are math's way of processing data.

12 / 70

Page 13: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Data Processing is Function-yFunctions are math's way of processing data.

We can build pipelines with function composition and application.

13 / 70

Page 14: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Data Processing is Function-yFunctions are math's way of processing data.

We can build pipelines with function composition and application.

a :: Ad :: D -- D -- \f :: A -> B -- Eg :: B -> C -- /h :: C -> D -> E -- A -> B -> C

14 / 70

Page 15: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Data Processing is Function-yFunctions are math's way of processing data.

We can build pipelines with function composition and application.

a :: Ad :: D -- D -- \f :: A -> B -- Eg :: B -> C -- /h :: C -> D -> E -- A -> B -> C

(g . f) :: A -> C -- (.) :: (b -> c) -> (a -> b) -> (a -> c)

(h . g . f) :: A -> D -> E

(h . g . f) a d :: E

15 / 70

Page 16: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Why Functional Programming?FP helps us write good software

FP is naturally suited for expressing data-oriented workflows

Learning FP makes you a better programmer

16 / 70

Page 17: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Introduction to Functional Programming

17 / 70

Page 18: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Imperative ProgrammingExpress computations using statements that perform side effects.

18 / 70

Page 19: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Imperative ProgrammingExpress computations using statements that perform side effects.

// Declare a variable to mutate later.int var;

// Change control flow depending on p.if (p) { var = 0; // If p is true, execute a statement that mutates var.} else { var = 1; // If p is false, execute a statement that mutates var.}

19 / 70

Page 20: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Imperative ProgrammingExpress computations using statements that perform side effects.

// Declare a variable to mutate later.int var;

// Change control flow depending on p.if (p) { var = 0; // If p is true, execute a statement that mutates var.} else { var = 1; // If p is false, execute a statement that mutates var.}

Functional ProgrammingExpress computations using expressions that evaluate to values.

20 / 70

Page 21: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Imperative ProgrammingExpress computations using statements that perform side effects.

// Declare a variable to mutate later.int var;

// Change control flow depending on p.if (p) { var = 0; // If p is true, execute a statement that mutates var.} else { var = 1; // If p is false, execute a statement that mutates var.}

Functional ProgrammingExpress computations using expressions that evaluate to values.

-- Declare var to be 0 if p is true, 1 otherwise.var = if p then 0 else 1

21 / 70

Page 22: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Lambda CalculusThe theory of computation underlying functional programming.

22 / 70

Page 23: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Lambda CalculusThe theory of computation underlying functional programming.

Lambda calculus has only

VariablesFunctionsExpressions

but can express anything that is computable.

23 / 70

Page 24: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

(λpq. pqp)

AND

(λxy. x)

TRUE

(λab. b)

FALSE

TRUE ∧ FALSE = FALSE

(λpq. pqp)(λxy. x)(λab. b) →β λab. b

24 / 70

Page 25: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

(λpq. pqp)(λxy. x)(λab. b)

→β (λ[p := (λxy. x)]q. pqp)(λab. b)

→β (λq. (λxy. x)q(λxy. x))(λab. b)

→β λ[q := (λab. b)]. (λxy. x)q(λxy. x)

→β (λxy. x)(λab. b)(λxy. x)

→β (λ[x := (λab. b)]y. x)(λxy. x)

→β (λy. (λab. b))(λxy′. x)

→β (λ[y := (λxy′. x)]. (λab. b))

→β λab. b

25 / 70

Page 26: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Pure FunctionsPure functions take inputs, produce outputs using only those inputs, and donothing else.

26 / 70

Page 27: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Pure FunctionsPure functions take inputs, produce outputs using only those inputs, and donothing else.

def pure(x): return x + 1

27 / 70

Page 28: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Pure FunctionsPure functions take inputs, produce outputs using only those inputs, and donothing else.

def pure(x): return x + 1

def maybe_pure(x): return f(x) + 1

28 / 70

Page 29: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

First-class FunctionsFirst-class functions are values rather than syntactic constructs.

29 / 70

Page 30: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

First-class FunctionsFirst-class functions are values rather than syntactic constructs.

f = lambda x: x + 1

30 / 70

Page 31: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

First-class FunctionsFirst-class functions are values rather than syntactic constructs.

f = lambda x: x + 1

(+) :: Int -> Int -> Int

addOne :: Int -> IntaddOne = (+) 1

31 / 70

Page 32: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Higher-order FunctionsHigher-order functions operate on other functions.

32 / 70

Page 33: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Higher-order FunctionsHigher-order functions operate on other functions.

(.) :: (b -> c) -> (a -> b) -> (a -> c)

33 / 70

Page 34: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Referential TransparencyA referentially transparent expression is one that can be replaced with theresult of its evaluation and not change the meaning of the program.

34 / 70

Page 35: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Referential TransparencyA referentially transparent expression is one that can be replaced with theresult of its evaluation and not change the meaning of the program.

Expressions that are not referentially transparent have side effects.

35 / 70

Page 36: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Are these the same?def program1(): # expr is the expression we're testing expr = 1 return (expr, expr)

def program2(): return (1, 1)

36 / 70

Page 37: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Are these the same?def program1(): # expr is the expression we're testing expr = 1 return (expr, expr)

def program2(): return (1, 1)

>>> program1()(1,1)>>> program2()(1,1)

37 / 70

Page 38: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Are these the same?def program1(): expr = datetime.now() return (expr, expr)

def program2(): return (datetime.now(), datetime.now())

38 / 70

Page 39: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Are these the same?def program1(): expr = datetime.now() return (expr, expr)

def program2(): return (datetime.now(), datetime.now())

>>> program1()(datetime.datetime(2018, 4, 1, 17, 16, 17, 489959), datetime.datetime(2018, 4, 1, 17, 16, 17, 489959))>>> program2()(datetime.datetime(2018, 4, 1, 17, 16, 19, 304800), datetime.datetime(2018, 4, 1, 17, 16, 19, 304815))

39 / 70

Page 40: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Are these the same?def my_length(x): res = 0 for i in range(0, len(x)): res += 1 return res

def program1(): expr = my_length([1,2,3]) return (expr, expr)

def program2(): return (my_length([1,2,3]), my_length([1,2,3]))

40 / 70

Page 41: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Are these the same?def my_length(x): res = 0 for i in range(0, len(x)): res += 1 return res

def program1(): expr = my_length([1,2,3]) return (expr, expr)

def program2(): return (my_length([1,2,3]), my_length([1,2,3]))

>>> program1()(3, 3)>>> program2()(3, 3)

41 / 70

Page 42: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Equational ReasoningReasoning about code through substitution rather than execution.

42 / 70

Page 43: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Equational ReasoningReasoning about code through substitution rather than execution.

-- Apply a function to every element of a list.map :: (a -> b) -> [a] -> [b]map _ [] = []map f x:xs = f x : map f xs

43 / 70

Page 44: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Equational ReasoningReasoning about code through substitution rather than execution.

-- Apply a function to every element of a list.map :: (a -> b) -> [a] -> [b]map _ [] = []map f x:xs = f x : map f xs

map (+ 1) [1, 2, 3]

44 / 70

Page 45: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Equational ReasoningReasoning about code through substitution rather than execution.

-- Apply a function to every element of a list.map :: (a -> b) -> [a] -> [b]map _ [] = []map f x:xs = f x : map f xs

map (+ 1) [1, 2, 3]

map (+ 1) [1, 2, 3]= (1 + 1) : map (+ 1) [2, 3]= (1 + 1) : (2 + 1) : map (+ 1) [3]= (1 + 1) : (2 + 1) : (3 + 1) : map (+ 1) []= (1 + 1) : (2 + 1) : (3 + 1) : []= 2 : 3 : 4 : []= [2, 3, 4]

45 / 70

Page 46: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Recap

46 / 70

Page 47: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

RecapFunctional programs are expressions.

47 / 70

Page 48: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

RecapFunctional programs are expressions.

We run them by evaluating the expressions.

48 / 70

Page 49: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

RecapFunctional programs are expressions.

We run them by evaluating the expressions.

We try to keep functions and expressions free from side effects.

49 / 70

Page 50: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

RecapFunctional programs are expressions.

We run them by evaluating the expressions.

We try to keep functions and expressions free from side effects.

We manipulate functions like any other values.

50 / 70

Page 51: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

RecapFunctional programs are expressions.

We run them by evaluating the expressions.

We try to keep functions and expressions free from side effects.

We manipulate functions like any other values.

These things enable us to apply mathematical reasoning to ourprograms.

51 / 70

Page 52: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

RecapFunctional programs are expressions.

We run them by evaluating the expressions.

We try to keep functions and expressions free from side effects.

We manipulate functions like any other values.

These things enable us to apply mathematical reasoning to ourprograms.

There's a lot more to functional programming.

52 / 70

Page 53: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Refactoring to be More Functional

53 / 70

Page 54: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Refactoring to be More Functionaldef process(): # Read input data. data = read_data("input")

# Run our processing algorithm. data = [f(x) for x in data] data = [g(x) for x in data]

# Write our output. write_data(data, "output")

54 / 70

Page 55: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Refactoring to be More Functionaldef process(): # Read input data. data = read_data("input")

# Run our processing algorithm. data = [f(x) for x in data] data = [g(x) for x in data]

# Write our output. write_data(data, "output")

Testing this is hard.

55 / 70

Page 56: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Refactoring to be More Functionaldef process(input_file, output_file): # Read input data. data = read_data(input_file)

# Run our processing algorithm. data = [f(x) for x in data] data = [g(x) for x in data]

# Write our output. write_data(data, output_file)

Make testing easier by taking arguments.

56 / 70

Page 57: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Refactoring to be More Functionaldef process(input_file, output_file): # Read input data. data = read_data(input_file)

# Run our processing algorithm. data = [g(x) for x in data] data = [f(x) for x in data]

# Write our output. write_data(data, output_file)

57 / 70

Page 58: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Refactoring to be More Functionaldef process(input_file, output_file): # Read input data. data = read_data(input_file)

# Run our processing algorithm. output_f = [f(x) for x in data] output_g = [g(x) for x in output_f]

# Write our output. write_data(output_g, output_file)

Keep data immutable.

58 / 70

Page 59: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Refactoring to be More Functionaldef process(input_file, output_file): # Read input data. data = read_data(input_file)

# Run our processing algorithm. output_f = process_f(data) output_g = process_g(output_f)

# Write our output. write_data(output_g, output_file)

def process_f(data): return [f(x) for x in data]

def process_g(data): return [g(x) for x in data]

Separate pure algorithms from side-effecting infrastructure.

59 / 70

Page 60: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

def process(): # Read input data. data = read_data("input")

# Run our processing algorithm. data = [f(x) for x in data] data = [g(x) for x in data]

# Write our output. write_data(data, "output")

def process(input_file, output_file): # Read input data. data = read_data(input_file)

# Run our processing algorithm. output_f = process_f(data) output_g = process_g(output_f)

# Write our output. write_data(output_g, output_file)

def process_f(data): return [f(x) for x in data]

def process_g(data): return [g(x) for x in data]

60 / 70

Page 61: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

def process(): # Read input data. data = read_data("input")

# Run our processing algorithm. data = [f(x) for x in data] data = [g(x) for x in data]

# Write our output. write_data(data, "output")

def process(input_file, output_file): # Read input data. data = read_data(input_file)

# Run our processing algorithm. output_f = process_f(data) output_g = process_g(output_f)

# Write our output. write_data(output_g, output_file)

def process_f(data): return [f(x) for x in data]

def process_g(data): return [g(x) for x in data]

61 / 70

Page 62: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

def process(): # Read input data. data = read_data("input")

# Run our processing algorithm. data = [f(x) for x in data] data = [g(x) for x in data]

# Write our output. write_data(data, "output")

def process(input_file, output_file): # Read input data. data = read_data(input_file)

# Run our processing algorithm. output_f = process_f(data) output_g = process_g(output_f)

# Write our output. write_data(output_g, output_file)

def process_f(data): return [f(x) for x in data]

def process_g(data): return [g(x) for x in data]

62 / 70

Page 63: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Refactoring to be More Functionaldef process(input_file, output_file): # Read input data. data = read_data(input_file)

# Run our processing algorithm. output_f = map(f, data) output_g = map(g, output_f)

# Write our output. write_data(output_g, output_file)

#def process_f(data):# return [f(x) for x in data]

#def process_g(data):# return [g(x) for x in data]

Build larger programs from smaller ones.

63 / 70

Page 64: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Refactoring to be More Functionaldef process(input_file, output_file): # Read input data. data = read_data(input_file)

# Run our processing algorithm. output = map(compose(g, f), data)

# Write our output. write_data(output, output_file)

def compose(g, f): return lambda x: g(f(x))

Steal from math.

-- Functor composition lawmap g . map f = map (g . f)

64 / 70

Page 65: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Refactoring Example in Haskellmain :: IO ()main = let inputFile = "input" outputFile = "output" in pipeline inputFile outputFile

pipeline :: FilePath -> FilePath -> IO ()pipeline inputFile outputFile = readData inputFile <&> map (g . f) >>= writeData outputFile

-- (<&>) :: Functor f => f a -> (a -> b) -> f b-- (>>=) :: Monad m => m a -> (a -> m b) -> m b

65 / 70

Page 66: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Guidelines for Writing Functional Code

66 / 70

Page 67: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Guidelines for Writing Functional CodeWrite functions that take inputs and return outputs

67 / 70

Page 68: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Guidelines for Writing Functional CodeWrite functions that take inputs and return outputs

Keep data immutable

68 / 70

Page 69: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Guidelines for Writing Functional CodeWrite functions that take inputs and return outputs

Keep data immutable

Reach for side effects last

69 / 70

Page 70: Functional Programming Concepts for Data Processing · FP is naturally suited for expressing data-oriented workflows Learning FP makes you a better programmer I . Introduction to

Guidelines for Writing Functional CodeWrite functions that take inputs and return outputs

Keep data immutable

Reach for side effects last

Limit side effects to the outer layers of your program

70 / 70