why functional why scala

26
Jul 2014 WHY FUNCTIONAL? WHY SCALA? Neville Li @sinisa_lyh

Upload: neville-li

Post on 08-Sep-2014

226 views

Category:

Technology


1 download

DESCRIPTION

Why every data engineer should know something about functional programming and Scala. Properly formatted slides at http://www.lyh.me/slides/pitch.html

TRANSCRIPT

Page 1: Why functional  why scala

Jul 2014

WHY FUNCTIONAL?WHY SCALA?

Neville Li@sinisa_lyh

Page 2: Why functional  why scala

MONOID!Actually it's a semigroup, monoid just sounds more interesting :)

A Little Teaser

Crunch: CombineFns are used to represent the associative operations...

PGroupedTable<K,V>::combineValues(CombineFn<K,V> combineFn, CombineFn<K,V> reduceFn)

Scalding: reduce with fn which must be associative and commutative

KeyedList[K, T]::reduce(fn: (T, T) => T)

Spark: Merge the values for each key using an associative reduce function

PairRDDFunctions[K, V]::reduceByKey(fn: (V, V) => V)

All of them work on both mapper and reducer side

0

Page 3: Why functional  why scala

MY STORY

Before

Mostly Python/C++ (and PHP...)No Java experience at allStarted using Scala early 2013

Now

Discovery's* Java backend/riemann guyThe Scalding/Spark/Storm guyContributor to Spark, chill, cascading.avro

* Spotify's machine learning and recommendation team

Page 4: Why functional  why scala

WHY THIS TALK?

Not a tutorialDiscovery's experienceWhy FP mattersWhy Scala mattersCommon misconceptions

Page 5: Why functional  why scala

WHAT WE ALREADY USE

KafkaScaldingSpark / MLLibStratosphereStorm / Riemann (Clojure)

Page 6: Why functional  why scala

WHAT WE WANT TO INVESTIGATE

Summingbird (Scala for Storm + Hadoop)Spark StreamingShark / SparkSQLGraphX (Spark)BIDMach (GPU ML with GPU)

Page 7: Why functional  why scala

DISCOVERY

Mid 2013: 100+ Python jobs10+ hires since (half since new year)Few with Java experience, none with ScalaAs of May 2014: ~100 Scalding jobs & 90 testsMore uncommited ad-hoc jobs12+ commiters, 4+ using Spark

Page 8: Why functional  why scala

DISCOVERY

rec-sys-scalding.git

Page 9: Why functional  why scala

DISCOVERY

GUESS HOW MANY JOBSWRITTEN BY YOURS TRUELY?

3

Page 10: Why functional  why scala

WHY FUNCTIONAL

Immutable dataCopy and transformNot mutate in placeHDFS with M/R jobsStorm tuples, Riemann streams

Page 11: Why functional  why scala

WHY FUNCTIONAL

Higher order functionsExpressions, not statementsFocus on problem solvingNot solving programming problems

Page 12: Why functional  why scala

WHY FUNCTIONAL

Word count in Pythonlyrics = ["We all live in Amerika", "Amerika ist wunderbar"]wc = defaultdict(int)for l in lyrics: for w in l.split(): wc[w] += 1

Screen too small for the Java version

Page 13: Why functional  why scala

WHY FUNCTIONAL

Map and reduce are key concepts in FPval lyrics = List("We all live in Amerika", "Amerika ist wunderbar")lyrics.flatMap(_.split(" ")) // map .groupBy(identity) // shuffle .map { case (k, g) => (k, g.size) } // reduce

(def lyrics ["We all live in Amerika" "Amerika ist wunderbar"])(->> lyrics (mapcat #(clojure.string/split % #"\s")) (group-by identity) (map (fn [[k g]] [k (count g)])))

import Control.Arrowimport Data.Listlet lyrics = ["We all live in Amerika", "Amerika ist wunderbar"]map words >>> concat >>> sort >>> group >>> map (\x -> (head x, length x)) $ lyrics

Page 14: Why functional  why scala

WHY FUNCTIONALLinear equation in ALS matrix factorization

= ( Y + ( + I)Y p(u)xu Y T Y T Cu )+1Y T Cu

vectors.map { case (id, vec) => (id, vec * vec.T) } // YtY .map(_._2).reduce(_ + _)

ratings.keyBy(fixedKey).join(outerProducts) // YtCuIY .map { case (_, (r, op)) => (solveKey(r), op * (r.rating * alpha)) } .reduceByKey(_ + _)

ratings.keyBy(fixedKey).join(vectors) // YtCupu .map { case (_, (r, vec)) => val Cui = r.rating * alpha + 1 val pui = if (Cui > 0.0) 1.0 else 0.0 (solveKey(r), vec * (Cui * pui)) }.reduceByKey(_ + _)

Page 15: Why functional  why scala

WHY SCALA

JVM - libraries and toolsPythonesque syntaxStatic typing with inferenceTransition from imperative to FP

Page 16: Why functional  why scala

WHY SCALA

Performance vs. agility

http://nicholassterling.wordpress.com/2012/11/16/scala-performance/

Page 17: Why functional  why scala

WHY SCALA

Type inferenceclass ComplexDecorationService { public List<ListenableFuture<Map<String, Metadata>>> lookupMetadata(List<String> keys) { /* ... */ }}

val data = service.lookupMetadata(keys)

type DF = List[ListenableFuture[Map[String, Track]]]def process(data: DF) = { /* ... */ }

Page 18: Why functional  why scala

WHY SCALA

Higher order functionsList<Integer> list = Lists.newArrayList(1, 2, 3);Lists.transform(list, new Function<Integer, Integer>() { @Override public Integer apply(Integer input) { return input + 1; }});

val list = List(1, 2, 3)list.map(_ + 1) // List(2, 3, 4)

And then imagine if you have to chain or nested functions

Page 19: Why functional  why scala

WHY SCALA

Collections APIval l = List(1, 2, 3, 4, 5)l.map(_ + 1) // List(2, 3, 4, 5, 6)l.filter(_ > 3) // 4 5

l.zip(List("a", "b", "c")).toMap // Map(1 -> a, 2 -> b, 3 -> c)l.partition(_ % 2 == 0) // (List(2, 4),List(1, 3, 5))List(l, l.map(_ * 2)).flatten // List(1, 2, 3, 4, 5, 2, 4, 6, 8, 10)

l.reduce(_ + _) // 15l.fold(100)(_ + _) // 115

"We all live in Amerika".split(" ").groupBy(_.size)// Map(2 -> Array(We, in), 4 -> Array(live),// 7 -> Array(Amerika), 3 -> Array(all))

Page 20: Why functional  why scala

WHY SCALA

Scalding field based word countTextLine(path)) .flatMap('line -> 'word) { line: String => line.split("""\W+""") } .groupBy('word) { _.size }

Scalding type-safe word countTextLine(path).read.toTypedPipe[String](Fields.ALL) .flatMap(_.split(""\W+"")) .groupBy(identity).size

Scrunch word countread(from.textFile(file)) .flatMap(_.split("""\W+""") .count

Page 21: Why functional  why scala

WHY SCALA

Summingbird word countsource .flatMap { line: String => line.split("""\W+""").map((_, 1)) } .sumByKey(store)

Spark word countsc.textFile(path) .flatMap(_.split("""\W+""")) .map(word => (word, 1)) .reduceByKey(_ + _)

Stratosphere word countTextFile(textInput) .flatMap(_.split("""\W+""")) .map(word => (word, 1)) .groupBy(_._1) .reduce { (w1, w2) => (w1._1, w1._2 + w2._2) }

Page 22: Why functional  why scala

WHY SCALA

Many patterns also common in Java

Java 8 lambdas and streamsGuava, Crunch, etc.Optional, PredicateCollection transformationsListenableFuture and transformparallelDo, DoFn, MapFn, CombineFn

Page 23: Why functional  why scala

COMMON MISCONCEPTIONS

It's complex

True for language featuresNot from user's perspectiveWe only use 20% featuresNot more than needed in Java

Page 24: Why functional  why scala

COMMON MISCONCEPTIONS

It's slow

No slower than PythonDepend on how pure FPTrade off with productivityDrop down to Java or native libraries

Page 25: Why functional  why scala

COMMON MISCONCEPTIONS

I don't want to learn a new language

How about flatMap, reduce, fold, etc.?Unnecessary overhead interfacing with Python or JavaYou've used monoids, monads, or higher order functions already

Page 26: Why functional  why scala

THE ENDTHANK YOU