python map reduce vs scalding
TRANSCRIPT
Python Map Reduce
vs
Scalding
Emily Samuels
Overview
● Background
● What is Python M/R
● What is Scalding
● Examples
● Pros & Cons
● Questions
My Background
● Data Engineer at Spotify
What is Python M/R
● Luigi
● Python support is built in for running mapreduce jobs in Hadoop
What is Scalding
● Scalding is a Scala library that makes it easy to specify Hadoop
MapReduce jobs.
Word Count in Python
Word Count in Scalding
ProsPython M/R Scalding
● Small learning curve
● Integrates seamlessly with Luigi
● Less code to do powerful things
● Easy to write unit tests for jobs
● Run locally
ConsPython M/R Scalding
● More code to do the same things as
Scalding
● Not easy to test without running jobs
on the cluster
● Hard to debug
● High learning curve
Thank You