python map reduce vs scalding

10
Python Map Reduce vs Scalding Emily Samuels

Upload: andreea-bodnari-phd

Post on 13-Jul-2015

157 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Python Map Reduce vs Scalding

Python Map Reduce

vs

Scalding

Emily Samuels

Page 2: Python Map Reduce vs Scalding

Overview

● Background

● What is Python M/R

● What is Scalding

● Examples

● Pros & Cons

● Questions

Page 3: Python Map Reduce vs Scalding

My Background

● Data Engineer at Spotify

Page 4: Python Map Reduce vs Scalding

What is Python M/R

● Luigi

● Python support is built in for running mapreduce jobs in Hadoop

Page 5: Python Map Reduce vs Scalding

What is Scalding

● Scalding is a Scala library that makes it easy to specify Hadoop

MapReduce jobs.

Page 6: Python Map Reduce vs Scalding

Word Count in Python

Page 7: Python Map Reduce vs Scalding

Word Count in Scalding

Page 8: Python Map Reduce vs Scalding

ProsPython M/R Scalding

● Small learning curve

● Integrates seamlessly with Luigi

● Less code to do powerful things

● Easy to write unit tests for jobs

● Run locally

Page 9: Python Map Reduce vs Scalding

ConsPython M/R Scalding

● More code to do the same things as

Scalding

● Not easy to test without running jobs

on the cluster

● Hard to debug

● High learning curve

Page 10: Python Map Reduce vs Scalding

Thank You

[email protected]