the little warehouse that couldn't or: how we learned to stop worrying and move to spark-(yandu...

13
The Little Warehouse That Couldn’t Or: How We Learned to Stop Worrying and Move to Spark 1 Yandu Oppacher (@yandu) Data Infrastructure

Upload: spark-summit

Post on 16-Aug-2015

594 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

The Little Warehouse That Couldn’t Or: How We Learned to

Stop Worrying and Move to Spark

1

Yandu Oppacher (@yandu)Data Infrastructure

Page 2: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

2

Shopify Stores

Page 3: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

ETL Warehouse Reporting

August 2013

TilllerRuby Vertica

3

Page 4: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

Why we had to move

• Data volume

• Data/Query complexity

• Performance issues

4

Page 5: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

Couple of false starts

5

Pig + Luigi

Pig + Oozie

Platfora

Page 6: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

–platfora.com

“Without coding or ETL, data warehousing, BI tools, or breaking a

sweat.”

6

Page 7: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

Enter Spark

• Fast

• Nice development model

• Python

7

Page 8: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

88

The Good Book

Page 9: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

GMVA Case Study

9

165,000+ACTIVE SHOPIFY MERCHANTS

$8 BILLION+CUMULATIVE GMV

Page 10: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

Growing pains

• Joins

• Groupings

• General data skew

• Getting to know python’s performance quirks

10

Page 11: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

Starscream

11

• specialized joins

• resolvers

• range

• cassandra

• overby

• contracts

• incrementalized fact builds

Page 12: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

Our current stack

12

Kafka

OLTPHDFS

Cassandra

Spark

FrontroomBackroom

Redshift

Tableau

Page 13: The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Move to Spark-(Yandu Oppacher, Shopify)

Thank you

13

Yandu Oppacher (@yandu)Data Infrastructure