let's build a service oriented data pipeline!

27
Let’s Build a Service Oriented Data Pipeline! June 2016 Software Developer | Hootsuite Yasha Podeswa

Upload: yasha-podeswa

Post on 10-Jan-2017

1.647 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Let's Build a Service Oriented Data Pipeline!

Let’s Build a Service Oriented Data Pipeline!

June 2016

Software Developer | HootsuiteYasha Podeswa

Page 2: Let's Build a Service Oriented Data Pipeline!

Before: Oceanographer

Me!

Page 3: Let's Build a Service Oriented Data Pipeline!

Now: Software Developer at Hootsuite

Me!

Page 4: Let's Build a Service Oriented Data Pipeline!

Introduce a problem that requires a new data pipeline

Design it in a service oriented style

Build it on stage!

This Talk

Page 5: Let's Build a Service Oriented Data Pipeline!

Passive Aggressive Inc. just cancelled their subscription!

Desperate Dan in trouble!

The Problem

Page 6: Let's Build a Service Oriented Data Pipeline!

Want to Build a Tool Like This

Page 7: Let's Build a Service Oriented Data Pipeline!

Want to Build a Tool Like This

Page 8: Let's Build a Service Oriented Data Pipeline!

Want to Build a Tool Like This

Page 9: Let's Build a Service Oriented Data Pipeline!
Page 10: Let's Build a Service Oriented Data Pipeline!

What We’re Starting With

Page 11: Let's Build a Service Oriented Data Pipeline!

What We’re Starting With

Things Users Did

Page 12: Let's Build a Service Oriented Data Pipeline!

What We’re Starting With

Things Organizations

Did

Page 13: Let's Build a Service Oriented Data Pipeline!

What We’re Starting With

Crap

Page 14: Let's Build a Service Oriented Data Pipeline!

High Level Plan

JSON filesCalculate stats

about organizations

DB

Page 15: Let's Build a Service Oriented Data Pipeline!

High Level Plan

JSON filesCalculate stats

about organizations

DB

Extract

Transform

Load

Page 16: Let's Build a Service Oriented Data Pipeline!

High Level Plan

JSON filesCalculate stats

about organizations

DB

Extract

Transform

Load

Page 17: Let's Build a Service Oriented Data Pipeline!

JSON filesCalculate stats

about organizations

DB

Clean and organize data

Calculate stats per organization

Page 18: Let's Build a Service Oriented Data Pipeline!

JSON filesCalculate stats

about organizations

DB

Clean and organize data

Calculate stats per organization

Useful for lots of things!

Page 19: Let's Build a Service Oriented Data Pipeline!

JSON filesCalculate stats

about organizations

DB

Clean and organize data

Calculate stats per organization

Shouldn’t run until dependent job done

Page 20: Let's Build a Service Oriented Data Pipeline!

Need a “Service” Communication and Orchestration Layer!

Page 21: Let's Build a Service Oriented Data Pipeline!
Page 22: Let's Build a Service Oriented Data Pipeline!

Let’s build it!

Page 23: Let's Build a Service Oriented Data Pipeline!

First App Event Cleaning and Loading

Read logs from S3, clean and sort into different types of events, load into data warehouse

Vanilla Scala app

AWS Lambda

Page 24: Let's Build a Service Oriented Data Pipeline!

Second App Organization Stat CalculationRead cleaned/sorted events from data warehouse, calculate stats about organization, load stats to data warehouse

Vanilla Scala app

AWS Lambda

Page 25: Let's Build a Service Oriented Data Pipeline!

Third App Airflow

Hook up the Lambda apps in a dependency graph● Scheduling● Retries● Monitoring

Page 26: Let's Build a Service Oriented Data Pipeline!

Steal my code!

https://github.com/yashap/etl-load-eventshttps://github.com/yashap/etl-organization-statshttps://github.com/yashap/airflow

Page 27: Let's Build a Service Oriented Data Pipeline!

Questions?