let's build a service oriented data pipeline!

Post on 10-Jan-2017

1.647 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Let’s Build a Service Oriented Data Pipeline!

June 2016

Software Developer | HootsuiteYasha Podeswa

Before: Oceanographer

Me!

Now: Software Developer at Hootsuite

Me!

Introduce a problem that requires a new data pipeline

Design it in a service oriented style

Build it on stage!

This Talk

Passive Aggressive Inc. just cancelled their subscription!

Desperate Dan in trouble!

The Problem

Want to Build a Tool Like This

Want to Build a Tool Like This

Want to Build a Tool Like This

What We’re Starting With

What We’re Starting With

Things Users Did

What We’re Starting With

Things Organizations

Did

What We’re Starting With

Crap

High Level Plan

JSON filesCalculate stats

about organizations

DB

High Level Plan

JSON filesCalculate stats

about organizations

DB

Extract

Transform

Load

High Level Plan

JSON filesCalculate stats

about organizations

DB

Extract

Transform

Load

JSON filesCalculate stats

about organizations

DB

Clean and organize data

Calculate stats per organization

JSON filesCalculate stats

about organizations

DB

Clean and organize data

Calculate stats per organization

Useful for lots of things!

JSON filesCalculate stats

about organizations

DB

Clean and organize data

Calculate stats per organization

Shouldn’t run until dependent job done

Need a “Service” Communication and Orchestration Layer!

Let’s build it!

First App Event Cleaning and Loading

Read logs from S3, clean and sort into different types of events, load into data warehouse

Vanilla Scala app

AWS Lambda

Second App Organization Stat CalculationRead cleaned/sorted events from data warehouse, calculate stats about organization, load stats to data warehouse

Vanilla Scala app

AWS Lambda

Third App Airflow

Hook up the Lambda apps in a dependency graph● Scheduling● Retries● Monitoring

Steal my code!

https://github.com/yashap/etl-load-eventshttps://github.com/yashap/etl-organization-statshttps://github.com/yashap/airflow

Questions?

top related