scaling data infrastructure @ spotify - qcon · scaling data infrastructure @ spotify...

42
Scaling Data Infrastructure @ Spotify [email protected] [email protected]

Upload: others

Post on 30-Jul-2020

19 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Scaling Data Infrastructure @ [email protected]

[email protected]

Page 2: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Mārtiņš Kalvā[email protected]

Matti [email protected]

Page 3: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Agenda

1. Data at Spotify

2. Summer of 2015

3. Challenges & Victory

○ Datamon

○ Styx

○ GABO

Page 4: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Spotify big-data context

● Over 100 million monthly active users

● Over 30 million song

● Over 2 billion playlists

● Active in 60 markets

Page 5: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Data is at the heart of Spotify

In 2007

- Monthly Royalty Report

In 2016

- Monthly Royalty Report- Weekly Billboard- Daily reports to partners- ...

- AB-Testing- Discover weekly- Daily Mix- ...

Page 6: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Our growth in Data

Users

+50 TB/day+100M Users

Developers

+60 TB/day+10k M/R jobs

Page 7: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Hadoop

Autonomy & Dependencies

Team A

Team B

Team C

Page 8: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Autonomy & Dependencies

Page 9: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Autonomy & Dependencies

Page 10: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Autonomy & Dependencies

Page 11: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Summer of Incidents

Page 12: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● A strain of incidents

Summer of Incidents

Page 13: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● A strain of incidents

● War-room

Summer of Incidents

Page 14: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● A strain of incidents

● War-room

● Hadoop on it’s knees

Summer of Incidents

Page 15: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● A strain of incidents

● War-room

● Hadoop on it’s knees

● Event Delivery Catch up

Summer of Incidents

Page 16: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● A strain of incidents

● War-room

● Hadoop on it’s knees

● Event Delivery Catch up

● Reprocessing of data

Summer of Incidents

Page 17: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● A strain of incidents

● War-room

● Hadoop on it’s knees

● Event Delivery Catch up

● Reprocessing of data

● Hard to debug data issues

Summer of Incidents

Page 18: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Challenges and the path to victory...

Page 19: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

1. Early Warning Datamon - Data monitoring

Challenges and the path to victory...

Page 20: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

1. Early Warning Datamon - Data monitoring

2. Debuggability & Control Styx - Scheduling and control

Challenges and the path to victory...

Page 21: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

1. Early Warning Datamon - Data monitoring

2. Debuggability & Control Styx - Scheduling and control

3. Automate Capacity GABO - Event Delivery

Challenges and the path to victory...

Page 22: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

1. Early Warning Datamon - Data monitoring

2. Debuggability & Control Styx - Scheduling and control

3. Automate Capacity GABO - Event Delivery

Challenges and the path to victory...

Page 23: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Early Warning - Datamon

Page 24: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● Unified view○ Alignment between teams

● Ownership○ Clear ownership of data

● SLA○ Alert on late data

Early Warning - Datamon

Page 25: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● Define terminology

● Provide metadata language

● Implement a Datamon service

Early Warning - Datamon

Page 26: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

1. Early Warning Datamon - Data monitoring

2. Debuggability & Control Styx - Scheduling and control

3. Automate Capacity GABO - Event Delivery

Challenges and the path to victory...

Page 27: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

- Execution control- Self service for data users

- Execution information- Expose debug information

- Execution isolation- Docker for data jobs

Debuggability & Control - Styx

The river Styx

Page 28: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● Execution control

○ Centralized execution API

Debuggability & Control - Styx

Page 29: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Debuggability & Control - Styx

● Execution control

○ Centralized execution API

○ Backfilling and reprocessing

Page 30: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● Execution control

● Execution information

○ Timeline

Debuggability & Control - Styx

Page 31: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Debuggability & Control - Styx

● Execution control

● Execution information

○ Timeline

○ Google Cloud Logging

Page 32: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Debuggability & Control - Styx

● Execution control

● Execution information

● Execution isolation○ Docker

Page 33: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

1. Early Warning Datamon - Data monitoring

2. Debuggability & Control Styx - Scheduling and control

3. Automate Capacity GABO - Event Delivery

Challenges and the path to victory...

Page 34: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● Complex and manual config

Automate Capacity - GABO/Event Delivery

Page 35: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● Complex and manual config

● Pubsub & Dataflow streaming

Automate Capacity - GABO/Event Delivery

Page 36: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● Complex and manual config

● Pubsub & Dataflow streaming

● Pubsubs at scale

Automate Capacity - GABO/Event Delivery

Page 37: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● Complex and manual config

● Pubsub & Dataflow streaming

● Pubsubs at scale

● Dataflow streaming

Automate Capacity - GABO/Event Delivery

Page 38: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● Complex and manual config

● Pubsub & Dataflow streaming

● Pubsubs at scale

● Dataflow streaming :-(

● 2 micro services + 1 Map/Reduce job

Automate Capacity - GABO/Event Delivery

Page 39: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● Complex and manual config

● Pubsub & Dataflow streaming

● Pubsubs at scale

● Dataflow streaming :-(

● 2 micro services + 1 Map/Reduce job

● Autoscaling & The Stuffer

Automate Capacity - GABO/Event Delivery

Page 40: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● Handles at least 10x our load

● Darkloading

● Autoscale everything

● Self service

GABO - WIP

Page 41: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

● Make sure you have the right tools to deal with data incidents○ Make sure you have time to

implement the tools you need

● Remember that your capacity model can fail at larger scale○ Keep track of your scale and

Automate, automate, automate...

Summary

Page 42: Scaling Data Infrastructure @ Spotify - QCon · Scaling Data Infrastructure @ Spotify matti@spotify.com kalvans@spotify.com. Mārtiņš Kalvāns kalvans@spotify.com Matti Pehrs matti@spotify.com

Thank [email protected]@spotify.com

Want to join the band? http://spoti.fi/jobs