ghostferry: the swiss army knife of live data migrations ... · how to automate database...

24
Ghostferry: the swiss army knife of live data migrations with minimum downtime Shuhao Wu Shopify April 24, 2018

Upload: others

Post on 09-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

Ghostferry: the swiss army knife of live data migrations with minimum downtime

Shuhao WuShopify

April 24, 2018

Page 2: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

Problems with Existing Tools

Cloud limitations

No access to the filesystem.

No direct access to commands like CHANGE MASTER.

Performance impact of mysqldump.

Must copy a whole table at a time.

CHANGE MASTER …? mysqldump --what?

Page 3: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

Ghostferry: The Solution

Easy: single binary solution to moving data.

Customizable: a library to implement arbitrary migration flows.

Proven: used to migrate 70 TiBs of data at Shopify.

Confident: algorithm modeled and understood with formal methods (TLA+)

Open source: MIT, https://github.com/Shopify/ghostferry

Page 4: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

Ghostferry: the Swiss Army Knife of Live Data Migrations with Minimum Downtime

General Session

▪ Tuesday

▪ 4:50 – 5:15 PM

▪ Room G

Page 5: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

VitessHigh performance, scalable, and available MySQL

clustering system for the Cloud

Sugu Sougoumarane

CTO, PlanetScale

@ssougou

Page 6: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

Database trends

● Transactional data explosion

● Move to the cloud

● DBAs transitioning to DBEs

Page 7: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

Vitess capabilities

● Leverage MySQL

● Take away the pain of sharding

● Make resharding robust and easy

● Pluggable sharding schemes

● Cloud-ready

● Observability

Page 8: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

The Community

In production Evaluating

Quiz of Kings

Page 9: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

In conclusion

● Scale out MySQL

● Run in the cloud

● Vitess sessions○ Migrating to Vitess at (Slack) Scale

○ Designing and launching the next-generation database system @

Slack: from whiteboard to production

○ Observability features of Vitess

Page 10: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

Automated DBA

Nikolay Samokhvalovtwitter: @postgresmenemail: [email protected]

Page 11: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

Hacker News “Who is hiring” – April 2018

https://news.ycombinator.com/item?id=16735011

List of job postings, popular among startups. 1068 messages (as of Apr 17 2018)

2

Page 12: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

Already automated:

● Postgres parameters tuning

● Query analysis and optimization

● Index set optimization

● Detailed monitoring

● Verify optimization ideas

● Setup/tune hardware, OS, FS

● Provision Postgres instances

● Create replicas

● High Availability:detect failures and switch to replicas

● Create backups

● Basic monitoring

Little to zero automatization:

3

Page 13: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

Meet postgres_dbapostgres_dba – The missing set of useful tools for Postgres https://github.com/NikolayS/postgres_dba

4

Page 14: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

Back to full-fledged automation

● Detect performance bottlenecks

● Predict performance bottlenecks

● Prevent performance bottlenecks

5

The ultimate goal

of automatization

Page 15: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

DIY automated pipeline for DB optimizationHow to automate database optimization using ecosystem tools and AWS?

Analyze:● pg_stat_statements● auto_explan● pgBadger to parse logs, use JSON output● pg_query to group queries better

Configuration:● annotated.conf● pgtune, pgconfigurator, postgresqlco.nf (wip)● ottertune

Suggested indexes● (useful: pgHero, POWA, HypoPG, dexter, plantuner)

Conduct experiments:● pgreplay to replay logs (different log_line_prefix, you need to handle it)● EC2 spot instances

Machine learning● MADlib 6

Page 16: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

Meet PostgreSQL.support

AI-based cloud-friendly platform to automate database administration

7

SteveAI-based expert in database tuning

MaxAI-based expert in query optimization and Postgres indexes

NancyAI-based expert in resource planning.Conducts experiments with benchmarks

Sign up for early access:

http://PostgreSQL.support

Page 17: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

Thanks!

Come hear more:Wednesday, 11:00 a.m.

Nikolay Samokhvalov

[email protected]

twitter: @postgresmen

http://PostgreSQL.support

8

Page 18: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

How to GetTenure in

Databases@andy_pavlo

Andy's Guide on

Page 19: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

2

Research PapersClasses TaughtGrants Funded

Page 20: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

3

# of Crazy Emails!→Physics: E≠mc2

→Math: Fermat's Thm→ComSci: P=NP

Page 23: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

6

Self-Driving DBMS→What to change?→When to change it?→Was it helpful?

Page 24: Ghostferry: the swiss army knife of live data migrations ... · How to automate database optimization using ecosystem tools and AWS? Analyze: pg_stat_statements auto_explan pgBadger

7

Today @ 11:30amRoom 203@andy_pavlo