managing (schema) migrations in cassandra

38
© 2015 GridPoint, Inc. Proprietary and Confidential 1 Managing (Schema) Migrations in Cassandra Mitch Gitman senior software engineer GridPoint, Inc.

Upload: datastax-academy

Post on 13-Apr-2017

6.014 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 1

Managing (Schema) Migrations in Cassandra

Mitch Gitmansenior software engineerGridPoint, Inc.

Page 2: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 2

05/03/2023

Page 3: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 3

05/03/2023

Page 4: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 4

05/03/2023

migrationA word with many meanings.

Page 5: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 5

05/03/2023

disclaimer…

image © Ana Camamiel

Page 6: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 6

What I mean by migrations• Live-data migrations

05/03/2023

One-off as opposed to ETL

Page 7: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 7

What I mean by migrations• Source-driven migrations

− Schema migrations

− Reference data migrations

− Test/sample data migrations

• CQL commands as opposed to real data (sstables), generally

05/03/2023

source control versioning

artifact versioningpublish

Page 8: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 8

Database refactoring

05/03/2023

Page 9: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 9

• Integration test & functional test automation (bootstrap-ability)• CI server pipelines

• Containerization??

• Consistency & repeatability across environments− Local developer box

− Dev environments

− Integration & QA environments

− Staging

− Production

Source-driven DB refactoring—the benefits

05/03/2023

Page 10: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 10

We need tools!• Built into web application frameworks

• Standalone

05/03/2023

Page 11: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 11

What do (perhaps) all these tools have in common?

05/03/2023

They’re relational. They’re for SQL.

Page 12: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 12

NoSQL Distilled

05/03/2023

Chapter 12. Schema Migrations

"We have seen that developing and maintaining an application in the brave new world of schemaless databases requires careful attention to be given to schema migration."

either/or:• RDBMS = strong schema• NoSQL = no schema

Page 13: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 13

05/03/2023

CREATE TABLE entities ( doc_id int, attribute_name String, attribute_value String, ... PRIMARY KEY(doc_id, attribute_name));

• partition keys & clustering keys• table-per-query denormalization• shift from Thrift to CQL

• Thrift: super columns & super column families • CQL: collection types

“metadata-driven documents in columnar storage:”

THE EXCEPTION

Does Cassandra like weak schemas?

So how have teams been managing their keyspace & table definitions?

Page 14: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 14

The Cassandra migration tools landscape

05/03/2023

• Flyway: First-class Cassandra support.− Requires JDBC.− https://github.com/flyway/flyway/issues/823

• Pillar: Scala tool.• mutagen-cassandra: Java tool, Astyanax driver.• Trireme: Python tool.• cql-migrate: Python tool.• mschematool: Python tool.

Page 15: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 15

What’s the secret behind DB migration tools?

05/03/2023

The migrations version tracking table

Page 16: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 16

Migration tool philosophies

05/03/2023

>

SQL

SQL

>

© Martha Stewart Living Omnimedia Inc. © Harpo Print, LLC

Page 17: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 17

Flyway for Cassandra

05/03/2023

• First-class Flyway• Faked-out Flyway

migrations

(in SQL)CQL

Page 18: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 18

The tradeoff

05/03/2023

• Store the migrations tracking table in an RDBMS

+

Page 19: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 19

Programmatically invoke Flyway

05/03/2023

Page 20: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 20

05/03/2023

Page 21: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 21

CassandraFlywayCallback

05/03/2023

implements FlywayCallback

Page 22: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 22

Two-step process

05/03/2023

source control

artifact repository

build time

deploy time

MigrationsBuilder

FlywayMigrator

Page 23: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 23

The migrations source

05/03/2023

The input to MigrationsBuilder

Page 24: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 24

05/03/2023

build time

Run MigrationsBuilder for CQL:

Run MigrationsBuilder for SQL:

Page 25: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 25

The generated migrations

05/03/2023

The output from MigrationsBuilder

build time

Page 26: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 26

The generated SQL script

05/03/2023

Faking out Flyway

build time

Page 27: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 27

05/03/2023

deploy time

Run FlywayMigrator for CQL:

Run FlywayMigrator for SQL:java -classpath /…/flyway-migrator-postgresql.jar \

com.gridpoint.tools.migrator.flyway.FlywayMigrator postgresql

java -classpath /…/flyway-migrator-cassandra.jar \

com.gridpoint.tools.migrator.flyway.FlywayMigrator cassandra

Page 28: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 28

05/03/2023

deploy time

flyway-migrator-postgresql.jarflyway-migrator-cassandra.jar

Page 29: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 29

The migrations version tracking table

05/03/2023

The Cassandra incarnation

Page 30: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 30

Best practices

05/03/2023

• Variations on versions− Version control: f94c7d7f8b130df360a4e9e4f586eafc618ddc50− Artifact repository: 3.5.1− Migration tool: 201505270800 or 10 or whatever you want− Effective contract versions—multiple versions can coexist at runtime

• Consistent deployment across environments• Failure handling• Baselining• Rollbacks?• Check schema agreement

Page 31: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 31

Schema agreement

05/03/2023

https://datastax.github.io/java-driver/2.1.8/features/metadata/

Page 32: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 32

Cassandra… migrations… limitations

05/03/2023

• Limitations of our Flyway-based solution− You need a relational database− Not open-sourced

• Limitations of source-driven migrations, in general

Page 33: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 33

Static vs. dynamic tables

05/03/2023

Page 34: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 34

Deploy time vs. runtime

05/03/2023

Dedicated migration application vs. part of main application

Page 35: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 35

Source-driven, but…

05/03/2023

• The orchestration is in source control

• Actual data rather than CQL commands− Not necessarily live data− Maybe doesn’t need to be in source control

Page 36: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 36

Embracing polyglot persistence

05/03/2023

A unified migrations solution

Page 37: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 37

Takeaways

05/03/2023

•challenging•exciting

•routine•boring+

=

=

Page 38: Managing (Schema) Migrations in Cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 38

05/03/2023

Thank you!

Mitch [email protected]@[email protected] presence @ LinkedIn