managing (schema) migrations in cassandra

Post on 13-Apr-2017

6.015 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2015 GridPoint, Inc. Proprietary and Confidential 1

Managing (Schema) Migrations in Cassandra

Mitch Gitmansenior software engineerGridPoint, Inc.

© 2015 GridPoint, Inc. Proprietary and Confidential 2

05/03/2023

© 2015 GridPoint, Inc. Proprietary and Confidential 3

05/03/2023

© 2015 GridPoint, Inc. Proprietary and Confidential 4

05/03/2023

migrationA word with many meanings.

© 2015 GridPoint, Inc. Proprietary and Confidential 5

05/03/2023

disclaimer…

image © Ana Camamiel

© 2015 GridPoint, Inc. Proprietary and Confidential 6

What I mean by migrations• Live-data migrations

05/03/2023

One-off as opposed to ETL

© 2015 GridPoint, Inc. Proprietary and Confidential 7

What I mean by migrations• Source-driven migrations

− Schema migrations

− Reference data migrations

− Test/sample data migrations

• CQL commands as opposed to real data (sstables), generally

05/03/2023

source control versioning

artifact versioningpublish

© 2015 GridPoint, Inc. Proprietary and Confidential 8

Database refactoring

05/03/2023

© 2015 GridPoint, Inc. Proprietary and Confidential 9

• Integration test & functional test automation (bootstrap-ability)• CI server pipelines

• Containerization??

• Consistency & repeatability across environments− Local developer box

− Dev environments

− Integration & QA environments

− Staging

− Production

Source-driven DB refactoring—the benefits

05/03/2023

© 2015 GridPoint, Inc. Proprietary and Confidential 10

We need tools!• Built into web application frameworks

• Standalone

05/03/2023

© 2015 GridPoint, Inc. Proprietary and Confidential 11

What do (perhaps) all these tools have in common?

05/03/2023

They’re relational. They’re for SQL.

© 2015 GridPoint, Inc. Proprietary and Confidential 12

NoSQL Distilled

05/03/2023

Chapter 12. Schema Migrations

"We have seen that developing and maintaining an application in the brave new world of schemaless databases requires careful attention to be given to schema migration."

either/or:• RDBMS = strong schema• NoSQL = no schema

© 2015 GridPoint, Inc. Proprietary and Confidential 13

05/03/2023

CREATE TABLE entities ( doc_id int, attribute_name String, attribute_value String, ... PRIMARY KEY(doc_id, attribute_name));

• partition keys & clustering keys• table-per-query denormalization• shift from Thrift to CQL

• Thrift: super columns & super column families • CQL: collection types

“metadata-driven documents in columnar storage:”

THE EXCEPTION

Does Cassandra like weak schemas?

So how have teams been managing their keyspace & table definitions?

© 2015 GridPoint, Inc. Proprietary and Confidential 14

The Cassandra migration tools landscape

05/03/2023

• Flyway: First-class Cassandra support.− Requires JDBC.− https://github.com/flyway/flyway/issues/823

• Pillar: Scala tool.• mutagen-cassandra: Java tool, Astyanax driver.• Trireme: Python tool.• cql-migrate: Python tool.• mschematool: Python tool.

© 2015 GridPoint, Inc. Proprietary and Confidential 15

What’s the secret behind DB migration tools?

05/03/2023

The migrations version tracking table

© 2015 GridPoint, Inc. Proprietary and Confidential 16

Migration tool philosophies

05/03/2023

>

SQL

SQL

>

© Martha Stewart Living Omnimedia Inc. © Harpo Print, LLC

© 2015 GridPoint, Inc. Proprietary and Confidential 17

Flyway for Cassandra

05/03/2023

• First-class Flyway• Faked-out Flyway

migrations

(in SQL)CQL

© 2015 GridPoint, Inc. Proprietary and Confidential 18

The tradeoff

05/03/2023

• Store the migrations tracking table in an RDBMS

+

© 2015 GridPoint, Inc. Proprietary and Confidential 19

Programmatically invoke Flyway

05/03/2023

© 2015 GridPoint, Inc. Proprietary and Confidential 20

05/03/2023

© 2015 GridPoint, Inc. Proprietary and Confidential 21

CassandraFlywayCallback

05/03/2023

implements FlywayCallback

© 2015 GridPoint, Inc. Proprietary and Confidential 22

Two-step process

05/03/2023

source control

artifact repository

build time

deploy time

MigrationsBuilder

FlywayMigrator

© 2015 GridPoint, Inc. Proprietary and Confidential 23

The migrations source

05/03/2023

The input to MigrationsBuilder

© 2015 GridPoint, Inc. Proprietary and Confidential 24

05/03/2023

build time

Run MigrationsBuilder for CQL:

Run MigrationsBuilder for SQL:

© 2015 GridPoint, Inc. Proprietary and Confidential 25

The generated migrations

05/03/2023

The output from MigrationsBuilder

build time

© 2015 GridPoint, Inc. Proprietary and Confidential 26

The generated SQL script

05/03/2023

Faking out Flyway

build time

© 2015 GridPoint, Inc. Proprietary and Confidential 27

05/03/2023

deploy time

Run FlywayMigrator for CQL:

Run FlywayMigrator for SQL:java -classpath /…/flyway-migrator-postgresql.jar \

com.gridpoint.tools.migrator.flyway.FlywayMigrator postgresql

java -classpath /…/flyway-migrator-cassandra.jar \

com.gridpoint.tools.migrator.flyway.FlywayMigrator cassandra

© 2015 GridPoint, Inc. Proprietary and Confidential 28

05/03/2023

deploy time

flyway-migrator-postgresql.jarflyway-migrator-cassandra.jar

© 2015 GridPoint, Inc. Proprietary and Confidential 29

The migrations version tracking table

05/03/2023

The Cassandra incarnation

© 2015 GridPoint, Inc. Proprietary and Confidential 30

Best practices

05/03/2023

• Variations on versions− Version control: f94c7d7f8b130df360a4e9e4f586eafc618ddc50− Artifact repository: 3.5.1− Migration tool: 201505270800 or 10 or whatever you want− Effective contract versions—multiple versions can coexist at runtime

• Consistent deployment across environments• Failure handling• Baselining• Rollbacks?• Check schema agreement

© 2015 GridPoint, Inc. Proprietary and Confidential 31

Schema agreement

05/03/2023

https://datastax.github.io/java-driver/2.1.8/features/metadata/

© 2015 GridPoint, Inc. Proprietary and Confidential 32

Cassandra… migrations… limitations

05/03/2023

• Limitations of our Flyway-based solution− You need a relational database− Not open-sourced

• Limitations of source-driven migrations, in general

© 2015 GridPoint, Inc. Proprietary and Confidential 33

Static vs. dynamic tables

05/03/2023

© 2015 GridPoint, Inc. Proprietary and Confidential 34

Deploy time vs. runtime

05/03/2023

Dedicated migration application vs. part of main application

© 2015 GridPoint, Inc. Proprietary and Confidential 35

Source-driven, but…

05/03/2023

• The orchestration is in source control

• Actual data rather than CQL commands− Not necessarily live data− Maybe doesn’t need to be in source control

© 2015 GridPoint, Inc. Proprietary and Confidential 36

Embracing polyglot persistence

05/03/2023

A unified migrations solution

© 2015 GridPoint, Inc. Proprietary and Confidential 37

Takeaways

05/03/2023

•challenging•exciting

•routine•boring+

=

=

© 2015 GridPoint, Inc. Proprietary and Confidential 38

05/03/2023

Thank you!

Mitch Gitmanmgitman@gridpoint.commgitman@nilistics.netmgitman@gmail.comskeletal presence @ LinkedIn

top related