managing (schema) migrations in cassandra
TRANSCRIPT
© 2015 GridPoint, Inc. Proprietary and Confidential 1
Managing (Schema) Migrations in Cassandra
Mitch Gitmansenior software engineerGridPoint, Inc.
© 2015 GridPoint, Inc. Proprietary and Confidential 2
05/03/2023
© 2015 GridPoint, Inc. Proprietary and Confidential 3
05/03/2023
© 2015 GridPoint, Inc. Proprietary and Confidential 4
05/03/2023
migrationA word with many meanings.
© 2015 GridPoint, Inc. Proprietary and Confidential 5
05/03/2023
disclaimer…
image © Ana Camamiel
© 2015 GridPoint, Inc. Proprietary and Confidential 6
What I mean by migrations• Live-data migrations
05/03/2023
One-off as opposed to ETL
© 2015 GridPoint, Inc. Proprietary and Confidential 7
What I mean by migrations• Source-driven migrations
− Schema migrations
− Reference data migrations
− Test/sample data migrations
• CQL commands as opposed to real data (sstables), generally
05/03/2023
source control versioning
artifact versioningpublish
© 2015 GridPoint, Inc. Proprietary and Confidential 8
Database refactoring
05/03/2023
© 2015 GridPoint, Inc. Proprietary and Confidential 9
• Integration test & functional test automation (bootstrap-ability)• CI server pipelines
• Containerization??
• Consistency & repeatability across environments− Local developer box
− Dev environments
− Integration & QA environments
− Staging
− Production
Source-driven DB refactoring—the benefits
05/03/2023
© 2015 GridPoint, Inc. Proprietary and Confidential 10
We need tools!• Built into web application frameworks
• Standalone
05/03/2023
© 2015 GridPoint, Inc. Proprietary and Confidential 11
What do (perhaps) all these tools have in common?
05/03/2023
They’re relational. They’re for SQL.
© 2015 GridPoint, Inc. Proprietary and Confidential 12
NoSQL Distilled
05/03/2023
Chapter 12. Schema Migrations
"We have seen that developing and maintaining an application in the brave new world of schemaless databases requires careful attention to be given to schema migration."
either/or:• RDBMS = strong schema• NoSQL = no schema
© 2015 GridPoint, Inc. Proprietary and Confidential 13
05/03/2023
CREATE TABLE entities ( doc_id int, attribute_name String, attribute_value String, ... PRIMARY KEY(doc_id, attribute_name));
• partition keys & clustering keys• table-per-query denormalization• shift from Thrift to CQL
• Thrift: super columns & super column families • CQL: collection types
“metadata-driven documents in columnar storage:”
THE EXCEPTION
Does Cassandra like weak schemas?
So how have teams been managing their keyspace & table definitions?
© 2015 GridPoint, Inc. Proprietary and Confidential 14
The Cassandra migration tools landscape
05/03/2023
• Flyway: First-class Cassandra support.− Requires JDBC.− https://github.com/flyway/flyway/issues/823
• Pillar: Scala tool.• mutagen-cassandra: Java tool, Astyanax driver.• Trireme: Python tool.• cql-migrate: Python tool.• mschematool: Python tool.
© 2015 GridPoint, Inc. Proprietary and Confidential 15
What’s the secret behind DB migration tools?
05/03/2023
The migrations version tracking table
© 2015 GridPoint, Inc. Proprietary and Confidential 16
Migration tool philosophies
05/03/2023
>
SQL
SQL
>
© Martha Stewart Living Omnimedia Inc. © Harpo Print, LLC
© 2015 GridPoint, Inc. Proprietary and Confidential 17
Flyway for Cassandra
05/03/2023
• First-class Flyway• Faked-out Flyway
migrations
(in SQL)CQL
© 2015 GridPoint, Inc. Proprietary and Confidential 18
The tradeoff
05/03/2023
• Store the migrations tracking table in an RDBMS
+
© 2015 GridPoint, Inc. Proprietary and Confidential 19
Programmatically invoke Flyway
05/03/2023
© 2015 GridPoint, Inc. Proprietary and Confidential 20
05/03/2023
© 2015 GridPoint, Inc. Proprietary and Confidential 21
CassandraFlywayCallback
05/03/2023
implements FlywayCallback
© 2015 GridPoint, Inc. Proprietary and Confidential 22
Two-step process
05/03/2023
source control
artifact repository
build time
deploy time
MigrationsBuilder
FlywayMigrator
© 2015 GridPoint, Inc. Proprietary and Confidential 23
The migrations source
05/03/2023
The input to MigrationsBuilder
© 2015 GridPoint, Inc. Proprietary and Confidential 24
05/03/2023
build time
Run MigrationsBuilder for CQL:
Run MigrationsBuilder for SQL:
© 2015 GridPoint, Inc. Proprietary and Confidential 25
The generated migrations
05/03/2023
The output from MigrationsBuilder
build time
© 2015 GridPoint, Inc. Proprietary and Confidential 26
The generated SQL script
05/03/2023
Faking out Flyway
build time
© 2015 GridPoint, Inc. Proprietary and Confidential 27
05/03/2023
deploy time
Run FlywayMigrator for CQL:
Run FlywayMigrator for SQL:java -classpath /…/flyway-migrator-postgresql.jar \
com.gridpoint.tools.migrator.flyway.FlywayMigrator postgresql
java -classpath /…/flyway-migrator-cassandra.jar \
com.gridpoint.tools.migrator.flyway.FlywayMigrator cassandra
© 2015 GridPoint, Inc. Proprietary and Confidential 28
05/03/2023
deploy time
flyway-migrator-postgresql.jarflyway-migrator-cassandra.jar
© 2015 GridPoint, Inc. Proprietary and Confidential 29
The migrations version tracking table
05/03/2023
The Cassandra incarnation
© 2015 GridPoint, Inc. Proprietary and Confidential 30
Best practices
05/03/2023
• Variations on versions− Version control: f94c7d7f8b130df360a4e9e4f586eafc618ddc50− Artifact repository: 3.5.1− Migration tool: 201505270800 or 10 or whatever you want− Effective contract versions—multiple versions can coexist at runtime
• Consistent deployment across environments• Failure handling• Baselining• Rollbacks?• Check schema agreement
© 2015 GridPoint, Inc. Proprietary and Confidential 31
Schema agreement
05/03/2023
https://datastax.github.io/java-driver/2.1.8/features/metadata/
© 2015 GridPoint, Inc. Proprietary and Confidential 32
Cassandra… migrations… limitations
05/03/2023
• Limitations of our Flyway-based solution− You need a relational database− Not open-sourced
• Limitations of source-driven migrations, in general
© 2015 GridPoint, Inc. Proprietary and Confidential 33
Static vs. dynamic tables
05/03/2023
© 2015 GridPoint, Inc. Proprietary and Confidential 34
Deploy time vs. runtime
05/03/2023
Dedicated migration application vs. part of main application
© 2015 GridPoint, Inc. Proprietary and Confidential 35
Source-driven, but…
05/03/2023
• The orchestration is in source control
• Actual data rather than CQL commands− Not necessarily live data− Maybe doesn’t need to be in source control
© 2015 GridPoint, Inc. Proprietary and Confidential 36
Embracing polyglot persistence
05/03/2023
A unified migrations solution
© 2015 GridPoint, Inc. Proprietary and Confidential 37
Takeaways
05/03/2023
•challenging•exciting
•routine•boring+
=
=
© 2015 GridPoint, Inc. Proprietary and Confidential 38
05/03/2023
Thank you!
Mitch [email protected]@[email protected] presence @ LinkedIn