scylla summit 2017: migrating to scylla from cassandra and others with no downtime
TRANSCRIPT
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Migration To ScyllaFrom Cassandra
Senior Solutions Architect, ScyllaDB
Alexander Sicular
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Alexander "Sasha" Sicular
2
● Over 16 years at Columbia University, the last seven as Director of Medical Informatics, working in the field of clinical informatics building EMR's, billing, data integration and research systems.
● Having extensive experience in relational, non-relational and distributed databases, Alexander helps customers get the most out of Scylla as a Senior Solutions Architect at ScyllaDB.
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
3
Agenda
+ Compatibility+ DB Migration 101
+ Offline migration+ Live migration
+ Migration From Cassandra to Scylla+ Migration Tools+ Best Practice
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Compatibility
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Scylla Compatibility
5
+ SSTable file format (Compatible to Cassandra 2.1)
+ Configuration file format (Compatible to Cassandra 2.1)
+ CQL language (CQL version 3.3.1)
+ CQL native protocol (CQL version 3.3.1)
+ JMX management protocol (Compatible to Cassandra 2.1)
+ Management command line (nodetool from C* 3.0)
+ All Drivers (Java, C++, Python, Node, Ruby, Go…)
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
DB Migration 101
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
DB Migration Steps
7
+ Schema Migration+ Migrating Historical Data (Forklifting)+ Migrating Live Data (Dual Writes)+ Validation (Offline and/or Dual Reads)*+ Fade out old DB
* Optional step
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Offline MigrationFrom DB-OLD to DB-NEW
8
Read from DB-NEW
Read / Write to DB-OLD
Write to DB-NEW
Time
Forklifting Historical Data
Validation*
Fade out DB-OLDDBs in Sync
Down Time
Migrate Schema
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Live MigrationFrom DB-OLD to DB-NEW
9
Read from DB-OLD
Read from DB-NEW
Dual Reads*
Write to DB-OLD
Write to DB-NEW
Dual Writes
Time
Forklifting Historical Data
Validation*
DBs in Sync
Fade out DB-OLD
Migrate Schema
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Migration Tools
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
11
Migration Multi DC cluster
SSTableLoader
SSTablesCQL
Internal communication
DC A
DC B
DC C
DC A
DC B
If every Cassandra DC holds the same information, uploading from one of the DC's sstables is sufficient.
Dual Write needs to be implemented in all regions.
Number and RF of DC's does not have to be preserved.
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
12
+ Use DESCRIBE to export each Cassandra Keyspace, Table, UDT (not including system tables)
+ Cassandra + cqlsh "-e DESC SCHEMA" > schema.cql
+ Scylla+ cqlsh --file ‘schema.cql’
+ When migrating from Cassandra 3.x some schema updates required
Migrate Schema
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
13
+ Update the application logic to send each write to both clusters (Cassandra and Scylla) in parallel
+ Recommendations: + Compare the results and log inconsistencies, if any+ Use client side timestamp + Create knobs for each DB writer, allowing you to stop/start writing to each DB in
runtime
+ Rolling application logic upgrade for zero downtime+ Dual Read can follow the same logic
Dual Write
ClientCQLCQL
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
14
Use two different cluster sessions.
#connect to cluster 1
db1 = cassandra.cluster.Cluster(IP_C1).connect()
#connect to cluster 2
db2 = cassandra.cluster.Cluster(IP_C2).connect()
Dual Writes
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
15
Two prepared statements, one for each DB session.
#insert statement with explicit TIMESTAMP
insert_statement = "INSERT INTO keyspace.table (c1,c2)
VALUES (?,?) USING TIMESTAMP ?"
#prepared statements
prepared_statement_1 = db1.prepare(insert_statement)
prepared_statement_2 = db2.prepare(insert_statement)
Dual Writes
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
16
Create sample values, execute async insert statements.
#rand values, explicitly set a write time in microseconds
values = [random.randrange(0,1000) , str(uuid.uuid4()) , int(time.time()*1000000)]
# build a list of queries
inserts = []
#insert 1st statement into the 1st session
inserts.append(db1.execute_async(prepared_statement_1, values))
#insert 2nd statement into the 2nd session
inserts.append(db2.execute_async(prepared_statement_2, values))
Dual Writes
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
17
Return for results, log results and values in array.
# loop over futures and output success/fail
results = []
for i in range(0,len(inserts)):
try:
row = inserts[i].result()
results.append(1)
except Exception:
results.append(0)
results.append(values)
Dual Writes
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
18
Check for failures in either write.
#did we have failures?
if (results[0]==0):
#do something
log('Write to cluster 1 failed')
if (results[1]==0):
#do something
log('Write to cluster 2 failed')
Dual Writes
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
19
Forklifting Historical Data+ Install Scylla’s sstableloader on Cassandra nodes, or on intermediate servers+ Create snapshot of each Cassandra node+ Run sstableloader from each Cassandra node
sstableloader -x -d [Scylla IP] .../[ks]/[table]
Or, from intermediate servers, using mount to Cassandra filesystem
sstableloader -x -d [scylla IP] .../[mount point] in /[ks]/[table] format
+ Watch for an affect on Cassandra nodes, and use throttling (-t) to limit the loader throughput
SSTableLoader
SSTablesCQL
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
Best Practices
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
21
Best Practices
+ Clean up the origin database in advance. Don't waste time on old data!
+ More data = longer migration time+ Iterative migration and validation. For example one table,
one region, one user prefix, etc. After validation keep or delete/restart that dataset
+ At any point: verify and validate. You can always roll back to the origin DB for any reason
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
22
Best Practices… Continued
+ Make sure to have a monitoring stack in place for both DBs and the application during the entire migration
+ Validate the process by sampling data at different points+ Before fading out the origin DB, make sure there are no
live connections to it+ Make sure all relevant users are aware of the process and
limitations (don't update your schema!)+ Get Scylla involved. We want to help!
PRESENTATION TITLE ON ONE LINE AND ON TWO LINES
First and last namePosition, company
THANK YOU!
@siculars
Please stay in touch:
Any questions?