spread the database love with heterogeneous replication · ©continuent ltd 2017 heterogeneous...
TRANSCRIPT
©Continuent Ltd 2017
Spread the Database Love with Heterogeneous Replication
MC Brown, VP, Products
©Continuent Ltd 2017
Heterogeneous Replication is NOT
• Exporting and Importing Data
• Moving to a different database platform
• One Time Exports
• ETL
©Continuent Ltd 2017
Heterogeneous Replication IS
• Live, constant, low-latency movement of data
• For analytics
• For migration
• For upgrades
• For Caching
• Data/format matching
• Effective target reproduction
©Continuent Ltd 2017
Know Your Databases
©Continuent Ltd 2017
Not all Databases are Created Equal
• Transactional over non transactional
• Object Reference
• Rows
• Columns
• Documents
• Free text
• Unstructured
©Continuent Ltd 2017
Is that a record, a field, a row a column?
• Row of data?
• Collection of related tables?
• What does it look like as a document?
• What does a document look like as a row?
• Databases, tables, collections, objects, buckets…
©Continuent Ltd 2017
Related Tables or Document?
©Continuent Ltd 2017
Mapping DB Compatibility
RDBMS Columnar Store Document Database
Freetext/unstructured data store
RDBMS Vendor specific only
Vendor specific only
Field mappings only Application specific
Columnar Store Vendor specific only
Vendor specific only
Field mappings only Application specific
Document Database
Field mappings only
Field mappings only
Vendor specific only Application specific
Freetext/unstructured data store
Application specific Application specific Application specific Application specific
Vendor specific - i.e. unique data typesField mappings - how we map the data
App Specific - how the data is used
©Continuent Ltd 2017
Know Your Data
©Continuent Ltd 2017
Hetero Replication Challenges
• Effective data replication
• nothing lost or removed
• low latency
• Automatic mapping
• Data typing
• Indexing and native use
©Continuent Ltd 2017
Challenges: Data Typing
• Data types are not supported everywhere
• For some, the type does not matter
• Even if the type does matter, the format, precision, structure might be different
• Numbers, Dates, Strings, Compound Data Types all cause problems
©Continuent Ltd 2017
Extraction/Apply Rates
• Data extraction rates vary
• Data apply rates
• Different solution handle data loading at different rtes
• Rows-based extraction/bulk apply
• Bulk extraction/row apply
• Non-destructive
©Continuent Ltd 2017
Whats the solution?
©Continuent Ltd 2017
Replicator Needs
• Native, neutral format
• Ability to change, reformat, restructure information
• Standalone nature
• Two-way
• Handle impedance problem
©Continuent Ltd 2017
Guess What?
• Tungsten Replicator does this
• High Performance
• Flexible storage interchange format
• Built-in filtering
• Operates standalone
• Stop and restart
• Transactionally consistent
• Open Source
©Continuent Ltd 2017
Applying Data
©Continuent Ltd 2017
Native is Best, Batch an Alternative
• Native:
• Applying to JDBC
• Adapt JDBC Applier to construct statement
• Or apply a record to target using API
• Batch
• Use CSV for data interchange
• Call scripts to import
©Continuent Ltd 2017
How Batch Apply Works
Replicator
Service ora2vrTransactions from master
CSVFilesCSVFilesCSVFiles
StagingTablesStagingTablesStagingTables
Base Tables
Base Tables
Base Tables
Merge Script
(or)COPY
directly to base tables
COPY to stage tables SELECT to
base tables
©Continuent Ltd 2017
How Batch Apply Works
• Works on one table at a time
• Five functions in JavaScript
• Prepare - Run when going online
• Begin - Start of transaction
• Apply - During transaction
• Commit - After transaction
• Release - When going offline
©Continuent Ltd 2017
During a transaction
• Copy, import, load the CSV
• Have access to column, key and transaction information
• Merge the data
• Delete and Insert, or
• Delete, Update and Insert
• Done
©Continuent Ltd 2017
Case Study: Cassandra/CQL
• Load table data:
• COPY staging_tablename (optype,seqno,uniqno,id,message) from ‘FILENAME’
• Delete:
• delete from sample where id in (#{deleteidlist})
• Insert:
• insert into sample ("+collist+") values ("+substlist+")
©Continuent Ltd 2017
Filters
©Continuent Ltd 2017
Filter Execution
Extract Filter Apply
StageExtract Filter Apply
Stage
MySQLMaster
TransactionHistory Log
In-MemoryQueue
Slave ReplicatorsBinlog
tcp/ip
©Continuent Ltd 2017
Filter Operation
• Always get one transaction at a time
• Transaction must be processed inline
• Metadata
• Data blocks
• SQL or ROW Info
• Always returns the transaction
©Continuent Ltd 2017
JS Filters
• prepare() - called when going online
• filter() - does the work
• release() - called when going offline
• Access to:
• Connection to DB
• Full Java class environment
• Bunch of utility functions
©Continuent Ltd 2017
Data Structure
ReplDBMSEvent DBMSData StatementData
DBMSData StatementData
DBMSData RowChangeData OneRowChange
OneRowChange
...
StatementData
ReplDBMSEvent DBMSData RowChangeData OneRowChange
OneRowChange
...
©Continuent Ltd 2017
Get/Set Valuesfor(j = 0; j < rowChanges.size(); j++){ oneRowChange = rowChanges.get(j); columns = oneRowChange.getColumnSpec(); columnValues = oneRowChange.getColumnValues(); for (c = 0; c < columns.size(); c++) { columnSpec = columns.get(c); type = columnSpec.getType(); if (type == TypesDATE || type == TypesTIMESTAMP) { for (row = 0; row < columnValues.size(); row++) { values = columnValues.get(row); value = values.get(c);
if (value.getValue() == 0) { value.setValueNull() } } } }}
©Continuent Ltd 2017
What you can do in a filter
• Anything
©Continuent Ltd 2017
Case Study: Building a Kafka
Applier
©Continuent Ltd 2017
Kafka?
• Message queue/bus
• Full publish/subscribe model
• Huge flexible
• Very practical
• High performance
• Not a database
©Continuent Ltd 2017
Message Format for Data
• Embedded JSON
• CSV Row
• Encoded binary fields
• Message topic?
• Schema/table/primary key?
©Continuent Ltd 2017
Impedance
• What happens with multi-row transactions?
• What happens when a multi-row transaction is not applied?
• Should we split data into chunks?
©Continuent Ltd 2017
What we do already
Sources TargetsMySQL MySQLOracle Oracle
RedShiftVerticalHadoop
TextSQLite
RabbitMQS3
MongoDB
©Continuent Ltd 2017
What are we adding?
Sources TargetsREST API Input Cassandra
MongoDB Amazon AthenaCouchbase CouchbaseCouchDB CouchDB
PostgreSQL ElasticSearchFlumeKafka
Native JDBC to HadoopPostgreSQL
©Continuent Ltd 2017
Where Next
• github.com/continuent/tungsten-replicator
• mcb.guru