spread the database love with heterogeneous replication · ©continuent ltd 2017 heterogeneous...

35
©Continuent Ltd 2017 Spread the Database Love with Heterogeneous Replication MC Brown, VP, Products

Upload: others

Post on 12-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Spread the Database Love with Heterogeneous Replication

MC Brown, VP, Products

Page 2: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Heterogeneous Replication is NOT

• Exporting and Importing Data

• Moving to a different database platform

• One Time Exports

• ETL

Page 3: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Heterogeneous Replication IS

• Live, constant, low-latency movement of data

• For analytics

• For migration

• For upgrades

• For Caching

• Data/format matching

• Effective target reproduction

Page 4: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Know Your Databases

Page 5: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Not all Databases are Created Equal

• Transactional over non transactional

• Object Reference

• Rows

• Columns

• Documents

• Free text

• Unstructured

Page 6: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Is that a record, a field, a row a column?

• Row of data?

• Collection of related tables?

• What does it look like as a document?

• What does a document look like as a row?

• Databases, tables, collections, objects, buckets…

Page 7: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Related Tables or Document?

Page 8: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Mapping DB Compatibility

RDBMS Columnar Store Document Database

Freetext/unstructured data store

RDBMS Vendor specific only

Vendor specific only

Field mappings only Application specific

Columnar Store Vendor specific only

Vendor specific only

Field mappings only Application specific

Document Database

Field mappings only

Field mappings only

Vendor specific only Application specific

Freetext/unstructured data store

Application specific Application specific Application specific Application specific

Vendor specific - i.e. unique data typesField mappings - how we map the data

App Specific - how the data is used

Page 9: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Know Your Data

Page 10: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Hetero Replication Challenges

• Effective data replication

• nothing lost or removed

• low latency

• Automatic mapping

• Data typing

• Indexing and native use

Page 11: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Challenges: Data Typing

• Data types are not supported everywhere

• For some, the type does not matter

• Even if the type does matter, the format, precision, structure might be different

• Numbers, Dates, Strings, Compound Data Types all cause problems

Page 12: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Extraction/Apply Rates

• Data extraction rates vary

• Data apply rates

• Different solution handle data loading at different rtes

• Rows-based extraction/bulk apply

• Bulk extraction/row apply

• Non-destructive

Page 13: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Whats the solution?

Page 14: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Replicator Needs

• Native, neutral format

• Ability to change, reformat, restructure information

• Standalone nature

• Two-way

• Handle impedance problem

Page 15: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Guess What?

• Tungsten Replicator does this

• High Performance

• Flexible storage interchange format

• Built-in filtering

• Operates standalone

• Stop and restart

• Transactionally consistent

• Open Source

Page 16: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Applying Data

Page 17: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Native is Best, Batch an Alternative

• Native:

• Applying to JDBC

• Adapt JDBC Applier to construct statement

• Or apply a record to target using API

• Batch

• Use CSV for data interchange

• Call scripts to import

Page 18: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

How Batch Apply Works

Replicator

Service ora2vrTransactions from master

CSVFilesCSVFilesCSVFiles

StagingTablesStagingTablesStagingTables

Base Tables

Base Tables

Base Tables

Merge Script

(or)COPY

directly to base tables

COPY to stage tables SELECT to

base tables

Page 19: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

How Batch Apply Works

• Works on one table at a time

• Five functions in JavaScript

• Prepare - Run when going online

• Begin - Start of transaction

• Apply - During transaction

• Commit - After transaction

• Release - When going offline

Page 20: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

During a transaction

• Copy, import, load the CSV

• Have access to column, key and transaction information

• Merge the data

• Delete and Insert, or

• Delete, Update and Insert

• Done

Page 21: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Case Study: Cassandra/CQL

• Load table data:

• COPY staging_tablename (optype,seqno,uniqno,id,message) from ‘FILENAME’

• Delete:

• delete from sample where id in (#{deleteidlist})

• Insert:

• insert into sample ("+collist+") values ("+substlist+")

Page 22: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Filters

Page 23: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Filter Execution

Extract Filter Apply

StageExtract Filter Apply

Stage

MySQLMaster

TransactionHistory Log

In-MemoryQueue

Slave ReplicatorsBinlog

tcp/ip

Page 24: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Filter Operation

• Always get one transaction at a time

• Transaction must be processed inline

• Metadata

• Data blocks

• SQL or ROW Info

• Always returns the transaction

Page 25: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

JS Filters

• prepare() - called when going online

• filter() - does the work

• release() - called when going offline

• Access to:

• Connection to DB

• Full Java class environment

• Bunch of utility functions

Page 26: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Data Structure

ReplDBMSEvent DBMSData StatementData

DBMSData StatementData

DBMSData RowChangeData OneRowChange

OneRowChange

...

StatementData

ReplDBMSEvent DBMSData RowChangeData OneRowChange

OneRowChange

...

Page 27: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Get/Set Valuesfor(j = 0; j < rowChanges.size(); j++){ oneRowChange = rowChanges.get(j); columns = oneRowChange.getColumnSpec(); columnValues = oneRowChange.getColumnValues(); for (c = 0; c < columns.size(); c++) { columnSpec = columns.get(c); type = columnSpec.getType(); if (type == TypesDATE || type == TypesTIMESTAMP) { for (row = 0; row < columnValues.size(); row++) { values = columnValues.get(row); value = values.get(c);

if (value.getValue() == 0) { value.setValueNull() } } } }}

Page 28: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

What you can do in a filter

• Anything

Page 29: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Case Study: Building a Kafka

Applier

Page 30: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Kafka?

• Message queue/bus

• Full publish/subscribe model

• Huge flexible

• Very practical

• High performance

• Not a database

Page 31: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Message Format for Data

• Embedded JSON

• CSV Row

• Encoded binary fields

• Message topic?

• Schema/table/primary key?

Page 32: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Impedance

• What happens with multi-row transactions?

• What happens when a multi-row transaction is not applied?

• Should we split data into chunks?

Page 33: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

What we do already

Sources TargetsMySQL MySQLOracle Oracle

RedShiftVerticalHadoop

TextSQLite

RabbitMQS3

MongoDB

Page 34: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

What are we adding?

Sources TargetsREST API Input Cassandra

MongoDB Amazon AthenaCouchbase CouchbaseCouchDB CouchDB

PostgreSQL ElasticSearchFlumeKafka

Native JDBC to HadoopPostgreSQL

Page 35: Spread the Database Love with Heterogeneous Replication · ©Continuent Ltd 2017 Heterogeneous Replication IS • Live, constant, low-latency movement of data • For analytics •

©Continuent Ltd 2017

Where Next

• github.com/continuent/tungsten-replicator

• mcb.guru