breaking the-database-type-barrier-replicating-across-different-dbms

Post on 24-May-2015

932 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Sharing data between different DBMS types is an inevitable need in Today's diverse IT environments. Need for real-time data integration, seamless migration and data warehousing are the main reasons driving demand for heterogenous replication. In this talk we'll review how open source Tungsten Replicator can replicate data in real-time between databases like MySQL, PostgreSQL, Oracle, MongoDB and others. Join us for this both technical and enlightening talk.We'll cover fundamental steps behind configuring heterogeneous replication, the importance of transaction transforming filters and common challenges rising when replicating cross DBMS-type. We'll conclude with in-line demos to show you how it looks in action.

TRANSCRIPT

© Continuent 2010

Linas Virbalas Continuent, Inc.

© Continuent 2010

/  Definition & Motivation /  Scoping the Challenge /  MySQL ->

•  PostgreSQL •  Oracle •  MongoDB

/  Demo 1 /  PostgreSQL ->

•  MySQL

/  Demo 2 /  Q&A

© Continuent 2010

© Continuent 2010

Heterogeneous Replication

Replication between different types of DBMS

© Continuent 2010

1.  Real-time integration of data between different DBMS types

2.  Seamless migration out of one DBMS type to another 3.  Data warehousing (real-time) from different DBMS

types 4.  Leveraging specific SQL power of other DBMS types

© Continuent 2010

/  Name: Linas Virbalas /  Country: Lithuania /  Implementing for Tungsten:

•  MySQL -> PostgreSQL •  MySQL -> Greenplum •  MySQL -> Oracle •  PostgreSQL WAL •  PostgreSQL Streaming Replication •  PostgreSQL Logical Replication

via Slony logs

/  Blog: http://flyingclusters.blogspot.com

© Continuent 2010

© Continuent 2010

1.  MySQL -> … •  Replicating from MySQL to PostgreSQL/Greenplum, Oracle,

MongoDB

2.  PostgreSQL -> … •  Replicating from PostgreSQL to MySQL

© Continuent 2010

With Tungsten Replicator

© Continuent 2010

/  Open Source GPL v2 /  JAVA /  Interfaces to implement new:

•  Extractors •  Filters •  Appliers

/  Multiple replication services per one process

© Continuent 2010

Technology: Replication Pipelines

© Continuent 2010

© Continuent 2010

/  Statement Based Replication

/  Row Based Replication

© Continuent 2010

© Continuent 2010

Master Replicator

MySQL Extractor

Transaction History Log

Slave Replicator

PostgreSQL Applier

Transaction History Log

Filters Filters

© Continuent 2010

/  Provisioning /  Data Type Differences /  Database vs. Schema /  Default (Implicitly Defined) Schema Selection /  SQL Dialect Differences

•  Statement Replication vs. Row Replication

/  Character Sets and Binary Data /  Old Versions of MySQL

© Continuent 2010

Provisioning

/  Harder way: Dump data explicitly

/  Easier way: Replicate a mysqldump backup

Replicator

© Continuent 2010

MySQL PostgreSQL ! TINYINT SMALLINT

SMALLINT SMALLINT INTEGER INTEGER BIGINT BIGINT

! CHAR(1) CHAR(5) = {‘true’, ‘false’} CHAR(x) CHAR(x) VARCHAR(x) VARCHAR(x) DATE DATE TIMESTAMP TIMESTAMP

! TEXT (diff. sizes) TEXT ! BLOB BYTEA

/  Note the type differences between MySQL and PG

© Continuent 2010

Database vs. Schema

/  In MySQL these are the same: ! !CREATE DATABASE foo!

! !CREATE SCHEMA foo!

/  In PostgreSQL these are very different: CREATE DATABASE foo!! !CREATE SCHEMA foo!

/  Tungsten uses filters to rectify MySQL databases to PostgreSQL schemas

© Continuent 2010

MySQL Implicit MySQL Explicit CREATE SCHEMA s; CREATE SCHEMA s; USE s;

! CREATE TABLE t (i int); CREATE TABLE s.t (i int); ! INSERT INTO t (1); INSERT INTO s.t (1);

/  MySQL: Trivial to use `USE` /  MySQL: Going without `USE` generates different

events

/  PG: Extract the default schema from the event /  PG: Set it before applying

MySQL PostgreSQL USE s; > SET search_path TO s, "$user”;

© Continuent 2010

MySQL PostgreSQL CREATE TABLE complex (id INTEGER AUTO_INCREMENT PRIMARY KEY, i INT);

CREATE TABLE complex (id SERIAL PRIMARY KEY, i INT);

CREATE TABLE dt (i TINYINT); CREATE TABLE dt (i SMALLINT); …

/  Differences between DDL and DML statement SQL dialects

/  Row Replication resolves issues rising from differences in DML, but still leaves DDL to handle

/  Tungsten Replicator Filters come to the rescue! •  Simple to develop Java or JavaScript extensions •  Event structure IN -> Filter -> Event structure OUT

© Continuent 2010

MySQL PostgreSQL INSERT INTO embedded_blob (key, data) VALUES (1, ‘?\0^Es\0^\0\’’)

ARGH!!! (SQL statement fails)

create table xlate(id int, d1 varchar(25) character set latin1, d2 varchar(25) character set utf8);

ARGH!!! (no way to translate to common charset)

/  Statement replication: MySQL syntax is “permissive” /  Embedded binary / alternate charsets /  Different charsets for different clients

/  Row replication: database/table/column charsets may differ

/  Answer: Stick with one character set throughout; use row replication to move binary data

© Continuent 2010

MySQL Versions

/  Problem: Data stored on hard-to-replicate MySQL versions or configurations

•  Row replication not enabled (5.1) •  No row replication support (5.0, 4.1) •  Tungsten cannot read binlog (4.1)

/  Answer: MySQL blackhole replication •  (Blackhole = no store, just a binlog) •  Caveat: Check MySQL docs carefully

Replicator

© Continuent 2010

© Continuent 2010

Master Replicator

MySQL Extractor

Transaction History Log

Slave Replicator

Oracle Applier

Transaction History Log

Filters Filters

© Continuent 2010

/  TEXT length limitation •  VARCHAR(4000) => CLOB

/  Primary Keys and PrimaryKeyFilter •  Goal:

UPDATE t SET c1 = x1, c2 = x2, c3 = x3 WHERE p = p1

•  NOT:

UPDATE t SET c1 = x1, c2 = x2, c3 = x3 WHERE p = p1 AND c1 = x1 AND c2 = x2 AND c3 = x3 AND …!

© Continuent 2010

© Continuent 2010

> use mydb switched to db mydb!

> db.test.insert( {"test": "test value", "anumber" : 5 } )!

> db.test.find() { "_id" : ObjectId("4dce9a4f3d6e186ffccdd4bb"), "test" : "test value", "anumber" : 5 }!

> exit!

© Continuent 2010

/  MySQL binary log doesn’t hold column names

•  mysql> INSERT INTO foo (id, data) VALUES (1, 'hello from MySQL!');

•  If nothing done becomes:

> db.foo.find(); { "_id" : ObjectId("4dc55e45ad90a25b9b57909d"), " " : "1”, " " : "hello from MySQL!”}

•  Solution: to fill in column names on master side. Then:

> db.foo.find(); { "_id" : ObjectId("4dc55e45ad90a25b9b57909d"), ” " : "1”, “ " : "hello from MySQL!”}

© Continuent 2010

MySQL -> MongoDB: The Pipeline

© Continuent 2010

© Continuent 2010

© Continuent 2010

Logical Physical MySQL Statement Based x

MySQL Row Based x MySQL Mixed x

PostgreSQL WAL Shipping x PostgreSQL Streaming Replication x Filters (data transformation) possible + -

Different data/structure on slave possible

+ -

/  A transaction is not accessible to the replicator under physical replication

/  Tungsten Replicator manages WAL/Streaming Replication

© Continuent 2010

Logical Physical MySQL Statement Based x

MySQL Row Based x MySQL Mixed x

PostgreSQL WAL Shipping x PostgreSQL Streaming Replication x

Tungsten Replicator w/ PostgreSQLSlonyExtractor

x

Filters (data transformation) possible + - Different data/structure on slave

possible + -

/  With PostgreSQLSlonyExtractor transaction goes through the Replicator pipeline

Slave Replicator

MySQLApplier

Transaction History Log

Master Replicator

PostgreSQL SlonyExtractor

Transaction History Log

Filters Filters

© Continuent 2010

© Continuent 2010

/  We’ve reviewed an open source heterogeneous replicator (professional services available upon request)

/  Tungsten Replicator encapsulates the complexity and corner cases of the subject

/  Replicating: •  out of MySQL – now; •  out of PostgreSQL – prototype; •  out of Oracle – designs ready, awaiting sponsorship.

© Continuent 2010

© Continuent 2010

Open Source http://tungsten-replicator.org #tungsten @ irc.freenode.net

My Blog: http://flyingclusters.blogspot.com

Commercial sales@continuent.com

Continuent Web Site: http://www.continuent.com

top related