building hybrid data cluster using postgresql and mongodb

32
Building a Hybrid Data Cluster with MongoDB and Postgres A solution based on PostgreSQL’s Foreign Data Wrapper 27 April 2015

Upload: ashnikbiz

Post on 15-Jul-2015

807 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Building Hybrid data cluster using PostgreSQL and MongoDB

Building a Hybrid Data Cluster with MongoDB

and PostgresA solution based on PostgreSQL’s Foreign Data Wrapper

27 April 2015

Page 2: Building Hybrid data cluster using PostgreSQL and MongoDB

Context and Customer scenario

Page 3: Building Hybrid data cluster using PostgreSQL and MongoDB

Customer Requirements for Hybrid Cluster

- More and more unstructured data being generated

- Increasing use and requirements of noSQL databases –because of

- usage scenario- ability to scale horizontally

- Challenges- A lot of Admin and Developer still prefer SQL as easy and

intutive tool to query information out of available data- Not many noSQL databases support complex queries as SQL

does e.g. JOINs, Sub-query etc

3

Page 4: Building Hybrid data cluster using PostgreSQL and MongoDB

Real Life Use Cases

- noSQL as Archive store of RDBMS- RDBMS being used to store the operational and transactional data

- while noSQL may act as an archive store for historical data

- noSQL for receiving write stream- noSQL databases being used to accumulate data from various sources

with high write throughput across multiple shards

- while RDBMS is used to store the filtered data after it has been transformed into proper structures

- RDBMS makes it easier for the users to query data using SQLs and JOINs

4

Page 5: Building Hybrid data cluster using PostgreSQL and MongoDB

Hybrid Data Cluster is the ‘need of hour’

Page 6: Building Hybrid data cluster using PostgreSQL and MongoDB

- Most Advanced Open Source Database

- Supports Relational model of storing database

- Supports ACID features of Transactions- Multi Version Concurrency Control

- Write Ahead WAL files

- Scalability with Tablespaces and Partitions/child tables

- Supports unstructured data-types (JSON, JSONB, HSTORE) and full text search features

PostgreSQL

6

Page 7: Building Hybrid data cluster using PostgreSQL and MongoDB

- Most popular noSQL Database for vast set of workloads

- Best for storing un-structured data

- Horizontal Scalability with sharding capability

- Provision for secondary indexes

- Aggregation and Map-reduce features

MongoDB

7

Page 8: Building Hybrid data cluster using PostgreSQL and MongoDB

- Get the best out of both the worlds

- Based on SQL/MED – Management of External Data

- Allows you to create FOREIGN TABLES which maps to external entities

- These entities could be - Table in RDBMS- collection in MongoDB- Or can be mapped respective entities in HDFS or File System

- More about FDW in Postgres: https://wiki.postgresql.org/wiki/Foreign_data_wrappers

Foreign Data Wrappers of PostgreSQL

8

Page 9: Building Hybrid data cluster using PostgreSQL and MongoDB

FDW for MongoDB

Page 10: Building Hybrid data cluster using PostgreSQL and MongoDB

- Started by CitusDB and then forked by EnterpriseDB

- More details - https://github.com/EnterpriseDB/mongo_fdw

- The example we will discuss here is based on a Blogpost from EnterpriseDB -http://www.enterprisedb.com/postgres-plus-edb-blog/jason-davis/tales-trenches-new-mongodb-fdw

- Let’s go through the Demo

MongoDB FDW

10

Page 11: Building Hybrid data cluster using PostgreSQL and MongoDB

Preparing the MongoDB

Page 12: Building Hybrid data cluster using PostgreSQL and MongoDB

- Platform: Windows 7- Create the directories that you will need

- cd d:\mongodb- mkdir a0- mkdir b0- mkdir c0- mkdir c1- mkdir c2- mkdir d0- mkdir d1- mkdir d2- mkdir cfg0- mkdir cfg1- mkdir cfg2

Prepare for a MongoDB Cluster

12

Page 13: Building Hybrid data cluster using PostgreSQL and MongoDB

mongod --configsvr --dbpath d:\mongodb\cfg0 --port 26050 --install --logpathd:\mongodb\cfg0.log --serviceName new_mongod_cfg0 --serviceDisplayNamenew_mongod_cfg0

net start new_mongod_cfg0

mongod --configsvr --dbpath d:\mongodb\cfg1 --port 26051 --install --logpathd:\mongodb\cfg1.log --serviceName new_mongod_cfg1 --serviceDisplayNamenew_mongod_cfg1

net start new_mongod_cfg1

mongod --configsvr --dbpath d:\mongodb\cfg2 --port 26052 --install --logpathd:\mongodb\cfg2.log --serviceName new_mongod_cfg2 --serviceDisplayNamenew_mongod_cfg2

net start new_mongod_cfg2

Create the services for MongoDB Cluster: ConfigServer

13

Page 14: Building Hybrid data cluster using PostgreSQL and MongoDB

mongod --shardsvr --replSet a --dbpath d:\mongodb\a0 --logpath d:\mongodb\a0.log --port 27000 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_a0 --serviceDisplayName new_mongod_shrd_a0

net start new_mongod_shrd_a0

mongod --shardsvr --replSet b --dbpath d:\mongodb\b0 --logpath d:\mongodb\b0.log --port 27100 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_b0 --serviceDisplayName new_mongod_shrd_b0

net start new_mongod_shrd_b0

mongod --shardsvr --replSet c --dbpath d:\mongodb\c0 --logpath d:\mongodb\c0.log --port 27200 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_c0 --serviceDisplayName new_mongod_shrd_c0

net start new_mongod_shrd_c0

Create the services for MongoDB Cluster: Create Shards

14

Page 15: Building Hybrid data cluster using PostgreSQL and MongoDB

- Though here for simplicity we have skipped the creation of replica set you can do that

- e.g. - mkdir a1

- mongod --shardsvr --replSet a --dbpath d:\mongodb\a1 --logpathd:\mongodb\a0.log --port 27001 --smallfiles --oplogSize 50 --install --serviceName new_mongod_shrd_a1 --serviceDisplayNamenew_mongod_shrd_a1

- net start new_mongod_shrd_a1

Create the services for MongoDB Cluster: Optionally Create the Replicas

15

Page 16: Building Hybrid data cluster using PostgreSQL and MongoDB

- mongos --configdbsameer:26050,sameer:26051,sameer:26052 --install --serviceName new_mongos_svc0 --serviceDisplayNamenew_mongos_svc0 --logpath d:\mongodb\mongos0.log --port 26060

- net start new_mongos_svc0

Initiate the Mongos

16

Page 17: Building Hybrid data cluster using PostgreSQL and MongoDB

- I am going to initiate 1 member replica set for all my shards

Initiate the Replica Set

17

- Shard Amongo --port 27000> rs.initiate()a:OTHER> rs.conf()a:PRIMARY> exit

- Shard Bmongo --port 27100> rs.initiate()b:OTHER> rs.conf()b:PRIMARY> exit

- Shard Cmongo --port 27200> rs.initiate()c:OTHER> rs.conf()c:PRIMARY> exit

Page 18: Building Hybrid data cluster using PostgreSQL and MongoDB

mongo --port 26060 test

mongos> sh.addShard("sameer:27100")

mongos> sh.addShard("sameer:27200")

mongos> sh.addShard("sameer:27000")

mongos> sh.enableSharding("db")

mongos> sh.shardCollection("db.warehouse",{warehouse_created:1},true)

Setup Sharding

18

Page 19: Building Hybrid data cluster using PostgreSQL and MongoDB

mongos> use db

mongos> db.createUser(

... {

... user: "superuser",

... pwd: "password",

... roles: [ { role: "root", db: "admin" } ]

... }

... )

Setup Users and Security

19

Page 20: Building Hybrid data cluster using PostgreSQL and MongoDB

Creating FDW Extension in Postgres

Page 21: Building Hybrid data cluster using PostgreSQL and MongoDB

- Download MongoDB FDW from Github

- Installation is quite easy when you use autogen.sh- Cd $PATH_WHERE_FDW_IS_EXTRACTED- ./autogen.sh

- It will automatically install all the required components- libbson- libmongoc

- Once installation is done then you can make and install- make -f Makefile.meta && make -f Makefile.meta install

Build MongoDB FDW

21

Page 22: Building Hybrid data cluster using PostgreSQL and MongoDB

- Allows you to build with Legacy Driver or Master Branch

- Has read and write capability for the foreign table

- Connection Pooling which uses the same MongoDB connection for queries in same session

- Build with MongoDB's legacy branch driver- autogen.sh --with-legacy

- Build MongoDB's master branch driver- autogen.sh --with-master

Features of mongo_fdw

22

Page 23: Building Hybrid data cluster using PostgreSQL and MongoDB

- Create Extension for mongo_fdw in PostgreSQL database

- You may create the table in template database

- Create a Foreign Data Server

- Create a user mapping a MongoDB user in Postgres

- Create Foreign Table which maps to a MongoDB Collection

Using mongo_fdw

23

Page 24: Building Hybrid data cluster using PostgreSQL and MongoDB

- psql=# CREATE EXTENSION mongo_fdw;

- psql=# CREATE SERVER mongo_server

FOREIGN DATA WRAPPER mongo_fdw

OPTIONS (address '192.168.160.1', port '26060');

- psql=# CREATE USER MAPPING FOR postgres

SERVER mongo_server

OPTIONS (username 'superuser',

password 'password');

Create Foreign Server: Example

24

Page 25: Building Hybrid data cluster using PostgreSQL and MongoDB

- psql=# CREATE FOREIGN TABLE warehouse(

_id NAME,

warehouse_id int,

warehouse_name text,

warehouse_created timestamptz)

SERVER mongo_server

OPTIONS (database 'db', collection 'warehouse');

Create Foreign Table: Example

25

Page 26: Building Hybrid data cluster using PostgreSQL and MongoDB

- It stores a unique Object ID

- By default if you skip this column MongoDB will insert a 12 Byte BSON Object ID

- While inserting data into MongoDB you may choose the value of this field

- In mongo_fdw you have to define _id column with its data type as “NAME”

- mongo_fdw will ignore the value inserted in _id column and let MongoDB

‘_id’ column of MongoDB

26

Page 27: Building Hybrid data cluster using PostgreSQL and MongoDB

- INSERT INTO warehouse values (0, 1, 'UPS', '2014-12-12T07:12:10Z');

- INSERT INTO warehouse values (0, 2, 'EMS', '2013-12-12T07:12:10Z');

- INSERT INTO warehouse values (0, 3, 'ASX', '2013-11-12T07:12:10Z');

- UPDATE warehouse set warehouse_name = 'UPS_NEW' where warehouse_id = 1;

DML on Foreign Tables

27

Page 28: Building Hybrid data cluster using PostgreSQL and MongoDB

- Connect to MongoDB- mongo --port 26060 --username superuser --password password

- Check the data in collection- db.warehouse.find()

Operations on MongoDB

28

Page 29: Building Hybrid data cluster using PostgreSQL and MongoDB

- You can run analyze on the foreign Table to collect statistics

- You can fire queries with “where” clause

- You may fire JOIN queries with other FOREIGN TABLE or NATIVE PostgreSQL Tables

Operations in Postgres on Foreign Data

29

Page 30: Building Hybrid data cluster using PostgreSQL and MongoDB

Live walkthrough of the Hybrid Cluster

Leverage upon complex SQLs with Sharded MongoDB

Page 31: Building Hybrid data cluster using PostgreSQL and MongoDB

Benefits of this Setup

- Build a sharded MongoDB cluster with SQL Interface

- Query MongoDB data using SQL

- Join MongoDB collections with each other or with tables in Postgres

- Combine and process MongoDB data with data from other data source with help of respective FDW e.g. Hadoop, Oracle, MySQL etc

- Add more shards on the go

- Add Replica for MongoDB on the go

- Use Postgres as front end to insert/update/delete data in MongoDB using SQL

31

Page 32: Building Hybrid data cluster using PostgreSQL and MongoDB

Send us your suggestions and questions

[email protected]

Stay Tuned!

Website: www.ashnik.com