oracle goldengate for big data - rainfocus · pdf filereal time big data analytic platform...

65

Upload: truongminh

Post on 23-Feb-2018

265 views

Category:

Documents


10 download

TRANSCRIPT

Page 1: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)
Page 2: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Oracle GoldenGate for Big DataCON6898

Thomas VengalDirector of Product Management, Oracle Data Integration@thomasvengal

Vengata GuruswamyPrincipal Data Administrator, LendingClub

Rajit SahaPrincipal Engineer, LendingClub

October 04, 2017

Page 3: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Confidential – Oracle Internal/Restricted/Highly Restricted 3

Page 4: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Big Data Challenges & Market Overview

GoldenGate for Big Data – What's New?

GoldenGate for Big Data - Roadmap

Customer Spotlight – LendingClub

1

2

3

4

Confidential – Oracle Internal/Restricted/Highly Restricted 4

Page 5: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Big DataCreating Value

Oracle Confidential

Page 6: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 6

Big Data ?

6

SensorsRFIDS

Devices

Ve

loci

ty

Volume

Mobile dataLocation data

E-commerce Weblogs Simple Social MediaSearch Marketing

Sentiment AnalysisNatural Language ProcessingUnstructured data

Short TextsImageAudio/ VideoBiometrics

ERP dataInventoryPayablesFinancials

CRM dataCustomer dataSales PipelineSales Orders

Predictive AnalysisComplex Event Processing

Transaction Data

Simple Interaction Data

Complex Interaction Data

Complex Social Media DataSocial Media Marketing

Page 7: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 7

Transition of High Volume Data Processing

1960-1990

Mainframes

1990-2010

Data Warehousing

2010-2030

Agile DataMart's &

Big DataLakes

Page 8: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 8

Data is New Age Capital

84%

$8 Trillion

Percentage of the market value of S&P 500 companies that comes from intangible assets, including data and software 2

Possible value of intangible assets, including data, in the United States 2

Source:1 MIT Technology Review, 2016 http://www.oracle.com/us/dm/mit-oracle-report-2952388.pdf2 Ocean Tomo LLC, 2015 Intangible Asset Market Value Study; The Wall Street Journal

“for most companies, their data is their single biggest asset. Many CEOs in the Fortune 500 don’t fully appreciate this fact.”

Andrew W. Lo, Director, MIT Laboratory for Financial Engineering1

Page 9: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

How to CreateValue from Your Data?

improvements.

…there are no industries in

which the ability to

continuously integrate new

sources of data of any format

and quality would not

generate massive

McKinsey Global Institute

December 2016

http://www.mckinsey.com/business-functions/mckinsey-analytics/our-insights/the-age-of-analytics-competing-in-a-data-driven-world

Page 10: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

GOLDENGATE FOR

BIG DATA

10

Page 11: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

GoldenGate for Big Data

(Running On-Premises or Cloud)

Replicat

Parameters

Big Data

Properties JAR

Oracle GoldenGate for Big Data

Modular & Pluggable Architecture Kafka

HiveHDFS

HBASE

Flume

Capture Trail Files NetworkFirewallCloud

Trail Files Native Java

Replicat

JMS

Mongo

11

Elastic

Cassandra

JMS

JDBC

KinesisOSA

High PerformanceLow Impact and Non-IntrusiveFlexible and HeterogeneousResilient and FIPS SecureBig Data and Cloud

Page 12: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Ca

ptu

re

Tra

il

Ro

ute

De

liv

er

Pu

mp

GoldenGate for Big Data: Use Cases

New DB/HW/OS/APP

Zero Downtime Upgrades & Data Migration

Fully Active Distributed DB

High Availability& Disaster Recovery

ApplicationOffloading

Query & Report Offloading

Big Data, DW & Marts

Real-time BI, Hadoop Data Staging, Data Ingestion

Real-time BI, Hadoop Data Staging, Data Ingestion

Databus, Event Driven Architecture, SOA/MQ

Message Bus& Data Grid

Data SynchronizationAcross the Enterprise

Global Data Centers

Real-time Analytics& Massive Parallelization

Real-time Analytics& Massive Parallelization

DataStreaming

GoldenGateGoldenGate

Real-timeData Delivery

12

JMS

JMS

Page 13: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Speed Layer

ApplicationsApplicationsApplications

Ca

ptu

re

Tra

il

Ro

ute

De

liv

er

Pu

mp

Streaming Analytics

Application

RESTServices

VisualizationTools

ReportingTools

Data Marts

UserUpdates

DBMSUpdates

Real-Time Data Ingestion of Big Data

GGGG GGGG

Applications Speed Layer

Batch Layer

Oracle Stream AnalyticsOracle Stream Analytics

Data Science

Workbench

Oracle EventHubOracle EventHub

Oracle Data IntegratorOracle Data Integrator

Consumers

Page 14: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Oracle GoldenGate Big Data -12.3.0New Features (Dec 2016)

Generic JDBC TargetsStatement caching, REPERROR and HANDLECOLLISIONS, metadata mapping

Highly Functional NoSQL TargetsCassandra & MongoDB

Improved Performance

Coordinated Apply

New Cloud and Data Warehousing TargetsAWS Redshift, IBM PureData (Netezza)

Page 15: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Oracle GoldenGate Big Data -12.3.1New Features (Aug 2017)

OGG Core 12.3Latest OGG Technology and Trail formats with Oracle DB 12.2

New Targets Elasticsearch, AWS Kinesis,

Confluent Kafka ( uses Kafka Connect API, Avro Converter and Schema Registry)

Dynamic Streaming

Mapping Templates for Dynamic Mapping of Topic Names and Partition Key

New CertificationsApache Hadoop 2.8, CDH 5.10, 5.11, HDP 2.6

Hive 2.x, Kafka 0.10.x, 0.11.x, Cassandra 3.11…& more

PerformanceImproved Performance upto 20%

Page 16: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

GoldenGate for Big DataRoad Ahead

Confidential – Oracle Internal/Restricted/Highly Restricted 16

Page 17: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Roadmap: Oracle GoldenGate Big Data Next 12 months

Data Integration Platform Cloud Seamless Big Data integration across ETL, Replication, Data Quality and Metadata Management

Big Data Sources

Kafka, MongoDB, Cassandra, Hadoop

New Big Data and Cloud TargetsAWS(S3, EMR), Kudu, Flink, Solr, Hive Streaming, NiFi

New FormatsORC, Parquet

New IntegrationsMicroservices, Veridata for Big Data

Workflow Processing for Higher PerformanceAWS S3 to Redshift, Netezza –NZLoad, Teradata-TPT

Page 18: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 18

Integrate Any Data Shape, Speed, Action, Volume & LocationContinued Focus on Our Vision:

Any Data Location Cloud Infrastructure

Any Data Volume Open Source Platforms

Any Data Action Dataflow | Pipes

Any Data Speed Lambda

Any Data Shape Polyglot

Page 19: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Real Time Big Data Analytic Platform with Oracle GoldenGate for Big Data

Rajit Saha Principal Big Data EngineerVengata(Venky) Guruswamy Principal Database Administrator

Page 20: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

� Introduction - LendingClub

� Real-time Big Data Platform Use cases

� Lambda Architecture – On-Premise / AWS

� Data Processing Flow/Algorithm

� GoldenGate for Big Data Implementation Architecture, Configuration and Troubleshooting Scenarios

Agenda

20

Page 21: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential21

LendingClub

LendingClub is America’s largest online marketplace connecting borrowers and investors.

� Headquartered in San Francisco, CA

� Office in Westborough, MA

Page 22: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential22

Product Lines

Personal Loans

Loans up to $40K

600+ FICO

36 and 60 mo. terms

Small Business

Business Loans up to $300K

At least $75,000 in annual sales

At least 2 years in business

Patient Financing

Extended plans up to $50K; no-interest plans up to $32K

Auto Refinance

Must have an outstanding balance of $5K-$55K, initiated in the last 3 months with 24 months of remaining payments

Page 23: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

� Maintain Reporting Databases

� Implement transaction history for key tables(Change Data capture)

� Enable real-time feed to Big data platform

23

Use Cases of Oracle GoldenGate

@

LendingClub

Page 24: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

� LendingClub’s Big Data Platform isresponsible for generating thousands ofreports for the company – Daily, Monthly,Quarterly� Investor Reports, Financial Reports, CollectionReports, Risk Reports, Marketing Reports etc.

24

Why Big Data Platform ?

Why Real-time Data Needed ?

• Near Real-time Availability of OLTP Data inHadoop Based OLAP warehouse

• Thousands of Tables and rapidly increasingstorage needs. rapidly increasing storageneeds.

• Increasingly additional 100s of internal AdhocUsers added every month. every month.

Page 25: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

� Typical changes are:

� Positive Response from customers

� Indicated DON’T CALL on a particular phone numberfrom previous contacts.

� Under certain circumstance, it can also be limited onhow many times we can contact borrower by legal orregulatory reason.

� NO CALL list needs be uploaded and fed into dialer ina very timely fashion, and ideally it should be near real-time at the event/change occurred.

25

NO Call list is a list of phone numbers we upload to dialer system in order to suppress Collections calls due to loan and/or customer status changes throughout the day.

Real-time Use Cases NO CALL List Generation

Page 26: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential26

What is it?

Why are we using

Real-time?

Who is it for?

Real-time Use Cases Marketing Dashboard

Track the progress of all the marketing channels with respect to quarterly forecast.

Business needs numbers every hour to change the strategy in case any channel performance compared to target.

Marketing Team

Page 27: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential27

Lambda Architecture On-Premise HortonWorks Cluster

DATADATA

SCHEMASCHEMA

Hive MetadataHive

Metadata

EXTRACT

TRAIL

REPLICAT

INCREMENTAL

TRANSACTIONS

Data

Science

CONSUMERS

TRAIL

TRAIL

Page 28: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential28

Lambda Architecture AWS S3/EMR Clusters

DATADATA

SCHEMASCHEMA

Hive MetadataHive

Metadata

EXTRACT

TRAIL

REPLICAT

INCREMENTAL

TRANSACTIONS

Data

Science

CONSUMERS

TRAIL

TRAIL

Page 29: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

SPEED LAYER - DATA FLOW DIAGRAM

Page 30: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

� DDL Verification from Oracle Source to GG Table before processing starts

� Data Inserted from Avro GG Table to CDC partition table

� Hive Dynamic Partition on OP_TS

� ORC (Optimized Row Columnar) Format

� Nightly Snapshot Creation –

� Latest Updates from CDC tables and rest from previous

day Snapshot

� Create Hive and Presto Real Time View with CDC and Nightly

Snapshot

� Data Validation : Full table Checksum with Sqoop Snapshot

30

Few Important Steps in Speed Layer Processing

Process Flow ..

Page 31: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

How are we getting latest update – For a Table , within a transaction for a row

CREATE EXTERNAL TABLE SCHEMA_GG.FOO

( TABLE STRING, OP_TYPE STRING, OP_TS STRING, CURRENT_TS STRING ,

POS STRING ,

PRIMARY_KEYS ARRAY<STRING> , TOKENS MAP<STRING,STRING> ,PKEY STRING ,......

{"TKN-OPTYPE":"INSERT","TKN-RECORDLENGTH":"1400","TKN-LOGRBA":"1774157","TKN-OBJECTNAME":”SCHEMA.FOO","TKN-FILERBA":"","TKN-LAGMSEC":"54420","TKN-

USERNAME":”XXX","TKN-SCN":"7633890649912","TKN-RSN":"7633890649838","TKN-

TRANSACTIONINDICATOR":"BEGIN","TKN-RECORDTIMESTAMP":"2017-07-30 23:49:14","TKN-LOGPOSITION":"19784208","TKN-FILESEQNO":"","TKN-ROWID":"AAA46FACCAAD4CtAAS","TKN-XID":"184.13.1380795","TKN-COMMITTIMESTAMP":"2017-07-30 23:49:15.000000","TKN-REDOTHREAD":"1"}

SELECT TOKENS FROM SCHEMA_GG.FOO LIMIT 1;Input Table From OGGBD

All the transaction records goes to Hive Partitioned ORC CDC table- SCHEMA_RT_CDC.FOO

SELECT DISTINCT VW.PKEY,.. FROM (SELECT FOO.ID, .. RANK() OVER (PARTITION BY

FOO.PKEY ORDER BY FOO.TKN-SCN DESC, FOO.TKN-RSN DESC, FOO.POS DESC, FOO

.CURRENT_TS DESC) AS DEDUP_RANK FROM SCHEMA_RT_CDC.FOO WHERE

FOO.OP_TS_DATE >='2017-07-30') VW WHERE VW.DEDUP_RANK = 1

Page 32: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

Verification of Real Time View definition

SELECT PKEY, OP_TS, TKN_SCN, TKN_RSN, POS, CURRENT_TS, MODIFIED_D, OP_TYPEFROM SCHEMA_RT_CDC.FOO WHERE PKEY = 2181322736 ORDER BY TKN_SCNDESC,TKN_RSN DESC,POS DESC,CURRENT_TS DESC

Records from CDC

SELECT PKEY, MODIFIED_D FROM SCHEMA_RT.FOO WHERE PKEY = 2181322736Records from Real Time View

PKEY OP_TS TKN_SCN TKN_RSN POS CURRENT_TS MODIFIED_D OP_TYPE

21813227362017-06-29 02:32:33.0

7630245451950 7630245451909 00004193750007692575 2017-06-28 19:33:53.778 2017-06-28 19:32:33.0 U

21813227362017-06-29 02:32:33.0

7630245451950 7630245451909 00004193750007692575 2017-06-28 19:33:53.778 2017-06-28 19:32:33.0 U

21813227362017-06-29 00:39:16.001

7630236411132 7630236411130 00004191270019251264 2017-06-28 17:41:09.519 2017-06-28 17:39:16.0 U

21813227362017-06-29 00:39:16.001

7630236411132 7630236411130 00004191270019251264 2017-06-28 17:41:09.519 2017-06-28 17:39:16.0 U

21813227362017-06-28 17:02:59.002

7630205764444 7630205764072 00004187440038392981 2017-06-28 10:04:13.76 2017-06-28 10:02:58.0 I

21813227362017-06-28 17:02:59.002

7630205764444 7630205764072 00004187440038392981 2017-06-28 10:04:13.76 2017-06-28 10:02:58.0 I

21813227362017-06-28 17:02:59.002

7630205764444 7630205764072 00004187440038392981 2017-06-28 10:04:13.76 2017-06-28 10:02:58.0 I

21813227362017-06-28 17:02:59.002

7630205764444 7630205764072 00004187440038392981 2017-06-28 10:04:13.76 2017-06-28 10:02:58.0 I

2181322736 2017-06-28 19:32:33.0

Page 33: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

SLAs

Transaction -> Import Process Start

Avg ~5mins

Transaction -> Import Process End

Avg ~ 5 – 15 mins

Spikes are for NIGHTLY SNAPSHOT Creation

1. Full table creation

2. TEZ is not good for Skew JOIN

3. Cluster Busy

Page 34: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

GoldenGate for Big Data ImplementationArchitecture

Page 35: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential35

GoldenGate for Big Data Pipeline

Db Node 1

DB

Extract

Data Pump

GoldenGate

AdapterServers (HA)

Replicat for HDFS Hive

Db Node 1 TRAIL FILE 1

TRAIL FILE 2

TRAIL FILE 3

TRAIL FILE 1

TRAIL FILE 2

TRAIL FILE 3

Page 36: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential36

GoldenGate for Big Data Infrastructure Architecture

Page 37: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential37

GoldenGate for Big Data Infrastructure Architecture – Site Failover

Page 38: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

GoldenGate for Big Data

Configuration

Page 39: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

Source Extract Parameter- eprod.prm

NOCOMPRESSUPDATESGETUPDATEBEFORESTABLE <schema name>.*, tokens (TKN-COMMITTIMESTAMP = @GETENV('GGHEADER', 'COMMITTIMESTAMP'),TKN-FILESEQNO = @GETENV('RECORD', 'FILESEQNO'),TKN-FILERBA = @GETENV('RECORD', 'FILERBA'),TKN-LAGMSEC = @GETENV('LAG', 'MSEC'),TKN-LOGPOSITION = @GETENV('GGHEADER', 'LOGPOSITION'),TKN-OBJECTNAME = @GETENV('GGHEADER', 'OBJECTNAME'),TKN-OPTYPE = @GETENV('GGHEADER', 'OPTYPE'),TKN-RECORDLENGTH = @GETENV('GGHEADER', 'RECORDLENGTH'),TKN-RECORDTIMESTAMP = @GETENV('RECORD', 'TIMESTAMP'),TKN-ROWID = @GETENV('RECORD', 'ROWID'),TKN-RSN = @GETENV('RECORD', 'RSN'),TKN-SCN = @GETENV('TRANSACTION', 'CSN'),TKN-TRANSACTIONINDICATOR = @GETENV('GGHEADER', 'TRANSACTIONINDICATOR'),TKN-USERNAME = @GETENV('TRANSACTION', 'USERNAME'),TKN-XID = @GETENV('TRANSACTION', 'XID'))

Page 40: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

Source Data Pump Extract Parameter - pprod.prm

EXTRACT PPRODPASSTHRURMTHOST <Goldengate adapter server>, MGRPORT <port no>

-- Trail file written by Pump to the remote hostRMTTRAIL /filesystem/pumptrail_ALL/pt

-- Pass data through without mapping, filtering, conversion:

PASSTHRU-- Specify tables to be captured:TABLE *.*;

Page 41: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

Target – Replicat Parameter - rhdfs.prm

REPLICAT RHDFSTARGETDB LIBFILE libggjava.so SET property=dirprm/hdfs.propsREPORTCOUNT EVERY 1 MINUTES, RATEGROUPTRANSOPS 10000

-Schemas

MAP SCHEMA1.*, TARGET SCHEMA1_GG.*;MAP SCHEMA2.*, TARGET SCHEMA2_GG.*;

-Heartbeat

MAP DBASCHEMA.HEARTBEAT,TARGET DBASCHEMA_GG.HEARTBEAT;

Page 42: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

gg.handler.hdfs.type=hdfs##performance gg.handler.hdfs.maxFileSize=256mgg.handler.hdfs.fileRollInterval=5mgg.handler.hdfs.inactivityRollInterval=5m

#configgg.handler.hdfs.fileSuffix=.avrogg.handler.hdfs.partitionByTable=truegg.handler.hdfs.rollOnMetadataChange=truegg.handler.hdfs.format=avro_row_ocf

## hive jdbc configgg.handler.hdfs.schemaFilePath=/ogg/schemagg.handler.hdfs.rootFilePath=/ogg/datagg.handler.hdfs.hiveJdbcUrl=jdbc:hive2://@Hive_Server_Name:Hive_Port

Target – Replicat Parameters - hdfs.props

Page 43: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

Oracle GoldenGate to AWS S3 Connector

gg.handlerlist=hdfsgg.handler.hdfs.type=hdfs

gg.handler.hdfs.rootFilePath=s3://LCBUCKET/OGG

s3replicat.props

Custom Built Hadoop Client from Trunkhadoop-3.0.0-alpha3-SNAPSHOT

<property><name>fs.s3a.access.key</name><value><< AWS KEY >></value>

</property><property>

<name>fs.s3a.secret.key</name><value><<AWS SECRET KEY >></value>

</property><property>

<name>fs.s3a.proxy.host</name><value> << PROXY SERVER >> </value>

</property><property>

<name>fs.s3a.proxy.port</name><value><< PROXY PORT >></value>

</property><property>

<name>fs.s3a.connection.ssl.enabled</name><value>true</value>

</property><property> <name>fs.s3.impl</name> <value>org.apache.hadoop.fs.s3a.S3AFileSystem</value> </property>

<property><name>fs.s3a.server-side-encryption-algorithm</name>

<value>SSE-KMS</value>

</property><property><name>fs.s3a.server-side-encryption-key</name><value>arn:aws:kms:<< KEY >></value></property>

Hadoop Client Talks to s3 via Hadoop-AWS package via s3a protocol

EMR only understand s3 , so s3a to s3 mapping is done by

Page 44: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

GGSCI - OutputSecurity - Authentication

### Kerberos params

gg.handler.name.authType=kerberos

gg.handler.name.kerberosKeytabFile=/etc/security/keytabs/lcapp.headless.keytab

[email protected]

Page 45: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

GGSCI - OutputSecurity - Username/password Encryption

ORACLEWALLETUSERNAME ggadalias ggadaptersORACLEWALLETPASSWORD ggadalias ggadapters

gg.handler.hdfs.hiveJdbcUsername=ORACLEWALLETUSERNAME[ggadalias ggadapters]gg.handler.hdfs.hiveJdbcPassword=ORACLEWALLETPASSWORD[ggadalias ggadapters]

gg.handler.hdfs.hiveJdbcUserName=unencrypted_usernamegg.handler.hdfs.hiveJdbcPassword=unencrypted_password

Page 46: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

GGSCI - Output

Password encryption steps:

1. Add credential store:

GGSCI > ADD CREDENTIALSTORECredential store created in ./dircrd/.

2. Add the credential for these users.ALTER CREDENTIALSTORE ADD USER <Hive_username>, password <Hive_password> alias ggadalias domain ggadapters

Security-Credential Store

Page 47: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

How HDFS Handler handles the Data Flow ?

• The output format chosen is Avro row ocf format for three reasons

• Integration with Hive

• Seamless Schema Evolution

• More compact than Avro op ocf

• There are two files produced namely Avro and Avsc files.

Page 48: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

GGSCI - OutputConfigure High Availability for GoldenGate Resource

As ROOT user. agctl add goldengate gg1 --gg_home /bdggvol/ggaws --instance_type dual --nodes node1,node2 -network 1 --ip 99.99.99.99 --user appuser --group oinstall--filesystems ora.registry.acfs --environment_vars"LD_LIBRARY_PATH=/opt/java/default/jre/lib/amd64/server:/lib, PATH=/opt/java/default/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin, JAVA_HOME=/opt/java/default"

$GRID_HOME/bin/crsctl setperm resource xag.gg1-vip.vip -u user:oracle:rwx

$GRID_HOME/bin/crsctl setperm resource xag.gg1-vip.vip -u group:oinstall:rwx

$GRID_HOME/bin/crsctl setperm resource xag.gg1.goldengate -u group:oinstall:rwx

$ GRID_HOME/bin/crsctl setperm resource ora.net1.network -u user:appuser:rwx

Page 49: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

GGSCI - OutputSchema Regex

Avro Doesn’t like ($) symbol ! – No problem :

gg.schemareplaceregex=[$:]

gg.schemareplacestring=_

Page 50: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

� Upgrade from 12.2 to 12.3 has changed the timestamp format from [YYYY-MM-DD:HH24:MI:SS.FFF] to [YYYY-MM-DD HH24:MI:SS.FFF]

� To avoid the dreaded ERROR OGG-15050 Error loading Java VM runtime library: (2 No such file or directory) ensure to set the following parameter JAVA_HOME, LD_LIBRARY_PATH and PATH properly. Remember manager passes the ENV variables.

� It’s a multi-threaded process and ensure to allocate sufficient Unix resources like NPROC ..

� Please monitor the heart beat table from Hive.

� Design the application to be Idempotent.

� Run the HDFS handler from dedicated hadoop client node.

50

Goldengate for Big Data – Things To Remember

Page 51: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

Troubleshooting Scenarios

Page 52: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

GGSCI - OutputRewind – Functionality

Page 53: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

GGSCI - OutputBefore Patch - 17995064 �

Page 54: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

GGSCI - OutputAfter Patch - 17995064 ☺

Page 55: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

GGSCI - OutputLobs-Clobs-Blobs

• Data team needed the CLOBS even though the CLOB is not updated.

• GETUPDATEBEFORES doesn’t work for CLOB

• Lobs :Always Include LOBS In Trail File (Doc ID 1639717.1)

• Include the extract parameters NOCOMPRESSUPDATES and FETCHCOLS option in the TABLE parameter, for example:

NOCOMPRESSUPDATES

TABLE schema.mytable, fetchcols (mylobs1, mylob2);

Page 56: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

� SR 3-15688859001 : Data loss after a glitch between goldengate

adapter and namenode

� Please test the rewind functionality and ability to handle duplicate data

� Please upgrade to latest release - OGGBD 12.3.1.1.0 which contains the fix for theabend issue.

� Monitor heart beat table in Hive.

� Monitor the ERROR messages from logfiles - RHDFS_info_log4j.log [Splunk]

� java.io.EOFException: Premature EOF: no length prefix available

56

Ability to handle hadoop failures

Page 57: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Confidential

GoldenGate for Big Data - Monitoring [Wavefront]

Page 58: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Q & A

Thank You

Page 59: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Get a sneak peek at cutting-edge data integration designs and receive a free gift!

• Oracle is constantly developing new software and features that will make your work easier, and Oracle's User Experience team would love to get your feedback on new data integration designs.

• Feedback sessions will take place at a date and time of your own choice.

• You can take part via webconference, from the comfort and convenience of your own office.

• If you’re interested, please fill out the 1-page form at http://bit.ly/2vIHlSg uppercase I lowercase l

• To show our appreciation, we will post all participants their choice from a wide selection of thank-you gifts.

Oracle Confidential – Highly Restricted 59

Page 60: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Presen-tationson:

Oracle Confidential 60

Data Integration Programming – FOCUS ON DOC LINK

DemoStations:

Hands-on Labs:

OracleEnterprise

Data Quality

OracleGoldenGate

Oracle Data Integrator

OracleData Integration Platform Cloud

OracleEnterprise Metadata

Management

Oracle GoldenGateReal-Time Data Replication

in the CloudHOL7715

Oracle Enterprise Data Quality

HOL7653

ODI and OGGfor Big Data

HOL7708

Oracle Data Integration Platform Cloud

HOL7673

The EXchangeIntegration Area- Moscone West

The EXchangeAnalytics & Big Data Area

- Moscone West

The EXchangeData Management Area

- Moscone West

Page 61: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential 61

Data Integration Programming – FOCUS ON DOC LINK

Sunday, October 1• Lift and Shift Workloads to Cloud with Oracle Data Integration Platform

Cloud [SUN6653]

• Data Movement between On-Prem, Fusion ERP Cloud, Fusion HCM Cloud

and Salesforce [SUN7286]

• Accelerate Migration to Cloud Infrastructure with Data Integration Platform

[SUN6896]

Monday, October 2• Oracle Data Integration Platform Strategy and Roadmap [CON6646]

• Filling Your Data Lake with Potable Data, Using Data Integration [CON5465]

• GoldenGate : Deep Dive into Automating OGG using the new Microservices

[CON6569]

• Oracle Data Integration Platform: Foundation for Cloud Integration

[CON6650]

• Oracle Data Integration Platform Empowers Enterprise Grade Big Data

Solutions [CON6893]

• Oracle Data Integration Platform Cloud Deep Dive [CON6651]

• Oracle GoldenGate Cloud Service: Real-Time Data Replication in the Cloud

[HOL7715]

Tuesday, October 3• Oracle Data Integrator Product Update and Strategy [CON6654]

• Oracle Enterprise Data Quality: Product Overview and Roadmap [CON6656]

• Accelerate Cloud On-Boarding Using Oracle GoldenGate Cloud Service

[CON6894]

• Oracle Enterprise Data Quality for All Types of Data [HOL7653]

• Oracle Data Integration Platform: a Cornerstone for Big Data [CON6655]

• GoldenGate: MAA and Best Practices for Oracle GoldenGate Microservices

[CON6570]

• Oracle GoldenGate Product Update and Strategy [CON6897]

Wednesday, October 4• A Practical Path to Enterprise Data Governance at Energy Australia [CON6657]

• Oracle Data Integrator and Oracle GoldenGate for Big Data [HOL7708]

• Introduction to Oracle Data Integration Platform Cloud [HOL7673]

• An Enterprise Databus: GoldenGate in the Cloud Working with Kafka and

Spark (CON6895]

• GoldenGate: Best Practices & Deep Dive on OGG 12.3 Microservices at Cloud

[CON6568]

• Oracle GoldenGate for Big Data [CON6898]

• Oracle Data Integration Platform Cloud Service Governance Edition

[CON6652]

Page 62: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Connect with Oracle Integration

@OracleDI

Blogs.oracle.com/DataIntegration/

Oracle Data Integration

Oracle Data Integration

Oracle FMW

@OracleIntegrate

Blogs.oracle.com/Integration/

Oracle SOA

Page 63: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 63

Stay Informed During and After OpenWorld

Twitter: @OracleExadata, @OracleBigData, @Infrastructure Follow #CloudReady

LinkedIn: Oracle IT Infrastructure– Oracle Showcase PageOracle Big Data – Oracle Showcase Page

Stay Informed During and After OpenWorld

Twitter: @OracleExadata, @OracleBigData, @Infrastructure Follow #CloudReady

LinkedIn: Oracle IT Infrastructure– Oracle Showcase PageOracle Big Data – Oracle Showcase Page

Page 64: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)
Page 65: Oracle GoldenGate for Big Data - RainFocus · PDF fileReal Time Big Data Analytic Platform with Oracle GoldenGate for Big Data RajitSaha Principal Big Data Engineer Vengata(Venky)

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The preceding is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Oracle Confidential 65