modernize your edw into big data warehousefiles.meetup.com/1444478/modernize your edw into big...

32
© 2016 Impetus Technologies - Confidential 1 Modernize your EDW into Big Data Warehouse Feb 2016

Upload: others

Post on 09-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential1

Modernize your EDW into Big Data Warehouse

Feb 2016

Page 2: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential2

• Sanjay Sharma– Principal Architect @– 18+ years in Software, 6+ with Hadoop– Started Hadoop User Group in 2009– Hortonworks, Cloudera, MapR certified

• Email - [email protected]• Tweet - @indoos

• Headquartered in Los Gatos; established in 1996; 1700+ employees; presences in Silicon Valley, Atlanta, NYC, and offshore.

• Unique products accelerated, innovation and R&D driven, vendor neutral consulting and solution services

Page 3: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential3

FIRST CAR

Early 1900s ~2015

Source: Google images

Page 4: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential4

SABRE -1960s

Ingres, IBM

System R-1970s

Entity-Relationship, or

ER-1976

SQL -1980

Early 1990s-ODBC

and Excel/Ac

cess, ODBMS

Internet driven

databases – late 1990s

Oracle, DB2,

Microsoft lead 2000s

EDW -distribut

ed RDBMS

-Teradat

a, Netezza, Exadata, PDW -

late 2000s

Hadoop - 2010s

NoSQL -2012s

NewSQL -

2014s

1970s RDBMS became a recognized term MS SQL Server, Sybase, Wang’s PACE, and Britton-LeeSQL/DS, DB2, Allbase, Oracle, and Non-Stop

Reference- http://quickbase.intuit.com/articles/timeline-of-database-history

Some Timelines

Page 5: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential5

Inmon EDW Approach - Top-down Design

A normalized data model is designed first. Then the dimensional data marts, which contain data required for specific business processes or

specific departments are created from the data warehouse.http://searchbusinessintelligence.techtarget.in/tip/Inmon-vs-Kimball-Which-approach-is-suitable-for-your-data-warehouse

Page 6: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential6

Kimball EDW Approach - Bottom-up Design

The data marts facilitating reports and analysis are created first; these are then combined together to create a broad data warehouse.

http://searchbusinessintelligence.techtarget.in/tip/Inmon-vs-Kimball-Which-approach-is-suitable-for-your-data-warehouse

Page 7: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential7

Logical Data Warehouse Reference Framework

Page 8: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential8

Data Warehouse- needs?

Data Sources Data Store

Reports

Dashboards

Adhoc Analytics

Advanced Analytics

Ingest Any Derive Value

Data Governance & Security(Authentication/Authorization/Audit/Data at Rest)

(Lineage, ILM)

ETL?

Streaming?Data Cleansing?

CDC/SCD?Metadata/Reference

Data Model?

Data Integration Access

(Physical/Virtualization)?

Intelligent Data Identification?

Intelligent Data Discovery?

Real Time Feedback Interface

Page 9: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential9

Data Warehouse- Ingest Side

Data Sources Data StoreIngest Any

Data Governance & Security(Authentication/Authorization/Audit/Data at Rest)

(Lineage, ILM)

Any source ETL?

Streaming Ready?

CDC/SCD?Metadata/Reference

Easy Metadata Identification?

Faster ETL?

Easy Data Modelling?

Lineage?

Mainframe/ OLAP DW?

Social Media/ Textual Data

OLTP/ ESB/Messa

ging Queue/SO

A

Page 10: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential10

Data Warehouse- Storage?

Data Store

Data Governance & Security(Authentication/Authorization/Audit/Data at Rest)

(Lineage, ILM)

CDC/SCD?Metadata/ReferenceData Model?

Intelligent Data Identification?

DR/Backup/Business Continuity

Data Preparation Processing?

Metadata Management (Ref data)?

Page 11: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential11

Data Warehouse- BI/ Analytics/ Integration?

Data Store

Reports

Dashboards

Adhoc Analytics

Advanced Analytics

Derive Value

Data Governance & Security(Authentication/Authorization/Audit/Data at Rest)

(Lineage, ILM)

Data Model?Data Integration

Access (Physical/Virtualiz

ation)?

Intelligent Data Discovery?

Real Time Feedback Interface

Data Science?

Fast Queries?

Metadata/Schema Discovery?

Search?

JDBC/ODBC/ReSTIntegration?

Big Data Queries/Scalable

Processing?

Page 12: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential12

Source: http://blogs.teradata.com/data-points/how-illy-is-cost-per-terabyte/

Page 13: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential13

Source: Hortonworks.com

Page 14: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential14

Source: Mapr.com

Page 15: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential15

MODERN DATA WAREHOUSE STRATEGY

Transformation for scale

BI DASHBOARDS REPORTING

PLANNED ANALYTICS

LIMITED ANALYSIS

Legacy EDW/ RDBMS

COSTEFFICIENCY

LINEAR SCALABILITY

LIMITLESS DEMOCATIZED

ANALYTICS

ALL FORMSOF DATA

StructuredUnstructured

Semi-structured

HADOOP + ETL/ Analytics Offloading

Limited ROI

Expanding ROI

Page 16: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential16

Source: http://brightmags.com/wp-content/uploads/2015/04/hybrid-car-working.gif

Page 17: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential17

Traditional EDW Augmentation ExampleEDW Key Challenges-• Storage and processing scalability costs• Static non-agile BI/Analytics• Processing capacity hogged resulting in slow analytics

ETL Storage Analytics/BI

Data Sources

Big Data Approach

BDW Advantages-• Cost effective scalable storage and processing (10X-100X)• Agile Self Service BI/Analytics• Offload resources from Netezza/EDW and SAS for freeing analytical

capabilities• Exploratory environment and Swiss knife toolsets for Machine

learning driven analytics• In time ETL, BI and Analytics• Streaming and Real time Analytics• Unstructured data Analytics

Transition Challenges solved by Workload Migration• Manual process and so error prone• Time taking and risky• Migration Validation/ QA problem• Hadoop best practices not known/missing• Know what/how to offload• Real time data

ETL Storage Analytics/BI

Data Sources(+ Streaming)

New Unstructured Sources

No “BIG BANG Replace” approach- Identify and Offload High Latency, Cold/Archival

Data Loads from Traditional EDW

Streaming Sources – Move from Batch to Real Time Mode

Page 18: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential18

Big Data Analytics Triple Play

BASIC ANALYTICS ADVANCED ANALYTICS

Fraud Detection in Real time Personalized/Adaptable Fraud Detection in Real time

Historical Transaction Analysis Fraud Analytics based on Historical Data

Enhance Static Rules

Intelligent input for Dynamic Rules

1 2

3 4

STATIC Applications Rules

Traditional BI/Analytics Machine

Learning/ Data Science

Big Data Analytics - Streaming + Real-time + Batch

Lambda/KapdaArchitectures

Page 19: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential19

EDW/RDBMS Modernization- Key Areas

Scalable cost effective Storage

and Query Engine

Streaming/ real time capability

Unstructured/semi structured with

Structured mashup

Schema on Fly ETL Offloading Analytics Offloading

Democratized access• Self service

BI/Analytics/ELTL

Information Discovery/Search

Security, Governance, Compliance

Page 20: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential20

Scalable cost effective Storage

and Query Engine

Unstructured/semi structured with

Structured mashup

Schema on Fly

Page 21: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential21

Scalable cost effective Storage

and Query Engine

Streaming/ real time capability

Page 22: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential22

ETL Offloading

Big data Big data Edition

Page 23: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential23

Analytics (SAS/SPSS)

Page 24: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential24

Democratized accessSelf service BI/ Analytics/ ELTL

Page 25: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential25

Augmenting EDW with Hadoop (Reference Arch.)

Mainframes

RDBMS

External File System

BI/Analytics Tools

Use Hadoop as a foundational data warehouse solution along with a relational big data warehouse for handling both structured and unstructured data analytics.

Enterprise data warehouse*

Staging/ETLData Sources

Advanced Analytics

Scalable ETL/Data Quality/Data Enrichment,

Machine Learning/ Predictive

Analytics/Recommendation Engines

Real Time Streaming Engine/ETL, Rule

Engine, CEP, Alerting and Dash

boarding

Streaming Analytics

IDW

Workload Migration

Metadata, Governance

SSRSSSAS

Page 26: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential26

Big Data Reference Architecture

Page 27: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential27

Universal BI on Big-Data

Real-time ingestion and analytics

Impetus Big Data Services & Products

Technology Evaluation& Piloting

Big Data Advisory,Strategy & Training

Big Data Labs , POV, Benchmarking,

Training

Data Visualization & Analytics

Idea toImplementation

DevOps , Big-Data as a Service

Big Data QA Services

Big Data Services Big Data Products/Solutions

EDW Modernization

Data ScienceEnterprise Data Governance

Metadata ManagementData Virtualization/Access

Hadoop Management/Integration

Page 28: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential28

ELTL Workload Migration: Automation Steps

The workload migration consists for the following major tasks– Identification of ideal workload

Usage based tables classificationWorkload based tables classification

– Migration of Schema from current RDBMS to HDFSRecommendation for partitioning, clustering and number of bucketsMigrate role based security

– Query/SQL Logic MigrationImpetus UDF LibraryAutomatic conversation of SQL and Stored Procedure scripts

– Query/SQL Logic ExecutionSchedule migrated code

– Data Quality Enhancements Patented Machine Learning Algorithms

– Data RestorationBI/ Analytical data can be synced back with EDW if required

The above tasks have flexible implementation sequence to facilitate the nature of the workload and customer’s implementation strategy

60%-80% - Automation!30%-60% - Reduced Manual Time!Reduced Risk!

Page 29: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential29

Impetus Workload Migration Architecture

Step 1: Identification

Step 2: Schema, Security and Data Migration

Step 3: Logic Migration

Step 4: Logic Execution

Step 5: Data Quality Enhancement

Step 6: Analytical Data Moved toEDW Datamart

Reduced Time!Reduced Risk!Automation!

Security Access Replicated by

Impetus Product

Identify and Offload High Latency, Cold/Archival Data Loads from Traditional EDW

SQL/ Stored Procedures/ BTEQ

Page 30: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential30

Thank you.Questions?

WE ARE HIRING - http://www.impetus.com/careers

Page 31: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential31

Impetus Contributions• Implement data provisioning for UDW and

non-UDW databases• Provide parametric security for Data Fabric

platform• Proposed reference architecture to handle

schema evolution, versions management, data consolidation & transformation

• Metadata Management & Data Lineage

The Challenge• Optimize space usage & processing on

Teradata• Schema evolution• Versioning of data to enable capability of

adding attributes to base• Immunity to ‘data load sequence’• Enable UDW equivalent security• Delta detection for snapshot acquisition

Benefits Realized• Performance for on-line and batch processes• Near real time processing of input

• Provisioning through web-services•

ETL Offloading to Hadoop- Healthcare Payer Case Study

Page 32: Modernize your EDW into Big Data Warehousefiles.meetup.com/1444478/Modernize your EDW into Big Data... · 2016-02-26 · DW? Social Media/ Textual Data OLTP/ ESB/Messa ging Queue/SO

© 2016 Impetus Technologies - Confidential32

Data Fabric/Hub

Source Data

X12 837/835

CIC

HL7 V3 OCIE

Custom ECG

SMTP Email

Ingest Integrate Load/ Virtualize Provision

Data Consumers

Teradata EDW

Target Output

Cloud based Provisioning-DaaS

Internal Consumers

ETL Offload

Unified Format

• Targeting 30%-60% reduction in Teradata Cost• Schema evolution and version management challenges• Compliances strategy

Unified Format

Hadoop + HBase + Hive/PIG

ETL Offloading to Hadoop- Healthcare Payer Contd.

Close to $50 million savings!!!