the path to digital transformation

39
The Road to Digital Transformation Dell EMC Cloudera Syncsort ETL Offload Hadoop Solution December 2016

Upload: syncsort

Post on 09-Jan-2017

184 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: The Path to Digital Transformation

The Road to Digital TransformationDell EMC Cloudera Syncsort ETL Offload Hadoop Solution

December 2016

Page 2: The Path to Digital Transformation

Armando Acosta

Dell EMC

Sean Anderson

Cloudera

Mark Muncy

Syncsort

Ted Arden

Dell EMC

Page 3: The Path to Digital Transformation

Dell - Internal Use - Confidential3 of 123 of 22

The digital transformation will cause disruption

48%don’t know what their

industry will look like

in 3 years

78%feel threatened

by digital startups

45%fear they may

become obsolete

in 3-5 years

Business leaders see a chaotic, uncertain future ahead

Source: Digital Transformation Index, October, 2016Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies aretransforming to meet changing customer demands and business leaders’ plans to succeed in the connected future.

Page 4: The Path to Digital Transformation

Dell - Internal Use - Confidential4 of 12

Businesses still have a huge opportunity to get this right

73%say a centralized

tech strategy needs

to be a priority

72%plan to expand

their software

development

capabilities

66%are incentivized

to invest in IT

infrastructure

and digital skills

leadership

This is how leaders plan to leap ahead

Source: Digital Transformation Index, October, 2016Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies aretransforming to meet changing customer demands and business leaders’ plans to succeed in the connected future.

4 of 22

Page 5: The Path to Digital Transformation

Dell - Internal Use - Confidential5 of 12

Leaders agreed the following digital businessattributes are imperatives to success

Source: Digital Transformation Index, October, 2016Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies aretransforming to meet changing customer demands and business leaders’ plans to succeed in the connected future.

Predictively spotnew opportunities

Demonstrate transparency

and trust

Deliver uniqueand personalized

experiences

Innovate inagile ways

Operate inreal time

Big Data and Analytics will be at the core to enabling

all these attributes

5 of 22

Page 6: The Path to Digital Transformation

Dell - Internal Use - Confidential6 of 12

Data-driven organizations are more effective

greater revenue growth

for businesses that

leverage data effectively

50%

But 44%Become data-driven. A journey begins with a single step.

Align IT / Business goalsImprove operational efficiencyTransform your organization

of organizations do not know how to start…

Data from Dell Global Technology Adoption Index, November 2015

6 of 22

Page 7: The Path to Digital Transformation

Dell - Internal Use - Confidential7 of 12

Align business and IT

Dell helps by

Utilizing ALL data to deliver deeper insights and enhanced data-driven decision making.

Organizational goals

.

.

. Empower end Users

Control costs

Improve outcomes

SReducing TCO and seamlessly integrating with existing investments to enable greater ROI

Providing secure anywhere, anytime access to data and analytics for improved productivity.

7 of 22

Page 8: The Path to Digital Transformation

Ted Arden, Dell EMC

8 of 22

Page 9: The Path to Digital Transformation

Dell - Internal Use - Confidential9 of 12

Traditional tools are not working

#1 ChallengeOrganizations cite TCO as biggest obstacle to data integration tools

Dell accelerates time to value by lowering data transformation costs & improve performance by augmenting the Enterprise Data Warehouse (EDW)

Dell EMC Cloudera Syncsort ETL Offload Hadoop Solution reduces Hadoop deployment to weeks, develop Hadoop ETL jobs within hours, and become fully productive within daysafter deployment

of all Data Warehouses are performance and capacity constrained

*Gartner70%

Data integration and transformation drive a majority of the EDW capacity

80%

9 of 98

Page 10: The Path to Digital Transformation

Dell - Internal Use - Confidential10 of 12

Too many workloads in the EDWModernize the data pipeline with Hadoop

Traditional data pipeline

Enterprise data warehouse + ETLData transformation jobs

Business reportingQuery

Data staging toolExtract and load dataClean and parse data

Disparate data sources

The results

Longer data transformationjob times

Not meeting SLAs forbusiness reporting

Slow Ad Hoc Query

Too costly to scale

Perf

Capacity

10 of 98

Modern data pipeline

Enterprise data warehouseBusiness reporting

Query

Hadoop + ETLData transformation jobsClean, parse, transform

Disparate data sources

The results

Reduced data transformation job times

Improved SLAs forbusiness reporting

Fast Ad Hoc Query

Scales Economically

Perf

Capacity

Page 11: The Path to Digital Transformation

Dell - Internal Use - Confidential11 of 12

Customer value

Dell Services

Reference ArchitectureETL Offload

PE R730XD, Networking

Solution stack Components Customer value

Faster deploymentfrom months to weeks

Hadoop Distribution Cloudera 5.9 Data management

and security

Data TransformationSyncsort

DMX-h version 9.1 Convert SQL jobs into

native Hadoop execution

Deploymentbusiness application

Build operationalefficiency with Hadoop

No other vendor offers this solution

11 of 98

Page 12: The Path to Digital Transformation

Dell - Internal Use - Confidential12 of 12

Dell data solutions drive operational efficiency

Reduce data warehouse administrative costs up to 76%

Controlcosts

Transform data 60% faster for analysisImprove

productivity

Develop and design complex data transformation jobs up to 54% faster

Simplify ongoing operations

12 of 98

Page 13: The Path to Digital Transformation

Dell - Internal Use - Confidential13 of 12

Dell EMC Cloudera Syncsort ETL offload Hadoop Solution

Solution benefits

• Integrates easily with Hadoop®

• No coding necessary for easy deployment

• No need for expertise on Apache Pig™, Hive™, and Sqoop™

• Closes the skills gap using Syncsort

Differentiation

• Reduces EDW admin costs up to 76%1

• Transforms data 60 percent faster for analysis2

• Designs transformation jobs up to 54% faster3

Primary use case: Scale out solution to optimize data management,

processing and analytics

Pod Network

2x Dell EMC Networking S4048 10GbE Pod Switches

1x S3124 iDRAC Switch

Data Nodes

10x Dell EMC PowerEdge R730xd with 3.5 Drives – 48 TB or

10x PowerEdge R730xd with 2.5” Drives – 24TB or 20x PowerEdge FC630 / FD332 – 32 TB

Infrastructure Nodes

1x Dell EMC PowerEdge™ R630 Admin Node

3x PowerEdge R730xd Name Nodes

1x PowerEdge R730xd Edge Node or

1x PowerEdge FC630 Name Nodes Admin Node

3x PowerEdge FC630 Name Nodes

1x PowerEdge FC630 Edge Node

Cluster Network

2x Dell EMC Networking S6000 40GbE Cluster Switches

Cloudera ™ Enterprise

Syncsort™ DMX-h™

1Cost advantages report2Performance advantages report3Design advantages report

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

01

02

03

04

05

06

07

08

09

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42Stack-ID

LNK1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

ACT50 52 5433 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

49 51 53

Stack-ID

LNK1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

ACT50 52 5433 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

49 51 53

12

01

24

11

21

16

10

41

08

96

10

0

88

92

80

84

72

76

64

68

56

60

48

52

40

44

32

36

24

28

16

20

81

2

04

Stack ID

12

01

24

11

21

16

10

41

08

96

10

0

88

92

80

84

72

76

64

68

56

60

48

52

40

44

32

36

24

28

16

20

81

2

04

Stack ID

Stack No.

1

2

25 26SFP+

3 5 7 9 11

4 6 8 10 12

13 15 17 19 21

14 16 18 20 22 24

LNK ACT1

2

23

LNK ACT

COMBO PORTS 23 24

6 720 1 14 1512 1310 118 9 22 2320 2118 1916 17

6 720 1 14 1512 1310 118 9 22 2320 2118 1916 17

6 720 1 14 1512 1310 118 9 22 2320 2118 1916 17

13 of 22

Page 14: The Path to Digital Transformation

Dell - Internal Use - Confidential14 of 12

Operational Efficiency: From use case to action

Source 1. Connect 3. Act2. Analyze

Preventive Maintenance

IT Resource Capacity and Unitization

Operational Process Improvement

Business Process Cost Optimization

Cyber Security Analytics

Improved Forecasting

Compliance and Reporting

Operational data sources

Extract, transform load Business reporting and query

Enterprise datawarehouse

Enterprise datawarehouse

Relationalmanagement database

RelationalManagement database

Data mart Data mart

Services • Management • Infrastructure • Security • Dell Financial Services

ParseClean

Translate

SortAggregate

Group

Compute+ Data

14 of 22

Page 15: The Path to Digital Transformation

Sean Anderson, Cloudera

15 of 22

Page 16: The Path to Digital Transformation

16© Cloudera, Inc. All rights reserved.

Traditional Monolithic Analytic Databases

No Cloud Elasticity or Cloud Storage Integration

Rigid Data Model with Tightly Coupled Storage/Compute

Limited to SQL with Data Movement Necessary

Static Sizing

COMPUTE

STORE

Page 17: The Path to Digital Transformation

17© Cloudera, Inc. All rights reserved.

Challenges Across the Business

Enterprise Architect

Existing Systems Hitting Their Limits

• How long does it take to bring in more data/use cases? And what would the cost be?

• What is your process for scaling today?

• What is your plan for cloud?

Missed SLAs & Overloaded Bottleneck

• How much time do you spend troubleshooting vs developing new uses?

• How long does it take to deliver on business requests?

Limited Data & Insights of Latent Value

• What limits on users, data, and time period exist?

• How long does it take to get new reports/data?

• Are you able to run actionable real-time analysis?

Meet Compliance Needs & Protect Data

• How do you manage siloed security & governance across workloads and systems?

• Is sensitive data available for analysis?

IT/DBASecurity Team & Data Steward

SQL Developer & Business Analyst

Page 18: The Path to Digital Transformation

18© Cloudera, Inc. All rights reserved.

Cloudera’s Analytic Database Solution

Identify, offload, & optimize workloads to

Hadoop

Navigator Optimizer

Intelligent SQL editor

Hue

Audit, lineage, encryption, key

management, & policy lifecycles

Navigator

Integration with the leading BI tools

BI Partners

Interactive query engine for BI & SQL analytics

Impala

Large-scale ETL & batch processing engine

Hive-on-Spark

Multi-Storage, Multi-Environment

Page 19: The Path to Digital Transformation

19© Cloudera, Inc. All rights reserved.

The DCC Rule

D C CComplexity

Maximize your optimization opportunities by exposing complex access patterns that make the best use of Hadoop’sarchitecture

Compatibility

Reduce development time by leveraging existing query compatibilities with Hadoop tools and get guidance for query rewrites

Duplication

Improve performance by easily detecting workload duplication and recommending top queries to optimize

Page 20: The Path to Digital Transformation

20© Cloudera, Inc. All rights reserved.

Cloudera Navigator OptimizerUnlock Your Best Hadoop Strategy, Instantly

Active Data Optimization for Hadoop to save you time and money

• Instant workload insights

• Intelligent optimization guidance

• Reduce Hadoop workload development effort

Page 21: The Path to Digital Transformation

Mark Muncy, Syncsort

21 of 22

Page 22: The Path to Digital Transformation

22 of 22

Page 23: The Path to Digital Transformation

Goals of the Modern Data Architecture

• Centralize all your dataCollect raw data from every source from within the enterprise, regardless of complexity. Only when you are able to collect and retain all your data, you can see the full picture.

• Turn raw data into insightCleanse, blend and transform your data, give it context and meaning so decision makers can execute.

• Maintain governance, compliance and security standardsIncrease consistency and confidence in decision making by preserving the confidentiality, integrity and availability of information. Protect data from unauthenticated and unauthorized access.

• Eliminate complexities within ITYour Modern Data Architecture should automate and optimize your data needs, keep pace with the evolution of technology, and homogenize platforms and infrastructures.

23Syncsort Confidential and Proprietary - do not copy or distribute

Page 24: The Path to Digital Transformation

Shift Data and ELT Workloads out of Data Warehouses

24Syncsort Confidential and Proprietary - do not copy or distribute

Page 25: The Path to Digital Transformation

Simplify Big Data Integration with Syncsort

25Syncsort Confidential and Proprietary - do not copy or distribute

Access Integrate Comply Simplify

Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.

Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.

Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, Cloudera Manager, Cloudera Navigator and Sentry.

Design once, deploy anywhere & insulate your organization from rapidly changing eco-system. Future proof your applications for new compute frameworks, on premise or in the cloud.

Page 26: The Path to Digital Transformation

Simplify Big Data Integration with Syncsort

26Syncsort Confidential and Proprietary - do not copy or distribute

Access

Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.

Page 27: The Path to Digital Transformation

Access: Bring ALL Enterprise Data Securely to the Data Lake

• Collect virtually any data from mainframe to relational, cloud and NoSQL sources

• Batch & streaming sources

• Access, re-format and load data directly into Hive & Parquet. No staging required!

• Pull hundreds of tables at once into your data hub, whole DB schemas in one invocation

• Load more data into Hadoop in less time

27Syncsort Confidential and Proprietary - do not copy or distribute

Build Your Enterprise Data Hub

Page 28: The Path to Digital Transformation

Access: Get Your Database data into Hadoop, At the Press of a Button

• Pull multiple data sources and funnel into your data lake --extract and move whole DB schemas in one invocation

• One-step data movement, auto-generating jobs • Process multiple funnels in parallel on your edge node or

from data nodes‒ Leverages DMX-h high speed data engine via DTL‒ Generated applications can be imported into GUI

• In-flight transformations‒ Filtering, funnel dependency ordering, mixed source/target,

data type filtering, table exclusion/inclusion

28Syncsort Confidential and Proprietary - do not copy or distribute

DMX DataFunnel™

Page 29: The Path to Digital Transformation

Simplify Big Data Integration with Syncsort

29Syncsort Confidential and Proprietary - do not copy or distribute

Access Integrate

Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.

Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.

Page 30: The Path to Digital Transformation

Integrate: Achieve the Fastest Path from Raw Data to Insight

• Prepare data on-the-fly

• Load into Hadoop without staging

• Write directly into Big Data formats (Parquet, Hive, etc.)

• Connect fast to NoSQL databases (Cassandra, HBase, etc.)

• Cloud Connectivity: Amazon AWS, Google Cloud Platform, Microsoft Azure

• Get the fastest, most efficient data joins and sorts

• Dynamic planning/optimization at runtime

• Create Tableau & Qlikview files with one click

• Fastest parallel loads to Amazon Redshift, Greenplum, Netezza, Oracle, Teradata & Vertica

30Syncsort Confidential and Proprietary - do not copy or distribute

Feed Business Intelligence Visualization

Page 31: The Path to Digital Transformation

A single tool for designing both streaming and batch jobs

Integrate: Single Interface for Streaming & Batch

• Kafka, Spark, Apache Nifi, HDF

• Combine legacy batch and cutting edge streaming data sources

• Easy development in GUI – no need to write Scala, C or Java code

31Syncsort Confidential and Proprietary - do not copy or distribute

Simplify Streaming Data Integration

Page 32: The Path to Digital Transformation

Simplify Big Data Integration with Syncsort

32Syncsort Confidential and Proprietary - do not copy or distribute

Access Integrate Comply

Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.

Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.

Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, Cloudera Manager, Cloudera Navigator and Sentry.

Page 33: The Path to Digital Transformation

Comply: Secure, Manage & Monitor Your Cluster

• Kerberos-secured clusters

– Authenticated browsing

– Authenticated sampling

• Apache Sentry security certified

• Cloudera Manager

– Deploy DMX-h across cluster

– Monitor DMX-h jobs

33Syncsort Confidential and Proprietary - do not copy or distribute

Page 34: The Path to Digital Transformation

Comply: Get Governance, Metadata and Lineage

• Metadata and data lineage for Hive, Avro and Parquet through HCatalog

• Metadata lineage export from DMX

– Simplify audits, analytics dashboards, metrics

– Integrate with enterprise metadata repositories

• Cloudera Navigator certified integration

– Extends HCatalog metadata

– HDFS, YARN, Spark and other metadata

– Lineage, tagging

– Business and structural metadata

34Syncsort Confidential and Proprietary - do not copy or distribute

Page 35: The Path to Digital Transformation

Simplify Big Data Integration with Syncsort

35Syncsort Confidential and Proprietary - do not copy or distribute

Access Integrate Comply Simplify

Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.

Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.

Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, Cloudera Manager, Cloudera Navigator and Sentry.

Design once, deploy anywhere & insulate your organization from rapidly changing eco-system. Future proof your applications for new compute frameworks, on premise or in the cloud.

Page 36: The Path to Digital Transformation

Simplify: Design Once, Deploy Anywhere

• Use existing ETL skills

• No need to worry about mappers, reducers, big side or small side of joins, and so on

• Automatic optimization for best performance, load balancing, etc.

• No changes or tuning required, even if you change execution frameworks

• Future-proof job designs for emerging compute frameworks, e.g. Spark

Single GUI Execute Anywhere!

36Syncsort Confidential and Proprietary - do not copy or distribute

Intelligent Execution - Insulate your organization from underlying complexities of Hadoop.

Page 37: The Path to Digital Transformation

Using the Dell | Cloudera | Syncsort solution for Hadoop, an entry-level technician developed and deployed Hadoop ETL jobs in 53.7% less time than a Hadoop expert

Simplify: Reclaim days of valuable time

Fact dimension load with type 2 SCD

Data validation and pre-processing

Vendor mainframe file integration

Load Validate Int.

Source: http://en.community.dell.com/techcenter/blueprints/m/resources

37Syncsort Confidential and Proprietary - do not copy or distribute

Cut Development Time in Half!

8.3 Days

3.8 Days

Page 38: The Path to Digital Transformation

Thank You

38 of 22

Page 39: The Path to Digital Transformation