denodo datafest 2017: integrating big data and streaming data with enterprise data

13
Confidential. Not to be copied, distributed, or reproduced without prior approval. Integrating Big Data and Streaming Data with Enterprise Data October 16, 2017

Upload: denodo

Post on 21-Jan-2018

44 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise Data

Confidential. Not to be copied, distributed, or reproduced without prior approval.

Integrating Big Data and Streaming Data with Enterprise Data

October 16, 2017

Page 2: Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise Data

Confidential. Not to be copied, distributed, or reproduced without prior

approval.

Capital

Aviation

Power

Healthcare Oil & Gas

Transportation

Lighting

Global OpsDigital

Additive

Renewables

Multiple Mighty Businesses

Page 3: Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise Data

Confidential. Not to be copied, distributed, or reproduced without prior

approval.

Machines, Chips, Sensors & Data everywhere

Page 4: Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise Data

Confidential. Not to be copied, distributed, or reproduced without prior

approval.

Need for Data Integration & Data Pipe Lines

October 16, 2017Presentation Title 4

Page 5: Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise Data

Confidential. Not to be copied, distributed, or reproduced without prior

approval.

Integrating Streaming Data with Enterprise Data

Integration needs will depend on the use cases

Page 6: Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise Data

Confidential. Not to be copied, distributed, or reproduced without prior

approval.

Real-Time Data & Data Pipeline

Data orchestration strategies will vary depending on use cases

Page 7: Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise Data

Confidential. Not to be copied, distributed, or reproduced without prior

approval.

A Look at few Data Collection & Processing Tools

AkkaKafka

Spark Streaming

October 16, 2017Presentation Title 7

Page 8: Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise Data

Confidential. Not to be copied, distributed, or reproduced without prior

approval.

AkkaAkka is a toolkit for building highly concurrent, distributed, and resilient message-driven

applications for Java and Scala. Uses reactive streaming model by leveraging back-pressure controlled messages.

Reactive Streams – Pull based back Pressure

Page 9: Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise Data

Confidential. Not to be copied, distributed, or reproduced without prior

approval.

KafkaKafka is a distributed publish-subscribe messaging system that is designed to be fast,

scalable, and durable. Like many publish-subscribe messaging systems, Kafka maintains feeds of messages in topics. Producers write data to topics and consumers read from topics. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes.

Page 10: Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise Data

Confidential. Not to be copied, distributed, or reproduced without prior

approval.

Spark StreamingSpark Streaming is an extension of the core Spark API that enables scalable, high-

throughput, fault-tolerant stream processing of live data streams.

Page 11: Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise Data

Confidential. Not to be copied, distributed, or reproduced without prior

approval.

Multiple Data Integration Techniques

What option would we choose?

October 16, 2017Presentation Title 11

Page 12: Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise Data

Confidential. Not to be copied, distributed, or reproduced without prior

approval.

Data Integration using DVA virtual data stitching can be enabled to integrate data from streaming data

and enterprise datasets regardless of data velocity, variety and volume.

Data

Warehouse(s)

&

Data Mart(s)Spark

Streamin

g

Data

Analytics

SparkSQ

L

Data Lake (s)NoSQL

Database(s)

Data Virtualization layer to connect big data based data sources

Page 13: Denodo DataFest 2017: Integrating Big Data and Streaming Data with Enterprise Data

Confidential. Not to be copied, distributed, or reproduced without prior

approval.

If you have these scenarios…

Streaming datasets

Disparate data sources within enterprise

Structured or Unstructured datasets

Need for stitching historical and new datasets

Real-Time Analytical solutions

Need for an agile solution