denodo datafest 2017: integrating big data and streaming data with enterprise data

Confidential. Not to be copied, distributed, or reproduced without prior approval.

Integrating Big Data and Streaming Data with Enterprise Data

October 16, 2017

Confidential. Not to be copied, distributed, or reproduced without prior

approval.

Capital

Aviation

Power

Healthcare Oil & Gas

Transportation

Lighting

Global OpsDigital

Additive

Renewables

Multiple Mighty Businesses


approval.

Machines, Chips, Sensors & Data everywhere


approval.

Need for Data Integration & Data Pipe Lines

October 16, 2017Presentation Title 4


approval.

Integrating Streaming Data with Enterprise Data

Integration needs will depend on the use cases


approval.

Real-Time Data & Data Pipeline

Data orchestration strategies will vary depending on use cases


approval.

A Look at few Data Collection & Processing Tools

AkkaKafka

Spark Streaming



approval.

AkkaAkka is a toolkit for building highly concurrent, distributed, and resilient message-driven

applications for Java and Scala. Uses reactive streaming model by leveraging back-pressure controlled messages.

Reactive Streams – Pull based back Pressure


approval.

KafkaKafka is a distributed publish-subscribe messaging system that is designed to be fast,

scalable, and durable. Like many publish-subscribe messaging systems, Kafka maintains feeds of messages in topics. Producers write data to topics and consumers read from topics. Since Kafka is a distributed system, topics are partitioned and replicated across multiple nodes.


approval.

Spark StreamingSpark Streaming is an extension of the core Spark API that enables scalable, high-

throughput, fault-tolerant stream processing of live data streams.


approval.

Multiple Data Integration Techniques

What option would we choose?



approval.

Data Integration using DVA virtual data stitching can be enabled to integrate data from streaming data

and enterprise datasets regardless of data velocity, variety and volume.

Data

Warehouse(s)

&

Data Mart(s)Spark

Streamin

g

Data

Analytics

SparkSQ

L

Data Lake (s)NoSQL

Database(s)

Data Virtualization layer to connect big data based data sources


approval.

If you have these scenarios…

Streaming datasets

Disparate data sources within enterprise

Structured or Unstructured datasets

Need for stitching historical and new datasets

Real-Time Analytical solutions

Need for an agile solution

denodo datafest 2017: integrating big data and streaming data with enterprise data

Data & Analytics