enabling next gen analytics with azure data lake and streamsets

23
Enabling Next Gen Analytics with Azure Data Lake

Upload: streamsets-inc

Post on 16-Apr-2017

103 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Enabling Next Gen Analytics with Azure Data Lake

Page 2: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Microsoft Azure

Page 3: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Microsoft Cloud

Global Trusted Hybrid

Page 4: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Big Data Definition

Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.

– Gartner, Big Data Definition*

* Gartner, Big Data (Stamford, CT.: Gartner, 2016), URL: http://www.gartner.com/it-glossary/big-data/

Page 5: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Big Data as a Cornerstone of Cortana Intelligence

Action

People

Automated Systems

Apps

Web

Mobile

Bots

Intelligence

Dashboards & Visualizations

Cortana

Bot Framework

Cognitive Services

Power BI

Information Management

Event Hubs

Data Catalog

Data Factory

Machine Learning and Analytics

HDInsight (Hadoop and Spark)

Stream Analytics

Intelligence

Data Lake Analytics

Machine Learning

Big Data Stores

SQL Data Warehouse

Data Lake Store

Data Sources

Apps

Sensors and devices

Data

Page 6: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

However, there are challenges to Big Data…

Obtaining skills and capabilities

Determining howto get value

Integrating with existing IT

investments*Gartner: Survey Analysis – Hadoop Adoption Drivers and Challenges (Stamford, CT.: Gartner, 2015)

Page 7: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Azure HDInsightA Cloud Spark and Hadoop service for the Enterprise

Reliable with an industry leading SLA

Enterprise-grade security and monitoring

Productive platform for developers and scientists

Cost effective cloud scale

Integration with leading ISV applications

Easy for administrators to manage

63% lower TCO than deploy your own Hadoop on-premises*

*IDC study “The Business Value and TCO Advantage of Apache Hadoop in the Cloud with Microsoft Azure HDInsight”

Page 8: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

• One-click deploy experience for installing apps.

• Fully managed PaaS offering.

• Access to entire cluster and secure by default.

• Install apps on new or existing clusters.

• Ease of authoring and deployment.

• Certified partners only.

HDInsight Application Platform

Page 9: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Hybrid cloud, a reality today

74%

Enterprises believe a hybrid cloud will enable

business growth1

82%

Enterprises have a hybrid cloud strategy, up from 74

percent a year ago2

Workload requirements

Regulation

Sensitive data

Customization

Latency

Legacy support

Page 10: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Introduction to StreamSets for Microsoft Azure

Page 11: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Who is StreamSets?

Enterprise Data DNA

StreamSets Mission

Top-tier Investors Commercial Customers Across Verticals

150,000 downloads⅓ of the Fortune 100

Empower enterprises to harness their data in motion.

ProductsStreamSets Dataflow Performance Manager™ (DPM)StreamSets Data Collector™ (open source)

Strong Partner Ecosystem Open Source Success

Page 12: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

StreamSets Solution

Desired Business Outcomes

● Developer & operator efficiency

● On-time delivery

● Data trust & governance

Data in motion middleware that ensures data trust.

Page 13: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

StreamSets Dataflow Performance Manager (DPM)

StreamSets ProductsStreamSets

Data Collector (SDC)

Open source tooling and engine to build complex any-to-any dataflows.

Cloud Service to map, measure and master

dataflow operations.

DATAFLOW LIFECYCLE

DEVELOP OPERATE

EVOLVE (Proactive)

REMEDIATE (Reactive)

● Developers● Scientists● Architects

● Operators● Stewards● Architects

Page 14: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

StreamSets Deployment Models

Install on Local Machine

Install on Azure VM

Page 15: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

StreamSets Deployment Models

Page 16: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

StreamSets and Microsoft Azure in Use in a Major Bank

Page 17: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

The Customer

● Forbes Global 500 financial services company.

● Adopting and moving into cloud at rapid phase.

● Growing rapidly both via acquisitions and organic growth.

Page 18: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Key Challenges Related to Data Movement

● Number of legacy tools both customer and vendor built.

● Security policy changes very hard to manage.

● Lack of security governance due to fragmentation of tools and lack of standardization.

● Difficulty onboarding new data sources as soon as the are created (technology change).

● Data drift (unexpected changes) very hard to manage at scale.

Page 19: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Key Factors for the Customer to Consider Streamsets

● KPIs

● Delivery guarantees

● Multiple types of origins and destinations using a single tool.

● Works natively with Microsoft Azure as part of HDInsight or Azure Virtual Machine or deployed on premise.

● Visualization of actual data transfers.

● Define security boundaries, actors etc.

● Repeating pattern

Page 20: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Customer’s Business Objectives

● Short Compute and Long Storage (ADLS,Azure Blob) in turn fine-grained cost control.

● Ability to build microanalytics framework. For instance, instead of taking entire dataset, build same micro datasets and build microanalytics framework and derive results faster (faster iteration).

● Move away from traditional Data Lake to Azure Data Lake to manage cost and scale.

Page 21: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Use Cases for StreamSetsUse Cases

1. Data Movement from On-Premise to Azure Data Lake

2. Consolidating Migration tools into single tool

3. Building DR for HDInsight Kafka workloads.

Page 22: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Resources / Q & A

StreamSets Data Collector @ Azure Marketplacehttps://azure.microsoft.com/en-us/marketplace/partners/streamsets/streamsets-data-collector/

Ingest Data into Microsoft Azure Data Lake (YouTube)https://www.youtube.com/watch?v=c1dVnOK7Luw

StreamSets Communityhttps://streamsets.com/community/

StreamSets Dataflow Performance Manager Product Information https://streamsets.com/products/dpm/

Page 23: Enabling Next Gen Analytics with Azure Data Lake and StreamSets

Thanks!