© cloudera, inc. all rights reserved. · © cloudera, inc. all rights reserved. 3 we believe...

19
1 © Cloudera, Inc. All rights reserved.

Upload: others

Post on 09-Sep-2019

2 views

Category:

Documents


0 download

TRANSCRIPT

1© Cloudera, Inc. All rights reserved.

2© Cloudera, Inc. All rights reserved.

Data Analytics 2018

CDSW – Teamplay und Governance in der Data Science Entwicklung

Thomas Friebel – Partner Sales Engineer

[email protected]

3© Cloudera, Inc. All rights reserved.

We believe data can make what is impossible

today, possible tomorrow

4© Cloudera, Inc. All rights reserved. 4© Cloudera, Inc. All rights reserved.

Cloudera at-a-glance

Customer successLarge enterprises fueling growth

48% 140%+customer growth net expansion

Last 4 years Global 8000 customers

Expansion driven by data and new

use cases

Open partner networkBest of breed solutions

3000+partners

Vast ecosystem of solution &

service providers

First to marketOpen source innovation

2008founded

1600+Clouderans

Global team doing business in 28 countries

Big data innovators from Google,

Yahoo and Oracle

5© Cloudera, Inc. All rights reserved.

Adoption driven by large enterprises

1000+ customers across all verticals

~500 Global 8000customers

7/10 9/10 27 6/10 8/10Top Global Top Global Top Global Top GlobalCountries with

Government customers

BANKING TELCO PUBLIC HEALTHCARE TECHNOLOGY

6© Cloudera, Inc. All rights reserved.

Customer Data CenterPurchased

Customer Managed

Big Data Appliance

Customer Data CenterSubscription

Oracle Managed

Oracle CloudSubscription

Oracle Managed

Big Data Cloud Service

On-Premises Cloud @ Customer Public CloudBig Data Cloud

Machine

Portfolio of Joint Product Collaborationpowered by Cloudera

7© Cloudera, Inc. All rights reserved.

Cloudera Enterprise

7

EXTENSIBLE SERVICES

CORE SERVICES DATA

ENGINEERINGOPERATIONAL

DATABASEANALYTIC DATABASE

DATA CATALOG

INGEST & REPLICATION

SECURITY GOVERNANCEWORKLOAD

MANAGEMENT

DATASCIENCE

Amazon S3 Microsoft ADLS HDFS KUDUSTORAGESERVICES

The modern platform for machine learning and analytics optimized for the cloud

8© Cloudera, Inc. All rights reserved.

We are in the age of machine learning

Data has never been more plentiful

Open source data science and machine learning libraries are rapidly evolving

Flexible commodity storage and compute make scalable production machine learning affordable

Data Analytics Deployment

9© Cloudera, Inc. All rights reserved.

But there are practical challenges

Most data science done at small scale, individually, and is difficult to replicate

Very few models reach production

Teams have different, conflicting requests for languages & libraries

Data needs to move across multiple different systems

Data Analytics Deployment

10© Cloudera, Inc. All rights reserved.

Help more data scientistsuse the power of Cloudera

Use a powerful, familiar environment with direct access to Cloudera data and compute

Data ScientistData Engineer

Make it easy and secure to add new users, use cases

Offer secure self-service analytics and a faster path to

production on common, affordable infrastructure

Enterprise ArchitectHadoop Admin

Our goal: Open data science at enterprise scale

11© Cloudera, Inc. All rights reserved.

Balancing the needs of data scientists and IT

ITdrive adoption, maintain compliance

Data Scientistsexplore, experiment, collaborate

12© Cloudera, Inc. All rights reserved.

Shared: Data, Operations, Governance, Security, Metadata

Data Engineering Data Science Deployment

Data Wrangling

Visualization and Analysis

Model Training & Testing Batch Scoring

Online Scoring

ServingData GovernanceCuration

Processing

Acquisition

Reports, Dashboards

Dev: Collaboration, Version Control Ops: Deployment, Scheduling, Orchestration

Support the complete data science workflowFrom data to exploration to action

13© Cloudera, Inc. All rights reserved.

Cloudera Data Science Workbench–Data Science at ScaleRuns and certified on BDA

CDSW interface brings Data Scientists to the data• Web-based notebook interface• R, Python or Scala • One-time Kerberos authentication• Isolated, individual environments allow self-service• Visualization, Team based sharing• Access to governed and Secured data• …

Powerful combination but…• Data scientists want a notebook-like interface• Security often interferes with productivity• Dependencies are very complicated• Collaboration is difficult

14© Cloudera, Inc. All rights reserved. 14© Cloudera, Inc. All rights reserved.

Demo

15© Cloudera, Inc. All rights reserved.

Integration with Oracle Big Data Appliance

Technical requirements:Available physical nodes for CDSW application – dedicated edge nodes requiredCDSW 1.2.x supports Oracle Linux 7.3Either use free nodes in BDA, order additional BDA nodes or add “non-BDA” edge nodes

Licensing requirements:Edge nodes need to be licensed for Cloudera Enterprise (covered by BDA or ordered directly from

Cloudera)Additional user based CDSW license required, ordered from Cloudera directly

(available as 10 user-pack for 1 year subscription)

24© Cloudera, Inc. All rights reserved.

A modern data science architecture

BDA BDA

Cloudera Manager

gateway nodesEDH BDA nodes

●Built on Docker and Kubernetes●Runs on dedicated gateway nodes●User sessions run in isolated

“engine” containers which:○Host Kerberos-authenticated

Python/R/Scala runtimes○ Interact with Spark via YARN

client mode (Driver runs in container, workers on CDH)

●Single-cluster only (for now)

Hive, HDFS, ...

CDSW CDSW

...

Master

...

Engine

EngineEngine

EngineEngine

25© Cloudera, Inc. All rights reserved.

“Our data scientists want GPUs, but we can’t find a way to deliver multi-tenancy.If they go to the cloud on their own, it’s expensive and we lose governance.”

●Extend existing CDSW benefits to GPU-optimized deep learning tools●Schedule & share GPU resources●Train on GPUs, deploy on CPUs●Works on-premises or cloud

Accelerated deep learning on-demand with GPUs

Data Science Workbench

GPUCPU

BDA

CPU

BDA

CPU

single-node training

distributedtraining, scoring

Multi-tenant GPU support on-premises or cloud

26© Cloudera, Inc. All rights reserved.

More flexible automation with the Jobs API

curl -XPOST http://cdsw.company.com/api/v1/projects/mbrandwein/sample/jobs/1/start--user $USERNAME:$PASSWORD -H "Content-type: application/json"-d '{"environment": {"FISCAL_QUARTER": "Q3"}}'

●Orchestrate jobs from 3rd party workflow tools●Parameterization via job

environment variables●View outputs in CDSW or

receive email notification

44© Cloudera, Inc. All rights reserved.

Thank youThomas Friebel