© cloudera, inc. all rights reserved. · © cloudera, inc. all rights reserved. 3 we believe...
TRANSCRIPT
2© Cloudera, Inc. All rights reserved.
Data Analytics 2018
CDSW – Teamplay und Governance in der Data Science Entwicklung
Thomas Friebel – Partner Sales Engineer
3© Cloudera, Inc. All rights reserved.
We believe data can make what is impossible
today, possible tomorrow
4© Cloudera, Inc. All rights reserved. 4© Cloudera, Inc. All rights reserved.
Cloudera at-a-glance
Customer successLarge enterprises fueling growth
48% 140%+customer growth net expansion
Last 4 years Global 8000 customers
Expansion driven by data and new
use cases
Open partner networkBest of breed solutions
3000+partners
Vast ecosystem of solution &
service providers
First to marketOpen source innovation
2008founded
1600+Clouderans
Global team doing business in 28 countries
Big data innovators from Google,
Yahoo and Oracle
5© Cloudera, Inc. All rights reserved.
Adoption driven by large enterprises
1000+ customers across all verticals
~500 Global 8000customers
7/10 9/10 27 6/10 8/10Top Global Top Global Top Global Top GlobalCountries with
Government customers
BANKING TELCO PUBLIC HEALTHCARE TECHNOLOGY
6© Cloudera, Inc. All rights reserved.
Customer Data CenterPurchased
Customer Managed
Big Data Appliance
Customer Data CenterSubscription
Oracle Managed
Oracle CloudSubscription
Oracle Managed
Big Data Cloud Service
On-Premises Cloud @ Customer Public CloudBig Data Cloud
Machine
Portfolio of Joint Product Collaborationpowered by Cloudera
7© Cloudera, Inc. All rights reserved.
Cloudera Enterprise
7
EXTENSIBLE SERVICES
CORE SERVICES DATA
ENGINEERINGOPERATIONAL
DATABASEANALYTIC DATABASE
DATA CATALOG
INGEST & REPLICATION
SECURITY GOVERNANCEWORKLOAD
MANAGEMENT
DATASCIENCE
Amazon S3 Microsoft ADLS HDFS KUDUSTORAGESERVICES
The modern platform for machine learning and analytics optimized for the cloud
8© Cloudera, Inc. All rights reserved.
We are in the age of machine learning
Data has never been more plentiful
Open source data science and machine learning libraries are rapidly evolving
Flexible commodity storage and compute make scalable production machine learning affordable
Data Analytics Deployment
9© Cloudera, Inc. All rights reserved.
But there are practical challenges
Most data science done at small scale, individually, and is difficult to replicate
Very few models reach production
Teams have different, conflicting requests for languages & libraries
Data needs to move across multiple different systems
Data Analytics Deployment
10© Cloudera, Inc. All rights reserved.
Help more data scientistsuse the power of Cloudera
Use a powerful, familiar environment with direct access to Cloudera data and compute
Data ScientistData Engineer
Make it easy and secure to add new users, use cases
Offer secure self-service analytics and a faster path to
production on common, affordable infrastructure
Enterprise ArchitectHadoop Admin
Our goal: Open data science at enterprise scale
11© Cloudera, Inc. All rights reserved.
Balancing the needs of data scientists and IT
ITdrive adoption, maintain compliance
Data Scientistsexplore, experiment, collaborate
12© Cloudera, Inc. All rights reserved.
Shared: Data, Operations, Governance, Security, Metadata
Data Engineering Data Science Deployment
Data Wrangling
Visualization and Analysis
Model Training & Testing Batch Scoring
Online Scoring
ServingData GovernanceCuration
Processing
Acquisition
Reports, Dashboards
Dev: Collaboration, Version Control Ops: Deployment, Scheduling, Orchestration
Support the complete data science workflowFrom data to exploration to action
13© Cloudera, Inc. All rights reserved.
Cloudera Data Science Workbench–Data Science at ScaleRuns and certified on BDA
CDSW interface brings Data Scientists to the data• Web-based notebook interface• R, Python or Scala • One-time Kerberos authentication• Isolated, individual environments allow self-service• Visualization, Team based sharing• Access to governed and Secured data• …
Powerful combination but…• Data scientists want a notebook-like interface• Security often interferes with productivity• Dependencies are very complicated• Collaboration is difficult
15© Cloudera, Inc. All rights reserved.
Integration with Oracle Big Data Appliance
Technical requirements:Available physical nodes for CDSW application – dedicated edge nodes requiredCDSW 1.2.x supports Oracle Linux 7.3Either use free nodes in BDA, order additional BDA nodes or add “non-BDA” edge nodes
Licensing requirements:Edge nodes need to be licensed for Cloudera Enterprise (covered by BDA or ordered directly from
Cloudera)Additional user based CDSW license required, ordered from Cloudera directly
(available as 10 user-pack for 1 year subscription)
24© Cloudera, Inc. All rights reserved.
A modern data science architecture
BDA BDA
Cloudera Manager
gateway nodesEDH BDA nodes
●Built on Docker and Kubernetes●Runs on dedicated gateway nodes●User sessions run in isolated
“engine” containers which:○Host Kerberos-authenticated
Python/R/Scala runtimes○ Interact with Spark via YARN
client mode (Driver runs in container, workers on CDH)
●Single-cluster only (for now)
Hive, HDFS, ...
CDSW CDSW
...
Master
...
Engine
EngineEngine
EngineEngine
25© Cloudera, Inc. All rights reserved.
“Our data scientists want GPUs, but we can’t find a way to deliver multi-tenancy.If they go to the cloud on their own, it’s expensive and we lose governance.”
●Extend existing CDSW benefits to GPU-optimized deep learning tools●Schedule & share GPU resources●Train on GPUs, deploy on CPUs●Works on-premises or cloud
Accelerated deep learning on-demand with GPUs
Data Science Workbench
GPUCPU
BDA
CPU
BDA
CPU
single-node training
distributedtraining, scoring
Multi-tenant GPU support on-premises or cloud
26© Cloudera, Inc. All rights reserved.
More flexible automation with the Jobs API
curl -XPOST http://cdsw.company.com/api/v1/projects/mbrandwein/sample/jobs/1/start--user $USERNAME:$PASSWORD -H "Content-type: application/json"-d '{"environment": {"FISCAL_QUARTER": "Q3"}}'
●Orchestrate jobs from 3rd party workflow tools●Parameterization via job
environment variables●View outputs in CDSW or
receive email notification