is your cloud ready for big data? strata ny 2013
DESCRIPTION
TRANSCRIPT
![Page 1: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/1.jpg)
© 2009 VMware Inc. All rights reserved
Is Your Cloud Ready for Big Data?
Richard McDougall
CTO, Storage and Application Services
![Page 2: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/2.jpg)
2
Not Just for the Web Giants – The Intelligent Enterprise
![Page 3: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/3.jpg)
3
Real-time analysis allows instant understanding of
market dynamics.
Retailers can have intimate understanding of their
customers needs and use direct targeted marketing.
Market Segment Analysis ! Personalized Customer Targeting`
![Page 4: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/4.jpg)
4
The Emerging Pattern of Big Data Systems: Retail Example
Real-Time Streams
Exa-scale Data Store
Parallel Data Processing
Real-Time Processing
Machine Learning
Data Science
Cloud Infrastructure
![Page 5: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/5.jpg)
5
Storage: Plan for Peta-scale Data Storage and Processing
0.01
0.1
1
10
100
1000
2000 2003 2006 2009 2012 2015
Online Apps Analytics
PB of Data
Analytics Rapidly Outgrows Traditional Data Size by 100x
![Page 6: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/6.jpg)
6
Unprecedented Scale
“Data transparency, amplified by Social Networks
generates data at a scale never seen before”
- The Human Face of Big Data
We are creating an Exabyte of data every minute in 2013
Yottabyte by 2030
![Page 7: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/7.jpg)
7
A single GE Jet Engine produces 10 Terabytes of data in one hour
– 90 Petabytes per year.
Enabling early detection of
faults, common mode failures, product engineering feedback.
Post Mortem ! Proactively Maintained Connected Product
![Page 8: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/8.jpg)
8
The Emerging Pattern of Big Data Systems: Manufacturing
Exa-scale Data Store
Parallel Data Processing
Real-Time Processing Machine
Learning
Data Science
Cloud Infrastructure
Real-Time Sensor
Analytics Support Product
Engineering
![Page 9: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/9.jpg)
© 2009 VMware Inc. All rights reserved
Cloud Platform
![Page 10: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/10.jpg)
10
Cloud Platform: Supporting Mixed Big Data Workloads
Machine Learning Hadoop Real-Time
Analytics
Change workload types to Real-time Analytics, Machine Learning , Hadoop above cloud infra, too
Cloud Infrastructure
Machine Learning
Hadoop
Real-Time Analytics
Management
Network/Security
Storage/Availability
Compute
![Page 11: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/11.jpg)
11
Cloud Platform: Supporting Multiple Tenants
Change workload types to Real-time Analytics, Machine Learning , Hadoop above cloud infra, too
Cloud Infrastructure
Management
Network/Security
Storage/Availability
Compute
Web User Analytics
Financial Analysis
Historical Customer Behavior
![Page 12: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/12.jpg)
12
What if you can…
Experimentation
Production recommendation engine
Production Ad Targeting
Test/Dev
Production
Test
Production
Test
Experimentation
Recommendation engine Ad targeting
Experimentation
One physical platform to support multiple virtual big data clusters
![Page 13: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/13.jpg)
13
Values of a Cloud Platform for Big Data
Agility / Rapid deployment
Lower Capex
Isolation for resource control and security
1
2
3
Operational efficiency 4
Management
Network/Security
Storage/Availability
Compute
![Page 14: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/14.jpg)
14
Hadoop as a Service
! Shrink and expand cluster on demand
! Independent scaling of Compute and data
! Strong multi-tenancy
Elasticity & Multi-tenancy
! High availability for entire Hadoop stack
! One click to setup
! Battle-tested
High Availability
! Rapid deployment ! One stop command
center
! Easy to configure/reconfigure
Operational Simplicity
![Page 15: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/15.jpg)
15
Self Service Access to Big Data Environments
Developer • 3 Hadoop nodes • Cloudera, Pivotal
MapR • Small VM • Local storage • No HA • …
Data Scientist • 5 Hadoop nodes • Cloudera, Pivotal • Hive, Pig • Medium VM • HA • …
High priority • 50 Hadoop nodes • Cloudera • Hive, Pig • Large VM • HA • …
… • … • …
Templates for Different Cloud Users
![Page 16: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/16.jpg)
16
Hadoop batch analysis
Big Data needs a Mix of Workloads
File System/Data Store
Host Host Host Host Host Host
HBase real-time queries
NoSQL Cassandra, Mongo, etc Big SQL
Impala, Pivotal HawQ
Compute layer
Platform Virtualization Technology
Host
Other Spark, Shark, Solr,
Platfora, Etc,…
![Page 17: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/17.jpg)
17
Strong Isolation between Workloads is Key
Hungry Workload 1
Reckless Workload 2
Nosy Workload 3
Virtualization Platform
![Page 18: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/18.jpg)
18
Community activity in Isolation and Resource Management
! YARN • Goal: Support workloads other than M-R on Hadoop • Initial need is for MPI/M-R from Yahoo
• Non-posix File system self selects workload types
! Mesos • Distributed Resource Broker
• Mixed Workloads with some RM
• Active project, in use at Twitter • Leverages OS Virtualization – e.g. cgroups
! Virtualization • Virtual machine as the primary isolation, resource management and
versioned deployment container
• Basis for Project Serengeti
![Page 19: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/19.jpg)
19
Use case: Elastic Hadoop with Tiered SLA
• Production workloads has high priority • Experimentation workloads has lower priority
Experimentation Dynamic resourcepool
Data layer
Production recommendation engine
Compute layer Compute VM
Compute VM
Compute VM
Compute VM
Compute VM
Compute VM
Compute VM
Compute VM
Compute VM
Compute VM
Compute VM
Compute VM
Compute VM
Compute VM
Compute VM
Experimentation Production
Compute VM
Experimentation Mapreduce
Production Mapreduce
vSphere
![Page 20: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/20.jpg)
20
Cloud Enabled Auto-elastic Hadoop
ESX ESX ESX
JT
DATA VM DATA VM DATA VM
Local Disks
SAN/NAS Non-Hadoop VMs Hadoop Compute VMs
JT: JobTracker TT: TaskTracker NN: NameNode VHM: Virtual Hadoop Manager
NN
TT
TT
TTVHM
Hadoop HDFS VMs
TT
TT
TT
JT
![Page 21: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/21.jpg)
21
Hadoop Performance with Virtualization
[http://www.vmware.com/resources/techresources/10360, Jeff Buell, Apr 2013]
(lower is better)
32 hosts/3.6GHz 8 cores/15K RPM 146GB SAS disks/10GbE/72-96GB RAM
![Page 22: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/22.jpg)
© 2009 VMware Inc. All rights reserved
Network Platform
![Page 23: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/23.jpg)
23
Host%
Host%
Host%
Top%of%Rack%Switch%
Host%
L2%Switch%
Top%of%Rack%Switch%
L2%Switch%
Host%
Host%
Host%
Host%
Top%of%Rack%Switch%
Host%
Host%
Host%
Host%
Top%of%Rack%Switch%
Host%
Host%
Host%
Host%
L2%Switch% L2%Switch%
Aggrega7ng%Switch%
Aggrega7ng%Switch%
A Typical Network Architecture
![Page 24: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/24.jpg)
24
Traditional Networks: Core Switch is the Choke Point
Network Topology
Modeled Bandwidth Non Uniform Bandwidth
Core
Aggregation
Rack
Hosts Hosts
100s of Gbits 10s of Gbits
![Page 25: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/25.jpg)
25
Modern Networks: Great for Big Data
Uniform Bandwidth
Network Topology
Modeled Bandwidth
Spine
Leaf
Hosts
![Page 26: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/26.jpg)
26
Flat Networks Allow for New Infrastructure Models
Top%of%Rack%Switch%
Host%
Host%
Host%
Host%
Top%of%Rack%Switch%
Storage%
Storage%
Storage%
Storage%
Top%of%Rack%Switch%
Host%
Host%
Host%
Host%
Storage Converged
Storage Compute
Host%
Host%
Host%
Host%
Host%
Host%
Top%of%Rack%Switch%
Storage%
Separated Storage
Separated Storage
![Page 27: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/27.jpg)
© 2009 VMware Inc. All rights reserved
Storage Platform
![Page 28: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/28.jpg)
28
Use Local Disk where it’s Needed
SAN Storage
$2 - $10/Gigabyte
$1M gets: 0.5Petabytes
200,000 IOPS 8Gbyte/sec
NAS Filers
$1 - $5/Gigabyte
$1M gets: 1 Petabyte
200,000 IOPS 10Gbyte/sec
Local Storage
$0.05/Gigabyte
$1M gets: 10 Petabytes 400,000 IOPS
250 Gbytes/sec
![Page 29: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/29.jpg)
29
Storage Economics: Traditional vs. Scale-out
$-
$0.50
$1.00
$1.50
$2.00
$2.50
$3.00
$3.50
$4.00
$4.50
$5.00
$5.50
0.5 1 2 4 8 16 32 64 128
Cost per GB
Petabytes Deployed
Traditional SAN/NAS
Distributed Object
Storage HDFS MAPR CEPH Gluster Scality Scale-out NAS
Isilon
![Page 30: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/30.jpg)
30
Big Data Storage Architectures
External SAN With HDFS
Local Disks With HDFS or
Other
External Scale-out NAS
HDFS, CEPH, MAPR, Gluster, Scality,
…
![Page 31: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/31.jpg)
31
Features from New Storage Solutions
Snapshots
Clones Erasure Coding
NFS Access
Universal File Store
Geo Replication
Posix Support SSD Capability QoS Controls
![Page 32: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/32.jpg)
© 2009 VMware Inc. All rights reserved
Summary
![Page 33: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/33.jpg)
33
Customers Winning from Consolidated Big Data Platforms
“Dedicated hardware makes no sense”
“Software-defined Datacenter enables rapid deployment multiple tenants and labs”
“Our mixed workloads include Hadoop, Database, ETL and
App-servers”
“Any performance penalties are minor” Management
Network/Security
Storage/Availability
Compute
![Page 34: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/34.jpg)
34
Cloud Infrastructure is Ready for Big Data – Are you?
Cloud Infrastructure
![Page 35: Is your cloud ready for Big Data? Strata NY 2013](https://reader033.vdocuments.mx/reader033/viewer/2022051816/54628ba3af7959477b8b4f9e/html5/thumbnails/35.jpg)
© 2009 VMware Inc. All rights reserved
Is Your Cloud Ready for Big Data?
Richard McDougall
CTO, Storage and Application Services
@richardmcdougll