technical aspects of cloud computing ed...exec education course on cloud computing, nov 15, 2011...
TRANSCRIPT
Technical Aspects of Cloud Computing
Chaitan Baru San Diego Supercomputer Center Competitive Advantage Through Cloud Computing
1
Exec Education Course on Cloud Computing, Nov 15, 2011
News flash… • “Now Available
in the Cloud..”
• What does this mean? ▫ For the
developers ▫ For the users
2
Exec Education Course on Cloud Computing, Nov 15, 2011
Outline
• The changing context • Cloud computing definitions ▫ Implementation considerations
• Application case studies • Future directions
• Course materials: ▫ http://clds.sdsc.edu, click on Education ▫ Files stored in the SDSC Cloud
3
Exec Education Course on Cloud Computing, Nov 15, 2011
The Changing Context • Rapid growth in data ▫ data-driven business decisions
• Scientific workloads as a predictor of future business workloads ▫ Sensor-based systems, remote sensing, genome sequencing
• A point of inflexion ▫ Changing software: from RDBMS, to noSQL, to streaming,
to scientific data, … ▫ Changing hardware: multi-core, solid-state disk, large
memory and new types of memory ▫ Changing platforms: wholly-owned (“on premises”) systems
vs clouds ▫ Changing business costs / models: ultra-high productivity,
energy efficiency, rent vs own, …
4
Exec Education Course on Cloud Computing, Nov 15, 2011
New sources of increasing data volumes
• Sensor Networks Top Social Networks for Big Data, Stacey Higginbotham Sep. 13, 2010, http://gigaom.com/cloud/sensor-networks-top-social-networks-for-big-data-2/
5
Exec Education Course on Cloud Computing, Nov 15, 2011
Gene Sequencing
• ~2TB/experiment in next-generation gene sequencing ▫ Sequencing of individuals ▫ Multiple runs per individual ▫ Multiple sequencing over time
• Managing and Analyzing Next-Generation Sequence Data, Richter BG, Sexton DP (2009). PLoS Comput Biol 5(6): e1000369. doi:10.1371/journal.pcbi.1000369, June 2009
6
Exec Education Course on Cloud Computing, Nov 15, 2011
Remote Sensing
• ~1TB of high-resolution topographic data for San Andreas Fault ▫ 10x more for imagery ▫ Repeated scans for ecological applications ▫ OpenTopography.org
• LaSDI Initiative: Laser Spatial Data Infrastructure ▫ Sub-meter to 10cm scale 3D models of earth
7
Exec Education Course on Cloud Computing, Nov 15, 2011
Data is the new oil ! • The data ecosystem ▫ From acquisition, to transfer, storage, creation of
derived products, and exploitation • Those with data are better off than those without • Those who can exploit data have the competitive
advantage ▫ Walmart, Fedex, Wall Street trading, Internet
companies (Google, Amazon, Facebook, Twitter,..) • And, cannot find oil without data…! ▫ Oil exploration data growing from 10’sTB to PBs over
next few years
8
Exec Education Course on Cloud Computing, Nov 15, 2011
Where should all the data reside?
• All in your private systems (private cloud)?
• All in a public cloud ?
• Hybrid model: Private + Public ?
9
Exec Education Course on Cloud Computing, Nov 15, 2011
Cloud Computing: Definition
• Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction
Another turn of the screw in our push towards productivity
Exec Education Course on Cloud Computing, Nov 15, 2011
Cloud Computing: NIST Definition • On-demand self-service ▫ You get the resource when you ask for it, using APIs
• Broad network access ▫ Accessible from anywhere
• Resource pooling ▫ Shared resource provisioning
• Rapid elasticity ▫ Uniform scaleout
• Measured service ▫ Monitoring of usage, reporting of usage
11
Exec Education Course on Cloud Computing, Nov 15, 2011
Delivery Models
• Software as a Service (SaaS) • Platform as a Service (PaaS) • Infrastructure as a Service (IaaS)
• Importance? ▫ Implications on what type of programming work
needs to be done ▫ Who do you have (or want to hire) to work on
this?
Exec Education Course on Cloud Computing, Nov 15, 2011
• http://thoughtsoncloud.com/
13
Exec Education Course on Cloud Computing, Nov 15, 2011
Delivery Model: Software as a Service
• Software as a Service (SaaS) ▫ The capability provided to the consumer is to use the
provider’s applications running on a cloud infrastructure ▫ Accessible from various client devices through a thin
client interface such as a Web browser (e.g., web-based email) ▫ The consumer does not manage or control the
underlying cloud infrastructure With the possible exception of limited user-specific application
configuration settings.
Exec Education Course on Cloud Computing, Nov 15, 2011
Software as a Service: Example • Google Maps API ▫ http://code.google.com/apis/maps/ ▫ Users are provided with simple APIs for maps ▫ Uses cloud resources at the back-end
• Facebook ▫ http://www.facebook.com ▫ Social networking site using a suite of cloud-based tools at the
back-end • Animoto ▫ http://www.animoto.com ▫ Service that makes videos from user uploaded images ▫ Uses Amazon EC2 at the back-end
Exec Education Course on Cloud Computing, Nov 15, 2011
Salesforce.com: SaaS
16
Exec Education Course on Cloud Computing, Nov 15, 2011
Delivery Models: Platform as a Service
• Platform as a Service (PaaS) ▫ The capability provided to the consumer is to deploy
onto the cloud infrastructure consumer-created applications using programming languages and tools supported by the provider (e.g., java, python, .Net) ▫ The consumer does not manage or control the
underlying cloud infrastructure, network, servers, operating systems, or storage But the consumer has control over the deployed applications
and possibly application hosting environment configurations.
Exec Education Course on Cloud Computing, Nov 15, 2011
Platform as a Service: Google AppEngine
*Credit: http://rdn-consulting.com/blog/2009/02/07/exploring-cloud-computing-development/
Exec Education Course on Cloud Computing, Nov 15, 2011
Delivery Model: Infrastructure as a Service • Infrastructure as a Service (IaaS) ▫ The capability provided to the consumer is to
provision processing, storage, networks, and other fundamental computing resources ▫ Consumer is able to deploy and run arbitrary software,
which can include operating systems and applications ▫ The consumer does not manage or control the
underlying cloud infrastructure But has control over operating systems, storage, deployed
applications, and possibly select networking components (e.g., firewalls, load balancers)
Exec Education Course on Cloud Computing, Nov 15, 2011
Infrastructure as a Service: Amazon Web Services (AWS) • Amazon Elastic Compute Cloud (EC2) ▫ A web service that provides resizable
compute capacity in the cloud. ▫ Configure an Amazon Machine Instance
(AMI) and load it into the Amazon EC2 service
▫ Quickly scale capacity, both up and down, as your computing requirements change
• Amazon Simple Storage Service (S3) ▫ A simple web services interface that can be
used to store and retrieve large amounts of data, at any time, from anywhere on the web
▫ It gives any developer access to the same highly scalable, reliable, fast, inexpensive data storage infrastructure that Amazon uses to run its own global network of web sites
Exec Education Course on Cloud Computing, Nov 15, 2011
AWS, aws.amazon.com
21
Exec Education Course on Cloud Computing, Nov 15, 2011
Microsoft Azure www.microsoft.com/windowsazure/
22
Exec Education Course on Cloud Computing, Nov 15, 2011
OpenStack, www.openstack.org
• Receiving attention. Example, Cisco support for OpenStack • http://www.slideshare.net/CiscoSP360/velocity-2011-cisco-and-
open-stack
23
Exec Education Course on Cloud Computing, Nov 15, 2011
Eucalyptus
24
• Target market ▫ On-premise (private) IaaS ▫ Use existing infrastructure to create AWS-compatible cloud
• Products: ▫ Eucalyptus IaaS ▫ Eucalyptus OpenSource ▫ Eucalyptus RightScale
Exec Education Course on Cloud Computing, Nov 15, 2011
Eucalyptus IaaS
25
Exec Education Course on Cloud Computing, Nov 15, 2011
Nirvanix
• CloudComplete ▫ Can vary among private, hybrid, public cloud
implementations, using Nirvanix’s public cloud
26
Exec Education Course on Cloud Computing, Nov 15, 2011
Virtualization and Cloud Computing • Virtualization is the ability to run “virtual machines” on top of
a “hypervisor.” ▫ A virtual machine (VM) is a software implementation of a
machine (i.e., a computer) that executes programs like a physical machine. ▫ Each VM includes its own kernel, operating system, supporting
libraries and applications. ▫ A hypervisor provides a uniform abstraction of the underlying
physical machine. Multiple VMs can execute simultaneously on a single hypervisor. ▫ The decoupling of the VM from the underlying physical hardware
allows the same VM to be started on different physical machines. • Virtualization is an enabler for cloud computing ▫ Gives the cloud computing provider the flexibility to move and
allocate the computing resources requested by the user wherever the physical resources are available.
27
Exec Education Course on Cloud Computing, Nov 15, 2011
SNIA CDMI
• Cloud Data Management Interface
• Standardizing at the IaaS level
28
Exec Education Course on Cloud Computing, Nov 15, 2011
Some Take Home Lessons
• Cloud providers are providing you a service, not just a product ▫ Product model: sell product, support product ▫ Service model: provide service, become intimately
exposed to all aspects of the service that the customer sees Seeing this from a customer’s viewpoint
29
Exec Education Course on Cloud Computing, Nov 15, 2011
Cloud Computing Costs*
* Source: McKinsey & Co
Exec Education Course on Cloud Computing, Nov 15, 2011
Cloud Computing: The Rationale • Flatten out the peaks and valleys of utilization to
get higher overall utilization of entire infrastructure
• Bring together workloads with different valley / peak behaviors
• But…is running a high utilization operation the same as running a low utilization operation?
• Velocity 2010: Datacenter Infrastructure Innovation, James Hamilton, VP & Distinguished Engineer, Amazon AWShttp://www.youtube.com/watch?v=kHW-ayt_Urk
31
Exec Education Course on Cloud Computing, Nov 15, 2011
Dealing with Peaks • Old approach: ▫ Provision for peak workload. Low utilization at other
times • HPC approach: ▫ Build a machine for a certain max job size. Provide job
queue and “on-demand”, pre-emptible access at other times.
• Cloud approach: ▫ Charge different rates for use at different times, based
on usage ▫ E.g. Amazon Spot Instance
• Typical server workloads: 10-15%
32
Exec Education Course on Cloud Computing, Nov 15, 2011
Some Application Case Studies from XLDB • XLDB11: 5th Extremely Large Databases
Workshop, Oct 18-19, SLAC, Palo Alto, CA • “State of practice” workshop • Presentations on current system implementation
and challenges, and needs and requirements ▫ E.g. presentations from: Facebook, LinkedIn,
eBay, Google, Netflix, Novartis, Quora, Metamarkets, Microsoft, …
• http://www-conf.slac.stanford.edu/xldb2011/Program.asp
33
Exec Education Course on Cloud Computing, Nov 15, 2011
Quora.com
• Scaling up Quickly in the Cloud, Edmond Lau, Quora, XLDB Workshop
34
Exec Education Course on Cloud Computing, Nov 15, 2011
Quora.com
35
Exec Education Course on Cloud Computing, Nov 15, 2011
36
Exec Education Course on Cloud Computing, Nov 15, 2011
Quora
37
Exec Education Course on Cloud Computing, Nov 15, 2011
Quora
38
Exec Education Course on Cloud Computing, Nov 15, 2011
39
Exec Education Course on Cloud Computing, Nov 15, 2011
Metamarkets, Michael Driscoll, co-founder, CTO
40
Exec Education Course on Cloud Computing, Nov 15, 2011
41
Exec Education Course on Cloud Computing, Nov 15, 2011
42
Exec Education Course on Cloud Computing, Nov 15, 2011
Metamarkets: Performance at scale
• Evaluating the online ad market • Billions of microtransactions per day • Require billion rows/second performance • Fast analytics over 100’s of terabytes • Metamarket’s Druid system ▫ Partial aggregates + In-memory data + Indexes ▫ Distributed data + Parallelizable Queries =
Horizontal Scalability ▫ Real-time analytics ▫ Implemented in the cloud (AWS)
43
Exec Education Course on Cloud Computing, Nov 15, 2011
Cloud Data Analytics, Roger Barga, Microsoft Azure
44
Exec Education Course on Cloud Computing, Nov 15, 2011
45
Exec Education Course on Cloud Computing, Nov 15, 2011
46
Exec Education Course on Cloud Computing, Nov 15, 2011
47
Exec Education Course on Cloud Computing, Nov 15, 2011
Netflix • Presentation by Eric Colson, VP, Netflix • All Netflix processing (DVD rentals and
Streaming video) is in the cloud (AWS) • Total cost of Netflix implementation may be
higher than an in-house solution but, ▫ Netflix made a business strategy decision. They
are not in the business of running IT infrastructure ▫ Cloud computing required them to build a
distributed IT team, which did not match their culture of building close teams.
48
Exec Education Course on Cloud Computing, Nov 15, 2011
Application Examples: Bioinformatics
• Crossbow ▫ Genotyping from short reads using cloud
computing ▫ http://bowtie-bio.sourceforge.net/crossbow/
index.shtml • SDSC Project to implement Hadoop-based
processing for next generation sequencing on SDSC’s HPC systems as well as clouds (AWS)
49
Exec Education Course on Cloud Computing, Nov 15, 2011
Role of Big Data • What is the connection between cloud and big data? • Cloud Scaling Lots of data • Big data Lots of data • Hadoop ▫ A software (eco)system for efficient processing of very
large data ▫ Uses MapReduce, which has become a convenient
language for low entry barrier, very large-scale data processing
• Could use cloud computing resources to implement Hadoop ▫ $1M Question: data movement and data locality
50
Exec Education Course on Cloud Computing, Nov 15, 2011
Discussion of Survey
• Results from “Cloud Storage: Adoption, Practice and Deployment”, survey conducted for the Storage Networking Industry Association (SNIA)
51
Exec Education Course on Cloud Computing, Nov 15, 2011
Research and Markets Survey Cloud Computing in HPC: Rationale for Adoption • Top reason: Access to extra resources to meet peak system load
requirements • Cost Avoidance ▫ Continued demand for HPC compute cycles. Cloud computing could deliver low-
cost computing cycles. • Capacity Management ▫ Deal with periodic demand peaks and better management of data center growth,
power, and cooling issues. • Collaboration ▫ Integration of internet-based applications and communications may allow HPC
users to better work with those both inside and outside of their organizations. • Evaluation of Cloud Systems ▫ Looking at the cloud system alternative to determine if and how they can make
use of the technology and concepts. • Organizational Requirement ▫ Making sure that the competition does not “steal a march” with a new technology.
52
Exec Education Course on Cloud Computing, Nov 15, 2011
Monitoring and Benchmarking
• Monitoring of resource usage is essential for cloud environments
• What about SLAs and QoS? • “Cloud computing means get your legal teams
lined up” ! • Resource monitoring is well-recognized need ▫ But need to ensure the right level of monitoring
and reporting is available • Benchmarking is a new frontier
53
Exec Education Course on Cloud Computing, Nov 15, 2011
Amazon Cloudwatch
54
http://aws.amazon.com/cloudwatch/
Exec Education Course on Cloud Computing, Nov 15, 2011
55
Exec Education Course on Cloud Computing, Nov 15, 2011
Azurescope
56
See: http://azurescope.cloudapp.net/BenchmarkTestCases
Exec Education Course on Cloud Computing, Nov 15, 2011
Azurescope application “probe”
57
Exec Education Course on Cloud Computing, Nov 15, 2011
Azurescope write “probe”
58
Exec Education Course on Cloud Computing, Nov 15, 2011
Future Directions: Benchmarking • The need for Big Data and Cloud benchmarks • The changing software landscape ▫ From RDBMS and Data Warehousing to NoSQL,
Hadoop, Unstructured data, Stream Processing, Graph Processing, … ▫ The changing hardware landscape
Multi-core, SSD, new types of memory, large memory, different networking options, commodity vs high-end, …
▫ Multiple platform choices Dedicated data platforms Cloud platforms
Exec Education Course on Cloud Computing, Nov 15, 2011
Benchmarking Issues • “Reference benchmarks” for big data (TPC style) ▫ Define modalities of big data ▫ Define end-to-end flows of big data ▫ Identify key real-world characteristics, e.g. multi-rack,
heterogeneous hardware ▫ Identify which existing benchmarks can be reused
• “Probe” benchmarks for clouds ▫ Cloud performance can be variable ▫ “Application-level” performance probes ▫ The Cloud Weather Service™
• Benchmarking workshop being planned for Feb 2012 timeframe
Exec Education Course on Cloud Computing, Nov 15, 2011
Future Directions: “Vertical” Clouds
• Clouds that are aimed at major markets ▫ Should have something unique about them, and a
sizable market • E.g. ▫ Collocated clouds for performance e.g. Wall Street trading systems, online
advertisement systems, etc. ▫ Collocated clouds for security / privacy
61