hitchhiker's guide to open source cloud computing

49
Hitchhiker's Guide to Open Source Cloud Computing By Mark R. Hinkle Senior Director, Cloud Computing Community Citrix Inc. Twitter: @mrhinkle Blog: www.socializedsoftware.com Email: [email protected]

Upload: mark-hinkle

Post on 11-May-2015

5.534 views

Category:

Technology


1 download

DESCRIPTION

Imagine it’s eight o’clock on a Thursday morning and you awake to see a bulldozer out your window ready to plow over your data center. Normally you may wish to consult the Encyclopedia Galáctica to discern the best course of action but your copy is likely out of date. And while the Hitchhiker’s Guide to the Galaxy (HHGTTG) is a wholly remarkable book it doesn’t cover the nuances of cloud computing. That’s why you need the Hitchhiker’s Guide to Cloud Computing (HHGTCC) or at least to attend this talk understand the state of open source cloud computing. Specifically this talk will cover infrastructure-as-a-service, platform-as-a-service and developments in big data and how to more effectively take advantage of these technologies using open source software. Technologies that will be covered in this talk include Apache CloudStack, Chef, CloudFoundry, NoSQL, OpenStack, Puppet and many more. Specific topics for discussion will include: Infrastructure-as-a-Service - The Systems Cloud - Get a comparision of the open source cloud platforms including OpenStack, Apache CloudStack, Eucalyptus, OpenNebula Platform-as-a-Service - The Developers Cloud - Find out what tools are availble to build portable auto-scaling applications including CloudFoundry, OpenShift, Stackato and more. Data-as-a-Service - The Analytics Cloud - Want to figure out the who, what , where , when and why of big data ? You get an overview of open source NoSQL databases and technologies like MapReduce to help crunch massive data sets in the cloud. Finally you'll get a overview of the tools that can help you really take advantage of the cloud? Want to auto-scale virtual machiens to serve millions of web pages or want to automate the configuration of cloud computing environments. You'll learn how to combine these tools to provide continous deployment systems that will help you earn DevOps cred in any data center. [Finally, for those of you that are Douglas Adams fans please accept the deepest apologies for bad analogies to the HHGTTG.]

TRANSCRIPT

Page 1: Hitchhiker's Guide to Open Source Cloud Computing

Hitchhiker's Guide to Open Source Cloud Computing

By Mark R. HinkleSenior Director, Cloud Computing CommunityCitrix Inc. Twitter: @mrhinkleBlog: www.socializedsoftware.comEmail: [email protected]

Page 2: Hitchhiker's Guide to Open Source Cloud Computing

%whoami

• Dedicated to the success of the Apache CloudStack & Xen Open Source Cloud Communities at Citrix

• Conduct Build A Cloud learning activities all over the world• Joined Citrix via Cloud.com acquisition July 2011• Zenoss Open Source project to 100,000 users, 1.5 million

downloads• Former Linux Desktop Advocate (Zealot?)• Former LinuxWorld Magazine Editor-in-Chief• Open Management Consortium organizer• Author - “Windows to Linux Business Desktop Migration” –

Thomson• NetDirector Project - Open Source Configuration

Management Project• Sometimes Author and Blogger at SocializedSoftware.com• NetworkWorld Open Source Subnet

Page 3: Hitchhiker's Guide to Open Source Cloud Computing

Why Open Source and the Cloud?

• User-Driven Context from Solving Real Problems• Lower Barrier to Participation• Larger user base, users helping users • Aggressive release cycles stay current with the state-of-

the-art• Open Source innovating faster than commercial• Open data, Open standards, Open APIs

Page 4: Hitchhiker's Guide to Open Source Cloud Computing

Quick Cloud Computing Overview or the Obligatory “What is the Cloud Explanation”

Infinite Probability Drive

Page 5: Hitchhiker's Guide to Open Source Cloud Computing

Five Characteristics of Cloud

1. On-Demand Self-Service

2. Broad Network Access

3. Resource Pooling

4. Rapid Elasticity

5. Measured Service

Page 6: Hitchhiker's Guide to Open Source Cloud Computing

Cloud Computing Service Models

USER CLOUD a.k.a. SOFTWARE AS A SERVICE

Single application, multi-tenancy, network-based, one-to-many delivery of applications, all users have same access to features.

Examples: Salesforce.com, Google Docs, Red Hat Network/RHEL

DEVELOPMENT CLOUD a.k.a. PLATFORM-AS-A-SERVICE

Application developer model, Application deployed to an elastic service that autoscales, low administrative overhead. No concept of virtual machines or operating system. Code it and deploy it.

Examples: VMware CloudFoundry, Google AppEngine, Windows Azure, Rackspace Sites, Red Hat OpenShift, Active State Stackato, Appfog

SYSTEMS CLOUD a.k.a INFRASTRUCTURE-AS-A-SERVICE

Servers and storage are made available in a scalable way over a network.

Examples: EC2,Rackspace CloudFiles, OpenStack, CloudStack, Eucalyptus, OpenNebula

SaaS

PaaS

IaaS

Page 7: Hitchhiker's Guide to Open Source Cloud Computing

Deployment Models: Public, Private & Hybrid

Page 8: Hitchhiker's Guide to Open Source Cloud Computing

Building Open Source Clouds

Page 9: Hitchhiker's Guide to Open Source Cloud Computing

Cloud Architecture

Page 10: Hitchhiker's Guide to Open Source Cloud Computing

Hypervisors

Open Source• Xen, Xen Cloud Platform (XCP)• KVM – Kernel-based Virtualization• VirtualBox* - Oracle supported Virtualization Solutions • OpenVZ* - Container-based, Similar to Solaris Containers or BSD Zones• LXC – User Space chrooted installs

Proprietary• VMware• Citrix Xenserver (based • Microsoft Hyper-V• OracleVM (Based on OS Xen)

Page 11: Hitchhiker's Guide to Open Source Cloud Computing

Open Virtual Machine Formats

Open Virtualization Format (OVF) is an open standard for packaging and distributing virtual appliances or more generally software to be run in virtual machines.

Formats for hypervisors/cloud technologies:

• Amazon - AMI• KVM – QCOW2• VMware – VMDK• Xen – IMG• VHD – Virtual Hard Disk - Hyper-V

Page 12: Hitchhiker's Guide to Open Source Cloud Computing

Sourcing Cloud Appliances Tool/Project What you can do with them

Bitnami BitNami provides free, ready to run environments for your favorite open source web applications and frameworks, including Drupal, Joomla!, Wordpress, PHP, Rails, Django and many more.

Boxgrinder BoxGrinder is a set of projects that help you grind out appliances for multiple virtualization and Cloud providers

Oz Command-line tool that has the ability to create images for common Linux distributions to run on KVM

SUSE Studio SUSE Studio supports building and deploying directly to cloud services such as Amazon EC2.

Page 13: Hitchhiker's Guide to Open Source Cloud Computing

Scale-Up or Scale-Out

Vertical Scaling (Scale-Up) – Allocate additional resources to VMs, requires a reboot, no need for distributed app logic, single-point of OS failure

Horizontal Scaling (Scale-Out) –

Application needs logic to work in distributed fashion (e.g. HA-Proxy and Apache, Hadoop)

Page 14: Hitchhiker's Guide to Open Source Cloud Computing

Compute Clouds (IaaS)

Year Started License Virtualization Technologies

Apache CloudStack

2008 Apache Xenserver, Xen Cloud Platform, KVM, VMware (Hyper-V developing)

Eucalyptus 2006 GPL Xen, KVM, VMware (commercial version)

OpenNebula 2005 Apache Xen, KVM, VMware

OpenStack 2010 (Developed by NASA by Anso Labs previously)

Apache VMware ESX and ESXi, , Xen, Xen Cloud Platform KVM, LXC, QEMU and Virtual Box

Numerous companies are building cloud software on OpenStack including Nebula, Piston Inc., CloudScaling

Page 15: Hitchhiker's Guide to Open Source Cloud Computing

OpenStack – Ecosystem of Projects

Enterprise Message Queue based on Rabbit MQ (ESB)

Object Storage“Swift”

Image Service“Glance

Compute“Nova”

Dashboard “Horizon”

KVM, VMware, Xen Cloud PlatformCeph, Gluster

Advanced Cloud and Networking services accessing the Quantum API

Firewall Service

Gateway Service

Quantum

Netw

orking Fabric REST API

PluginsOpenvSwitch

QuantumPlugin-insId

entit

y Se

rvic

es “

Keys

tone

” API

20 Collective projects hosted at:

https://launchpad.net/openstack

Page 16: Hitchhiker's Guide to Open Source Cloud Computing

Cloud APIs

• jclouds• libcloud• deltacloud• fog

Page 17: Hitchhiker's Guide to Open Source Cloud Computing

Cloud Computing Storage

Project Description

GlusterFS Scale Out NAS system aggregating storage over Ethernet or Infiniband

Ceph Distributed file storage system developed by DreamHost

OpenStack Storage Long-term object storage system

Sheepdog Distributed storage for KVM hypervisors

Page 18: Hitchhiker's Guide to Open Source Cloud Computing

Platform-as-a-Service (PaaS)Project Year Started Sponsors Languages/Frameworks

CloudFoundry 2011 VMware Spring for Java, Ruby for Rails and Sinatra, node.js, Grails, Scala on Lift and more via partners (e.g. Python, PHP)

Cloudify 2012 Gigaspaces

OpenShift ** 2011 Red Hat Java, Ruby, PHP, Perl and Python

Stackato* 2012 ActiveState Java, Python, PHP, Ruby, Perl, Node.js, others

WSO2 Stratus 2010 WSO2 Jboss, Java EE6

Page 19: Hitchhiker's Guide to Open Source Cloud Computing

Software Defined Networking (SDN)

Page 20: Hitchhiker's Guide to Open Source Cloud Computing

Overview of Software Defined Networking

Business Applications

Network Services

SDN Control Software

API API

Network DevicesNetwork DevicesNetwork Devices

Network DevicesNetwork DevicesNetwork Devices

ApplicationLayer

Control Layer

InfrastructureLayer

Control Data Plane Interface (e.g. OpenFlow)

Page 21: Hitchhiker's Guide to Open Source Cloud Computing

Why SDN?

Cloud Promise Cloud Reality

Centralized Configuration and Automation

Without true virtualization, network devices must still be manually configured.

Instant Self-Service Provisioning

In a physical network, it could take a long time for network engineer to provision new services.

Elasticity and Scalability By horizontally scaling up the physical network, elasticity is lost.

Designed for Failure Failover can be automated and physical network limitations can be alleviated.

Source: Midokura

Page 22: Hitchhiker's Guide to Open Source Cloud Computing

Open Flow

OpenFlow enables networks to evolve, by giving a remote controller the power to modify the behavior of network devices, through a well-defined "forwarding instruction set". The growing OpenFlow ecosystem now includes routers, switches, virtual switches, and access points from a range of vendors.

Image from http://www.openflow.org/documents/openflow-wp-latest.pdf

Page 23: Hitchhiker's Guide to Open Source Cloud Computing

Software Defined NetworkingProject Description

Floodlight The Floodlight controller is an enterprise-class, Apache-licensed, Java-based OpenFlow Controller.

Indigo Indigo is an open source project to support OpenFlow on a range of physical switches. By leveraging hardware features of Ethernet switch ASICs, Indigo supports high rates for high port counts, up to 48 10-gigabit ports. Multiple gigabit platforms with 10-gigabit uplinks are also supported.

OpenStack Networking“Quantum”

Pluggable, scalable, API-driven network and IP management

Open vSwitch Open vSwitch is a open source (ASL 2.0), multilayer virtual switch designed to enable massive network automation through programmatic extension, while still supporting standard management interfaces and protocols (e.g. NetFlow, sFlow, SPAN, RSPAN, CLI, LACP, 802.1ag).

Page 24: Hitchhiker's Guide to Open Source Cloud Computing

Big Data

Deep Thought

Page 25: Hitchhiker's Guide to Open Source Cloud Computing

1 Billion Facebook Users - October 2012

Dec-04

Mar-05

Jun-05

Sep-05

Dec-05

Mar-06

Jun-06

Sep-06

Dec-06

Mar-07

Jun-07

Sep-07

Dec-07

Mar-08

Jun-08

Sep-08

Dec-08

Mar-09

Jun-09

Sep-09

Dec-09

Mar-10

Jun-10

Sep-10

Dec-10

Mar-11

Jun-11

Sep-11

Dec-11

Mar-12

Jun-12

Sep-12

0

200

400

600

800

1000

1200

Faceb

ook U

sers

in

M

illion

s

Source: Benphoster.com

Page 26: Hitchhiker's Guide to Open Source Cloud Computing

Twitter at 400M Tweets Per Day – June 2012

Jan-07

Mar-07

May-07

Jul-07

Sep-07

Nov-07Jan

-08

Mar-08

May-08

Jul-08

Sep-08

Nov-08Jan

-09

Mar-09

May-09

Jul-09

Sep-09

Nov-09Jan

-10

Mar-10

May-10

Jul-10

Sep-10

Nov-10Jan

-11

Mar-11

May-11

Jul-11

Sep-11

Nov-11Jan

-12

Mar-12

May-12

0

50

100

150

200

250

300

350

400

450

Tweets

in

Million

s

Source :TheBigDataGroup.com

Page 27: Hitchhiker's Guide to Open Source Cloud Computing

Data is growing faster than storage capacity and computing power. Legacy systems hold organizations back; storage software must include multi-petabyte capacity, support potentially billions of objects, and provide application performance awareness and agile provisioning.

-Gartner, Big Data Challenges for the IT Infrastructure Team

Big Data and Storage Infrastructure

Page 28: Hitchhiker's Guide to Open Source Cloud Computing

Big Data LandscapeSource

: Big

Data

Gro

up.co

m

Page 29: Hitchhiker's Guide to Open Source Cloud Computing

Open Source NoSQL DatabasesName Type DescriptionApache Cassandra

Wide Column Store/Families

API: many » Query Method: MapReduce, Replicaton: , Written in: Java, Concurrency: eventually consistent , Misc: like "Big-Table on Amazon Dynamo alike", initiated by Facebook

CouchDB Document Store API: Memcached API+protocol (binary and ASCII) , most languages, Protocol: Memcached REST interface for cluster conf + management, Written in: C/C++ + Erlang (clustering), Replication: Peer to Peer, fully consistent, Misc: Transparent topology changes during operation, provides memcached-compatible caching buckets

HBase Wide Column Store/Families

API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication: HDFS Replication, Written in: Java

Hypertable Wide Column Store/Families

PI: Thrift (Java, PHP, Perl, Python, Ruby, etc.), Protocol: Thrift, Query Method: HQL, native Thrift API, Replication: HDFS Replication, Concurrency: MVCC, Consistency Model: Fully consistent Misc: High performance C++ implementation of Google's Bigtable.

MongoDB Document Store API: BSON, Protocol: C, Query Method: dynamic object-based language & MapReduce, Replication: Master Slave & Auto-Sharding, Written in: C++,Concurrency

Redis Key Value/ Tuple Store

API: Tons of languages, Written in: C, Concurrency: in memory and saves asynchronous disk after a defined time. Append only mode available. Different kinds of fsync policies. Replication: Master / Slave, Misc: also lists, sets, sorted sets, hashes, queues.

Riak Key Value / Tuple Store

API: JSON, Protocol: REST, Query Method: MapReduce term matching , Scaling: Multiple Masters; Written in: Erlang, Concurrency: eventually consistent (stronger then MVCC via Vector Clocks)

Page 30: Hitchhiker's Guide to Open Source Cloud Computing

MapReduce

Problem Data

Master Node

WorkerNode 1

Worker Node 2

Worker Node 3

Solution Data

Map

Reduce

Page 31: Hitchhiker's Guide to Open Source Cloud Computing

Apache Hadoop

Overview• Handles large amounts of data• Stores data in native format• Delivers linear scalability at low cost• Resilient in case of infrastructure failures• Transparent application scalability

Facts • Apache top-level open source project• One framework for storage and compute

– HDFS – Scalable storage in Hadoop Distributed File System (HDFS)– Compute via the MapReduce distributed processing platform

• Domain Specific Language (DSL) - Java

Page 32: Hitchhiker's Guide to Open Source Cloud Computing

Hadoop Architecture

Hadoop Common

HDFSDistributes & replicates data

across machines

MapReduceDistributes & monitors tasks

Hive Data warehouse that

provides SQL interface. Ad hoc projection of

data structure to unstructured

MapReduce

• Parallel programming

• Handles large data blocks

Non-Relational DB

HBase Column-oriented

schema-less distributed DB modeled after Google’s BigTableRandom real time

read/write.

Scripting

PigPlatform for

manipulating and analyzing large data sets.

Scripting language for analysts.

Mahout Machine learning

libraries for recommendations ,

clustering, classifications and item sets.

Machine Learning

Chuc

kwa

Zook

eepe

r

Managem

en

t

Page 33: Hitchhiker's Guide to Open Source Cloud Computing

Big Data Summary• Quantity of Machine Created Data Increasing Drastically (examples: networked

sensor data from mobile phones and GPS devices)• Data manipulation moving from batched to real-time• Cloud services giving everyone Big Data tools • Consumer company speed and scale requirements driving efficiencies in Big Data

storage and analytics• New and broader number of data sources being meshed together• Big Data Apps means using Big Data is faster and easier

Page 34: Hitchhiker's Guide to Open Source Cloud Computing

Cloud Management Tools

Page 35: Hitchhiker's Guide to Open Source Cloud Computing

Automation in the Cloud

Meat Cloud Cloud Operations

Page 36: Hitchhiker's Guide to Open Source Cloud Computing

4 Types of Management Tools

ProvisioningInstallation of operating systems and other software

Configuration ManagementSets the parameters for servers, can specify installation parameters

Orchestration/AutomationAutomate tasks across systems

MonitoringRecords errors and health of IT infrastructure

Page 37: Hitchhiker's Guide to Open Source Cloud Computing

Management Toolchains

Configuration

Patching and

Provisioning

Monitoring

Toolchain (n):

A set of tools where the output of one tool becomes the input of another tool

Page 38: Hitchhiker's Guide to Open Source Cloud Computing

ProvisioningProject Installation Targets

Axemblr Provisionr Can provision 10s to 1000s of machines on various clouds.

Cobbler Distributed virtual infrastructure using koan (kickstart of a network to PXE boot VMs) for Red Hat, OpenSUSE Fedora, Debian, Ubuntu VMs

JuJu Public Clouds - Amazon Web Services HP Cloud, Private OpenStack clouds, Bare Metal via MAAS.

Salt Cloud Tool to provision “salted” VMs that can then be updated by a central server via ZeroMQ

Crowbar (Bare metal provisioning)

Page 39: Hitchhiker's Guide to Open Source Cloud Computing

Configuration Management Tools

Project Year Started Language License Client/Server

Cfengine 1993 C Apache Yes

Chef 2009 Ruby Apache Chef Solo – No Chef Server - Yes

Puppet 2004 Ruby GPL Yes & standalone

Salt 2011 Python Apache yes

Page 40: Hitchhiker's Guide to Open Source Cloud Computing

Automation/Orchestration ToolsProject Description

Capistrano Utility and framework for executing commands in parallel on multiple remote machines, via SSH. It uses a simple DSL that allows you to define tasks, which may be applied to machines in certain roles

RunDeck Rundeck is an open-source process automation and command orchestration tool with a web console.

Func Func provides a two-way authenticated system for generically executing tasks, integrations with puppet and cobbler.

MCollective The Marionette Collective AKA MCollective is a framework to build server orchestration or parallel job execution systems.

Salt Execute arbitrary shell commands or choose from dozens of pre-built modules of common (or complex) commands.

Scalr Provide scaling across multiple cloud computing platforms, integrates with Chef.

Page 41: Hitchhiker's Guide to Open Source Cloud Computing

Conceptual Automated Toolchain

BootStrapped ImageCloudStackOpenStack

ConfigurationPuppet

Chef

Start/Stop ServicesRunDeck

CapistranoMCollective

ProvisionCobbler

SUSE Stuido

MonitoringNagiosZenoss Cacti

Generate ImagesSUSE StudioBoxGrinder

Page 42: Hitchhiker's Guide to Open Source Cloud Computing

NetFlix Open Source - ToolBag for AWS

http://netflix.github.com

Page 43: Hitchhiker's Guide to Open Source Cloud Computing
Page 44: Hitchhiker's Guide to Open Source Cloud Computing

Goodbye and thanks for all the fish!

Page 45: Hitchhiker's Guide to Open Source Cloud Computing

Questions?

Slides Can be Viewed and Downloaded at:http://www.slideshare.net/socializedsoftware/

Copyright Mark R. Hinkle, available under the CCbySA license some rights reserved. 2012 -2013

Page 46: Hitchhiker's Guide to Open Source Cloud Computing

Contact Me

Professional: [email protected]: [email protected]

Professional: 919.228.8049

Professional: http://www.cloudstack.orgPersonal: http://www.socializedsoftware.comTwitter: @mrhinkle

Mark R. Hinkle

Senior Director,Cloud Computing CommunityCitrix Systems Inc. Open Source Enthusiast

Page 47: Hitchhiker's Guide to Open Source Cloud Computing

Appendix

Page 49: Hitchhiker's Guide to Open Source Cloud Computing

Monitoring Tools

License Type of Monitoring Collection Methods

Cacti / RRDTool GPL Performance SNMP, syslog

Graphite Apache 2.0 Performance Agent

Nagios GPL Availability SNMP,TCP, ICMP, IPMI, syslog

Zabbix GPL Availability/ Performance and more

SNMP, TCP/ICMP, IPMI, Synthetic Transactions

Zenoss GPL Availability, Performance, Event Management

SNMP, ICMP, SSH, syslog, WMI