building a real world public cloud from the ground up · building a real‐world public cloud from...
TRANSCRIPT
[email protected]| 20121112
Building a real‐world public cloudfrom the ground upg p
Vangelis KoukisVangelis [email protected] Coordinator, okeanos Project
GRNET DeIC konference 2012 1
[email protected]| 20121112
Outline okeanos ? okeanos ?
Rationale
Design – Platform ‐ Features
Unity ‐ Automation
Opensource Upcoming Opensource – Upcoming
Greek Research and Technology Network DeIC konference 2012 2
[email protected]| 20121112
What is okeanos?
‘okeanos’ is Greek for ‘ocean’.
Oceans capture store and deliverOceans capture, store and deliverenergy, oxygen and life around the planet.
Greek Research and Technology Network DeIC konference 2012 3
[email protected]| 20121112
GRNET DeIC konference 2012 5
[email protected]| 20121112
ComputeVirtual MachinesComputeVirtual Machines
NetworkVirtual Ethernets
StorageVirtual DisksSto age
S i
tua s s
Vi l Fi llSecurityVirtual Firewalls
GRNET DeIC konference 2012 6
[email protected]| 20121112
GRNET DeIC konference 2012 8
[email protected]| 20121112
okeanos service
Goal: Production‐quality IaaS
B t i D t Al h >1600 VM / >1000 Beta in Dec, current Alpha: >1600 VMs / >1000 users
Target group: GRNET’s customers
direct: IT depts of connected institutions
indirect: university students researchers in academia indirect: university students, researchers in academia
Users manage resources over
a simple, elegant UI, or
a REST API, for full programmatic control
Greek Research and Technology Network DeIC konference 2012 10
a REST API, for full programmatic control
[email protected]| 20121112
okeanos features
Compute/Network Service: Cyclades
File Storage Service: Pithos+
Image Service: Plankton Image Service: Plankton
Identity Service: Astakos
Volume Service: Archipelago
Greek Research and Technology Network DeIC konference 2012 11
[email protected]| 20121112
How it all started
Need for easy, secure access to GRNET’s datacenters
U f i dli i li i User friendliness, simplicity
Scalable to the thousands
#VMs, TBs, users (Pithos: 10k)
running within GRNET’s AAI Federation
Resell or build your own? Resell or build your own?
IaaS cloud provider, vendor, or own infrastructure?
Greek Research and Technology Network DeIC konference 2012 13
It all depends on your needs
[email protected]| 20121112
Build on commercial IaaS?
Commercial IaaS
Amazon EC2 not an end user service Amazon EC2 not an end‐user service
Need to develop custom UI, AAI layers
Vendor lock‐in
Unsuitable for IT depts
• persistent, long‐term servers
• custom networking requirements
GRNET has invested heavily in its core network
8000k f d k fib
Greek Research and Technology Network DeIC konference 2012 14
> 8000km of dark fiber
[email protected]| 20121112
Bring vendor into datacenter?
Hypervisor lock‐in
I t k l ti it bl f bli l d? Is a turn‐key solution suitable for a public cloud?
Building public clouds is an ongoing process
Manageable by GRNET’s operation
Integrated into the rest of the infrastructure Integrated into the rest of the infrastructure
Scaling to thousands of users
Build on existing know‐how
Gain know‐how build own IaaS reuse for own services
Greek Research and Technology Network DeIC konference 2012 15
Gain know how, build own IaaS reuse for own services
[email protected]| 20121112
What about opensource?
OpenStack, Eucalyptus, OpenNebula
Need a mature opensource core to build around
Maturity, production‐readiness? Maturity, production readiness?
proven in production environments, predictable
Extensibility?
Flexibility? Flexibility?
Upgradeability, maintainability?
Greek Research and Technology Network DeIC konference 2012 16
[email protected]| 20121112
okeanos design decisions
Reuse existing components
Build on Google Ganeti Build on Google Ganeti
target commodity hardware
release to the community as opensource
Greek Research and Technology Network DeIC konference 2012 18
release to the community as opensource
[email protected]| 20121112
okeanos design principles
No need to make the world
No need to support everything
Service developed and maintained by 10‐15 peoplep y p p
Start from the architecture…
…then discover, combine, reuse the right components
And for everything that’s not already available And for everything that s not already available
Do it yourself!
Greek Research and Technology Network DeIC konference 2012 19
[email protected]| 20121112
GRNET DeIC konference 2012 20
[email protected]| 20121112
Jigsaw puzzle
Synnefo
custom cloud management software to power okeanosg p
Google Ganeti backend
VM cluster management: physical nodes, VMs, migrations
OpenStack APIs: Compute API v1.1, Object Storage APIp p , j g
with custom extensions whenever necessary
Th thi t th Then everything comes together
UI, Networking, Images, Storage, Monitoring, Identity
Greek Research and Technology Network DeIC konference 2012 21
management, Accounting, Billing, Clients, Helpdesk
[email protected]| 20121112
Why Ganeti?
No need to reinvent the wheel
Scalable proven software infrastructure Scalable, proven software infrastructure
Built with reliability and redundancy in mind
Combines open components (KVM, LVM, DRBD)
Well‐maintained, readable code
VM cluster management in production is
serious business
reliable VM control, VM migrations, resource allocation
handling node downtime software upgrades
Greek Research and Technology Network DeIC konference 2012 22
handling node downtime, software upgrades
[email protected]| 20121112
Why Ganeti?
GRNET already had long experience with Ganeti
provides 280 VMs to NOCs through the ViMa service provides 280 VMs to NOCs through the ViMa service
involved in development, contributing patches upstream
Build on existing know‐how for okeanos Build on existing know‐how for okeanos Common backend, common fixes
reuse of experience and operational procedures
simplified, less error‐prone deployment
Greek Research and Technology Network DeIC konference 2012 23
[email protected]| 20121112
Software Stack
REST API
Multiple users,multiple resources Synnefo
Multiple VMson cluster Ganeti
SingleVM KVM
Greek Research and Technology Network DeIC konference 2012 25
[email protected]| 20121112
l f iPlatform Design
user@home admin@homeWeb Client CLI Client Web Client 2
user@home admin@home
OpenStack Compute API v1.1 GRNET Proprietary
GRNETdatacenter
Synnefo cloud management software
Deb
ianGoogle Ganeti
KVM
Direct Outof Band Access
KVM
VirtualHardware
Greek Research and Technology Network DeIC konference 2012 26
[email protected]| 20121112
i l hi iVirtual Machine Actions
My_Windows_desktop
S C lStart Console
Reboot
Shutdown Destroy
Greek Research and Technology Network DeIC konference 2012 28
[email protected]| 20121112
IaaS – Compute (1)
Virtual Machines
d b KVM powered by KVM
• Linux and Windows guests, on Debian hosts
Google Ganeti for VM cluster management
accessible by the end‐user over the Web oraccessible by the end user over the Web or
programmatically (OpenStack Compute v1.1)
Greek Research and Technology Network DeIC konference 2012 29
[email protected]| 20121112
IaaS – Compute (2)
User has full control over own VMs
C Create
• Select # CPUs, RAM, System Disk
• OS selection from pre‐defined or custom Images
• popular Linux distros (Fedora, Debian, Ubuntu)
• Windows Server 2008 R2
Start, Shutdown, Reboot, Destroy, , , y
Out‐of‐Band console over VNC for troubleshooting
Greek Research and Technology Network DeIC konference 2012 30
[email protected]| 20121112
IaaS – Compute (3)
REST API for VM management
OpenStack Compute v1 1 compatible OpenStack Compute v1.1 compatible
3rd party tools and client libraries
custom extensions for yet‐unsupported functionality
Python & Django implementation
Full‐featured UI in JS/jQuery
UI i j h API li UI is just another API client
All UI operations happen over the API
Greek Research and Technology Network DeIC konference 2012 31
[email protected]| 20121112
k ( i l h )IaaS – Network (Virtual Ethernets)
IInternet
Private Network 1
Private Network 2Private Network 2
P i N k 3Private Network 3
Greek Research and Technology Network DeIC konference 2012 32
[email protected]| 20121112
IaaS – Network ‐ Functionality
Dual IPv4/IPv6 connectivity for each VM
E l tf id d fi lli Easy, platform‐provided firewalling
Array of pre‐configured firewall profiles
Or roll‐your‐own firewall inside VM
Multiple private virtual L2 networks Multiple private, virtual L2 networks
Construct arbitrary network topologies
e.g., deploy VMs in multi‐tier configurations
Exported all the way to the API and the UI
Greek Research and Technology Network DeIC konference 2012 33
Exported all the way to the API and the UI
[email protected]| 20121112
Images
Spawn
my own Ubuntu
Freezey
Greek Research and Technology Network DeIC konference 2012 35
[email protected]| 20121112
Custom Images: snf‐image
Untrusted images
H h id d d Host cannot touch user‐provided data
Resize fs, change hostname, change passwords, inject files
Split design
f h snf‐image‐host
snf‐image‐helper
All customization in helper VM
Greek Research and Technology Network DeIC konference 2012 36
[email protected]| 20121112
OpenStack Object Storage API
Block storage Block storage
Content‐based addressing for blocks
f l ll f bl k Every file is a collection of blocks
Web‐based, command‐line, and native clients
Synchronization, deduplication
An integral part of okeanos An integral part of okeanos User files, Image registry for VM Images
Goal: use common backend with Archipelago
Greek Research and Technology Network DeIC konference 2012 37
Goal: use common backend with Archipelago
[email protected]| 20121112
Images
Spawn
my own Ubuntu
Freezey
Greek Research and Technology Network DeIC konference 2012 38
[email protected]| 20121112
Images Storage
Clone
Ubuntu + user data
Snapshot
Greek Research and Technology Network DeIC konference 2012 39
[email protected]| 20121112
Images – Golden Image
Greek Research and Technology Network DeIC konference 2012 40
[email protected]| 20121112
IaaS – Storage
Greek Research and Technology Network DeIC konference 2012 41
[email protected]| 20121112
IaaS – Storage
VolumeRADOS
Volume Composer
Maps
object I/O Monitor nodes
ArchipelagoStorage
Greek Research and Technology Network DeIC konference 2012 42
Object Storage nodes
[email protected]| 20121112
IaaS – Storage
VolumeRADOS
Volume Composer
Maps
object I/O Monitor nodes
Greek Research and Technology Network DeIC konference 2012 43
Storage nodes
[email protected]| 20121112
IaaS – Storage (1)
First‐phase deployment
System provided and custom user Images System‐provided and custom user Images
Redundant storage based on DRBD
VMs survive physical node downtime or failure
Currently under testing Currently under testing
Reliable distributed storage over RADOS
C bi d i h f f h i l i Combined with custom software for snapshotting, cloning
Dynamic virtual storage volumes
Greek Research and Technology Network DeIC konference 2012 44
[email protected]| 20121112
IaaS – Storage (2)
Multi‐tier storage architecture
D di d S N d (SSD SAS d SATA ) Dedicated Storage Nodes (SSD, SAS, and SATA storage)
OSDs, e.g., for RADOS
Custom storage layer: Archipelago
h l bl k lmanages snapshots, creates clones over block pools
OS Images held as snapshots
VMs created as clones of snapshots
Greek Research and Technology Network DeIC konference 2012 45
[email protected]| 20121112
Greek Research and Technology Network DeIC konference 2012 47
[email protected]| 20121112
Greek Research and Technology Network DeIC konference 2012 48
[email protected]| 20121112
Greek Research and Technology Network DeIC konference 2012 49
[email protected]| 20121112
Greek Research and Technology Network DeIC konference 2012 50
[email protected]| 20121112
Support services
Identity: Astakos
P id h b f k Provides the user base for okeanos
Once authenticated, the user retrieves a
common auth token for programmatic access
Greek Research and Technology Network DeIC konference 2012 51
[email protected]| 20121112
./kamaki$ /kamaki$ ./kamakiUsage: kamaki <group> <command> [options]…
‐‐api=API API can be either openstack or synnefoapi API API can be either openstack or synnefo‐‐url=URL API URL‐‐token=TOKEN use token TOKEN
…
Commands:flavor info get flavor detailsflavor list list flavors
…image create create imageimage delete delete image
$ ./kamaki server shutdown 101 ‐‐url=http://localhost:8000/api/v1.1t k 1234527db2
Greek Research and Technology Network DeIC konference 2012 53
‐‐token=1234527db2…
[email protected]| 20121112
./kamaki$ ipython$ ipython
In [1]: from kamaki.client import ClientIn [2]: c = Client('http://localhost:8000/api/v1.1', "1234527db2…")In [2]: c Client( http://localhost:8000/api/v1.1 , 1234527db2… )In [3]: c.list_flavors()…In [4]: i = c.list_images()[ ] _ g ()In [5]: i[5]{u'created': u'2011‐06‐09T00:00:00+00:00',u'id': 7,u'metadata': {u'values': {u'OS': u'windows',
u'size': u'11000'}},u'name': u'Windows',u'progress': 100,u'status': u'ACTIVE',u'updated': u'2011‐09‐12T14:47:12+00:00'}
I [6] t (' i 1' 3 5)
Greek Research and Technology Network DeIC konference 2012 54
In [6]: c.create_server('mywin1', 3, 5)
[email protected]| 20121112
Live Demo
Prepare and upload Image from local template VM
Spawn compute cluster to run MPI app
Make local modifications and repeat Make local modifications and repeat
… What if it was over a 3G connection?
Time needed to upload 1GB Image file? Time needed to upload 1GB Image file?
Time needed to prepare and spawn virtual nodes?
Greek Research and Technology Network DeIC konference 2012 56
[email protected]| 20121112
Deployment
W b S
DB
Web Server REST API
API ServerSQLui
SQL
API Server
api aai
Logic RAPI
f di t h
GanetiQ
snf‐dispatcher
GanetiMaster
QueueGaneti node
KVMsnf‐gnt‐eventd
Greek Research and Technology Network DeIC konference 2012 58
…snf‐gnt‐hook
[email protected]| 20121112
Upcoming goals Credit based resource allocation Credit‐based resource allocation
Abstract away the Ganeti backend, replace with backend
connector behind the MQ
Release to community as reference implementation of
OpenStack Compute v1.1
Support live modification of VMs in Ganeti pp
Snapshots, clones in storage layer
i d i i i i li i i Dramatic decrease in VM initialization time
Support workloads with 100s of ephemeral VMs
Greek Research and Technology Network DeIC konference 2012 59
• e.g. for scientific computation, MPI jobs
[email protected]| 20121112
GRNET DeIC konference 2012 60
[email protected]| 20121112
GRNET DeIC konference 2012 61
[email protected]| 20121112
GRNET DeIC konference 2012 62
[email protected]| 20121112
GRNET DeIC konference 2012 63
[email protected]| 20121112
GRNET DeIC konference 2012 64
[email protected]| 20121112
GRNET DeIC konference 2012 65
[email protected]| 20121112
GRNET DeIC konference 2012 66
[email protected]| 20121112
GRNET DeIC konference 2012 67
[email protected]| 20121112
GRNET DeIC konference 2012 68
[email protected]| 20121112
GRNET DeIC konference 2012 69
[email protected]| 20121112
GRNET DeIC konference 2012 70
[email protected]| 20121112
GRNET DeIC konference 2012 71
[email protected]| 20121112
GRNET DeIC konference 2012 72
[email protected]| 20121112
GRNET DeIC konference 2012 73
[email protected]| 20121112
GRNET DeIC konference 2012 74
[email protected]| 20121112
GRNET DeIC konference 2012 75
[email protected]| 20121112
GRNET DeIC konference 2012 76
[email protected]| 20121112
GRNET DeIC konference 2012 77
[email protected]| 20121112
GRNET DeIC konference 2012 78
[email protected]| 20121112
GRNET DeIC konference 2012 79
[email protected]| 20121112
GRNET DeIC konference 2012 80
[email protected]| 20121112
GRNET DeIC konference 2012 81
[email protected]| 20121112
GRNET DeIC konference 2012 82
[email protected]| 20121112
GRNET DeIC konference 2012 83
[email protected]| 20121112
GRNET DeIC konference 2012 84
[email protected]| 20121112
GRNET DeIC konference 2012 85
[email protected]| 20121112
GRNET DeIC konference 2012 86
[email protected]| 20121112
GRNET DeIC konference 2012 87
[email protected]| 20121112
GRNET DeIC konference 2012 88
[email protected]| 20121112
GRNET DeIC konference 2012 89
[email protected]| 20121112
GRNET DeIC konference 2012 90
[email protected]| 20121112
GRNET DeIC konference 2012 91
[email protected]| 20121112
GRNET DeIC konference 2012 92
[email protected]| 20121112
GRNET DeIC konference 2012 93
[email protected]| 20121112
GRNET DeIC konference 2012 94
[email protected]| 20121112
GRNET DeIC konference 2012 95
[email protected]| 20121112
GRNET DeIC konference 2012 96
[email protected]| 20121112
GRNET DeIC konference 2012 97
[email protected]| 20121112
GRNET DeIC konference 2012 98
[email protected]| 20121112
GRNET DeIC konference 2012 99
[email protected]| 20121112
GRNET DeIC konference 2012 100
[email protected]| 20121112
GRNET DeIC konference 2012 101
[email protected]| 20121112
GRNET DeIC konference 2012 102
[email protected]| 20121112
GRNET DeIC konference 2012 103
[email protected]| 20121112
GRNET DeIC konference 2012 104
[email protected]| 20121112
GRNET DeIC konference 2012 105
[email protected]| 20121112
GRNET DeIC konference 2012 106
[email protected]| 20121112
GRNET DeIC konference 2012 107
[email protected]| 20121112
GRNET DeIC konference 2012 108
[email protected]| 20121112
GRNET DeIC konference 2012 109
[email protected]| 20121112
GRNET DeIC konference 2012 110
[email protected]| 20121112
GRNET DeIC konference 2012 111
[email protected]| 20121112
GRNET DeIC konference 2012 112
[email protected]| 20121112
GRNET DeIC konference 2012 113
[email protected]| 20121112
GRNET DeIC konference 2012 114
[email protected]| 20121112
GRNET DeIC konference 2012 115
[email protected]| 20121112
Current and Upcoming features
Now: Alpha2
C b i Pi h Common user base, custom user images on Pithos+
short‐term: Synnefo v0.12, Betay
Ultra‐lightweight VMs on Archipelago with RADOS backend
medium‐term
Volumes: clonable / snapshottable / attachable disks/ p /
Network and storage hotplugging
Greek Research and Technology Network DeIC konference 2012 117
Upcoming beta in fully populated datacenter
[email protected]| 20121112
Opensource
Synnefo: Cyclades / Pithos+ / Astakos y y / /
https://code.grnet.gr/projects/synnefo
https://code.grnet.gr/projects/pithos
https://code.grnet.gr/projects/astakos
kamaki
https://code.grnet.gr/projects/kamaki
Greek Research and Technology Network DeIC konference 2012 118
pip install or apt‐get install everything!
http://okeanos.iohttp://okeanos.io
[email protected]| 20121112
Thank You!
Questions?Questions?
Greek Research and Technology Network DeIC konference 2012 120
[email protected]| 20121112
Asynchronous design DB contains All state needed to handle API queries DB contains All state needed to handle API queries
no need to reach the backend
Ganeti GetInstanceInfo() is a proper job too slow Ganeti GetInstanceInfo() is a proper job, too slow
Two distinct paths, effect and update
Effect changes to VMs when servicing API requests to modify VM state
issue commands to Ganeti backend, over RAPI
ACK reception of request to user
Update DB, when interesting things happen user or admin initiated
Greek Research and Technology Network DeIC konference 2012 121
Queue notifications to Message Queue, over AMQP
[email protected]| 20121112
Synnefo deployment
W b S i
DB
Web Server REST API
API ServerSQL
ui
SQL
API Server
api aai
Logic RAPI
f di t h
GanetiQ
snf‐dispatcher
GanetiMaster
QueueGaneti node
KVMsnf‐gnt‐eventd
Greek Research and Technology Network DeIC konference 2012 122
…snf‐gnt‐hook
[email protected]| 20121112
The “effect” Path Reception of API request to modify VM state (e g Reception of API request to modify VM state (e.g.,
PUT /servers over HTTP)
API enforces access rights and policy
Ganeti knows no cloud users or access rights
Need to translate from Openstack Compute to backend
ops (e g CreateInstance())ops (e.g., CreateInstance())
Asynchronous request processing
Return HTTP 202 Accepted
it’s up to the API client to poll for completion
Greek Research and Technology Network DeIC konference 2012 123
[email protected]| 20121112
Synnefo deployment
W b S i
DB
Web Server REST API
API ServerSQL
ui
SQL
API Server
api aai
Logic RAPI
f di t h
GanetiQ
snf‐dispatcher
GanetiMaster
QueueGaneti node
KVMsnf‐gnt‐eventd
Greek Research and Technology Network DeIC konference 2012 124
…snf‐gnt‐hook
[email protected]| 20121112
The “update” path May run at any time May run at any time
Completely decoupled from “effect” path
Design goal:
G ti d i f t b f t d Ganeti admins free to bypass frontend
Synnefo adapts
Synnefo logic triggered on backend events
Ganeti operation progressing in the queue Ganeti operation progressing in the queue
Synnefo hook running inside Ganeti
H k i h i VM’ lif l
Greek Research and Technology Network DeIC konference 2012 125
• Hooks run at various phases in a VM’s lifecycle
[email protected]| 20121112
Synnefo deployment
W b S i
DB
Web Server REST APIui
API ServerSQL
SQL
API Server
api aai
Logic RAPI
f di t h
GanetiQ
snf‐dispatcher
GanetiMaster
QueueGaneti node
KVMsnf‐gnt‐eventd
Greek Research and Technology Network DeIC konference 2012 126
…snf‐gnt‐hook
[email protected]| 20121112
The Ganeti event daemon Ganeti master manages job queue Ganeti master manages job queue
Jobs pass Queued, Waiting, Running, end up in Canceled,
Success, Error.
Need a way for Synnefo to monitor job progressy y j p g
Synnefo‐specific solution: Ganeti event daemon
Passively monitor the Ganeti job queue
Notifications over AMQP on job progress
Synnefo logic listens to Message Queue, updates DB
inotify()‐based mechanism no code changes to Ganeti
Greek Research and Technology Network DeIC konference 2012 127
inotify()‐based mechanism, no code changes to Ganeti
[email protected]| 20121112
The Synnefo hook in Ganeti Different phases in a VM’s life le Different phases in a VM’s lifecycle
{pre, post} – {add, start, stop, reboot, modify}
Run Synnefo‐specific hook in post‐*
Pushes VM configuration notifications to MQ
e.g., NIC setupg , p
Greek Research and Technology Network DeIC konference 2012 128
[email protected]| 20121112
IaaS – Network ‐ Implementation
Custom modifications to Ganeti
IP pool management for the public network IP pool management for the public network
Custom‐written DHCP server over NFQUEUE
Custom interface handling scripts
Enforce VM networking configuration Enforce VM networking configuration
Private Networks
Alpha: pre‐provisioned bridges to 802.1Q VLANs
Later on: MAC‐prefix based filtering
Greek Research and Technology Network DeIC konference 2012 129
Later on: MAC prefix based filtering
[email protected]| 20121112
Synnefo deployment
W b S i
DB
Web Server REST APIui
API ServerSQL
SQL
API Server
api aai
Logic RAPI
f di t h
GanetiQ
snf‐dispatcher
GanetiMaster
QueueGaneti node
KVMsnf‐gnt‐eventd
Greek Research and Technology Network DeIC konference 2012 130
…snf‐gnt‐hook
[email protected]| 20121112
Reconciliation with Ganeti What if the MQ is do n and messa es are lost? What if the MQ is down, and messages are lost?
Ganeti is the Single Source of Truth for VM state
Reconcile DB state asynchronously
On success notification for a Ganeti GetInstanceInfo() op
Triggered periodically, e.g., using crongg p y, g , g
or even by the administrator,
i t i t i f llrunning gnt‐instance info manually
Greek Research and Technology Network DeIC konference 2012 131