cloud computing with nimbus fnal, january 2009 kate keahey ([email protected]) university of...
TRANSCRIPT
Cloud Computing with Nimbus FNAL, January 2009
Kate Keahey
University of Chicago
Argonne National Laboratory
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Cloud Computing
Science Clouds
Elastic computing,Pay-as-you-go,
Capital expense
operational expense
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Everything-as-a-Service
IaaS
PaaS
SaaS
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
The Quest Begins
Code complexity Resource control
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
“Workspaces”
Dynamically provisioned environments Environment control Resource control
Hardware implementations vs virtualization
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
A Brief History of Nimbus
Research on agreement-based
services
Xen released
FirstWorkspace Service
release
EC2 goes online
EC2 gatewayavailable
STAR productionruns on EC2
Nimbus Cloudcomes online
Support for EC2 interfaces
2003 20092006
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Nimbus Overview
Goal: open source, extensible, IaaS implementation and tools Specifically targeting scientific community A platform for experimentation with features for
scientific needs Set up private clouds (privacy, expense
considerations) Tools
IaaS layer (Workspace Service) Orchestration layer (Context Broker, gateway)
http://workspace.globus.org/
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
VWSService
The Workspace Service
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
The Workspace Service
Poolnode
Trusted Computing Base (TCB)
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
The workspace service publishesinformation on each workspace
as standard WSRF ResourceProperties.
Users can query thoseproperties to find out
information about theirworkspace (e.g. what IP
the workspace was bound to)
Users can interact directly with their
workspaces the same way the would with a
physical machine.
VWSService
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Workspace Service Interfaces and Clients
Web Services based Web Service Resource Framework (WSRF)
GT-based Elastic Computing Cloud (EC2)
Supported: ec2-describe-images, ec2-run-instances, ec2-describe-instances, ec2-terminate-instances, ec2-reboot-instances, ec2-add-keypair, ec2-delete-keypair
Unsupported: availability zones, security groups, elastic IP assignment, REST
Used alongside WSRF interfaces E.g., the University of Chicago cloud allows you to connect
via the cloud client or via the EC2 client
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Security
GSI authentication and authorization PKI credential required Works with Grid proxies VOMS, Shibboleth (via GridShib), custom PDPs
Secure access to VMs EC2 key generation or accessed from .ssh
Validating images and image data Collaboration with Vienna University of Technology
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Networking
Network configuration External: public IPs or private IPs (via VPN) Internal: private network via a local cluster
network Each VM can specify multiple NICs mixing
private and public networks (WSRF only) E.g., cluster worker nodes on a private
network, headnode on both public and private network
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
The Back Story
Poolnode
Trusted Computing Base (TCB)
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
Poolnode
VWSService
Workspace WSRF front-end that allows clients
to deploy and manage virtual workspaces
Resource manager for a pool of physical nodesDeploys and manages
Workspaces on the nodes
Each node must have a VMM (Xen) installed, as
well as the workspace control program that manages
individual nodes
Workspace back-end:
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Workspace Components
workspacecontrol
workspaceresourcemanager
workspacepilot
workspaceclient
workspaceservice
EC
2W
SR
F
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Workspace Control
VM image propagation Image management and reconstruction
Creating blank partitions, sharing partitions VM control
Starting, stopping, pausing, etc. Integrating a VM into the network
Assigning MAC addresses and IP addresses DHCP delivery tool Building up a trusted (non-spoofable) networking layer
Contextualization information management Talks to the workspace service via ssh Standalone component Some functionality overlap with libvirt Implementations in Xen and KVM (queued up for release)
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
The Workspace Resource Manager
Basic slot fitting Implements “immediate leases” Extensible vehicle to experiment with different leases Open source resource manager for multiple different
VMMs Datacenter technology equivalent
Can be replaced by OpenNebula or other datacenter technologies
Deployment University of Chicago, University of Florida, Purdue,
Masaryk University and all the other Science Cloud sites
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
The Workspace Pilot Challenge: how can I provide a virtualization
solution without disrupting the current operation of my cluster?
Flying Low: the Workspace Pilot Integrates with popular LRMs (such as PBS, SGE) Implements “best effort” leases Glidein approach: submits a “pilot” program that claims a
resource slot Includes administrator tools
Deployment Testing @ U of Victoria (Atlas), Ian Gable and collaborators Adapting for the use of the Atlas experiment @ CERN,
Omer Khalid
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Cloud Closure
workspacecontrol
workspaceresourcemanager
workspacepilot
workspaceclient
cloudclient
storageservice
workspaceservice
EC
2W
SR
F
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
IaaS Gateway
Goals Access to different IaaS infrastructures Account management Facilitate movement between academic and
commercial clouds and creation of meta-clouds Combine higher-level tools and IaaS
Released as service, not as code First online in June 2007, currently in a rewrite Used to move e.g., HEP STAR experiments between
Science Clouds and EC2
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
The IaaS Gateway
IaaSgateway
EC2potentially other providers
workspacecontrol
workspaceresourcemanager
workspacepilot
workspaceclient
cloudclient
storageservice
workspaceservice
EC
2W
SR
F
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
IP1IP1 HK1HK1 IP2IP2 HK2HK2 IP3IP3 HK3HK3
MPIMPI
Reciprocal exchange of information: networking and security
One-click Virtual Clusters
Parameterizable appliance Tightly-coupled clusters
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Context Broker
IP1IP1 HK1HK1
IP1IP1
IP2IP2
IP3IP3
HK1HK1
HK2HK2
HK3HK3
IP1IP1 HK1HK1
IP1IP1
IP1IP1
IP1IP1
IP1IP1
IP1IP1
IP1IP1
IP1IP1 HK1HK1
IP1IP1
IP1IP1
IP1IP1
IP1IP1
IP1IP1
IP1IP1
Context Context BrokerBroker
IP2IP2 HK2HK2
IP1IP1
IP2IP2
IP3IP3
HK1HK1
HK2HK2
HK3HK3
IP3IP3 HK3HK3
IP1IP1
IP2IP2
IP3IP3
HK1HK1
HK2HK2
HK3HK3
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Goals for Context Broker
Can work with every appliance Appliance schema, can be implemented in
terms of many configuration systems Can work with every cloud provider
Simple and minimal conditions on generic context delivery
Can work across multiple cloud providers, in a distributed environment
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Status for Context Broker
Release history: In alpha testing since August ‘07 First released summer July ‘08 (v 1.3.3) Latest update January ‘09 (v 2.2)
Used to contextualize 100s of nodes for EC2 STAR runs
Contextualized images on workspace marketplace Working with rPath to make contextualizatin easier
for the user
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
End of Nimbus Tour
workspacecontrol
workspaceresourcemanager
workspacepilot
workspaceservice
workspaceclient
cloudclient
IaaSgateway
cont
ext
brok
er
contextclient
EC2potentially other providers
storageservice
EC
2W
SR
F
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Science Clouds
Make it easy for scientific projects to experiment with cloud computing
Can cloud computing be used for science?
Evolve software in response to the needs of scientific projects
Start with EC2-like functionality and evolve to serve scientific projects: virtual clusters, diverse resource leases
Federating clouds: moving between cloud resources in academic and commercial space
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Science Cloud Resources
University of Chicago (Nimbus): first cloud, online since March 4th 2008 16 nodes of UC TeraPort cluster, public IPs
University of Florida Online since 05/08 16-32 nodes, access via VPN
Other Science Clouds Masaryk University, Brno, Czech Republic (08/08), Purdue
(09/08) Installations in progress: IU, Grid5K, others
Using EC2 for overflow Minimal governance model http://workspace.globus.org/clouds
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Cloud Use
~100 DNs Utilization:
Overall: 16% Peak pw: 86%
(week of 7/14) Requests rejected:
None untill 7/14 Lots afterwards ;-)
Data scaled to the number of days
Nimbus utilization
0
10
20
30
40
50
60
3/08 4/08 5/08 6/08 7/08 8/08 9/08 10/08 11/08 12/08 1/09
utilization (%)
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Who Runs on Nimbus?
HadoopAliEnGT-scalabilitySTARMontage workflowsGridFTP testingworkspace-teamTestingOSGgeofestbioinformaticsOther
Project diversity: Science, CS, education, build&test…
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Hadoop over ManyClouds
CS research: investigate latency-sensitive apps, e.g. Hadoop Need access to distributed resources, and high level of privilege
to run a ViNE router Virtual workspace: ViNE router + application VMs Paper: “CloudBLAST: Combining MapReduce and Virtualization
on Distributed Resources for Bioinformatics Applications” by Andréa Matsunaga, Maurício Tsugawa and José Fortes. eScience 2008.
U of FloridaU of Chicago
ViNErouter
ViNErouter
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Alice HEP Experiment at CERN
CHEP paper in preparation
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
STAR
STAR: a high-energy physics experiment Need resources with the right configuration
Complex environments: correct versions of operating systems, libraries, tools, etc all have to be installed.
Consistent environments: require validation A virtual OSG STAR cluster
OSG cluster OSG CE (headnode), gridmapfiles, host certificates, NSF, PBS
STAR worker nodes: SL4 + STAR conf Requirements
One-click virtual cluster deployment Migration: Science Clouds -> EC2
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
STAR (cntd) From proof-of-concept to production runs
~2 years ago: proof-of-concept Last September: EC2 runs of up to 100 nodes (production
scale, non-critical codes) Testing for critical production deployment
Performance Within 10% of expected performance for applications
Work by Jerome Lauret, Doug Olson, Leve Hajdu, Lidia Didenko
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Scalability Testing Motivation
Test scalability of various Globus components Test on a different platforms
Workspaces Globus 101 + others
Requirements very short-term but flexible access to diverse platforms
Work by various members of the Globus community (Tom Howe and John Bresnahan)
Resulted in provisioning a private cloud for Globus Typically very short-lived communities of one
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Montage Workflows
Evaluating a cloud from user’s perspective Paper: “Exploration of the Applicability of Cloud
Computing to Large-Scale Scientific Workflows”, C. Hoffa, T. Freeman, G. Mehta, E. Deelman, K. Keahey, SWBES08: Challenging Issues in Workflow Applications
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
User Environments
Cloud Computing Ecosystem
VMM/datacenter/IaaSVMM/datacenter/IaaS
Appliance Providersmarketplaces
commercial providerscommunities
Deployment Orchestratororchestrate the deployment of environments across possibly
many cloud providers
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Open Source IaaS Implementations
OpenNebula Open source datacenter implementation University of Madrid, I. Llorente & team, 03/2008
Eucalyptus Open source implementation of EC2 UCSB, R. Wolski & team, 06/2008
Cloud-enabled Nimrod-G Open source implementation of EC2 Monash University, MeSsAGE Lab, 01/2009
Industry efforts openQRM, Enomalism
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
Friends and Family
Committers: Kate Keahey & Tim Freeman (ANL/UC), Ian Gable (UVIC)
A lot of help from the community, see: http://workspace.globus.org/people.html
Collaborations: Cumulus: S3 implementation (Globus team) EBS implementation with IU Appliance management: rPath and Bcfg2 project Virtual network overlays: University of Florida Security: Vienna University of Technology
10/20/08 The Nimbus Toolkit: http//workspace.globus.org
To the Future and Beyond
Increasing Importance of Appliance Providers Cloud computing tools Increased interest in cloud interoperability
Standards: “rough consensus & working code” Image formats, contextualization capabilities, cloud
interfaces, etc. Cloud markets