cloud computing with nimbus fnal, january 2009 kate keahey ([email protected]) university of...

39
Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey ([email protected]) University of Chicago Argonne National Laboratory

Upload: mason-cooke

Post on 27-Mar-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

Cloud Computing with Nimbus FNAL, January 2009

Kate Keahey

([email protected])

University of Chicago

Argonne National Laboratory

Page 2: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Cloud Computing

Science Clouds

Elastic computing,Pay-as-you-go,

Capital expense

operational expense

Page 3: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Everything-as-a-Service

IaaS

PaaS

SaaS

Page 4: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

The Quest Begins

Code complexity Resource control

Page 5: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

“Workspaces”

Dynamically provisioned environments Environment control Resource control

Hardware implementations vs virtualization

Page 6: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

A Brief History of Nimbus

Research on agreement-based

services

Xen released

FirstWorkspace Service

release

EC2 goes online

EC2 gatewayavailable

STAR productionruns on EC2

Nimbus Cloudcomes online

Support for EC2 interfaces

2003 20092006

Page 7: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Nimbus Overview

Goal: open source, extensible, IaaS implementation and tools Specifically targeting scientific community A platform for experimentation with features for

scientific needs Set up private clouds (privacy, expense

considerations) Tools

IaaS layer (Workspace Service) Orchestration layer (Context Broker, gateway)

http://workspace.globus.org/

Page 8: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

VWSService

The Workspace Service

Page 9: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

The Workspace Service

Poolnode

Trusted Computing Base (TCB)

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

The workspace service publishesinformation on each workspace

as standard WSRF ResourceProperties.

Users can query thoseproperties to find out

information about theirworkspace (e.g. what IP

the workspace was bound to)

Users can interact directly with their

workspaces the same way the would with a

physical machine.

VWSService

Page 10: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Workspace Service Interfaces and Clients

Web Services based Web Service Resource Framework (WSRF)

GT-based Elastic Computing Cloud (EC2)

Supported: ec2-describe-images, ec2-run-instances, ec2-describe-instances, ec2-terminate-instances, ec2-reboot-instances, ec2-add-keypair, ec2-delete-keypair

Unsupported: availability zones, security groups, elastic IP assignment, REST

Used alongside WSRF interfaces E.g., the University of Chicago cloud allows you to connect

via the cloud client or via the EC2 client

Page 11: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Security

GSI authentication and authorization PKI credential required Works with Grid proxies VOMS, Shibboleth (via GridShib), custom PDPs

Secure access to VMs EC2 key generation or accessed from .ssh

Validating images and image data Collaboration with Vienna University of Technology

Page 12: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Networking

Network configuration External: public IPs or private IPs (via VPN) Internal: private network via a local cluster

network Each VM can specify multiple NICs mixing

private and public networks (WSRF only) E.g., cluster worker nodes on a private

network, headnode on both public and private network

Page 13: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

The Back Story

Poolnode

Trusted Computing Base (TCB)

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

Poolnode

VWSService

Workspace WSRF front-end that allows clients

to deploy and manage virtual workspaces

Resource manager for a pool of physical nodesDeploys and manages

Workspaces on the nodes

Each node must have a VMM (Xen) installed, as

well as the workspace control program that manages

individual nodes

Workspace back-end:

Page 14: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Workspace Components

workspacecontrol

workspaceresourcemanager

workspacepilot

workspaceclient

workspaceservice

EC

2W

SR

F

Page 15: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Workspace Control

VM image propagation Image management and reconstruction

Creating blank partitions, sharing partitions VM control

Starting, stopping, pausing, etc. Integrating a VM into the network

Assigning MAC addresses and IP addresses DHCP delivery tool Building up a trusted (non-spoofable) networking layer

Contextualization information management Talks to the workspace service via ssh Standalone component Some functionality overlap with libvirt Implementations in Xen and KVM (queued up for release)

Page 16: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

The Workspace Resource Manager

Basic slot fitting Implements “immediate leases” Extensible vehicle to experiment with different leases Open source resource manager for multiple different

VMMs Datacenter technology equivalent

Can be replaced by OpenNebula or other datacenter technologies

Deployment University of Chicago, University of Florida, Purdue,

Masaryk University and all the other Science Cloud sites

Page 17: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

The Workspace Pilot Challenge: how can I provide a virtualization

solution without disrupting the current operation of my cluster?

Flying Low: the Workspace Pilot Integrates with popular LRMs (such as PBS, SGE) Implements “best effort” leases Glidein approach: submits a “pilot” program that claims a

resource slot Includes administrator tools

Deployment Testing @ U of Victoria (Atlas), Ian Gable and collaborators Adapting for the use of the Atlas experiment @ CERN,

Omer Khalid

Page 18: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Cloud Closure

workspacecontrol

workspaceresourcemanager

workspacepilot

workspaceclient

cloudclient

storageservice

workspaceservice

EC

2W

SR

F

Page 19: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

IaaS Gateway

Goals Access to different IaaS infrastructures Account management Facilitate movement between academic and

commercial clouds and creation of meta-clouds Combine higher-level tools and IaaS

Released as service, not as code First online in June 2007, currently in a rewrite Used to move e.g., HEP STAR experiments between

Science Clouds and EC2

Page 20: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

The IaaS Gateway

IaaSgateway

EC2potentially other providers

workspacecontrol

workspaceresourcemanager

workspacepilot

workspaceclient

cloudclient

storageservice

workspaceservice

EC

2W

SR

F

Page 21: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

IP1IP1 HK1HK1 IP2IP2 HK2HK2 IP3IP3 HK3HK3

MPIMPI

Reciprocal exchange of information: networking and security

One-click Virtual Clusters

Parameterizable appliance Tightly-coupled clusters

Page 22: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Context Broker

IP1IP1 HK1HK1

IP1IP1

IP2IP2

IP3IP3

HK1HK1

HK2HK2

HK3HK3

IP1IP1 HK1HK1

IP1IP1

IP1IP1

IP1IP1

IP1IP1

IP1IP1

IP1IP1

IP1IP1 HK1HK1

IP1IP1

IP1IP1

IP1IP1

IP1IP1

IP1IP1

IP1IP1

Context Context BrokerBroker

IP2IP2 HK2HK2

IP1IP1

IP2IP2

IP3IP3

HK1HK1

HK2HK2

HK3HK3

IP3IP3 HK3HK3

IP1IP1

IP2IP2

IP3IP3

HK1HK1

HK2HK2

HK3HK3

Page 23: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Goals for Context Broker

Can work with every appliance Appliance schema, can be implemented in

terms of many configuration systems Can work with every cloud provider

Simple and minimal conditions on generic context delivery

Can work across multiple cloud providers, in a distributed environment

Page 24: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Status for Context Broker

Release history: In alpha testing since August ‘07 First released summer July ‘08 (v 1.3.3) Latest update January ‘09 (v 2.2)

Used to contextualize 100s of nodes for EC2 STAR runs

Contextualized images on workspace marketplace Working with rPath to make contextualizatin easier

for the user

Page 25: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

End of Nimbus Tour

workspacecontrol

workspaceresourcemanager

workspacepilot

workspaceservice

workspaceclient

cloudclient

IaaSgateway

cont

ext

brok

er

contextclient

EC2potentially other providers

storageservice

EC

2W

SR

F

Page 26: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Science Clouds

Make it easy for scientific projects to experiment with cloud computing

Can cloud computing be used for science?

Evolve software in response to the needs of scientific projects

Start with EC2-like functionality and evolve to serve scientific projects: virtual clusters, diverse resource leases

Federating clouds: moving between cloud resources in academic and commercial space

Page 27: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Science Cloud Resources

University of Chicago (Nimbus): first cloud, online since March 4th 2008 16 nodes of UC TeraPort cluster, public IPs

University of Florida Online since 05/08 16-32 nodes, access via VPN

Other Science Clouds Masaryk University, Brno, Czech Republic (08/08), Purdue

(09/08) Installations in progress: IU, Grid5K, others

Using EC2 for overflow Minimal governance model http://workspace.globus.org/clouds

Page 28: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Cloud Use

~100 DNs Utilization:

Overall: 16% Peak pw: 86%

(week of 7/14) Requests rejected:

None untill 7/14 Lots afterwards ;-)

Data scaled to the number of days

Nimbus utilization

0

10

20

30

40

50

60

3/08 4/08 5/08 6/08 7/08 8/08 9/08 10/08 11/08 12/08 1/09

utilization (%)

Page 29: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Who Runs on Nimbus?

HadoopAliEnGT-scalabilitySTARMontage workflowsGridFTP testingworkspace-teamTestingOSGgeofestbioinformaticsOther

Project diversity: Science, CS, education, build&test…

Page 30: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Hadoop over ManyClouds

CS research: investigate latency-sensitive apps, e.g. Hadoop Need access to distributed resources, and high level of privilege

to run a ViNE router Virtual workspace: ViNE router + application VMs Paper: “CloudBLAST: Combining MapReduce and Virtualization

on Distributed Resources for Bioinformatics Applications” by Andréa Matsunaga, Maurício Tsugawa and José Fortes. eScience 2008.

U of FloridaU of Chicago

ViNErouter

ViNErouter

Page 31: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Alice HEP Experiment at CERN

CHEP paper in preparation

Page 32: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

STAR

STAR: a high-energy physics experiment Need resources with the right configuration

Complex environments: correct versions of operating systems, libraries, tools, etc all have to be installed.

Consistent environments: require validation A virtual OSG STAR cluster

OSG cluster OSG CE (headnode), gridmapfiles, host certificates, NSF, PBS

STAR worker nodes: SL4 + STAR conf Requirements

One-click virtual cluster deployment Migration: Science Clouds -> EC2

Page 33: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

STAR (cntd) From proof-of-concept to production runs

~2 years ago: proof-of-concept Last September: EC2 runs of up to 100 nodes (production

scale, non-critical codes) Testing for critical production deployment

Performance Within 10% of expected performance for applications

Work by Jerome Lauret, Doug Olson, Leve Hajdu, Lidia Didenko

Page 34: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Scalability Testing Motivation

Test scalability of various Globus components Test on a different platforms

Workspaces Globus 101 + others

Requirements very short-term but flexible access to diverse platforms

Work by various members of the Globus community (Tom Howe and John Bresnahan)

Resulted in provisioning a private cloud for Globus Typically very short-lived communities of one

Page 35: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Montage Workflows

Evaluating a cloud from user’s perspective Paper: “Exploration of the Applicability of Cloud

Computing to Large-Scale Scientific Workflows”, C. Hoffa, T. Freeman, G. Mehta, E. Deelman, K. Keahey, SWBES08: Challenging Issues in Workflow Applications

Page 36: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

User Environments

Cloud Computing Ecosystem

VMM/datacenter/IaaSVMM/datacenter/IaaS

Appliance Providersmarketplaces

commercial providerscommunities

Deployment Orchestratororchestrate the deployment of environments across possibly

many cloud providers

Page 37: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Open Source IaaS Implementations

OpenNebula Open source datacenter implementation University of Madrid, I. Llorente & team, 03/2008

Eucalyptus Open source implementation of EC2 UCSB, R. Wolski & team, 06/2008

Cloud-enabled Nimrod-G Open source implementation of EC2 Monash University, MeSsAGE Lab, 01/2009

Industry efforts openQRM, Enomalism

Page 38: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

Friends and Family

Committers: Kate Keahey & Tim Freeman (ANL/UC), Ian Gable (UVIC)

A lot of help from the community, see: http://workspace.globus.org/people.html

Collaborations: Cumulus: S3 implementation (Globus team) EBS implementation with IU Appliance management: rPath and Bcfg2 project Virtual network overlays: University of Florida Security: Vienna University of Technology

Page 39: Cloud Computing with Nimbus FNAL, January 2009 Kate Keahey (keahey@mcs.anl.gov) University of Chicago Argonne National Laboratory

10/20/08 The Nimbus Toolkit: http//workspace.globus.org

To the Future and Beyond

Increasing Importance of Appliance Providers Cloud computing tools Increased interest in cloud interoperability

Standards: “rough consensus & working code” Image formats, contextualization capabilities, cloud

interfaces, etc. Cloud markets