first steps in the clouds kate keahey [email protected] university of chicago argonne national...

15
First Steps in the Clouds Kate Keahey [email protected] University of Chicago Argonne National Laboratory

Upload: miguel-ryan

Post on 26-Mar-2015

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

First Steps in the Clouds

Kate Keahey

[email protected]

University of Chicago

Argonne National Laboratory

Page 2: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Virtual Workspaces: http//workspace.globus.org

Why Clouds? Resource consumers

Individual users or Virtual Organization Requirements

Customized environments for their services/applications Services/applications can be short-lived New environments/services deployed quickly and often

Resource providers Own and operate physical resources Requirements

Ability to monitor and control their resources Provide resources at reasonable operational cost Protection from activities performed by resource consumer

Consumers need to be able to lease (potentially for short-term) platforms that they can customize and control

Page 3: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Cloud Computing for Grid Communities:

The STAR Application Use Case

Page 4: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Virtual Workspaces: http//workspace.globus.org

The STAR Application

Complex experimental application codes Developed over more than 10 years, by more than 100 scientists,

comprises ~2 M lines of C++ and Fortran code www.star.bnl.gov

Require complex, customized environments Rely heavily on the right combination of compiler versions and

available libraries Dynamically load external libraries depending on the task to be

performed Environment validation

To ensure reproducibility and result uniformity across environments Why do we need a cloud?

Resources with the right configuration are hard to find A VM-based cloud gives us the required control

Page 5: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Virtual Workspaces: http//workspace.globus.org

Running STAR in a Cloud First Challenge: finding VM-enabled resources

Amazon Elastic Compute Cloud (EC2) More Challenges:

Can we use X.509 certs to submit to a cloud? Can we use Grid access protocols? How much manual configuration do we need to do for a cluster that we need for 4 hours? How do we integrate the cluster into the Grid infrastructure?

Workspace Service X.509 certificates are mapped to a project account Grid access protocols Creating a virtual cluster dynamically

Contextualization (cluster context): the cluster node VMs find out about each other and integrate that information at boot time

Integrating the cluster into the Grid Contextualization (grid context): cluster is configured with appropriate

host certs, gridmapfiles, etc.

Page 6: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Virtual Workspaces: http//workspace.globus.org

Running jobs : 300Running jobs : 300

PDSF

Fermi

VWS/EC2 BNL

Running jobs : 230

Running jobs : 150Running jobs : 50

Running jobs : 150

Running jobs : 300Running jobs : 282Running jobs : 243Running jobs : 221Running jobs : 195Running jobs : 140Running jobs : 76Running jobs : 0

Running jobs : 200Running jobs : 50

Running jobs : 150Running jobs : 142Running jobs : 124Running jobs : 109Running jobs : 94Running jobs : 73Running jobs : 42

Running jobs : 195Running jobs : 183Running jobs : 152Running jobs : 136Running jobs : 96Running jobs : 54Running jobs : 37Running jobs : 0Running jobs : 42Running jobs : 39Running jobs : 34Running jobs : 27Running jobs : 21Running jobs : 15Running jobs : 9Running jobs : 0

Running jobs : 0

Job Completion :

File Recovery :

WSU

with thanks to Jerome Lauret and Doug Olson of the STAR projectwith thanks to Jerome Lauret and Doug Olson of the STAR project, presented at CHEP’07

Page 7: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Virtual Workspaces: http//workspace.globus.org

NerscPDSF

EC2(via Workspace

Service)

WSU

Accelerated display of a workflow job state Y = job number, X = job state

with thanks to Jerome Lauret and Doug Olson of the STAR projectwith thanks to Jerome Lauret and Doug Olson of the STAR project, presented at CHEP’07

Page 8: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Virtual Workspaces: http//workspace.globus.org

What Did We Learn?

Performance was not an issue

The real comparison is having a resource to run on vs not having a resource to run on

Contextualization is key for dynamic virtual cluster deployment

Next steps: a more challenging application

Page 9: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Cloud Computing for Grid Providers:

Building the Science Cloud at the University of Chicago

Page 10: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Virtual Workspaces: http//workspace.globus.org

Challenges Virtualization adoption has been relatively slow among

Grid Providers Challenge: integrating VMs into current provisioning

models Integrate into a site without disrupting the current operation

of resources I.e., be able to run jobs as well as VMs

Non-invasive from the perspective of currently used tools E.g., no modification to the currently used schedulers and resource

managers

Can be used alongside the current mode of operation Batch jobs

Represent as small a change as possible Operate within familiar metaphors Avoid error-generating complexity

Page 11: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Virtual Workspaces: http//workspace.globus.org

Roll Your Own Cloud The Workspace Pilot

Operates on resources that can support jobs as well as VMs

E.g., have been booted into Xen domain 0

Non-invasive extension to batch schedulers (e.g., PBS)

Wrappers for submission operation, scheduler signals to operate on VMs

Glidein approach: submits a “pilot program” that prepares a resource slot for VM deployment

E.g., adjusts Xen domain 0 memory

Comes with administrator tools E.g., kill-all

Page 12: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Virtual Workspaces: http//workspace.globus.org

Workspace Pilot in Action

WorkspaceService

LRM/PBS

Xen dom0

Xen dom0

Xen dom0

VM

VMVM

VM

Level 1:provision raw

resources

Level 2:provision VMs

VMs aredecomissioned

raw resourcesare decomissioned

Page 13: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Virtual Workspaces: http//workspace.globus.org

The Pilot Program Uses Xen balloon driver to reduce/restore domain0 memory so

that guest domains (VMs) can be deployed Secure VM deployment

The pilot requires sudo privilege and thus can be used only with site administrator’s approval

The workspace service provides fine-grained authorization for all requests

Signal handling SIGTERM: pilot exceeded its allotted time

Notifies VWS, allows it to clean up After a configurable time period takes things into its hands.

Default policy: one VM per physical node Available for download

Workspace Release 1.3.1: http://workspace.globus.org/downloads/index.html

Page 14: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Virtual Workspaces: http//workspace.globus.org

Nimbus @ UC What is it?

The Science Cloud at University of Chicago UC TeraPort cluster configured with the workspace pilot Currently 16 nodes

What can it do for me? Allow you to “lease out” a cluster of VMs

Who can use it? Members of scientific community

In as much as usage policies will allow

What do I need to do if I want to use it? Contact us: [email protected] You will need a VM image (we can help and know others

who can), a certificate, and a simple client

Page 15: First Steps in the Clouds Kate Keahey keahey@mcs.anl.gov University of Chicago Argonne National Laboratory

Virtual Workspaces: http//workspace.globus.org

Cloud Interoperability Moving an app from a hardware platform to a cloud

is relatively hard Need to develop a VM image, learn about cloud

computing, figure our logistics Moving between clouds

E.g., STAR app EC2->Science Cloud and vice versa is very easy Rough consensus on the interfaces needed to provision

resources in the cloud

OGF gridvit-wg Chairs: Erol Bozak, Wolfgang Reichert Define the requirements for integration of Grid

architecture with system virtualization platforms Exploring the impact of virtualization on Grid use

cases Exploring the relationship with standards (DMTF, etc.)