drm/computational grids bill desalvo april 14,, 2004

52
DRM/Computational Grids Bill DeSalvo April 14,, 2004

Upload: reynold-ryan

Post on 30-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: DRM/Computational Grids Bill DeSalvo April 14,, 2004

DRM/Computational Grids

Bill DeSalvo

April 14,, 2004

Page 2: DRM/Computational Grids Bill DeSalvo April 14,, 2004

Computational Grids

Page 3: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 20033

Ian Foster’s Three-Point Grid Checklist

Coordinates resources

Not subject to centralized control

One or more (virtual) organizations

Geographic distribution of users/resources is common

Standard, open, general-purpose protocols and interfaces

Delivers nontrivial qualities of service

SLAs vs. policies vs. QoS

Translates business objectives into IT objectives

Enables effective utilization, resource aggregation, and remote access to specialized resources

Clusters are NOT grids!A cluster is a local-area, logical arrangement of independent entities that collectively provide a service.

Page 4: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 20034

Virtual Organizations

Page 5: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 20036

Evolution of the Grid

Page 6: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 20037

Page 7: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 20038

Everyone’s Aware of “The Grid”

Page 8: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 20039

Platform Grid Competencies

Resource Leasing

Job Forwarding

Account Mapping

Grid Fairshare Scheduling

Advance Reservations

User Authentication

Reliable Data Transfer

Outgrowth of Platform’s experience in Grid and Distributed Computing

Page 9: DRM/Computational Grids Bill DeSalvo April 14,, 2004

Platform MultiCluster

Page 10: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200311

Three-Point Grid Checklist & Platform MultiCluster

Coordinates resources

Not subject to centralized control

‘Single’ organization (“Enterprise Grid”)

Geographic distribution of users/resources is common

Proprietary protocols and interfaces

Delivers nontrivial qualities of service

SLAs vs. policies

Common queues

Advance reservation

Resource leasing

Fairshare

SLAs

Translates business objectives into IT objectives

Enables effective utilization, resource aggregation, and remote access to specialized resources

Page 11: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200312

Why MultiCluster

Global Sharing, Local Ownership (“politics of the grid”)Providing … while maintaining …

Increased Capacity

Increased Capability

Increased Scalability

Growing Computational Needs

Local Autonomy

Dept ADept B

Dept C Dept D

Page 12: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200313

Job Forwarding Model

“HPC Center” ConfigurationEnhanced transparency

FCFS guarantee, pending reason support, chunk jobs, host type/queue status aware scheduling, checkpoint/migration

Cluster A

HPC Center

Cluster B

Cluster C

Page 13: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200314

Job Forwarding Model

Compute

ServersCompute

Servers

Site A Site B

Send queue

Receivequeue

You submitWe do ---• Job transfer• data staging• Account mapping• Accounting

Page 14: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200315

Resource Leasing Model

Accelerating Enterprise Grid AdoptionSingle system image, ease of administration, scalability

Enable fairshare, preemption, pending reason support, chunk jobs, advance reservation, interactive jobs, parallel jobs, … across clusters

Page 15: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200317

Common Resource Leases

t

utilization

By AdminLease 528 CPUsTo Site A

Site B projectcompletes

t

utilization By LoadIF (load < threshold(X)) Lease 528 CPUs to Site AELSE Reclaim

Site B hits extended low util period then goes up

t

utilization

By User ReqLease based onAdvance Rsv req

Site B is always loaded

Page 16: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200318

Advance Reservation

Nodes dedicated to User A for time duration

Reserve nodes for exclusive access for user or user group

Ensures critical work is done without interference Useful for benchmarking or system maintenance One-time and recurring reservation Administrator defines reservation for users

Page 17: DRM/Computational Grids Bill DeSalvo April 14,, 2004

Use Cases

Page 18: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200320

Page 19: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200321

DoD HPCMP Grid

DoD HPCMPChallenge

Initiative to share resources on HPCMP’s resources easily & transparently: SMDC, TACOM, NRL, NAVO and WSMR, …

Build a meta-queuing system to integrate the centers

Primary BenefitThe capability to submit a job to a single, common queue, which will be sent to thebest available computer in the Grid

Page 20: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200322

DOD HPCMOSolutionPlatform LSF MultiCluster

Resource reservation protocolTransparent job controlAccounting

Client-server, interactions KerberizedTicket forwarding/renewalMulti-realm supportAccount mapping

Platform FTAKerberizedFault tolerant

DoD HPCMP Grid

Requirement Fire and Forget

Full Kerberos 5Support

Reliable, SecureFile Transfer

Page 21: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200323

NAVOSUN E10K

64 PEs

AEDCOrigin 2000

64 PEs

DRENNRL

Origin 2000

128 PEs

TACOM/TARDEC

Onyx2 32 PEs

RTTCOrigin 2000

32 PEs

SMDCOrigin 2000

64 PEs

SSCSDHP

Superdome

44 PEs

AFFTCOrigin 3000

64 PEs

WSMROrigin 2000

64 PEs

DREN

GRID Challenges Logistics / Coordination

PeopleUser AccountsGeographic locationsSite configurationsTime zones /schedules

Network Security /FirewallsIntro of batch queuing systems to environments Training & skills transfer

DoD HPCMP Grid

Page 22: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200324

SHARCNET

ExternalGrids/Portal

Page 23: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200325

SHARCNET

The network is no longer ‘passive plumbing’

True resource that can be managed in real time – with guaranteed QoS

Potential projects

-based resource leasing, advance reservation

IP-based topology awareness

Enables new classes of Grid applications

Operational results

Real-time, remote visualization

Virtual storage

Persistent/pervasive

On demand

Page 24: DRM/Computational Grids Bill DeSalvo April 14,, 2004

The Globus Toolkit V2

Page 25: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200327

Sharing pains…physical login

Compute

ServersCompute

ServersSite A Site B

You have to• Get and maintain multiple accounts• Use different batch systems• No consolidated accounting• Manual file movement

Page 26: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200328

The Globus Toolkit™ Version 2 (GT2)

A software toolkit that addresses key technical problems in the development of Grid-enabled tools, services, and applications

Offers a modular “bag of technologies”

Enables incremental development of grid-enabled tools and applications

Implements standard Grid protocols and APIs

Made available under liberal Open Source license

Provided by The Globus Alliance

http://www.globus.org

Page 27: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200329

Globus Toolkit: Evaluation (+)

Good technical solutions for key problems, e.g.

Authentication and authorization

Resource discovery and monitoring

Reliable remote service invocation

High-performance remote data access

This & good engineering is enabling progress

Good quality reference implementation, multi-language support, interfaces to many systems, large user base, industrial support

Growing community code base built on tools

Page 28: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200330

Globus Toolkit: Evaluation (-)

Protocol deficiencies, e.g.

Heterogeneous basis: HTTP, LDAP, FTP

No standard means of invocation, notification, error propagation, authorization, termination, …

Significant missing functionality, e.g.

Databases, sensors, instruments, workflow, …

Virtualization of end systems (hosting envs.)

Little work on total system properties, e.g.

Dependability, end-to-end QoS, …

Reasoning about system properties

Scalability

Page 29: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200331

LSF MC & Globus

MC: Transparent, dynamic, intelligent, scalable inter-cluster sharing

User does not need to know about clusters: total transparency

MC dynamically chooses the “best cluster” to run the job

User chooses which cluster to submit job to via Globus interface

Static, non-intelligent sharing

Lacks transparency

Cluster A Cluster B Cluster C

Globus

Inter-cluster protocols

Page 30: DRM/Computational Grids Bill DeSalvo April 14,, 2004

Globus Toolkit 3 (OGSA)

Page 31: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200333

Every product an island unto itself

Prelude to OGSA: An Analogy

Page 32: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200334

Prelude to OGSA: An Analogy

Differentiated products, integrated stack

Page 33: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200336

Open Grid Services Architecture (OGSA)

Next-generation architecture

Consequence of technology refresh (i.e., refactoring the Globus Toolkit) and research into Autonomic Computing

Convergence of Grid Computing and Web Services

Globus Toolkit

Access services – e.g., CLIs, GUIs, portals and CoGs

Resource and allocation management

Monitoring and discovery services – e.g., sensing and indexing

Data management services – e.g., file transfer, replica management, etc.

Security – e.g., the Grid Security Infrastructure

Initially SOAP, WSDL and WS-Inspection

The Global Grid Forum (GGF) serves as the standards authority

Two layers

Core Grid platform – OGSA platform interfaces and models

Core Grid infrastructure – Open Grid Services Infrastructure (OGSI)

http://www.gridforum.org http://www.globus.org/ogsa

Page 34: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200337

Importance of OGSA to Customers

Grid-enabled Web Services transforming IT

Analyst feedback (e.g., Gartner)

Customer experience

Customers demand standards-compliant products, solutions and services – why?

Vendors guilty of over-promising and under-delivering

Avoid single-vendor lock-in

Proprietary implementations based on open standards

Seek multi-vendor deliverables

Framework for partner collaboration

Demanding professionalism in software engineering

Seek to be engaged in the process

Page 35: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200338

Platform Embraces Open Standards

Platform developing software for over 11 years

Standards efforts are recent activities

Existing implementations are proprietary

Platform is an NPi founder

NPi merged with GGF (4/02)

NPi being leveraged in OGSA

Platform committed to open standards

Proprietary implementations based on open standards

Platform experienced in Open Source arena

Offering Linux solutions for over 6 years

Offering Globus Toolkit solutions for about 2 years

Source-code available for components of Platform LSF

Page 36: DRM/Computational Grids Bill DeSalvo April 14,, 2004

Platform and Globus

Page 37: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200342

Platform Globus Toolkit

CSF Plus Advanced CSF-based metascheduler

Job persistence; enhanced scalability (6x GT 3); Cluster load balancing and host type matching (LSF only)

Globus Toolkit 3

Community Scheduler Framework (CSF)Round robin job scheduling; Advance reservation booking, query, & control; Reservation based scheduling; Job throttling for increased

reliability

Connectors for 3rd party workload management systems (ie: SGE, PBS, etc)

Native command line interface support

Platform Globus Tookit

One step installation

Open Source

Platform Enhancements

Page 38: DRM/Computational Grids Bill DeSalvo April 14,, 2004

CSF

Page 39: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200344

What is CSF?

CSF (Community Scheduler Framework)

. Not a Platform product

. Contributed industries 1st open source meta-scheduler enhancement to Globus Toolkit V3.X

. Developed with the latest version of OGSI – grid guideline being developed with Global Grid Forum

. Open source "meta-scheduler“ – framework

- Provides basic protocols and interfaces to help resources work together in heterogeneous environments

- enables global access and maintains local control of resources

Page 40: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200345

Key Benefits of OGSA Compliance

•Future-proof & protect grid investment using standards-based

solutions

•Standardized approach to access Platform LSF

•Interoperate with 3rd party systems

Page 41: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200346

Metaschedulers

Scheduler that co-ordinates communication between heterogeneous schedulers that operate at a local level

Enables global access and coordination while maintaining local control and ownership of resources

Future – possible to schedule workload execution also storage, network bandwidth, etc.

Page 42: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200347

CSF Grid Services

Job Service creates, monitors and controls compute jobs

Reservation Service guarantees resources are available for running a job

Queueing Service provides a service where administrators can customize and define scheduling policies at the VO level and/or at the different resourcemanager level

RM Adaptor Service provides a Grid service interface that bridgesthe Grid service protocol and resource managers (LSF, PBS, SGE, Condor and other RMs)

Page 43: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200349

CSF Architecture

Platform LSF User

Globus Toolkit User

LSFLSF

Meta-scheduler

Plugin

Meta-scheduler

Plugin

Grid Service Hosting Environment

Job Service

Reservation Service

Meta-SchedulerGlobal

Information Service

RIPS

GRAM SGE RIPS

GRAM PBS RIPS

RM Adapter

RIPS = Resource Information Provider Services

GRAM = Grid Resource & Allocation Mangement

Queuing Service

Third Party Workload Management System

Third Party Workload Management System

Platform LSF

Page 44: DRM/Computational Grids Bill DeSalvo April 14,, 2004

Pro

file

High

Low

Awareness/Knowledge Liking/Preference/Conviction Commitment

Grid Canada

OMII

Page 45: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200351

What are the Multi-Domain Tools and What Do They Do?

Platform MultiCluster

Enables global access and coordination while maintaining local control and ownership of resources

Join geographically dispersed clusters

Production quality solution to build enterprise grids

Platform proprietary solution that is standards-based & OGSA compliant

Globus Toolkit

Tools to join geographically dispersed clusters

A bunch of “bricks” to build grids (that’s why it’s called a toolkit)

Users have to specify which cluster they would like their job to be sent to – not transparent

Open source solution

Platform adds commercial support: documentation, training, tech support, professional services

Page 46: DRM/Computational Grids Bill DeSalvo April 14,, 2004

Data Grids

Page 47: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200354

Page 48: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200355

Data Grid Spectrum

NoUpdates

PeriodicUpdates

FrequentUpdates

GOV/EDUGrid

Life Sciences

Grid

AutoGrid

• Partial replication

Efficient & reliable file transfer

Intelligent transfer Workload-directed

caching Cache-aware

scheduling Data pipeline

Sharingscope

HEP Grid

Userprivate

Intra-projectsharing

AeroGrid

EDAGrid

Efficient data syncInter-projectsharing

Intelligent data scheduling

Page 49: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200356

Data Grid Spectrum

NoUpdates

PeriodicUpdates

FrequentUpdates

Sharingscope

Userprivate

Intra-projectsharing

Inter-projectsharing

• GridFTP• GridFTP• Replica Catalog

• FTA

• DataGrid

Page 50: DRM/Computational Grids Bill DeSalvo April 14,, 2004

Summary

Page 51: DRM/Computational Grids Bill DeSalvo April 14,, 2004

© Platform Computing Inc. 200358

Summary

OGSA applies to e-Science and e-Business

Rich architectural framework

Existing, emerging and planned specifications

Ultimately resulting in Open Standards

Existing, emerging and planned implementations

The Community Scheduler Framework

Standards-based

Choice of implementations

Ushers existing grids towards OGSA compliance

Spectrum of potential use cases

Page 52: DRM/Computational Grids Bill DeSalvo April 14,, 2004

Thank you.