drm/computational grids bill desalvo april 14,, 2004

Post on 30-Dec-2015

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

DRM/Computational Grids

Bill DeSalvo

April 14,, 2004

Computational Grids

© Platform Computing Inc. 20033

Ian Foster’s Three-Point Grid Checklist

Coordinates resources

Not subject to centralized control

One or more (virtual) organizations

Geographic distribution of users/resources is common

Standard, open, general-purpose protocols and interfaces

Delivers nontrivial qualities of service

SLAs vs. policies vs. QoS

Translates business objectives into IT objectives

Enables effective utilization, resource aggregation, and remote access to specialized resources

Clusters are NOT grids!A cluster is a local-area, logical arrangement of independent entities that collectively provide a service.

© Platform Computing Inc. 20034

Virtual Organizations

© Platform Computing Inc. 20036

Evolution of the Grid

© Platform Computing Inc. 20037

© Platform Computing Inc. 20038

Everyone’s Aware of “The Grid”

© Platform Computing Inc. 20039

Platform Grid Competencies

Resource Leasing

Job Forwarding

Account Mapping

Grid Fairshare Scheduling

Advance Reservations

User Authentication

Reliable Data Transfer

Outgrowth of Platform’s experience in Grid and Distributed Computing

Platform MultiCluster

© Platform Computing Inc. 200311

Three-Point Grid Checklist & Platform MultiCluster

Coordinates resources

Not subject to centralized control

‘Single’ organization (“Enterprise Grid”)

Geographic distribution of users/resources is common

Proprietary protocols and interfaces

Delivers nontrivial qualities of service

SLAs vs. policies

Common queues

Advance reservation

Resource leasing

Fairshare

SLAs

Translates business objectives into IT objectives

Enables effective utilization, resource aggregation, and remote access to specialized resources

© Platform Computing Inc. 200312

Why MultiCluster

Global Sharing, Local Ownership (“politics of the grid”)Providing … while maintaining …

Increased Capacity

Increased Capability

Increased Scalability

Growing Computational Needs

Local Autonomy

Dept ADept B

Dept C Dept D

© Platform Computing Inc. 200313

Job Forwarding Model

“HPC Center” ConfigurationEnhanced transparency

FCFS guarantee, pending reason support, chunk jobs, host type/queue status aware scheduling, checkpoint/migration

Cluster A

HPC Center

Cluster B

Cluster C

© Platform Computing Inc. 200314

Job Forwarding Model

Compute

ServersCompute

Servers

Site A Site B

Send queue

Receivequeue

You submitWe do ---• Job transfer• data staging• Account mapping• Accounting

© Platform Computing Inc. 200315

Resource Leasing Model

Accelerating Enterprise Grid AdoptionSingle system image, ease of administration, scalability

Enable fairshare, preemption, pending reason support, chunk jobs, advance reservation, interactive jobs, parallel jobs, … across clusters

© Platform Computing Inc. 200317

Common Resource Leases

t

utilization

By AdminLease 528 CPUsTo Site A

Site B projectcompletes

t

utilization By LoadIF (load < threshold(X)) Lease 528 CPUs to Site AELSE Reclaim

Site B hits extended low util period then goes up

t

utilization

By User ReqLease based onAdvance Rsv req

Site B is always loaded

© Platform Computing Inc. 200318

Advance Reservation

Nodes dedicated to User A for time duration

Reserve nodes for exclusive access for user or user group

Ensures critical work is done without interference Useful for benchmarking or system maintenance One-time and recurring reservation Administrator defines reservation for users

Use Cases

© Platform Computing Inc. 200320

© Platform Computing Inc. 200321

DoD HPCMP Grid

DoD HPCMPChallenge

Initiative to share resources on HPCMP’s resources easily & transparently: SMDC, TACOM, NRL, NAVO and WSMR, …

Build a meta-queuing system to integrate the centers

Primary BenefitThe capability to submit a job to a single, common queue, which will be sent to thebest available computer in the Grid

© Platform Computing Inc. 200322

DOD HPCMOSolutionPlatform LSF MultiCluster

Resource reservation protocolTransparent job controlAccounting

Client-server, interactions KerberizedTicket forwarding/renewalMulti-realm supportAccount mapping

Platform FTAKerberizedFault tolerant

DoD HPCMP Grid

Requirement Fire and Forget

Full Kerberos 5Support

Reliable, SecureFile Transfer

© Platform Computing Inc. 200323

NAVOSUN E10K

64 PEs

AEDCOrigin 2000

64 PEs

DRENNRL

Origin 2000

128 PEs

TACOM/TARDEC

Onyx2 32 PEs

RTTCOrigin 2000

32 PEs

SMDCOrigin 2000

64 PEs

SSCSDHP

Superdome

44 PEs

AFFTCOrigin 3000

64 PEs

WSMROrigin 2000

64 PEs

DREN

GRID Challenges Logistics / Coordination

PeopleUser AccountsGeographic locationsSite configurationsTime zones /schedules

Network Security /FirewallsIntro of batch queuing systems to environments Training & skills transfer

DoD HPCMP Grid

© Platform Computing Inc. 200324

SHARCNET

ExternalGrids/Portal

© Platform Computing Inc. 200325

SHARCNET

The network is no longer ‘passive plumbing’

True resource that can be managed in real time – with guaranteed QoS

Potential projects

-based resource leasing, advance reservation

IP-based topology awareness

Enables new classes of Grid applications

Operational results

Real-time, remote visualization

Virtual storage

Persistent/pervasive

On demand

The Globus Toolkit V2

© Platform Computing Inc. 200327

Sharing pains…physical login

Compute

ServersCompute

ServersSite A Site B

You have to• Get and maintain multiple accounts• Use different batch systems• No consolidated accounting• Manual file movement

© Platform Computing Inc. 200328

The Globus Toolkit™ Version 2 (GT2)

A software toolkit that addresses key technical problems in the development of Grid-enabled tools, services, and applications

Offers a modular “bag of technologies”

Enables incremental development of grid-enabled tools and applications

Implements standard Grid protocols and APIs

Made available under liberal Open Source license

Provided by The Globus Alliance

http://www.globus.org

© Platform Computing Inc. 200329

Globus Toolkit: Evaluation (+)

Good technical solutions for key problems, e.g.

Authentication and authorization

Resource discovery and monitoring

Reliable remote service invocation

High-performance remote data access

This & good engineering is enabling progress

Good quality reference implementation, multi-language support, interfaces to many systems, large user base, industrial support

Growing community code base built on tools

© Platform Computing Inc. 200330

Globus Toolkit: Evaluation (-)

Protocol deficiencies, e.g.

Heterogeneous basis: HTTP, LDAP, FTP

No standard means of invocation, notification, error propagation, authorization, termination, …

Significant missing functionality, e.g.

Databases, sensors, instruments, workflow, …

Virtualization of end systems (hosting envs.)

Little work on total system properties, e.g.

Dependability, end-to-end QoS, …

Reasoning about system properties

Scalability

© Platform Computing Inc. 200331

LSF MC & Globus

MC: Transparent, dynamic, intelligent, scalable inter-cluster sharing

User does not need to know about clusters: total transparency

MC dynamically chooses the “best cluster” to run the job

User chooses which cluster to submit job to via Globus interface

Static, non-intelligent sharing

Lacks transparency

Cluster A Cluster B Cluster C

Globus

Inter-cluster protocols

Globus Toolkit 3 (OGSA)

© Platform Computing Inc. 200333

Every product an island unto itself

Prelude to OGSA: An Analogy

© Platform Computing Inc. 200334

Prelude to OGSA: An Analogy

Differentiated products, integrated stack

© Platform Computing Inc. 200336

Open Grid Services Architecture (OGSA)

Next-generation architecture

Consequence of technology refresh (i.e., refactoring the Globus Toolkit) and research into Autonomic Computing

Convergence of Grid Computing and Web Services

Globus Toolkit

Access services – e.g., CLIs, GUIs, portals and CoGs

Resource and allocation management

Monitoring and discovery services – e.g., sensing and indexing

Data management services – e.g., file transfer, replica management, etc.

Security – e.g., the Grid Security Infrastructure

Initially SOAP, WSDL and WS-Inspection

The Global Grid Forum (GGF) serves as the standards authority

Two layers

Core Grid platform – OGSA platform interfaces and models

Core Grid infrastructure – Open Grid Services Infrastructure (OGSI)

http://www.gridforum.org http://www.globus.org/ogsa

© Platform Computing Inc. 200337

Importance of OGSA to Customers

Grid-enabled Web Services transforming IT

Analyst feedback (e.g., Gartner)

Customer experience

Customers demand standards-compliant products, solutions and services – why?

Vendors guilty of over-promising and under-delivering

Avoid single-vendor lock-in

Proprietary implementations based on open standards

Seek multi-vendor deliverables

Framework for partner collaboration

Demanding professionalism in software engineering

Seek to be engaged in the process

© Platform Computing Inc. 200338

Platform Embraces Open Standards

Platform developing software for over 11 years

Standards efforts are recent activities

Existing implementations are proprietary

Platform is an NPi founder

NPi merged with GGF (4/02)

NPi being leveraged in OGSA

Platform committed to open standards

Proprietary implementations based on open standards

Platform experienced in Open Source arena

Offering Linux solutions for over 6 years

Offering Globus Toolkit solutions for about 2 years

Source-code available for components of Platform LSF

Platform and Globus

© Platform Computing Inc. 200342

Platform Globus Toolkit

CSF Plus Advanced CSF-based metascheduler

Job persistence; enhanced scalability (6x GT 3); Cluster load balancing and host type matching (LSF only)

Globus Toolkit 3

Community Scheduler Framework (CSF)Round robin job scheduling; Advance reservation booking, query, & control; Reservation based scheduling; Job throttling for increased

reliability

Connectors for 3rd party workload management systems (ie: SGE, PBS, etc)

Native command line interface support

Platform Globus Tookit

One step installation

Open Source

Platform Enhancements

CSF

© Platform Computing Inc. 200344

What is CSF?

CSF (Community Scheduler Framework)

. Not a Platform product

. Contributed industries 1st open source meta-scheduler enhancement to Globus Toolkit V3.X

. Developed with the latest version of OGSI – grid guideline being developed with Global Grid Forum

. Open source "meta-scheduler“ – framework

- Provides basic protocols and interfaces to help resources work together in heterogeneous environments

- enables global access and maintains local control of resources

© Platform Computing Inc. 200345

Key Benefits of OGSA Compliance

•Future-proof & protect grid investment using standards-based

solutions

•Standardized approach to access Platform LSF

•Interoperate with 3rd party systems

© Platform Computing Inc. 200346

Metaschedulers

Scheduler that co-ordinates communication between heterogeneous schedulers that operate at a local level

Enables global access and coordination while maintaining local control and ownership of resources

Future – possible to schedule workload execution also storage, network bandwidth, etc.

© Platform Computing Inc. 200347

CSF Grid Services

Job Service creates, monitors and controls compute jobs

Reservation Service guarantees resources are available for running a job

Queueing Service provides a service where administrators can customize and define scheduling policies at the VO level and/or at the different resourcemanager level

RM Adaptor Service provides a Grid service interface that bridgesthe Grid service protocol and resource managers (LSF, PBS, SGE, Condor and other RMs)

© Platform Computing Inc. 200349

CSF Architecture

Platform LSF User

Globus Toolkit User

LSFLSF

Meta-scheduler

Plugin

Meta-scheduler

Plugin

Grid Service Hosting Environment

Job Service

Reservation Service

Meta-SchedulerGlobal

Information Service

RIPS

GRAM SGE RIPS

GRAM PBS RIPS

RM Adapter

RIPS = Resource Information Provider Services

GRAM = Grid Resource & Allocation Mangement

Queuing Service

Third Party Workload Management System

Third Party Workload Management System

Platform LSF

Pro

file

High

Low

Awareness/Knowledge Liking/Preference/Conviction Commitment

Grid Canada

OMII

© Platform Computing Inc. 200351

What are the Multi-Domain Tools and What Do They Do?

Platform MultiCluster

Enables global access and coordination while maintaining local control and ownership of resources

Join geographically dispersed clusters

Production quality solution to build enterprise grids

Platform proprietary solution that is standards-based & OGSA compliant

Globus Toolkit

Tools to join geographically dispersed clusters

A bunch of “bricks” to build grids (that’s why it’s called a toolkit)

Users have to specify which cluster they would like their job to be sent to – not transparent

Open source solution

Platform adds commercial support: documentation, training, tech support, professional services

Data Grids

© Platform Computing Inc. 200354

© Platform Computing Inc. 200355

Data Grid Spectrum

NoUpdates

PeriodicUpdates

FrequentUpdates

GOV/EDUGrid

Life Sciences

Grid

AutoGrid

• Partial replication

Efficient & reliable file transfer

Intelligent transfer Workload-directed

caching Cache-aware

scheduling Data pipeline

Sharingscope

HEP Grid

Userprivate

Intra-projectsharing

AeroGrid

EDAGrid

Efficient data syncInter-projectsharing

Intelligent data scheduling

© Platform Computing Inc. 200356

Data Grid Spectrum

NoUpdates

PeriodicUpdates

FrequentUpdates

Sharingscope

Userprivate

Intra-projectsharing

Inter-projectsharing

• GridFTP• GridFTP• Replica Catalog

• FTA

• DataGrid

Summary

© Platform Computing Inc. 200358

Summary

OGSA applies to e-Science and e-Business

Rich architectural framework

Existing, emerging and planned specifications

Ultimately resulting in Open Standards

Existing, emerging and planned implementations

The Community Scheduler Framework

Standards-based

Choice of implementations

Ushers existing grids towards OGSA compliance

Spectrum of potential use cases

Thank you.

top related