cs 591x grid computing. observe that - today’s processors are tremendously powerful, even compared...

39
CS 591x Grid Computing

Upload: cynthia-miller

Post on 27-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

CS 591x

Grid Computing

Observe that -

Today’s processors are tremendously powerful, even compared to a few years agoMillions of computers in the worldMost are not busy at any one time

…Observe that -

Large percentage of computers are interconnected via the InternetNetworking technology has made tremendous progressMillions of computers have access to relatively high performance networkingNetworking performance progressing rapidly Internet-2 Lambda Rail – DWDM 10 Gs/fiber

…Observe that -

Large number of computing problems have become increasingly complexComputational demands of computing programs have outstripped the computational capability of any one computerYet, world-wide there appears to be a surplus of computational capacity (idle machines)

Recall that…

Clusters came about by tying together a group of desktop computers…… to harness the computational power of these computers as a collective whole…physically in one place……with a single common interconnect…

But what if…

Grid Computing

Why not tie computational resources (desktop computers, supercomputers, etc.) together …… and harness their collective computational power.… thus Grid Computing

Grid Computing

“A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities”( Foster and Kesselman, 1998)

A Grid is…

…A collection of computational processing elements……possibly organized dynamically……utilizing relatively high performance networking…… to provide computational resources beyond those normally available

Grid Computing

Primarily accomplished through middleware--software layers that tie discrete computers together into a gridmust be based on standards – why?*** participating elements are administratively autonomous ***

the Virtual Organization

Important concept in grid computingenabled by and part of a griddynamically “convening” expertise around a problem dynamically “constructing” resources to support the approach to a problemmay go away when problem is solved or project is completed

Middleware IssuesSecurity transaction/data security authentication

Resource Management authorization resource allocation

Information services resource monitoring job monitoring

Data Management data access data caching

Grids --Come in many “flavors”-

Cluster of clusters, grids of high performance systems well known, stable resources under administrative management

Dynamic grids Cycle “stealing” not so stable resources not always well known little or no communications among

processes - sometimes

Standards

OGSA – Open Grid Services ArchitectureOGSI – Open Grid Services Infrastructure Infrastructure around which OGSA is built Core grid service specification

On-going development through the Global Grid Forumwww.ggf.org

Globus

Implementation of OGSA/OGSIMiddleware for deploying a grid

Teragrid

from: http://www.teragrid.org/userinfo/guide_hardware_table.html

TeraGrid

Extensible Terascale Computational FacilityTies together HPCs from major national supercomputing centers in the U.S.Massive computational resourcesWell known, controlled computing environmentsee http://www.teragrid.org

The Sabre Grid

Overall managed by PSCcomposed of clusters from PSC… and WVU (Energy)…… and the Department of Energy (NETL)..… and a Condor flockEarly stages

Einstein@home

Einstein@home

Cycle stealingsearches for gravitation objects – pulsars in astronomy dataruns as a screen saver – when computer is not usedBerkeley Open Infrastructure for Network Computing – BOINCBOINC – “An open-source software platform for computing using volunteered resources. “ from:http://boinc.berkeley.edu/

Other BOINC based projects

SETI@home – search for extraterrestrial intelligenceClimateprediction.net – study climate changePredictor@home - investigate protein related diseases

Global Grid Exchange

Uses central serverdeploys tasks to “common” computersfrom a large pool of available computerpotentially massive pool of computersprimarily Java basedno inter-task communicationshas process fail-over capability

Global Grid Exchange

Operated by the WV High Technology Consortium Foundationpotentially thousands of computersCan run non-Java code requires special “intervention” to get

by-pass security

CondorDeveloped and maintained by the University of Wisconsin – MadisonOriginally – a cycle-stealing approach to gathering high performance computational resourcesCan function like a clusteror like a grid (flocking)… can be part of a Globus based grid (Condor–G)Supports message passing

Others

United DevicesUnicore

Grid Computing

further thoughts

Types of Grids

Desktop Grids collections of computers office grids volunteer compute elements Can be heterogeneous Unreliable

Types of Grids

Cluster Grids Cluster of Clusters Single system image “completely compiled” code Stable resources Known environment Sabre

Types of Grids

HPC Grids Grid of “Big Iron” supercomputers

Very high performance Stable platform reliable known environment not so many organizational/human issues

TeraGrid

Types of Grids

Data Grids access to distributed data resource global and local resource

management common access protocol resources can be very large National Virtual Observatory

Requirements for a Grid

Interface should provide the user community with

a familiar, understandable interface command-line command (like qsub) and

tools the user community is familiar with

Job Scheduling Should be done in a manner similar to

other parallel paradigms Known queuing algorithms

Requirements for a Grid

Data Management Access to data by distributed processes

Grid Global file system does not scale beyond a point Staging/Caching data Consistent namespace

Remote Execution Environment User should have control of the execution

environment environment variables/parameters

Grid Requirements

Security Authentication – positively identify users,

devices, other resources Confidentiality – information is not disclosed

to unauthorized people, systems,… Data integrity – data not modified

accidentally, maliciously Non-repudiation – trusted confirmation –

“return receipt”

Grid Requirements

Gang Scheduling process/thread scheduling must be

managed grid wide all processes/threads must start/stop

at the same time if a process/thread fails, grid must

manage the entire job stop job, restart job

Grid Requirements

Checkpointing and Job Migration Fault-tolerence – Failure recovery Load balancing Checkpointing – automatic, user-induced,

none

Management tools to manage grid as a system must respect rights, autonomy, authority of

components

Some BarriersResource Sharing call for sharing corporate resources

things that have cost to companies/organizations

System Integrity once someone has code running your

computer….?

Data Integrity confidence in results – are they correct

architecture software environment tampering

Some BarriersAvailability Critical Grid App vs. Critical Corporate App who gets priority how to assert that priority

Ownership who owns the discovery if it was discovered

on my computer Intellectual Property – does the U of X own a

piece of my work

Licensing calls for new licensing models (no named

seats)

Some Barriers

Culpability/Liability if its wrong – who’s to blame

Propriety Commericial code running on a state-

owned computer inappropriate code