application classification - web browsers are replacing command-line tools enabling users to...

Download Application Classification - Web browsers are replacing command-line tools enabling users to remotely

Post on 24-Sep-2020

0 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • CS693 – Grid Computing Unit - V

    MTech CSE (PT, 2011-14) SRM, Ramapuram 1 hcr:innovationcse@gg

    UNIT – V APPLICATIONS, SERVICES AND

    ENVIRONMENTS Application integration – application classification – Grid requirements – Integrating applications with

    Middleware platforms – Grid enabling Network services – managing Grid environments – Managing Grids –

    Management reporting – Monitoring – Data catalogs and replica management – portals – Different application

    areas of Grid computing

    Application Classification  The dimensions relevant to enabling the application to run on a grid are

    o parallelism

    o granularity

    o communications

    o dependency

     affects the way in which it can be migrated to a grid environment

     not mutually exclusive—a specific application can be characterized along most, of these dimensions.

    Parallelism

     classification scheme attributed to Flynn

     has a significant impact on how the application is integrated to run on a grid.

    Single Program, Single Data (SPSD)

     simple sequential programs that take a single input set and generate a single output set

    Motivations

    1) A vast number of computing resources are immediately available

     grid is used as a throughput engine when many instances of this single program need to be executed.

     Leveraging resources outside the local domain can vastly improve the throughput of these jobs.

    Example: Logic simulation of a microprocessor design

     The design needs to be verified by running millions of test cases against the design.

     By leveraging grid resources, each test case can be run on a different remote machine, thereby improving

    the overall throughput of the simulation.

    2) Remotely available shared data can be used

     program can be made to execute where the data reside or the data can be accessed via a “data grid.”

     grid environment enables controlled access to shared data.

     It is simple to integrate these applications to run on a grid, as no additional code development is required.

    Single Program, Multiple Data (SPMD)

     the input data can be partitioned and processed concurrently using the same program.

     comprises the majority of applications that utilize the grid today and covers a wide range of domains.

    Examples

     Finite Element Method evaluations using an MPI-based program

     Large-scale Internet applications such as SETI@home

     UD-Cancer project

    Motivation

     to significantly improve performance and/or scope by scaling the application out to as many resources on

    the grid as possible

  • CS693 – Grid Computing Unit - V

    MTech CSE (PT, 2011-14) SRM, Ramapuram 2 hcr:innovationcse@gg

    Multiple Program Multiple Data (MIMD)

     broadest category of parallel/distributed applications

     both the program and the data can be partitioned and processed concurrently.

    Motivation

     to improve performance

     to ensure that resources are more optimally matched to application needs

    Example

    there might be partitions of an application that require a very tightly-coupled SMP system while another partition

    could be run in a more loosely-coupled environment or on a single machine.

    Multiple Program Single Data (MPSD)

     require different transformations to be applied to the same set of input data

     can be carried out concurrently

     very rare

    Communications

     very dependent on the performance of the underlying grid network infrastructure.

    The communications cost associated with an application is characterized by the following:

     The cost of initial movement of data and programs to grid resources prior to the computation.

    o This cost can be avoided on the grid in cases where the data or program is already available at the

    remote grid resource.

     The cost of communication while the application execution executes.

    o This cost is a function of the frequency and size of communication.

    Granularity

     based on the granularity of their programs as

    o coarse-grained

    o fine-grained

     specifies how long a program can execute before it needs to communicate with other programs

     will dictate the amount of overhead associated with running the application remotely on the grid.

    Dependency

     programs within an application have no dependencies between them and each program can be scheduled

    and executed independently of the others.

     the application would be a good candidate for deployment on the grid

    Grid requirements 1. Interfaces

    2. Job Scheduling

    Interfaces

     Users and applications have accessed grids using simple command-line tools and programming APIs

     Command line tools such as ‘qsub’ and ‘qstat’ offer ways for users to interactively submit and query the

    status of their jobs.

    o These tools can be further extended through wrapper scripts

    o hide the job creation and management complexities from the user.

     programming APIs provide a way for developers to embed job submission and management functions into

    an application-specific wrapper program.

     users execute an application-specific wrapper script or program and input the data required for the job.

  • CS693 – Grid Computing Unit - V

    MTech CSE (PT, 2011-14) SRM, Ramapuram 3 hcr:innovationcse@gg

     usually limited to running SPSD-type applications

     more sophisticated interfaces that allow the creation of single- and multi-dimensional job arrays to support

    SPMD-, MPMD-, and MPSD-type applications

     Web browsers are replacing command-line tools

     enabling users to remotely access the grid from anywhere and at any time.

     The use of a Web Services description language (WSDL) to describe jobs, applications, and data in

    conjunction with Web Services provides a very powerful programmatic interface to the grid.

     Web Services provides a language-independent interface for programmers to develop complex wrappers

    using their favorite programming language.

     Web Services is a well-defined standard, grids based on it will easily integrate into existing environments

     The Open Grid Services Architecture (OGSA) is in the process of defining a set of Web Services

    Description Language (WSDL) interfaces for creating, managing, and securely accessing large

    computational grids.

    Job Scheduling

     To provide transparent and efficient access to remote and geographically distributed resources.

     A scheduling service is necessary to coordinate access to the different resources such as network, data,

    storage, software, and computing elements that are available on the grid.

     the heterogeneous and dynamic nature of the grid makes scheduling complicated.

     Schedulers need to automate the three essential tasks viz

    o resource discovery,

    o system selection

    o job execution.

     At a very basic level, jobs, once submitted into the grid, are

    o queued by the scheduler

    o dispatched to compute nodes as they become available.

     grid schedulers are typically required to dispatch jobs based on a well-defined algorithm such as

    o First-Come-First-Serve (FCFS),

    o Shortest-Job-First (SJF),

    o Round-Robin (RR), or

    o Least-Recently-Serviced (LRS).

     Schedulers have to support advanced features such as:

    o User requested job priority

    o Allocation of resources to users based on percentages

    o Concurrency limit on the numbers jobs a user is allowed to run

    o User specifiable resource requirements

    o Advanced reservation of resources

    o Resource usage limits enforced by administrators

    Data Management

    Data processed by jobs are typically

     sourced at submission time

     obtained by the compute node from a shared file system (NFS, AFS, DFS) at execution time.

    o the location of the data is input at submission

    o each node is required to have access to the specified location

     work well when the input data size is small or when all nodes in the grid have access to a global shared file

    system.

     Typical data sizes in both commercial and R&D are in gigabytes

     Several hardware and software solutions are available

    o such as distributed data caching, replication, uniform namespace

     virtualize access to data and provide a seamless way to store and retrieve data.

  • CS693 – Grid Computing Unit - V

    MTech CSE (PT, 2011-14) SRM, Ramapuram 4 hcr:innovationcse@gg

    Remote Execution Environment

     Jobs executing on a grid require the same environment as they would if executed on submitters’ machine

     These include environment variables, runtime environments such as Java, Small Talk, or .NET CLR, and

    libraries such as C language runtime and operating system libraries.

     This total environment may be already available on the compute node or may have to be created by the job

    on node prior to execut