autonomic computing: model, architecture, infrastructure

Autonomic Computing:Model, Architecture, Infrastructure

Manish ParasharThe Applied Software Systems Laboratory

Rutgers, The State University of New Jerseyhttp://automate.rutgers.edu

Ack: NSF (CAREER, KDI, ITR, NGS), DoE (ASCI)

UPP – Autonomic ComputingMt. St. Michel, France, September 15 – 17, 2004

UPP, September 15-17, 2004 2

Unprecedented Complexity, Uncertainty …

• Very large scales– million of entities

• Ad hoc (amorphous) structures/behaviors– p2p/hierarchical architecture

• Dynamic– entities join, leave, move, change behavior

• Heterogeneous– capability, connectivity, reliability, guarantees, QoS

• Unreliable– components, communication

• Lack of common/complete knowledge– number, type, location, availability, connectivity, protocols,

semantics, etc.


Autonomic Computing

• Our system programming paradigms, methods and management tools seem to be inadequate for handling the scale, complexity, dynamism and heterogeneity of emerging systems– requirements and objectives are dynamic and not know a priori– requirements, objectives and solutions (algorithms, behaviors,

interactions, etc.) depend on state, context, and content

• Nature has evolved to cope with scale, complexity, heterogeneity, dynamism and unpredictability, lack of guarantees– self configuring, self adapting, self optimizing, self healing, self

protecting, highly decentralized, heterogeneous architectures that work !!!

• The goal of autonomic computing is to build self-managing system address these challenges using high level policies


Ashby’s Ultrastable System Model of the Human Autonomic Nervous System

Reacting Part R

Environment

Step Mechanisms/ Input Parameter S

Essential Variables

Motorchannels

Sensorchannels


Programming Distributed Systems

• A distributed system is a collections of logically or physically disjoint entities which have established a processing for making collective decisions.

if (Decision(CurrentState,Request)) then TransitionState(CurrentState,Request)

– Central/Distributed Decision & Transition– Programming System

• programming model, languages/abstraction – syntax + semantics

– entities, operations, rules of composition, models of coordination/communication

• abstract machine, execution context and assumptions• infrastructure, middleware and runtime

– Conceptual and Implementation Models


UPP 2004 – Autonomic Computing

• Objective: Investigate conceptual and implementation models for Autonomic Computing– Models, Architectures and Infrastructures for Autonomic

Computing• Manish Parashar et al.

– Grassroots Approach to Self-Management in Large-Scale Distributed Systems

• Ozalp Babaoglu et al.

– Autonomic Runtime System for Large Scale Applications• Salim Hariri et al.


Outline

• Programming emerging distributed systems• Project AutoMate and the Accord programming system• Sample applications in science and engineering• Conclusion


Autonomic Computing Architecture

• Autonomic elements (components/services)– Responsible for policy-driven self-management of individual

components

• Relationships among autonomic elements – Based on agreements established/maintained by autonomic

elements– Governed by policies– Give rise to resiliency, robustness, self-management of

system


Project AutoMate: Enabling Autonomic Applications(http://automate.rutgers.edu)

• Conceptual models and implementation architectures for autonomic computing– programming models, frameworks and middleware services

• autonomic elements• dynamic and opportunistic composition• policy, content and context driven execution and management

Ru

dd

erC

oo

rdin

ation

Mid

dlew

are

Sesam

e/DA

IS P

rotection Service

Autonomic Grid Applications

Programming SystemAutonomic Components, Dynamic Composition,

Opportunistic Interactions, Collaborative Monitoring/Control

Decentralized Coordination EngineAgent Framework,

Decentralized Reactive Tuple Space

Semantic Middleware ServicesContent-based Discovery, Associative Messaging

Content OverlayContent-based Routing Engine,

Self-Organizing Overlay

Ont

olog

y, T

axon

omy

Met

eor/

Sq

uid

Co

nte

nt-

bas

edM

idd

lew

are

Acc

ord

Pro

gra

mm

ing

Fra

mew

ork


Accord: A Programming System for Autonomic Applications

• Specification of applications that can detect and dynamically respond during execution to changes in both, the execution environment and application states– applications composed from discrete, self-managing

components which incorporate separate specifications for all of functional, non-functional and interaction-coordination behaviors

– separations of the specifications of computational (functional) behaviors, interaction and coordination behaviors and non-functional behaviors (e.g. performance, fault detection and recovery, etc.) so that their combinations are composable

– separation of policy and mechanism – policies in the form of rules are used to orchestrate a repertoire of mechanisms to achieve context-aware adaptive runtime computational behaviors and coordination and interaction relationships based on functional, performance, and QoS requirements

– extends existing distributed programming systems


Autonomic Elements in Accord

Sensor Invocation

Context/Content RulesAutonomic Element

Computational Element

Element Manager

Operational Port

Functional Port

Control Port Element Manager

State

Actuator Invocation

FunctionInterfaces

– Functional port defines set of functional behaviors provided and used– Control port defines sensors/actuators for externally monitoring and

controlling the autonomic element, and a set of guards to control the access to the sensors and actuators

– Operational port defines interfaces to formulate, inject and manage rules used to manage the runtime behaviors and interactions of the element

– Autonomic element embeds an element manager that is delegated to evaluate and execute rules in order to manage the execution of the element, and cooperates with other element managers to fulfill application objectives.


Rules In Accord

IF condition THEN then_actions ELSE else_actions

A logic combination of sensors, events, and functional interfaces

A sequence of sensors, actuators and functional interfaces

– Behavior rules manage the runtime behaviors of a component

– Interaction rules manage the interactions between components, between components and environments, and the coordination within an application.

• control structure, interaction pattern, communication mechanism

– Security rules control access to the functional interfaces, sensors/actuators and rule interfaces

– Conflicts are resolved using a simple priority mechanism


Dynamic Composition/Coordination In Accord

Workflow

Manager(s)

Interaction rules

Interaction rules

Interaction rules

Interaction rules

• Relationship is defined by control structure (e.g., loop, branch) and/or communication mechanism (e.g., RPC, shared-space)

– composition manager translates workflow into a suite of interaction rules injected into element managers

– element managers execute rules to establish control and communication relationships among elements in a decentralized manner

• rules can be used to add or delete elements

• a library of rule-sets defined for common control and communications relationships between elements.

– interaction rules must be based on the core primitives provided by the system.


Accord Implementation Issues

• Current implementations – C++ + MPI, DoE CCA, XCAT/OGSA– XML used for control/operational ports and rules– common ontology for specifying interfaces, sensors/actuators,

rule, content, context, …– timed behavior, fail-stop semantics– of course, these is a performance impact but in our

experience this have not been a show stoppers

• Accord assumes an execution environment that provides – agent-based control network– supports for associative coordination– service for content-based discovery and messaging, – support of context-based access control– execution environment of the underlying programming system


Accord Neo-CCA

usePort

Component A

providePort

providePort

Element Manager

usePort

Driver component

GoPort

Component B

providePort

usePort

Driver component

GoPort

providePort

Composition Agent

usePort

Component A

providePort

Component B

providePort

An original Neo-CCA application

The Neo-CCA based Accord application

usePort


Accord Neo-CCA

CA

Driver A B

EM

Neo-CCA frameworkNode x

CA

Driver A B

EM

Neo-CCA frameworkNode y

CA

Driver A B

EM

Neo-CCA frameworkNode z


Accord Application Infrastructure

• Rudder Decentralized Coordination Framework – support autonomic compositions, adaptations, optimizations, and

fault-tolerance.• context-aware software agents • decentralized tuple space coordination model

• Meteor Content-based Middleware– services for content routing, content discovery and associative

interactions• a self-organizing content overlay• content-based routing engine and decentralized information discovery

service – flexible routing and querying with guarantees and bounded costs

• Associative Rendezvous messaging – content-based decoupled interactions with programmable reactive

behaviors.

• Details in IEEE IC 05/04, ICAC 04, SMC 05


Data-Driven Optimization of Oil Production


AutonomicAutonomic OilOil Well Placement (VFSA)

permeability Pressure contours3 wells, 2D profile

Contours of NEval(y,z,500)(10)

Requires NYxNZ (450)evaluations. Minimum

appears here.

VFSA solution: “walk”: found after 20 (81) evaluations


AutonomicAutonomic OilOil Well Placement (VFSA)


AutonomicAutonomic OilOil Well Placement (SPSA)

Permeability field showing the positioning of current wells. The symbols “*” and “+” indicate injection and producer wells, respectively.

Search space response surface: Expected revenue - f(p) for all possible well locations p. White marks indicate optimal well locations found by SPSA for 7 different starting points of the algorithm.


AutonomicAutonomic OilOil Well Placement (SPSA)


CH4Air/H2Air Simulations

• Simulate the chemical reaction with the elements O, H, C, N, and AR under dynamic conditions– CRL/SNL, Livermore, CA

• Objective is to use current sensor date and simulation state to choose “best” algorithm the accelerates convergence– i.e., decreases nfe


Rule Generation for CH4Air Problem

Comparison of BDF algorithms in CH4Air problem in terms of nfe

0

200

400

600

800

1000

1200

1400

1000

1200

1400

1600

1800

2000

2200

2400

2600

2800

3000

temperature

the

nu

mb

er

of

nfe

BDF2

BDF3

BDF4

BDF5


Rules for CH4Air Problem

• IF 1000 <= temperature < 2000 THEN BDF 3• IF 2000 <= temperature < 2200 THEN BDF 4• IF 2200 <= temperature < 3000 THEN BDF 3• IF 3000 <= temperature THEN BDF 3


Experiment Results of CH4Air Problem

Comparison of rule based and non rule based execution of CH4Air problem in terms of nfe

0

200

400

600

800

1000

1200

1400

temperature

the

nu

mb

er

of

nfe rule based

execution

non rulebasedexecution


Rule Generation for H2Air Problem

Comparison of BDF algorithms in H2Air problem in terms of nfe

0

100

200

300

400

500

600

700

1000 1200 1400 1600 1800 2000 2200 2400 temperature

the

nu

mb

er

of

nfe

BDF2

BDF3

BDF4

BDF5


Rules for H2Air Problem

• IF 1000 <= temperature < 1200 THEN BDF 2• IF 1200 <= temperature < 1800 THEN BDF 4• IF 1800 <= temperature < 2400 THEN BDF 3• IF 2400 <= temperature THEN BDF 4


Experiment Results of H2Air Problem

Comparison of rule based and non rule based execution of H2Air problem in terms of nfe

0

100

200

300

400

500

600

700

1000 1200 1400 1600 1800 2000 temperature

the

nu

mb

er

of

nfe

Rulebasedexecution

Non rulebasedexecution


Computational Modeling of Physical Phenomenon

• Realistic, physically accurate computational modeling– Large computation requirements

• e.g. simulation of the core-collapse of supernovae in 3D with reasonable resolution (5003) would require ~ 10-20 teraflops for 1.5 months (i.e. ~100 Million CPUs!) and about 200 terabytes of storage

• e.g. turbulent flow simulations using active flow control in aerospace and biomedical engineering requires 5000x1000x500=2.5∙109 points and approximately 107 time steps, i.e. with 1GFlop processors requires a runtime of ~7∙106 CPU hours, or about one month on 10,000 CPUs! (with perfect speedup). Also with 700B/pt the memory requirement is ~1.75TB of run time memory and ~800TB of storage.

– Dynamically adaptive behaviors– Complex couplings

• multi-physics, multi-model, multi-resolution, ….

– Complex interactions• application – application, application – resource, application – data, application – user, …

– Software/systems engineering/programmability• volume and complexity of code, community of developers, …

– scores of models, hundreds of components, millions of lines of code, …


A Selection of SAMR Applications

Multi-block grid structure and oil concentrations contours (IPARS, M. Peszynska, UT Austin)

Blast wave in the presence of a uniform magnetic field) – 3 levels of refinement. (Zeus + GrACE +

Cactus, P. Li, NCSA, UCSD)

Mixture of H2 and Air in stoichiometric proportions with a non-uniform temperature field (GrACE + CCA, Jaideep Ray, SNL, Livermore)

Richtmyer-Meshkov - detonation in a deforming

tube - 3 levels. Z=0 plane visualized on the right (VTF + GrACE, R. Samtaney, CIT)


Autonomic Runtime Management

Self-Optimization& Execution

Self-Observation

& Analysis

AutonomicPartitioning

Partition/ComposeRepartition/Recompose

VCUVCUVirtual

Computation Unit

VCUVirtual

ResourceUnit

Dynamic Driver Application

Monitoring &Context-Aware

Services

Application

Monitoring

Service

Resource

Monitoring

Service

Heterogeneous, Dynamic Computational Environment

Natural Region Characterization

PerformancePrediction

Module

CPU

SystemCapability

Module

MemoryBandwidthAvailabilityAccess

Policy

Resource History Module

System StateSynthesizer

Application State Characterization

Nature of Adaptation

ApplicationDynamics

Computation/Communication

ObjectiveFunction

Synthesizer

Prescriptions

MappingDistribution

Redistribution

Execution

NRM NWM

Normalized Work Metric

NormalizedResource Metric

Autonomic Scheduling

VGTS: Virtual Grid Time SchedulingVGSS: Virtual Grid Space Scheduling

Global GridScheduling

Local GridScheduling

VGTS VGSS VGTS VGSS

Virtual GridAutonomic

RuntimeManager

Current System State

Current Application State

DeductionEngine

DeductionEngine

DeductionEngine


Autonomic Forest Fire Simulation

High computationzone

Predicts fire spread (the speed, direction and intensity of forest fire front) as the fire propagates, based on both dynamic and static environmental and vegetation conditions.


Conclusion

• Autonomic applications are necessary to address scale/complexity/heterogeneity/dynamism/reliability challenges

• Project AutoMate and the Accord programming system addresses key issues to enable the development of autonomic applications – conceptual and implementation

• More Information, publications, software, conference– http://automate.rutgers.edu– [email protected] / [email protected] – http://www.autonomic-conference.org


The Team

• TASSL Rutgers University – Autonomic Computing Research

Group• Viraj Bhat • Nanyan Jiang• Hua Liu (Maria) • Zhen Li (Jenny) • Vincent Matossian • Cristina Schmidt • Guangsen Zhang

– Autonomic Applications Research Group

• Sumir Chandra • Xiaolin Li • Li Zhang

• CS Collaborators– HPDC, University of Arizona

• Salim Hariri– Biomedical Informatics, The Ohio

State University• Tahsin Kurc, Joel Saltz

– CS, University of Maryland• Alan Sussman, Christian Hansen

• Applications Collaborators– CSM, University of Texas at Austin

• Malgorzata Peszynska, Mary Wheeler

– IG, University of Texas at Austin• Mrinal Sen, Paul Stoffa

– ASCI/CACR, Caltech• Michael Aivazis, Julian Cummings,

Dan Meiron– CRL, Sandia National Laboratory,

Livermore• Jaideep Ray, Johan Steensland

autonomic computing: model, architecture, infrastructure

Documents

autonomic computingmodels

autonomic computingobjective

autonomic elementsgoverned

autonomic applicationshttp

autonomic runtime system

goal of autonomic computing

managing system

management of systemupp