venugopal adec

37
Autonomic Decentralised Elasticity Management of Cloud Applications Srikumar Venugopal School of Computer Science and Engineering University of New South Wales, Sydney, Australia E: [email protected] W:

Upload: srikumarv

Post on 06-May-2015

93 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Venugopal adec

Autonomic Decentralised Elasticity Management of Cloud

ApplicationsSrikumar VenugopalSchool of Computer Science and EngineeringUniversity of New South Wales, Sydney, AustraliaE: [email protected]: http://www.cse.unsw.edu.au/~srikumarv

Page 2: Venugopal adec

Agenda

Background & Motivation

Problem Statement

Solution Overview

Evaluation: Methodology and Results

Conclusion

Page 3: Venugopal adec

The Promise of Cloud Computing

Page 4: Venugopal adec

Background & Motivation

Page 5: Venugopal adec

State-of-the-art in Auto-scaling

Product/Project Trigger Controller Actions

Amazon Autoscaling

Cloudwatch metrics/ Threshold

Rule-based/Schedule-based

Add/Remove Capacity

WASABi Azure Diagnostics/Threshold

Rule-based Add/Remove Capacity, Custom

RightScale/Scalr Load monitoring Rule-based/Schedule-based

Add/Remove Capacity, Custom

Google Compute Engine

CPU Load, etc. Rule-based Add/Remove Capacity

Academic

CloudScale Demand Prediction Control theory Voltage-scaling

Cataclysm Threshold-based Queueing-model Admission Control

IBM Unity Application Utility Utility functions/RL Add/Remove Capacity

Page 6: Venugopal adec

Cons of Rule-based Auto-scaling

• Currently, the most popular mechanisms for auto-scaling are rule-based mechanisms

• The effectiveness of rule-based autoscaling is determined by the trigger conditions

• Setting up the triggers is a trial-and-error process.

Page 7: Venugopal adec

Cons of Rule-based Autoscaling

• Commercial products are rule-based– Gives “illusion of control” to users– Leads to the problem of defining the

“right” thresholds

• Centralised controllers– Communication overhead increases with

size– Processing overhead also increases (Big

Data!)

• Limited to One application per VM

Page 8: Venugopal adec

Challenges of large-scale elasticity

• Large numbers of instances and apps– Deriving solutions takes time

• Dynamic conditions– Apps are going into critical all the time

• Shifting bottlenecks– Greedy solutions may create bottlenecks

in other places

• Network partitions, fault tolerance…

H. Li, S. Venugopal, Using Reinforcement Learning for Controlling an Elastic Web Application Hosting Platform, Proceedings of 8th ICAC '11.

Page 9: Venugopal adec

Problem Statement

Page 10: Venugopal adec

Initial Conditions

Instance1App Server1

app1 app2

Instance2App Server2

app3 app4

IaaS Provider

Page 11: Venugopal adec

A Critical Event

Instance1App Server1

app1 app2

IaaS Provider

Instance2App Server2

app3 app4

Page 12: Venugopal adec

Placement 1

Instance1App Server1

app2

IaaS Provider

Instance2App Server2

app3 app4 app1

Page 13: Venugopal adec

Placement 2

Instance1App Server1

app1

IaaS Provider

Instance2App Server2

app3 app4 app2

Page 14: Venugopal adec

Placement 3

Instance1App Server1

app2

IaaS Provider

Instance2App Server2

app3 app4

Instance3App Server3

app1

Page 15: Venugopal adec

Placements 4 & 5

Instance1App Server1

app2

IaaS Provider

Instance2App Server2

app3 app4

Instance1App Server1

app2

IaaS Provider

Instance2App Server2

app3 app4

Instance3App Server3

app1 app1

app1 app1

Page 16: Venugopal adec

Challenges of App Placement

• Load shifts are dynamic

• Multiple applications may go critical simultaneously

• Instance provisioning should be controlled

• Service QoS must be maintained

Page 17: Venugopal adec

Twin Objectives

• Provisioning Problem– To determine the smallest number of

servers required to satisfy resource requirements of all the applications

• Dynamic Placement Problem– To distribute the applications so as to

maximise utilisation yet meet each app’s response time and availability requirements

Page 18: Venugopal adec

Solution Overview

Page 19: Venugopal adec

Decentralised Elastic Control

• Instances control their own utilisation–Monitoring, management and feedback

• Local controllers are learning agents– Reinforcement Learning

• Servers are linked by Zookeeper– Agility, Flexibility, Co-ordination

• We call our system ADEC (Autonomic Decentralised Elasticity Control)

Page 20: Venugopal adec

Software Architecture of ADEC

Page 21: Venugopal adec

Reinforcement Learning

• Learn optimal management policies over time– vs. Model-based policies

• Learn long-term effects of short-term actions– If the state-action pairs are chosen correctly

• We have applied Q-Learning to this problem– Initial actions are drawn using Boltzmann dist.

Page 22: Venugopal adec

Abstract View of the Control Scheme

Page 23: Venugopal adec

States

Page 24: Venugopal adec

Basic Actions

Server

Application

create terminate find

move duplicate merge

(-3.5) (3.5) (3.5)

(0.5) (0.5) (0.5)

Page 25: Venugopal adec

Actions and Rewards

• Actual actions are a combination of a server and an application action– E.g. find and move, merge and

terminate

• 11 pre-defined actions– Reducing complexity

• Each action is associated with a reward– -ve rewards for actions incurring costs

(e.g. start server)–+ve rewards for actions that save (e.g.

terminate

Page 26: Venugopal adec

Co-ordination using find

• Server looks up other servers with the least load– Zookeeper lookup

• Sends a move message to the selected server

• Replies with accept or reject– accept has a +ve reward

Page 27: Venugopal adec

Shrinking

• The controller is always reward maximising– Highest Reward is for merge+terminate

• A controller initiates its own shutdown– Low load on its applications

• Gets exclusive lock on termination– Only one instance can terminate at a

time

• Transfers state before shutdown

Page 28: Venugopal adec

Information on the DHT

• Server event notification

• List of applications on each server

• Server status updates (load information)

• Q-value updates

Page 29: Venugopal adec

Evaluation

Page 30: Venugopal adec

Experiment 1: Testing ADEC

• IaaS provider: Amazon EC2– small instances and high CPU instance

• Load-tester: Apache Jmeter• Application server: Tomcat 6.0– JVM with 1 GB RAM

• Server thresholds: 60% and 85%

Page 31: Venugopal adec

Experiment 1: Testing

• Six web applications– Test Application: Hotel Management– Search Book Confirm

• Five were subjected to a background load– Uniform Random

• One was subjected to the test load• Application threshold: 200 and 500

ms• Metrics– Average Response Time, Drop Rate,

Servers

Page 32: Venugopal adec

Peaking Workload

Page 33: Venugopal adec

Poisson Workload

Page 34: Venugopal adec

Conclusion

Page 35: Venugopal adec

Conclusion

• Demonstrated a co-ordination architecture for provisioning web applications

• Each server is independent and the system is managed by set of simple states and actions

• Instances start and shutdown on their own to meet application objectives

Page 36: Venugopal adec

Ongoing Work

• Imrpoved performance modeling for quick detection of slowdowns

• Using utility functions for defining application priorities

• Extension to SOA and BPM – Collaboration with Technical Univ of

Vienna, Austria

• Scaling the database– ElasCass project

Page 37: Venugopal adec

Questions ?

[email protected]

Thank you!