model based transaction-aware cloud resources management case study and methodology

<Insert Picture Here>

Model-based transaction-aware cloud resources management: case study and methodology

Leonid Grinshpan, Ph.D.Consulting Technical Director

2

Disclaimer

The views expressed in this presentation are the author’s own and do not reflect the views of the

companies he has worked for neither

Oracle Corporation.

All brands and trademarks mentioned are the property of their owners.

3

Presentation’s goal

The presentation includes:

- A case study when a number of database servers was brought down from 40 to 21 while deploying an enterprise application for one of Oracle customers

- An outline of a model-based transaction-aware cloud management methodology that made possible such minimization of hardware

4

Presentation’s structure

Section 1

Subject

Section 2

Transaction-aware servers allocation

Section 3

How to implement model-based transaction-aware management?

5

Section 1

Subject

6

Deployment of enterprise application on a network of functional clusters

7

Load balancing algorithms

1. Round robin

2. Algorithms based on assessment of hardware metrics (CPU utilization, etc)

A load balancing approach discussed in the presentation is based on

business transaction metrics

8

Definitions 1

• Transaction - a request from an EA user to be processed by system.• Transaction (response) time - time to process transaction by

application.• Transaction rate - a number of transaction requests submitted by one

user during one hour.• Transaction service demand - time interval a transaction was

processed by particular component of infrastructure (network, hardware appliance, hardware server).

• Transaction profile - a set of Transaction service demand s for system resources needed to process transaction

9

Definitions 2

• Workload - a flow of transactions generated by EA users. • Workload characterization - specification of workload that includes three components:

List of business transactions. Transaction rate. Per each transaction a number of users requesting it.

• Transaction stretch factor - a parameter defined by a formula:

Scalable system has the stretch factors equal to 1

for all transactions

10

Sizing project requirements. System architecture

11

Sizing project requirements. Number of users and SLA

• Provide an estimate of the number of hardware servers on each layer and a number of CPUs on each server for architecture presented on previous slide for anticipated workload from 400 business users.

• Service level: transaction time degradations while increasing a number of users up to 400 are acceptable if they do not exceed 7%.

12

Sizing project requirements. Workload characterization

Transaction nameAverage transaction time for single user (seconds)

Transaction rate Number of users executing each transaction

OLAP maintenance small 10 1 14

OLAP maintenance medium 10 1 2

OLAP maintenance large 10 4 1

OLAP restructure small 60 1 2

OLAP restructure medium 600 1 1

OLAP restructure large 3600 2 1

OLAP update small 10 1 12

OLAP update medium 10 1 11

OLAP update large 10 3 6

OLAP calculation small 500 2 187

OLAP calculation medium 2250 3 34

OLAP calculation large 10000 5 6

Maintenance report small 15 1 60

Maintenance report medium 25 8 9

Maintenance report large 200 9 1

Update report small 10 1 8

Update report medium 10 3 6

Update report large 10 3 1

Sales report small 15 3 30

Sales report medium 25 4 7

Sales report large 200 12 1

13

Sizing project requirements. Transaction profile (time in seconds)

Transaction nameAverage

transaction time for single user Time on Web/App

server Time on OLAP

serverTime on RDBMS

server

OLAP maintenance small 10 1 7 2

OLAP maintenance medium 10 1 7 2

OLAP maintenance large 10 1 7 2

OLAP restructure small 60 1 56 3

OLAP restructure medium 600 1 595 4

OLAP restructure large 3600 1 3594 5

OLAP update small 10 1 7 2

OLAP update medium 10 1 7 2

OLAP update large 10 1 7 2

OLAP calculation small 500 1 496 3

OLAP calculation medium 2250 1 2245 4

OLAP calculation large 10000 1 9995 5

Maintenance report small 15 3 11 1

Maintenance report medium 25 6 17 2

Maintenance report large 200 49 147 4

Update report small 10 1 8 1

Update report medium 10 1 8 1

Update report large 10 1 8 1

Sales report small 15 3 11 1

Sales report medium 25 6 17 2

Sales report large 200 49 147 4

14

What-if scenarios

• A few what-if scenarios with different numbers of servers on each layer were modeled

• All analyzed deployments indicated a sufficiency of one Web/Application and one RDBM servers

• There were analyzed configurations with different numbers of OLAP server.

15

Architecture with 40 OLAP server and original workload

This architecture keeps transaction time deterioration for 400 users under 7% …

… but it featured low utilization of each OLAP server (only 36%)

It was obvious to check server utilizations for deployments with fewer OLAP servers. We did just that for 34 servers and the result contradicted SLA- an increase of some transaction times was reaching unacceptable 18%

16

Transaction time degradation under original workload

Transaction

1 user 400 users

Stretch factor

40 OLAP servers, CPU utilization of each one

36%

34 OLAP servers, CPU utilization of each one

42%

OLAP maintenance small 10.0 10.68 1.07 1.18

OLAP maintenance medium 10.0 10.68 1.07 1.18

OLAP maintenance large 10.0 10.68 1.07 1.18

OLAP restructure small 60.0 60.68 1.01 1.03

OLAP restructure medium 600.0 600.68 1.00 1.00

OLAP restructure large 3600.0 3600.66 1.00 1.00

OLAP update small 10.0 10.68 1.07 1.18

OLAP update medium 10.0 10.68 1.07 1.18

OLAP update large 10.0 10.68 1.07 1.18

OLAP calculation small 500.0 500.66 1.00 1.00

OLAP calculation medium 2250.0 2250.66 1.00 1.00

OLAP calculation large 10001.2 10001.66 1.00 1.00

Maintenance report small 15.0 15.68 1.05 1.12

Maintenance report medium 25.0 25.68 1.03 1.07

Maintenance report large 200.0 200.67 1.00 1.01

Update report small 10.0 10.68 1.07 1.18

Update report medium 10.0 10.68 1.07 1.18

Update report large 10.0 10.68 1.07 1.18

Sales report small 15.0 15.68 1.05 1.12

Sales report medium 25.0 25.68 1.03 1.07

Sales report large 200.0 200.67 1.00 1.01

17

What causes degradation of transaction time ?

• The cause of degradation of short transactions is waiting in server’s queues until long transactions (like OLAP calculation) release a CPU.

• This observation leads to a hypothesis that segmentation of transactions based on hourly service demand by different groups and processing of each group in dedicated OLAP servers might minimize total number of OLAP servers.

Transaction hourly service demand = = time a single transaction spends in a server * number of transactions per hour per user * number of transaction's users

18

Low and high demand workloads

Transaction nameTransactio

n rateNumber of users executing each

transaction

Time on OLAP Server

Hourly service demand for OLAP server from

each low demand transaction

Hourly service demand for OLAP server from

each high demand transaction

OLAP maintenance small 1 14 7 98

OLAP maintenance med. 1 2 7 14

OLAP maintenance large 4 1 7 28

OLAP restructure small 1 2 56 112

OLAP restructure medium 1 1 595 595

OLAP restructure large 2 1 3594 7188

OLAP update small 1 12 7 84

OLAP update medium 1 11 7 77

OLAP update large 3 6 7 126

OLAP calculation small 2 187 496 185504

OLAP calculation medium 3 34 2245 228990

OLAP calculation large 5 6 9995 299850

Maintenance report small 1 60 11 660

Maintenance report med. 8 9 17 1224

Maintenance report large 9 1 147 1323

Update report small 1 8 8 64

Update report medium 3 6 8 144

Update report large 3 1 8 24

Sales report small 3 30 11 990

Sales report medium 4 7 17 476

Sales report large 12 1 147 1764

19

Architecture with segmented workload

This configuration delivered the same transaction times and stretch factors as a system with 40 OLAP servers handling non-segmented original

workload.

1 OLAP server processing low demand workload

20 OLAP servers processing high demand workload

20

Take away from case study

•The presentation describes a method of servers allocation based on business transaction metrics. The method minimizes a number of servers assigned to application without compromising transaction times

•The approach assumes classification of transactions by groups depending on their hourly service demand and processing of each group in dedicated servers

•Transaction- aware cloud management might deliver significant improvement of cloud profitability without any additional investments in hardware platform

•Further research of cloud management based on business transactions metrics is worthy of consideration as it might bring significant economical benefits to the cloud providers and to the customers

21

Section 3

How to implement model-based transaction-aware management?

22

What is needed?

• Application model

• Workload specification

• Transaction profiles

• Model solver

23

Mapping application into queuing model

Hardware server representation

Total time in node = time in waiting queue + time in processing unit

24

Mapping application into queuing model (cont 2)

25

Mapping application into queuing model (cont 3)

The relationships between the components of a real system and the components of its model

Component of application Matching object in queuing model

Users Node “Users”

Web server Node “Web server”

Application and Database server

Node “A&D server”

Requests from users Cars

26

Transaction response time and transaction profile

Transaction time is a time spent in “cloud”

27

Transaction response time and transaction profile (cont 2)

Active resources Passive resources

•CPU time (data processing)•I/O time (data transfer)

•Web server connections•Database connections•Software threads •Storage space•Memory space

28

Model’s input data

List of business transactions Number of users per each business transaction Per each transaction a number of transactions per user per hour

(transaction rate).

Transaction name Number of users Transaction rate

Report ABC 20 12

Business Rule X 10 8

Consolidation Y 5 3

1. Workload characterization

29

Model’s input data (cont 2)

2. Transactions profiles

Transaction profile in this example s comprised of the time intervals a transaction has spent in system servers it has visited when application was serving only that single transaction

Transaction name Service demand (seconds)

Web server A&D server

Report ABC 0.5 0.5

Business Rule X 0.5 2.5

Consolidation Y 0.5 9.5

30

Workload segmentation

Transaction hourly service demand = = time a single transaction spends in a server *

number of transactions per hour per user * number of transaction's users

Transaction-aware management is based on classification of transactions by groups depending on their hourly service demand

and processing of each group in dedicated servers

31

How to obtain workload characterization and transaction profiles?

•Analysis of business process – creating process flowcharts based on interviews of key process participants

http://www.wikihow.com/Analyze-a-Business-Process

•Business transaction management software - tracking transaction across application http://en.wikipedia.org/wiki/Business_transaction_management

•Application instrumentation on software development stage – making application manageable

http://en.wikipedia.org/wiki/Instrumentation_(computer_programming)

•Big data analysis – forensic analysis of transactional data collected over time

http://en.wikipedia.org/wiki/Big_data

32

Application instrumentation is the most potent technology

• Transactions are defined at application development stage

• It is possible to assign unique ID to each transaction

• Unique transaction ID enables tracking transaction path among servers

•Unique transaction ID enables measurement of each server passive and active resources allocated to transaction

•Unique transaction ID enables logging in file information on all executed transactions with their parameters

33

Parameters of a transaction saved in log file

• Unique transaction ID

• ID of a user who initiated transaction

• Transaction start and stop date and times

• Transaction total execution time

• Per each server :

time transaction entered server

time transaction exited server

time transaction spent in server

34

Transaction profile data generated by instrumented application

Qty of passive resource 1Qty of passive resource 2

…Qty of passive resource M

(transaction ID)(server 1) (server 2) … (server N)

CPU timeI/O time

35

Application instrumentation technologies

• Application Response Measurement – ARM https://collaboration.opengroup.org/tech/management/arm

• Apache Commons Monitoring http://commons.apache.org/sandbox/monitoring/instrumentation.html

• Tracing and Instrumenting Applications in Visual Basic and Visual C#

http://msdn.microsoft.com/en-us/library/aa984115(v=vs.71).aspx

• Systemtap for Linux http://sourceware.org/systemtap/tutorial.pdf

• Java Management Extension (JMX)

http://docs.oracle.com/javase/tutorial/jmx/index.html

36

Application instrumentation is must-have component of efficient cloud management

transaction profiles transaction log files

Instrumented application provides transaction profiles and transactional log files

Big data analysis - extracting workload characterization from transactional log files

Application queuing models generate estimates of system performance for different what-if scenarios

Cloud management implements the best scenario

37

Research areas

• Enterprise application instrumentation as a provider of transactional data

• Big data analysis delivering transactional workload characterizations and workload variability patterns for proactive cloud management

• Queuing models of enterprise applications enabling analysis of different what-if scenarios

38

To learn more about queuing models of enterprise applications check

author’s book

“Solving Enterprise Applications Performance Puzzles: Queuing Models to the Rescue”

(available in bookstores and from Web booksellers)

http://www.amazon.com/Solving-Enterprise-Applications-Performance-Puzzles/dp/

1118061578/ref=sr_1_1?ie=UTF8&qid=1326134402&sr=8-1

https://www.amazon.com/author/leonid.grinshpan

Contact Leonid Grinshpan at:

[email protected]

model based transaction-aware cloud resources management case study and methodology

Technology

olap maintenance large10

olap maintenance medium10

olap update small10

olap update large10

olap update medium10

olap restructure large3600

olap restructure medium600

olap restructure small60