model based transaction-aware cloud resources management case study and methodology
DESCRIPTION
The presentation introduces a method of cloud resources allocation to enterprise applications (EA) depending on business transaction metrics. The approach is using queuing models; it was devised while working on a real-life EA capacity planning project requested by one of the Oracle customers. An implementation of a proposed solution brought a number of database servers from 40 to 21 without compromising transaction times. The presentation describes components of proposed methodology: building application’s queuing model, obtaining input data for modeling (workload characterization and transaction profile), solving model and analyzing what-if scenarios. The presentation compares ways and means of collecting input data; it identifies instrumentation of software at its development stage as an ultimate solution and encourages research of technologies delivering instrumented EAs. Takeaway: model-based transaction-aware cloud resources management significantly improves cloud profitability by minimizing a number of hardware servers hosting applications while delivering required service level.TRANSCRIPT
<Insert Picture Here>
Model-based transaction-aware cloud resources management: case study and methodology
Leonid Grinshpan, Ph.D.Consulting Technical Director
2
Disclaimer
The views expressed in this presentation are the author’s own and do not reflect the views of the
companies he has worked for neither
Oracle Corporation.
All brands and trademarks mentioned are the property of their owners.
3
Presentation’s goal
The presentation includes:
- A case study when a number of database servers was brought down from 40 to 21 while deploying an enterprise application for one of Oracle customers
- An outline of a model-based transaction-aware cloud management methodology that made possible such minimization of hardware
4
Presentation’s structure
Section 1
Subject
Section 2
Transaction-aware servers allocation
Section 3
How to implement model-based transaction-aware management?
5
Section 1
Subject
6
Deployment of enterprise application on a network of functional clusters
7
Load balancing algorithms
1. Round robin
2. Algorithms based on assessment of hardware metrics (CPU utilization, etc)
A load balancing approach discussed in the presentation is based on
business transaction metrics
8
Definitions 1
• Transaction - a request from an EA user to be processed by system.• Transaction (response) time - time to process transaction by
application.• Transaction rate - a number of transaction requests submitted by one
user during one hour.• Transaction service demand - time interval a transaction was
processed by particular component of infrastructure (network, hardware appliance, hardware server).
• Transaction profile - a set of Transaction service demand s for system resources needed to process transaction
9
Definitions 2
• Workload - a flow of transactions generated by EA users. • Workload characterization - specification of workload that includes three components:
List of business transactions. Transaction rate. Per each transaction a number of users requesting it.
• Transaction stretch factor - a parameter defined by a formula:
Scalable system has the stretch factors equal to 1
for all transactions
10
Sizing project requirements. System architecture
11
Sizing project requirements. Number of users and SLA
• Provide an estimate of the number of hardware servers on each layer and a number of CPUs on each server for architecture presented on previous slide for anticipated workload from 400 business users.
• Service level: transaction time degradations while increasing a number of users up to 400 are acceptable if they do not exceed 7%.
12
Sizing project requirements. Workload characterization
Transaction nameAverage transaction time for single user (seconds)
Transaction rate Number of users executing each transaction
OLAP maintenance small 10 1 14
OLAP maintenance medium 10 1 2
OLAP maintenance large 10 4 1
OLAP restructure small 60 1 2
OLAP restructure medium 600 1 1
OLAP restructure large 3600 2 1
OLAP update small 10 1 12
OLAP update medium 10 1 11
OLAP update large 10 3 6
OLAP calculation small 500 2 187
OLAP calculation medium 2250 3 34
OLAP calculation large 10000 5 6
Maintenance report small 15 1 60
Maintenance report medium 25 8 9
Maintenance report large 200 9 1
Update report small 10 1 8
Update report medium 10 3 6
Update report large 10 3 1
Sales report small 15 3 30
Sales report medium 25 4 7
Sales report large 200 12 1
13
Sizing project requirements. Transaction profile (time in seconds)
Transaction nameAverage
transaction time for single user Time on Web/App
server Time on OLAP
serverTime on RDBMS
server
OLAP maintenance small 10 1 7 2
OLAP maintenance medium 10 1 7 2
OLAP maintenance large 10 1 7 2
OLAP restructure small 60 1 56 3
OLAP restructure medium 600 1 595 4
OLAP restructure large 3600 1 3594 5
OLAP update small 10 1 7 2
OLAP update medium 10 1 7 2
OLAP update large 10 1 7 2
OLAP calculation small 500 1 496 3
OLAP calculation medium 2250 1 2245 4
OLAP calculation large 10000 1 9995 5
Maintenance report small 15 3 11 1
Maintenance report medium 25 6 17 2
Maintenance report large 200 49 147 4
Update report small 10 1 8 1
Update report medium 10 1 8 1
Update report large 10 1 8 1
Sales report small 15 3 11 1
Sales report medium 25 6 17 2
Sales report large 200 49 147 4
14
What-if scenarios
• A few what-if scenarios with different numbers of servers on each layer were modeled
• All analyzed deployments indicated a sufficiency of one Web/Application and one RDBM servers
• There were analyzed configurations with different numbers of OLAP server.
15
Architecture with 40 OLAP server and original workload
This architecture keeps transaction time deterioration for 400 users under 7% …
… but it featured low utilization of each OLAP server (only 36%)
It was obvious to check server utilizations for deployments with fewer OLAP servers. We did just that for 34 servers and the result contradicted SLA- an increase of some transaction times was reaching unacceptable 18%
16
Transaction time degradation under original workload
Transaction
1 user 400 users
Stretch factor
40 OLAP servers, CPU utilization of each one
36%
34 OLAP servers, CPU utilization of each one
42%
OLAP maintenance small 10.0 10.68 1.07 1.18
OLAP maintenance medium 10.0 10.68 1.07 1.18
OLAP maintenance large 10.0 10.68 1.07 1.18
OLAP restructure small 60.0 60.68 1.01 1.03
OLAP restructure medium 600.0 600.68 1.00 1.00
OLAP restructure large 3600.0 3600.66 1.00 1.00
OLAP update small 10.0 10.68 1.07 1.18
OLAP update medium 10.0 10.68 1.07 1.18
OLAP update large 10.0 10.68 1.07 1.18
OLAP calculation small 500.0 500.66 1.00 1.00
OLAP calculation medium 2250.0 2250.66 1.00 1.00
OLAP calculation large 10001.2 10001.66 1.00 1.00
Maintenance report small 15.0 15.68 1.05 1.12
Maintenance report medium 25.0 25.68 1.03 1.07
Maintenance report large 200.0 200.67 1.00 1.01
Update report small 10.0 10.68 1.07 1.18
Update report medium 10.0 10.68 1.07 1.18
Update report large 10.0 10.68 1.07 1.18
Sales report small 15.0 15.68 1.05 1.12
Sales report medium 25.0 25.68 1.03 1.07
Sales report large 200.0 200.67 1.00 1.01
17
What causes degradation of transaction time ?
• The cause of degradation of short transactions is waiting in server’s queues until long transactions (like OLAP calculation) release a CPU.
• This observation leads to a hypothesis that segmentation of transactions based on hourly service demand by different groups and processing of each group in dedicated OLAP servers might minimize total number of OLAP servers.
Transaction hourly service demand = = time a single transaction spends in a server * number of transactions per hour per user * number of transaction's users
18
Low and high demand workloads
Transaction nameTransactio
n rateNumber of users executing each
transaction
Time on OLAP Server
Hourly service demand for OLAP server from
each low demand transaction
Hourly service demand for OLAP server from
each high demand transaction
OLAP maintenance small 1 14 7 98
OLAP maintenance med. 1 2 7 14
OLAP maintenance large 4 1 7 28
OLAP restructure small 1 2 56 112
OLAP restructure medium 1 1 595 595
OLAP restructure large 2 1 3594 7188
OLAP update small 1 12 7 84
OLAP update medium 1 11 7 77
OLAP update large 3 6 7 126
OLAP calculation small 2 187 496 185504
OLAP calculation medium 3 34 2245 228990
OLAP calculation large 5 6 9995 299850
Maintenance report small 1 60 11 660
Maintenance report med. 8 9 17 1224
Maintenance report large 9 1 147 1323
Update report small 1 8 8 64
Update report medium 3 6 8 144
Update report large 3 1 8 24
Sales report small 3 30 11 990
Sales report medium 4 7 17 476
Sales report large 12 1 147 1764
19
Architecture with segmented workload
This configuration delivered the same transaction times and stretch factors as a system with 40 OLAP servers handling non-segmented original
workload.
1 OLAP server processing low demand workload
20 OLAP servers processing high demand workload
20
Take away from case study
•The presentation describes a method of servers allocation based on business transaction metrics. The method minimizes a number of servers assigned to application without compromising transaction times
•The approach assumes classification of transactions by groups depending on their hourly service demand and processing of each group in dedicated servers
•Transaction- aware cloud management might deliver significant improvement of cloud profitability without any additional investments in hardware platform
•Further research of cloud management based on business transactions metrics is worthy of consideration as it might bring significant economical benefits to the cloud providers and to the customers
21
Section 3
How to implement model-based transaction-aware management?
22
What is needed?
• Application model
• Workload specification
• Transaction profiles
• Model solver
23
Mapping application into queuing model
Hardware server representation
Total time in node = time in waiting queue + time in processing unit
24
Mapping application into queuing model (cont 2)
25
Mapping application into queuing model (cont 3)
The relationships between the components of a real system and the components of its model
Component of application Matching object in queuing model
Users Node “Users”
Web server Node “Web server”
Application and Database server
Node “A&D server”
Requests from users Cars
26
Transaction response time and transaction profile
Transaction time is a time spent in “cloud”
27
Transaction response time and transaction profile (cont 2)
Active resources Passive resources
•CPU time (data processing)•I/O time (data transfer)
•Web server connections•Database connections•Software threads •Storage space•Memory space
28
Model’s input data
List of business transactions Number of users per each business transaction Per each transaction a number of transactions per user per hour
(transaction rate).
Transaction name Number of users Transaction rate
Report ABC 20 12
Business Rule X 10 8
Consolidation Y 5 3
1. Workload characterization
29
Model’s input data (cont 2)
2. Transactions profiles
Transaction profile in this example s comprised of the time intervals a transaction has spent in system servers it has visited when application was serving only that single transaction
Transaction name Service demand (seconds)
Web server A&D server
Report ABC 0.5 0.5
Business Rule X 0.5 2.5
Consolidation Y 0.5 9.5
30
Workload segmentation
Transaction hourly service demand = = time a single transaction spends in a server *
number of transactions per hour per user * number of transaction's users
Transaction-aware management is based on classification of transactions by groups depending on their hourly service demand
and processing of each group in dedicated servers
31
How to obtain workload characterization and transaction profiles?
•Analysis of business process – creating process flowcharts based on interviews of key process participants
http://www.wikihow.com/Analyze-a-Business-Process
•Business transaction management software - tracking transaction across application http://en.wikipedia.org/wiki/Business_transaction_management
•Application instrumentation on software development stage – making application manageable
http://en.wikipedia.org/wiki/Instrumentation_(computer_programming)
•Big data analysis – forensic analysis of transactional data collected over time
http://en.wikipedia.org/wiki/Big_data
32
Application instrumentation is the most potent technology
• Transactions are defined at application development stage
• It is possible to assign unique ID to each transaction
• Unique transaction ID enables tracking transaction path among servers
•Unique transaction ID enables measurement of each server passive and active resources allocated to transaction
•Unique transaction ID enables logging in file information on all executed transactions with their parameters
33
Parameters of a transaction saved in log file
• Unique transaction ID
• ID of a user who initiated transaction
• Transaction start and stop date and times
• Transaction total execution time
• Per each server :
time transaction entered server
time transaction exited server
time transaction spent in server
34
Transaction profile data generated by instrumented application
Qty of passive resource 1Qty of passive resource 2
…Qty of passive resource M
(transaction ID)(server 1) (server 2) … (server N)
CPU timeI/O time
35
Application instrumentation technologies
• Application Response Measurement – ARM https://collaboration.opengroup.org/tech/management/arm
• Apache Commons Monitoring http://commons.apache.org/sandbox/monitoring/instrumentation.html
• Tracing and Instrumenting Applications in Visual Basic and Visual C#
http://msdn.microsoft.com/en-us/library/aa984115(v=vs.71).aspx
• Systemtap for Linux http://sourceware.org/systemtap/tutorial.pdf
• Java Management Extension (JMX)
http://docs.oracle.com/javase/tutorial/jmx/index.html
36
Application instrumentation is must-have component of efficient cloud management
transaction profiles transaction log files
Instrumented application provides transaction profiles and transactional log files
Big data analysis - extracting workload characterization from transactional log files
Application queuing models generate estimates of system performance for different what-if scenarios
Cloud management implements the best scenario
37
Research areas
• Enterprise application instrumentation as a provider of transactional data
• Big data analysis delivering transactional workload characterizations and workload variability patterns for proactive cloud management
• Queuing models of enterprise applications enabling analysis of different what-if scenarios
38
To learn more about queuing models of enterprise applications check
author’s book
“Solving Enterprise Applications Performance Puzzles: Queuing Models to the Rescue”
(available in bookstores and from Web booksellers)
http://www.amazon.com/Solving-Enterprise-Applications-Performance-Puzzles/dp/
1118061578/ref=sr_1_1?ie=UTF8&qid=1326134402&sr=8-1
https://www.amazon.com/author/leonid.grinshpan
Contact Leonid Grinshpan at:
39