an introduction to autonomic computingmenasce/cs895/slides/introautonomiccomputing.pdf · –...
TRANSCRIPT
© 2010 D.A. Menasce. All Rights Reserved.
1
An Introduction to Autonomic Computing
Daniel A. Menasce Department of Computer Science
George Mason University
2
Based on the papers: “The Vision of Autonomic Computing,”
Jeff Kephart and D. Chess, IEEE Computer, January 2003.
and other papers by D.A. Menasce, his students, and colleagues.
© 2010 D.A. Menasce. All Rights Reserved.
3
Motivation for AC • “…main obstacle to further progress in
IT is a looming software complexity crisis.” (from an IBM manifesto, Oct. 2001). – Tens of millions of lines of code – Skilled IT professionals required to install,
configure, tune, and maintain. – Need to integrate many heterogeneous
systems – Limit of human capacity being achieved
© 2010 D.A. Menasce. All Rights Reserved.
4
Motivation for AC (cont’d) • Harder to anticipate interactions
between components at design time: – Need to defer decisions to run time
• Computer systems are becoming too massive, complex, to be managed even by the most skilled IT professionals
• The workload and environment conditions tend to change very rapidly with time
© 2010 D.A. Menasce. All Rights Reserved.
5
3600 sec
60 sec
1 sec
Multi-scale time workload variation of a Web Server
© 2010 D.A. Menasce. All Rights Reserved.
6
Autonomic Computing • System that can manage themselves given high-level
objectives. – High-level objectives can be expressed in term of service-
level objectives or utility functions. • Autonomic computing is inspired in the human
autonomic nervous system: – “The autonomic nervous system consists of sensory neurons
and motor neurons that run between the central nervous system and various internal organs such as the: heart, lungs, viscera, glands. It is responsible for monitoring conditions in the internal environment and bringing about appropriate changes in them. The contraction of both smooth muscle and cardiac muscle is controlled by motor neurons of the autonomic system.” http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/P/PNS.html
– The autonomic nervous system functions in an involuntary, reflexive manner.
© 2010 D.A. Menasce. All Rights Reserved.
7
Evolution of AC Systems • Automated data collection and aggregation in support
of decisions by human administrators • Provide advise to humans suggesting possible
courses of action • Take lower level actions automatically • Increase the scope and impact of actions taken
automatically by systems in support of their AC behavior
• Self-managing systems and devices will be completely natural
© 2010 D.A. Menasce. All Rights Reserved.
8
Autonomic Computing • Inspiration on self-governing systems
such as social and economic systems in addition to purely biological ones.
• Dimension of Self-Management: – Self-configuration – Self-optimization – Self-healing – Self-protection
© 2010 D.A. Menasce. All Rights Reserved.
9
Self-Configuring Systems • Automatic configuration of components
within larger systems • Component registration
– Need to advertise behavior – Need to advertise configuration options
and mechanisms • Automatic component discovery and
integration
© 2010 D.A. Menasce. All Rights Reserved.
10
Self-Optimizing Systems • Complex middleware and database systems have a
very large number of configurable parameters.
Web Server (IIS 5.0) Application Server
(Tomcat 4.1) Database Server
(SQL Server 7.0) HTTP KeepAlive acceptCount Cursor Threshold Application Protection Level minProcessors Fill Factor Connection Timeout maxProcessors Locks Number of Connections Max Worker Threads Logging Location Min Memory Per Query Resource Indexing Network Packet Size Performance Tuning Level Priority Boost Application Optimization Recovery Interval MemCacheSize Set Working Set Size MaxCachedFileSize Max Server Memory ListenBacklog Min Server Memory MaxPoolThreads User Connections worker.ajp13.cachesize
© 2010 D.A. Menasce. All Rights Reserved.
11
Self-optimizing systems: motivation
.
.
.
Arriving requests
Q: How does the response time vary with the number of software threads?
cpu
disk
queue for threads
Software threads
© 2010 D.A. Menasce. All Rights Reserved.
12
0.0
0.5
1.0
1.5
2.0
2.5
0 5 10 15 20 25 30
Number of Threads
Resp
on
se T
ime (
sec)
L=1 L=2 L=3 L=3.5 SLA
Q: Why does the response time has this shape?
Obs: For each workload level there is an optimal number of threads.
© 2010 D.A. Menasce. All Rights Reserved.
13
Workload and QoS metrics
Computer system
Workload: • transactions • HTTP requests • video downloads • calls to Call Center
QoS metrics: • response time • throughput • availability • page download time • revenue throughput • abandonment rate
© 2010 D.A. Menasce. All Rights Reserved.
14
Self-optimizing System
Computer system
Workload: • transactions • HTTP requests • video downloads • calls to Call Center
QoS metrics: • response time • throughput • availability • page download time • revenue throughput • abandonment rate
Controller
Service Level Objectives © 2010 D.A. Menasce. All Rights Reserved.
15
Self-optimizing System
Computer system
Workload: • transactions • HTTP requests • video downloads • calls to Call Center
QoS metrics: • response time • throughput • availability • page download time • revenue throughput • abandonment rate
Controller
Service Level Objectives
Monitoring Monitoring
© 2010 D.A. Menasce. All Rights Reserved.
16
Self-optimizing System
Computer system
Workload: • transactions • HTTP requests • video downloads • calls to Call Center
QoS metrics: • response time • throughput • availability • page download time • revenue throughput • abandonment rate
Controller
Service Level Objectives
Parameter Change
© 2010 D.A. Menasce. All Rights Reserved.
17
Engineering with QoS in Mind • Poor QoS may lead to loss of human life (e.g.,
homeland security and disaster relief support systems) and financial losses due to customer dissatisfaction.
© 2010 D.A. Menasce. All Rights Reserved.
18
Engineering with QoS in Mind • Poor QoS may lead to loss of human life (e.g.,
homeland security and disaster relief support systems) and financial losses due to customer dissatisfaction.
• QoS has to be an integral part of the design of any computer system.
© 2010 D.A. Menasce. All Rights Reserved.
19
Engineering with QoS in Mind • Poor QoS may lead to loss of human life (e.g.,
homeland security and disaster relief support systems) and financial losses due to customer dissatisfaction.
• QoS has to be an integral part of the design of any computer system.
• Online computer systems (e.g., Web sites, e-commerce sites, call centers) have workloads that can be widely varying.
© 2010 D.A. Menasce. All Rights Reserved.
20
Engineering with QoS in Mind • Poor QoS may lead to loss of human life (e.g.,
homeland security and disaster relief support systems) and financial losses due to customer dissatisfaction.
• QoS has to be an integral part of the design of any computer system.
• Online computer systems (e.g., Web sites, e-commerce sites, call centers) have workloads that can be widely varying.
• QoS autonomic control should be part of the design of complex online computer systems.
© 2010 D.A. Menasce. All Rights Reserved.
21
Results of a Self-Optimizing E-commerce Site
Arrival rate
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35
Time (Controller Intervals)
Arr
ival
Rat
e (r
eque
sts/
sec)
© 2010 D.A. Menasce. All Rights Reserved.
"Preserving QoS of E-commerce Sites Through Self-Tuning: A Performance Model Approach," Menasce, Dodge and Barbara, Proc. 2001 ACM Conference on E-commerce, Tampa, FL, October 14-17, 2001.
22
Results of QoS Controller
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
0.8
14.4
14.0
14.4
41.3
38.1
36.6
62.2
61.5
63.5
77.3
80.4
78.5
83.8
85.7
85.9
88.2
88.3
89.2
90.0
90.3
89.5
85.7
81.4
83.7
79.0
75.9
73.5
66.2
64.1
64.4
Arrival Rate (req/sec)
QoS
Controlled QoS Uncontrolled QoS
© 2010 D.A. Menasce. All Rights Reserved.
23
Experiment Results
Arrival rate
0
10
20
30
40
50
60
70
80
90
100
0 5 10 15 20 25 30 35
Time (Controller Intervals)
Arr
ival
Rat
e (r
eque
sts/
sec)
QoS is not met!
© 2010 D.A. Menasce. All Rights Reserved.
24
0
5
10
15
20
25
30
35
40
45
1 1.5 2 2.5 3 3.5 4 4.5 5Avg. Arrival Rate (requests/sec)
Avg
. No.
Idle
Pro
cess
es
20 Processes 40 Processes 30 Processes
Dynamic Variation of Process Pool Size in Web Servers
© 2010 D.A. Menasce. All Rights Reserved.
25
Dynamic Variation of Pool Size
ScenarioNumber of processes
Arrival Rate
(req/sec)
Avg. No. Idle
Processes1 20 4.5 10.32 20 4.8 5.33 30 4.8 11.34 30 4.9 6.65 40 4.9 11.4
© 2010 D.A. Menasce. All Rights Reserved.
26
Self-Healing Systems • Automatic mechanisms to:
– Identify failures – Identify their root causes – Determine how to repair the system – Take into account complex dependencies between hardware
and software failures • Tools include:
– Logs from various types of monitors – Data aggregation mechanisms – Statistical techniques to determine correlation between
events and diagnose faults
© 2010 D.A. Menasce. All Rights Reserved.
27
Self-Protecting Systems • Automatic mechanisms to:
– Defend the system against malicious security attacks
– Prevent security compromises to occur due to component failures
– Predict the onset of security attacks by analyzing events recorded in various types of logs and analyzing correlations that may lead to attacks.
© 2010 D.A. Menasce. All Rights Reserved.
Utility Functions
28
1
0 Response Time (R)
1
0 Throughput (X)
1
0 Security Level (S)
1
0 Availability (A)
1 0.95
0 0
Low
U (R)
U (S)
U (X)
U (A)
© 2010 D.A. Menasce. All Rights Reserved.
Medium High
€
Ug (R,X,A,S) = f (U(R),U(X),U(A),U(S))
29
Architectural Considerations
Plan
Execute
Managed Element
Knowledge Monitor Execute
Autonomic Manager
Autonomic Element
Plan Analyze
M A P E-K cycle
© 2010 D.A. Menasce. All Rights Reserved.
30
Architectural Considerations • Manage internal behavior and relationships
with other autonomic elements guided by human-specified policies
• Distributed service-oriented architecture is useful to support AC – SOA – Web Services – Grid Computing
• Autonomic Elements (AEs) may need resources from other autonomic elements – Need for robust and secure negotiation protocols for obtaining and
releasing resources from other AEs – Need to monitor consumers for not overusing the AE’s resources – Behavior of one AE may depend on behavior of other loosely-
coupled AEs – Need formal languages (machine and human-readable) to express
service contracts and SLAs with other AEs © 2010 D.A. Menasce. All Rights Reserved.
31
Engineering Challenges • How to program AEs?
– Need tools to acquire and represent policies – Need tools to translate from high-level to low-level goals
• How to test AEs? – Repeatability issues – Interconnected AEs – AE interconnection and binding determined at run-time – Difficult to put together controlled test environments
• Installation and configuration issues – Need directory services and brokers that locate AEs based on
policies and SLAs • Monitoring and problem determination
– Continuously monitor suppliers to ensure compliance with SLAs and policies
– Monitoring of consumers – Need statistical and aggregation techniques to be able to cope with
huge amounts of data
© 2010 D.A. Menasce. All Rights Reserved.
32
Engineering Challenges • Relationship with other AEs
– Interoperability can be achived through ontologies – Need to assess reliability and trustworthiness of other AEs – Negotiation with other AEs:
• Demand-for-service • FCFS • Posted-price • Bi-lateral or multi-lateral negotiations over multiple attributes • Third-party arbiter running auctions
• Provisioning of resources after agreements are achieved • Monitoring to check for compliance • Security and privacy issues • Robustness with respect to erroneous/not feasible policies. • Mapping from high-level objectives to low-level ones.
© 2010 D.A. Menasce. All Rights Reserved.