achieving self-management capabilities in autonomic systems...

Achieving Self-Management Capabilitiesin Autonomic Systems using Case-Based

Reasoning

PhD Thesis

Malik Jahan Khan

2004-03-0056

Advisors

Dr. Mian Muhammad Awais

Dr. Shafay Shamail

Department of Computer Science

School of Science and Engineering

Lahore University of Management Sciences

Dedicated to my beloved parents

CERTIFICATE

We hereby recommend that the thesis titled “Achieving Self-Management Capabil-

ities in Autonomic Systems using Case-Based Reasoning” by Malik Jahan Khan be

accepted in partial fulfillment of the requirements for the degree of Doctor of Phi-

losophy in Computer Science.

————————————

–

Dr. Mian Muhammad Awais

————————————

–

Dr. Shafay Shamail

————————————

–

Dr. Shahid Masud

————————————

–

Dr. Asim Karim

————————————

–

Dr. Omer F. Rana

Acknowledgements

I am highly grateful to Allah Almighty Who enabled me to achieve this milestone

of my life. He is the One I always seek guidance and help from whenever I am in

a difficult situation. Allah has bestowed countless blessings upon me. Whatever I

have achieved in my life is just due to His special favors upon me, otherwise I would

never have been able to do anything.

I am very thankful and would like to acknowledge the continuous support and

valuable guidance provided by my advisors Dr. Mian Muhammad Awais and Dr.

Shafay Shamail throughout the course of this research. Their sincere mentoring led

me to achieve this goal. They have always been great source of technical help and

motivation. I would also like to extend my sincere thanks to the faculty members

of Computer Science Department at LUMS who have been very supportive in the

completion of my PhD work, especially Dr. Naveed Arshad, Dr. Shahab Baqai, Dr.

Ashraf Iqbal, Dr. Sohaib Khan, Dr. Shahid Masud and Dr. Asim Karim. I am

highly grateful to my friends in LUMS whose valuable feedback on my research work

has supported me to complete my PhD work, especially Zeeshan Ali Rana, Umar

Suleman, Junaid Akhtar, Saqib Ilyas, Khawaja Fahd, Mubashar Baig, Ijaz Akhtar,

Hafsa Zafar and Maria Zubair. LUMS in general and CS Department in particular

have provided me a great learning opportunity. I have enjoyed and learnt a lot here

and it has been my spiritual home. Thanks LUMS!

This research was funded by Higher Education Commission of Pakistan and La-

hore University of Management Sciences (LUMS). Their support is gratefully ac-

knowledged.

At last but not the least, let me try to thank my entire family whose continuous

unconditional determination and support for my studies can never be paid back. I

pay gratitude and love to my dear parents, brother, sisters, wife and my little angel

Fatima Malik. They have been bearing my absence and missing me for a long time

during the course of my PhD.

Malik Jahan Khan

Abstract

Autonomic systems promise to inject self-managing capabilities in software systems.

The major objectives of autonomic computing are to minimize human intervention

and to enable a seamless self-adaptive behavior in software systems. To achieve

self-managing behavior, various methods have been exploited in the past. Case-

Based Reasoning (CBR) is a problem solving paradigm of artificial intelligence which

exploits past experience, stored in the form of problem-solution pairs. Although

CBR has been applied in the externalization architecture of self-healing systems at a

limited scale, however it has not been fully exploited in autonomic systems in general.

We have proposed and applied CBR to achieve autonomicity in software systems.

The proposed approach has been described and evaluated on CBR implementation

for externalization and internalization architectures of autonomic systems. The study

highlights the effect of ten different similarity measures, the role of adaptation and

the effect of changing nearest neighborhood cardinality for a CBR solution cycle in

autonomic managers. The results show that the proposed CBR based autonomic

systems exhibit 90 to 98% accuracy in diagnosing the problem and planning the

solution.

The learning process improves as more experience is added to the case-base. This

results in a larger case-base. A larger case-base reduces the efficiency in terms of

computational cost. To overcome this efficiency problem, this research work suggests

to cluster the case-base, classify the reported problem in the appropriate cluster and

devise the solution. This approach reduces the search complexity by confining a

new case to a relevant cluster in the case-base. Clustering the case-base is a one-

time process and does not need to be repeated regularly. The proposed approach

has been outlined in the form of a new clustered CBR framework. The comparison

of performance of the conventional CBR approach and clustered CBR approach

has been presented in terms of their Accuracy, Recall and Precision (ARP) and

computational efficiency. The proposed approach exhibits up to 90% accuracy. It

indicates that the performance does not degrade using clustered CBR approach in

terms of accuracy and at the same time, it improves the time complexity of the

retrieval process.

As the case-base grows in size, it is partitioned into different clusters in order

to improve the retrieval efficiency. Deciding an appropriate number of clusters for a

case-base is not a trivial problem. This research work proposes an approach to cluster

the case-base into a random number of clusters. Two versions of the randomized

approach have been presented. One of them guarantees success but its computational

cost is a function of random variable. Other approach guarantees a deterministic

computational cost but the success is not guaranteed. In order to ensure the retrieval

time, a binary search based retrieval strategy has also been proposed. Randomized

approach guarantees the same level of accuracy as in case of the clustered CBR

approach and simplifies the clustering process by reducing its time complexity.

The proposed approaches have been implemented on Rice University Bidding Sys-

tem (RUBiS) and a simulation study of Autonomic Forest Fire Application (AFFA).

Their theoretical and empirical results have been compared. The statistical analysis

shows that the empirical and theoretical results are significantly similar.

6

Contents

List of Figures vii

List of Tables ix

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3 Solution Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.4.1 CBR Based Analysis and Planning Algorithms for Autonomic

Managers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4.2 Efficiency Improvement through Clustered CBR Approach . . 11

1.4.3 Improving Efficiency of Clustered CBR Approach through Ran-

domized Algorithms and Euclidean Norm Based Efficient Clas-

sifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.5 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Background 14

2.1 Autonomic Computing . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Self-Management Properties . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.1 Self-Configuration . . . . . . . . . . . . . . . . . . . . . . . . . 15

i

2.2.2 Self-Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.3 Self-Protection . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.4 Self-Healing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.5 Self-Awareness . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Components of an Autonomic System . . . . . . . . . . . . . . . . . . 17

2.3.1 Autonomic Manager . . . . . . . . . . . . . . . . . . . . . . . 17

2.3.2 Managed Element . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.3.3 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3.4 Actuators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3.5 Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 Architectures of Autonomic Systems . . . . . . . . . . . . . . . . . . 20

2.4.1 Externalization . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4.2 Internalization . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5 Case-Based Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5.1 Retrieve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.5.2 Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5.3 Revise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.5.4 Retain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.5.5 Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.6 Applications of CBR . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.6.1 Document Retrieval using CBR . . . . . . . . . . . . . . . . . 29

2.6.2 CBR for Product Selection Tasks in E-Commerce . . . . . . . 29

2.6.3 CBR for Help-Desk Applications . . . . . . . . . . . . . . . . 29

2.6.4 CBR for Medical Diagnosis . . . . . . . . . . . . . . . . . . . 30

2.6.5 CBR for Planning and Design . . . . . . . . . . . . . . . . . . 30

2.6.6 CBR for Configuration . . . . . . . . . . . . . . . . . . . . . . 31

ii

2.6.7 CBR for Software Quality Prediction . . . . . . . . . . . . . . 31

2.6.8 CBR for Software Reuse . . . . . . . . . . . . . . . . . . . . . 32

2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3 Literature Review 33

3.1 Literature Survey of Autonomic Computing . . . . . . . . . . . . . . 33

3.1.1 Self-Management Approaches for Externalization . . . . . . . 33

3.1.2 Self-Management Approaches for Internalization . . . . . . . . 35

3.2 CBR for Autonomic Computing . . . . . . . . . . . . . . . . . . . . . 37

3.3 Applications of Clustering in CBR . . . . . . . . . . . . . . . . . . . . 38

3.4 Limitations of the Existing Self-Management Approaches . . . . . . . 39

3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4 Achieving Self-Management Capability using Case-Based Reason-

ing 41

4.1 Self-Management in Autonomic Systems: A CBR Based Generic Ap-

proach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.1.1 CBR-Based Externalization . . . . . . . . . . . . . . . . . . . 43

4.1.2 CBR-Based Internalization . . . . . . . . . . . . . . . . . . . . 44

4.1.3 CaseBasedAutonome(CBA): CBR-Based Autonomic Solu-

tion Finder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.2.1 Nature of the Case-base . . . . . . . . . . . . . . . . . . . . . 48

4.2.2 Similarity Functions . . . . . . . . . . . . . . . . . . . . . . . 49

4.2.3 Cardinality of Nearest Neighborhood . . . . . . . . . . . . . . 50

4.2.4 Solution and Adaptation Algorithms . . . . . . . . . . . . . . 50

4.2.5 Selection of Model . . . . . . . . . . . . . . . . . . . . . . . . 50

iii

4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5 Improving the Retrieval Efficiency using Clustered CBR 53

5.1 Proposed Clustered CBR Approach . . . . . . . . . . . . . . . . . . . 56

5.1.1 Construction of Clustered Case-base . . . . . . . . . . . . . . 56

5.1.2 Devising the Solution of the New Problem . . . . . . . . . . . 58

5.1.3 Computational Complexity . . . . . . . . . . . . . . . . . . . . 59

5.1.4 Limitations of Clustered CBR Approach . . . . . . . . . . . . 61

5.2 Randomized Approach to Estimate Number of Clusters . . . . . . . . 61

5.2.1 Upper Bound on the Cluster Size . . . . . . . . . . . . . . . . 63

5.2.2 New Classification Process Based on Binary Search . . . . . . 63

5.2.3 Las Vegas Randomized Algorithm for Searching the Optimal

Value of k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.2.4 Monte Carlo Randomized Algorithm for Searching the Optimal

Value of k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6 Results 70

6.1 Applying CBR Based Self-Management Algorithms on Externaliza-

tion Based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.1.1 Case Study: RUBiS . . . . . . . . . . . . . . . . . . . . . . . . 70

6.1.2 Problem Injector . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.1.3 Autonomic Manager . . . . . . . . . . . . . . . . . . . . . . . 72

6.1.4 Results and Discussion for RUBiS Testbed . . . . . . . . . . . 76

6.2 Applying CBR Based Self-Management Algorithms on Internalization

Based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.2.1 Case Study: Autonomic Forest Fire Application . . . . . . . . 80

iv

6.2.2 Computational Component in AFFA . . . . . . . . . . . . . . 81

6.2.3 CBA Implementation for Element Manager of AFFA . . . . . 82

6.2.4 Results and Discussion for AFFA . . . . . . . . . . . . . . . . 84

6.3 Comparison of CBR with Other Machine Learning Approaches . . . . 88

6.4 Applying Clustered CBR Approach for Self- Management Capabilities 89

6.4.1 Clustering the Case-Base . . . . . . . . . . . . . . . . . . . . . 89

6.4.2 Implementing the Clustered-CBR Cycle . . . . . . . . . . . . 90

6.4.3 Performance Analysis . . . . . . . . . . . . . . . . . . . . . . . 91

6.5 Performance of The Clustered CBR Approach on CTG Case Study . 98

6.5.1 Case Study: Cardiotocography (CTG) . . . . . . . . . . . . . 98

6.5.2 Attributes of CTG Dataset . . . . . . . . . . . . . . . . . . . . 99

6.5.3 Recommended Model for CTG Dataset . . . . . . . . . . . . . 99

6.6 Applying Randomized Approach to Improve The Efficiency of Clus-

tered CBR Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.6.1 Estimating Prior Knowledge . . . . . . . . . . . . . . . . . . . 102

6.6.2 Theoretical Estimate of Expected Number of Iterations . . . . 105

6.6.3 Empirical Estimate of Number of Iterations . . . . . . . . . . 105

6.6.4 Validation of Results using The t-test . . . . . . . . . . . . . . 106

6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

7 Conclusion and Future Research Directions 107

7.1 Concluding Remarks on Current Research . . . . . . . . . . . . . . . 107

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

7.2.1 Service Oriented Architectures for Autonomic Systems . . . . 109

7.2.2 Solution Composition for Autonomic Systems . . . . . . . . . 110

7.2.3 Case-base Maintenance Strategies . . . . . . . . . . . . . . . . 111

v

7.2.4 Identification and Exploitation of Penalty Based Schemes for

Case-base Partitioning Problem . . . . . . . . . . . . . . . . . 111

7.2.5 Testing the CBR based Approach on Large Scale Testbeds . . 111

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

vi

List of Figures

1.1 Outline of Research Problems and Proposed Solutions . . . . . . . . . 11

2.1 Externalization Architecture of Autonomic Systems . . . . . . . . . . 21

2.2 Internalization Architecture of Autonomic Systems [62] . . . . . . . . 22

2.3 The CBR Cycle [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.4 Possible Variation in Cardinality Selection of Nearest Neighborhood

of a New Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1 Proposed Architecture for Externalization using CBR . . . . . . . . . 42

4.2 Algorithm 1 - Self-Management Algorithm for Externalization using

CBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.3 Proposed Architecture for Internalization using CBR . . . . . . . . . 46

4.4 Algorithm 2 - Self-Management Algorithm for Internalization using

CBR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.5 Algorithm 3 - CaseBasedAutonome (CBA) . . . . . . . . . . . . . . . 48

4.6 Algorithm 4 - Rollback Strategy . . . . . . . . . . . . . . . . . . . . . 49

4.7 Algorithm 5 - Check for Undo Need . . . . . . . . . . . . . . . . . . . 49

5.1 Confinement of Solution Space: Conventional CBR Approach vs Clus-

tered CBR Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.2 Proposed Clustered CBR Approach for Autonomic Managers . . . . . 56

5.3 Algorithm to Construct the Clustered Case-base . . . . . . . . . . . . 57

vii

5.4 Algorithm to Devise Solution of the New Problem in Clustered CBR

Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5.5 Sample Acceptable Accuracy Region of Clustered Approach . . . . . 62

5.6 A New Efficient Classifier: Binary Search Classifier (BSC) . . . . . . 65

5.7 Randomized Algorithm to Cluster the Case-base into k Clusters: Las

Vegas Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.8 Randomized Algorithm to Cluster the Case-base into k Clusters: Monte

Carlos Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

6.1 RUBiS Architecture with CBA . . . . . . . . . . . . . . . . . . . . . 71

6.2 Majority Voting Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 75

6.3 Solution Adaptation Algorithm . . . . . . . . . . . . . . . . . . . . . 75

6.4 Effect of Nn on Maximum Accuracy . . . . . . . . . . . . . . . . . . . 76

6.5 AFFA Architecture with CBA . . . . . . . . . . . . . . . . . . . . . . 81

6.6 Effect of Nn on Maximum Accuracy in AFFA Testbed . . . . . . . . 87

6.7 Accuracy of Clustered Approach vs Unclustered Approach for n = 500 95



6.10 Recall of Clustered Approach vs Unclustered Approach for n = 500 . 96

6.11 Recall of Clustered Approach vs Unclustered Approach for n = 2000 . 97

6.12 Recall of Clustered Approach vs Unclustered Approach for n = 10000 97

6.13 Precision of Clustered Approach vs Unclustered Approach for n = 500 98



6.16 Accuracy of Clustered vs Unclustered Approach on CTG Dataset . . 101

6.17 Implementation to Compute Actual Number of Iterations . . . . . . . 103

viii

List of Tables

4.1 Similarity Functions Used . . . . . . . . . . . . . . . . . . . . . . . . 51

6.1 Sample Bootstrap Case-base for RUBiS . . . . . . . . . . . . . . . . . 74

6.2 Accuracy of Selected Similarity Functions for RUBiS . . . . . . . . . 76

6.3 RMSE of Selected Similarity Functions for RUBiS . . . . . . . . . . . 77

6.4 AAE of Selected Similarity Functions for RUBiS . . . . . . . . . . . . 77

6.5 Accuracy of Adaptation Algorithm for RUBiS . . . . . . . . . . . . . 79

6.6 RMSE of Adaptation Algorithm for RUBiS . . . . . . . . . . . . . . . 79

6.7 AAE of Adaptation Algorithm for RUBiS . . . . . . . . . . . . . . . 79

6.8 Accuracy of Selected Similarity Functions for AFFA . . . . . . . . . . 85

6.9 RMSE of Selected Similarity Functions for AFFA . . . . . . . . . . . 85

6.10 AAE of Selected Similarity Functions for AFFA . . . . . . . . . . . . 86

6.11 Performance Comparison of CBR with Other Machine Learning Ap-

proaches on RUIBiS . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.12 Performance Comparison of CBR with Other Machine Learning Ap-

proaches on AFFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.13 Format of a Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . 92

6.14 Sample Confusion Matrix for CS1 . . . . . . . . . . . . . . . . . . . . 92




ix

6.18 Sample ARP Calculations . . . . . . . . . . . . . . . . . . . . . . . . 94

6.19 Percentage Improvement in Retrieval Efficiency for Optimal Values of k 99

6.20 Description of the Attributes of CTG Dataset [25] . . . . . . . . . . . 100

6.21 Expected Number of Iterations for Random k using Case-base of Size

n = 500 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102


n = 2000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102


n = 10000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

6.24 Comparison of Expected and Observed Number of Iterations using

Case-base of Size n = 500 . . . . . . . . . . . . . . . . . . . . . . . . 104


Case-base of Size n = 2000 . . . . . . . . . . . . . . . . . . . . . . . . 104


Case-base of Size n = 10000 . . . . . . . . . . . . . . . . . . . . . . . 104

x

Chapter 1

Introduction

With the passage of time, the reliance and dependence on computers has increased

tremendously due to automation and computerization of manual processes. This

has led to increased accuracies and process speed, but has added an additional layer

of complexity. The software systems have grown in size and functionality enabling

us to solve large scale complex problems. However, when something unexpected

occurs in such systems, it becomes difficult to resolve the unexpected abnormal

behavior. In large and complex systems, diagnosis of a problem and then solving

it manually is not a trivial job. It requires significant time and effort to diagnose

the problem in the underlying system and then find out the solution. We need

immediate solutions to minimize the time of abnormal behavior of the systems and

role of human administrators.

Inspired from the human nervous system, in which most of the activities inside

human body are performed without explicit permission of human itself, autonomic

computing [9], [28], [34], [43], [72] is a concept which recommends to inject some

self-managing capabilities inside the large systems so that human intervention for

managing large scale systems can be minimized. The additional layer of complexity

thus ensures a lot of problem solving responsibilities to be performed automatically

and autonomously. Automatic process ensures that there is no human intervention.

1

Autonomous process ensures that it performs its job seamlessly. An application unit

called autonomic manager is responsible for injecting the self-managing behavior.

There are two architectural options for adding autonomic manager in a computing

system: externalization and internalization. In externalization architecture, auto-

nomic manager resides outside the managed component. In internalization archi-

tecture, autonomic manager is built in within the managed component. There are

eight properties of autonomic systems which are collectively known as self-* proper-

ties: self-configuration, self-optimization, self-protection, self-healing, self-awareness,

context-awareness, open and anticipatory [34], [43]. Self-configuration capability

enables seamless adaptation of a new component or a new execution environment

without significant human intervention. Self-configuring systems also need to con-

figure themselves according to user-defined high level goals. A user only specifies

what is needed and not how it should be achieved. Autonomic manager prepares

a configuration plan to seamlessly adapt changes in environment according to user-

defined specifications [8], [40], [96]. A self-optimizing system is supposed to con-

tinuously seek for the possibilities of optimization in the computing environment.

Self-optimization capability optimizes various parameters at run time according to

environmental conditions [21], [65]. It enables monitoring, experimenting and tuning

various parameters to make best use of the existing resources [43]. Systems hav-

ing the self-protection capability are expected to defend against various malicious

attacks and risks causing large-scale system-wide problems or localized problems

within the components [16], [101]. Self-healing property detects various system fail-

ures and recovers from failures to enable minimum downtime of the system [6], [67],

[73], [75]. A self-healing system is supposed to remain conscious of the failures or

abnormal behaviors of services and should diagnose and plan recovery action [43].

Autonomic systems should be self-aware of their state, contextually aware of envi-

2

ronmental changes, open for portability and common standards, and anticipate their

needs, context and behavior [34], [72]. Self-configuration in autonomic systems is a

continuous process and the system has to keep on configuring itself with the passage

of time and varying environmental conditions.

In literature, various intelligent techniques like rule-based systems [15], [40], ar-

tificial intelligence planning [8], hot swapping [7], PeerPressure [88], process query

systems based on Hidden Markov Model (HMM) and state machines [77], active

probing [76], control theoretic approach [27], [31] etc. have been used to enable

various self-management capabilities in autonomic systems. These techniques get

benefit from the past experience to some extent but recent experiences are not con-

sidered while devising the solution. Most of these techniques are eager learning

approaches. They learn from the training data and calibrate the decision making

model. When a new example is provided to these approaches, they cannot incorpo-

rate the new knowledge on run time. In order to incorporate the new knowledge,

these techniques need to re-build their decision models. So, experience base is not

continuously updated and utilized. These approaches limit the learning growth with

passage of time and addition of new experiences. They do not learn from the recent

failures or successes. Conversational case-based reasoning [35] has been applied for

diagnosis of system failures and recovery which exploits the knowledge gained from

recent successes and failures but has been applied at a very limited level in the con-

text of autonomic systems. Rule-based systems require rich well-formalized domain

knowledge to construct the rule-base. Planning-based techniques require complete

state space before searching the goal state. Hot swapping has been applied for self-

configuration and adaptation of various code snippets dynamically [7]. However,

its application scope has been limited. Control theory is a model-based approach

which uses mathematical models that are developed based on certain assumptions,

3

thus reducing their applicability. Other probabilistic approaches like PeerPressure,

HMM, state machines, active probing etc. also rely on the availability of rich prior

knowledge for learning the models. These models need to be provided the prior

probabilities of the different events within system. Estimating the prior probabilities

without having rich knowledge base becomes difficult.

Case-Based Reasoning (CBR) [3], [10],[18], [64], [93] is a lazy learning paradigm

which makes more intuition to be used in the domain of autonomic computing be-

cause of the re-occurring nature of the autonomic computing problems in particular

and computing systems in general. Human administrators of the computing systems

analyze, diagnose and solve the problems using their experience. Experienced hu-

man administrators are more effective and efficient in solving the problems inside

computing systems than the unexperienced administrators. The reason is behind

this capability is their knowledge of the domain with the passage of time. CBR is a

reasoning methodology which uses past experience to find solution of a new problem.

More similar cases contribute more towards the solution of the new problem. Set of

nearest neighbors of the current case is retrieved based on the similarity values. An

appropriate solution algorithm is selected to aggregate solutions of nearest neigh-

bors and devise the final solution of the problem. With the intervention of human

administrator, this solution may need to be revised within certain limits, and then

the new solution is retained in the case-base for future use. So, case-base is continu-

ously updated with the arrival of each new problem, and recently proposed solution

is made immediately available for future use. This capability of CBR distinguishes

it from other learning techniques of artificial intelligence and thus makes it a good

candidate in the domain of autonomic computing.

We selected Rice University Bidding System (RUBiS) [1] for our experiments

based on externalization architecture. RUBiS is used as a benchmark for perfor-

4

mance and optimization of distributed systems. It is a Java based three tier ap-

plication which provides basic functionality of an auction site like eBay. It use

MySQL as the database management system at back-end. Inherently, RUBiS is

a non-autonomic system. We implemented an external autonomic manager based

on CaseBasedAutonome (CBA) for self-configuration and self-healing capabilities of

RUBiS.

The second case study selected for our experiments based on internalization archi-

tecture is Autonomic Forest Fire Application (AFFA) [61], [62]. This is a simulation

of forest fire and originally has autonomic capabilities of self-optimization using a

rule agent. The limitations of rule-based system include its dependence on complete

prior domain knowledge and its rigidness in terms of adaptability and revision of

a proposed solution. We have replaced the rule-agent with CBA and studied its

behavior for self-protection and self-configuration.

This research work focuses on the novelty of applying CBR to two alternate

architectures of autonomic systems: externalization and internalization. Along with

many advantages, conventional CBR also experiences efficiency bottleneck due to

continuous updating of the case-base. As new cases are retained after revision, the

size of the case-base continuously increases resulting in a large number of comparisons

required before a solution is presented [47], [51]. As self-configuration is a continuous

process, case-base keeps on increasing in size and incurs high computational cost in

finding similarity of current case with each existing case. In real time application,

it harms the efficiency constraints because of incremental growth in the number

of comparisons needed. This research work proposes to cluster the case-base into a

reasonable number of clusters using an appropriate clustering algorithm. Once a new

problem comes in, it is classified among one of these clusters, and the cases within the

relevant cluster are used to devise the solution of the new problem. Now, solution

5

devising step is limited to a single cluster instead of the whole case-base, which

will significantly reduce the number of comparisons needed to find similar cases.

Two algorithms are discussed in this regard. The case-base partitioning algorithm

presents the method for clustering the case-base and the solution devising algorithm

outlines the clustered CBR-based solution finding methodology using classification.

As part of the case-base partitioning algorithm, implementation and comparison of

three different clustering algorithms is presented, i.e. k-Means, FarthestFirst and

density-based clustering algorithms. A performance comparison of the conventional

CBR approach with the clustered CBR based approach is presented in terms of

Accuracy, Recall and Precision (ARP) and computational cost. The performance of

the proposed approach has been tested on a simulation of AFFA.

The clustered CBR approach works in two phases. First phase is computation-

ally expensive which searches for the optimal value of number of clusters. The search

space is limited between 2 and n/2, where n is the size of the case-base. The lower

bound is kept 2 because having k = 1 implies the unclustered approach. Having

k > n/2 implies that we are going to have clusters of size 1 which limits the cardi-

nality of nearest neighborhood to be 1. For each value k in this range, case-base is

clustered into k clusters and a test case-base is used to estimate performance of the

clustered approach for that value of k. During this exhaustive search, an optimal k

is found which yields the highest accuracy. Subsequently, case-base is clustered into

the optimal number of clusters. In second phase, when a new case is encountered,

its relevant cluster is predicted using the conventional classification process. The

predicted cluster acts the search space for the new case. The new case is compared

only with the members of the predicted cluster. Hence, k-nearest neighborhood of

the new case is confined within the predicted cluster.

The first phase of the clustered CBR approach involves learning the optimal num-

6

ber of clusters which demands n2− 1 runs of re-clustering the case-base and applying

it on the test cases to compute the performance. For large values of n, it takes

O(n) iterations which makes it computationally expensive solution. This research

work aims to reduce this computational time without significantly compromising the

performance of clustered CBR approach. This work proposes to choose a random k.

The randomized approach to cluster the case-base is proposed in the form of Las Ve-

gas and Monte Carlo variations of the randomized algorithms. The random process

may be repeated few times in order to ensure an acceptable performance. However,

number of iterations (t) needed in this process is a very small value as compared to

O(n). The relatively best value of k among t iterations gives a good approximation

of the optimal number of clusters.

The proposed approach has been implemented on a simulated case-base of AFFA.

A comparison between the theoretical and empirical results have been presented.

1.1 Motivation

With the evolution of information technology, the software systems, networks and as-

sociated hardware have become complex in structure and large in size. The increased

complexity increases the chances of errors in managing the underlying systems and

the cost of systems management. The skilled human resources are costly and can-

not deal with all kinds of software management problems like configuration, healing,

optimization and protection problems. It has been studied in [78] that 20% to 50%

of the total resources are consumed in diagnosing the problems in software systems.

Almost 50% of the total budget is utilized in the efforts to avoid the software fail-

ures and recover from the software failure. These costs are alarmingly high and

autonomic computing claims the remedy to significantly reduce these high costs by

enabling automatic, adaptive and self-aware behavior in the software systems.

7

There are many possible alternate approaches to enable autonomic behavior in

software systems. CBR is a good candidate technique to be adapted for autonomic

systems due to various reasons. It works well where well-structured knowledge is

not available before hand because it offers a flexible case structure. Its proposed

solution is not rigid and can be revised using adaptation algorithms or human in-

tervention. Revised solution is then finally adapted as the solution of the current

problem. Current problem-solution pair may be retained in the case-base to add

new knowledge for future use. Past experience is maintained in a repository called

case-base. When a new problem is encountered, its description is compared with the

cases in case-base and similarity between the current case and all the cases in the

case-base is calculated using certain similarity measures.

Due to added problem solving capability, CBR promises many potential benefits

as compared to other conventional methods used for self-management in autonomic

systems. These include:

1. Flexibility in case representation due to non-structural approach as opposed to

AI planning, rule-based systems and control theory. Case can be represented

as a simple vector of parameters representing the problem and the solution or

as graph having nodes to represent the parameters of the domain and edges to

represent the dependency of parameters on each other.

2. Simple and natural self-managing behavior due to nature inspired solution

strategy.

3. Comparatively lesser input pre-processing overhead.

4. The capability of exploiting past and the most recent experiences to suggest a

solution of the current problem.

8

5. Flexible revision mechanism for updating and upgrading of a suggested solu-

tion.

6. Minimum off-line learning requirement as compared to most of the probabilistic

and statistical learning techniques. CBR does not demand to calibrate the de-

cision making model in advance. Its model keeps on evolving with the addition

of new experience.

1.2 Problem Statement

Autonomic systems must be able to exhibit at least three fundamental characteristics:

automatic, adaptive and aware. The problem diagnosis and solution planning phases

should be carried out without human intervention. The autonomic systems must be

capable to control the internal operations like updating configurations, taking some

system level actions with administrative rights etc. They must remain aware of

the external context on continuous basis. In order to enable the first characteristic,

achieving self-managing behavior of autonomic managers using continuous learning

from experiences is one of the research problems tackled in this thesis. Employing

an appropriate problem diagnosis and solution planning approach in an autonomic

manager is an open challenge. This thesis advocates CBR for this purpose.

The application of CBR for autonomic managers leads to the inherent open is-

sues of case-base maintenance. So, the second research problem of this thesis is CBR

focused. If case-base maintenance is overlooked then it causes retrieval efficiency

bottlenecks. CBR is an attractive learning approach which incorporates the recent

knowledge through its retention strategy. Consequently, it leads to increased com-

putational complexity of the retrieval time from the case-base due to the growing

size of the case-base. This problem has been handled using clustered CBR approach.

9

Brute-force learning approach has been employed for deciding the optimal number

of clusters to partition the case-base. Choosing the optimal number of clusters is

again a non-trivial problem and this thesis handles this problem using a randomized

approach. This research problem is generic in nature and is also a challenge in other

problem domains where CBR can potentially be applied.

1.3 Solution Methodology

We have investigated the efficiency and effectiveness of our proposed CBR based

approaches empirically and theoretically. The empirical study is based on a multi-

dimensional research methodology. These dimensions include similarity measure for

retrieval of nearest neighbors, cardinality of set of nearest neighbors, solution algo-

rithm, adaptation algorithm, two different case studies pertaining to externalization

and internalization respectively and three clustering schemes. The best model for

self-management algorithms is selected on the basis of the highest accuracy yielding

combination of these parameters. The comparison of different clustering schemes in

the proposed architecture is based on ARP analysis which involves accuracy, recall

and precision comparisons. The theoretical study involves the analysis of the pro-

posed algorithms in terms of their computational complexity and expected benefit.

An outline of the research problems and corresponding solutions presented in the

thesis has been visualized in Figure 1.1.

1.4 Research Contributions

Following are the main research contributions of this work:

10

Problem Diagnosis and Solution Planning in Autonomic Managers

Enabling Self- Learning in

Autonomic Managers using CBR

CBR Works Well but Inherent Issues of CBR

Cause Efficiency Bottleneck

Clustered CBR

Efficiency Bottlenecks in Extreme Cases of

Clustered CBR

Randomized Clustered CBR + Euclidean Norm Based Classifier

PROBLEMS SOLUTIONS

ICIC 2007

ICAS 2008

Simulation Modelling Practice and Theory

(Elsevier) [2011]

ICAC 2008

IEICE Transactions on Information and Systems [2010]

The Computer Journal, Oxford

University Press [2012]

Figure 1.1: Outline of Research Problems and Proposed Solutions

1.4.1 CBR Based Analysis and Planning Algorithms for Au-tonomic Managers

In software systems, similar problems re-occur and similar problems have similar

solutions. We propose to apply case-based reasoning to enable autonomic behavior

in software systems by exploiting the past experience of software systems them-

selves. In this research work, we describe the proposed algorithms and discuss the

implementation results on two implementation architectures of autonomic systems:

externalization and internalization using two different applications, RUBiS and a

simulated Autonomic Forest Fire Application [45], [46], [49].

1.4.2 Efficiency Improvement through Clustered CBR Ap-proach

The case-based reasoning approach exploits past experience that can be helpful in

achieving autonomic capabilities. The learning process improves as more experience

is added in the case-base in the form of cases. This results in a larger case-base

which reduces the efficiency in terms of computational cost. To overcome this effi-

11

ciency problem, this research work suggests to cluster the case-base, subsequent to

find the solution of the reported problem. This approach reduces the search com-

plexity by confining a new case to a relevant cluster in the case-base. Clustering the

case-base is a one-time process and does not need to be repeated regularly because

the clustering scheme finds the optimal number of partitions through brute-force

search. The proposed approach has been outlined in the form of a new clustered

CBR framework, which has been evaluated on a simulation of Autonomic Forest

Fire Application (AFFA). This work presents an outline of the simulated AFFA and

results on three different clustering algorithms for clustering the case-base in the

proposed framework. The comparison of performance of the conventional CBR ap-

proach and clustered CBR approach has been presented in terms of their Accuracy,

Recall and Precision (ARP) and computational efficiency [51], [47].

1.4.3 Improving Efficiency of Clustered CBR Approach throughRandomized Algorithms and Euclidean Norm BasedEfficient Classifier

Case-base is partitioned into clusters in order to improve the retrieval efficiency.

Deciding an appropriate number of clusters for a case-base is not a trivial problem.

This work proposes a randomized approach to cluster the case-base into a random

number of clusters. Two versions of the randomized approach have been presented.

One of them guarantees success but its computational cost is a function of random

variable representing the number of iterations needed to achieve the success. Other

guarantees a deterministic computational cost but the success is not guaranteed. In

order to ensure the retrieval time, a binary search based retrieval strategy has also

been proposed. The proposed approaches have been implemented on a simulation

study of Autonomic Forest Fire Application. Their theoretical and empirical results

have been compared [48].

12

1.5 Outline of Thesis

The thesis is organized as follows: Chapter 2 presents the background of autonomic

computing and CBR. Chapter 3 presents the literature review which includes the

related work to enable self-managing behavior in different kinds of systems, var-

ious applications of CBR in related domains and other domains and applications

of clustering to support CBR. At the end, a critical summary of the literature is

presented. Chapter 5 outlines the proposed CBR based framework for autonomic

systems. Chapter 5 presents the clustered CBR approach for autonomic computing

and its theoretical comparison with the conventional CBR approach. It discusses

the randomized algorithm to improve the performance of clustered CBR approach.

Chapter 6 presents and discusses the case studies and results of the proposed ap-

proaches. Finally, Chapter 7 concludes the contributions of the thesis followed by

highlighting the prominent future directions.

13

Chapter 2

Background

2.1 Autonomic Computing

With the passage of time, our reliance and dependence on computers have increased

tremendously. We have been more curious to make everything automated, speedy

and accurate, but we have not taken care of the additional layer of complexity we

add to the existing systems. With the increased functionality, software systems grew

in size as well as complexity. Today, we are able to solve a lot of problems using

those large complex systems, but within these systems, complexity itself is a huge

problem. Whenever something happens unexpectedly or accidentally, it makes our

lives miserable to solve that problem within no time. Such large systems are becom-

ing hard to install, configure, optimize, maintain, protect and recover from failures.

Sometimes, we succeed and sometimes we don’t. In such huge and complex systems,

diagnosing a problem and then solving it manually is not an easy job. Because of

the significant dependence on such systems, we cannot afford the delays in problem

solving. We need immediate solutions to minimize the downtime or time of abnormal

behavior of the systems.

Inspired from the human nervous system, in which most of the activities inside hu-

man body are performed without permission of human itself, autonomic computing

14

[9], [28], [34], [43], [89] is a concept which recommends to inject some self-managing

capabilities inside the large systems so that human intervention for managing those

systems can be minimized. Systems should be able to manage themselves given high

level goals and objectives by human administrators. Although we are adding another

layer of complexity, this layer ensures that a lot of problem solving responsibilities

will be performed automatically and autonomously. There are many fundamental

properties which are collectively known as self-* properties, and introduce the auto-

nomic behavior within a system. To implement or enable any of these properties,

autonomic manager continuously monitors the managed element, analyzes the moni-

tored data, plans to take some intelligent actions and then executes those actions onto

the managed element. These properties include self-configuration, self-optimization,

self-protection, self-healing and self-awareness [27], [39], [43], [65].

2.2 Self-Management Properties

Autonomic systems exhibit various self-management capabilities, each of which is

discussed below:

2.2.1 Self-Configuration

As in human body, when a new cell is generated or an existing cell dies, remaining

body adapts the change seamlessly. Similarly, in large systems, human defines poli-

cies at high level and system configures itself appropriately. If a new component is

to be added in the existing system, we need certain configurations to be done within

the existing system to enable seamless adaptation of the new component [39], [43].

Autonomic systems must be capable of dealing with varying environmental condi-

tions which may be unpredictable, and configure themselves appropriately. Any of

the system resources should be capable of self-configuring without disrupting the

15

running services [28].

2.2.2 Self-Optimization

In human body, muscles become stronger during exercise. Brain modifies its cir-

cuitry by learning new things. Similarly the objective of self-managed systems is to

continuously monitor, experiment and tune various parameters to make best use of

them [27], [28], [43], [65]. Objective of self-optimization is to enable hardware and

software resources to maximize their utilization and minimize the computational

cost to provide best possible services without significant human intervention. IBM

has already introduced some technologies exhibiting self-optimization like dynamic

workload management and dynamic server clustering [28]. These existing capabili-

ties need to be extended in a generic way across heterogenous systems so that various

resources like databases, networks, web servers, storage devices etc could be used in

an optimized fashion.

2.2.3 Self-Protection

Human body has some built in mechanism to protect itself from various diseases.

Similarly, autonomic systems promise to have some self security mechanisms against

large malicious attacks [16], [43]. There may be some internal risks resulting from

failures in self-healing process in addition to outside malicious attacks. So autonomic

systems should be capable of handling different kinds of risks and attacks.

2.2.4 Self-Healing

If you get a cut on your body, it is healed autonomously. You don’t need to dic-

tate or control the process of healing that cut. Similarly, in autonomic systems, if

some problem occurs, then the process to detect, diagnose and repair the problem

is carried out automatically [6], [73]. These problems may be routine as well as

16

extraordinary events that cause malfunctioned behavior of the system [28]. To diag-

nose such problems, a system should be capable to analyze the available information

properly using some probabilistic or artificial intelligence technique. After the diag-

nosis, it should be able to prepare a corrective plan. This planning phase also needs

some intelligent approach which should identify the current situation and propose

an appropriate solution on the basis of some pre-learnt model or experience. The

objective of self-healing is to minimize the downtime of autonomic systems.

2.2.5 Self-Awareness

A human senses its environment like heat, fire, cold etc, and switches its state ac-

cordingly so that next state should be relatively more desirable and safe. Similarly,

autonomic systems must be aware of their environmental conditions and their own

state. They should be able to predict their next possible state after analyzing the

environment [28], [34].

2.3 Components of an Autonomic System

Autonomic systems are composed of several autonomic elements. Each autonomic

element is usually dedicated for a particular task or set of tasks. Various autonomic

elements can interact with each other as defined in the high level goals or objectives

by human administrators. An autonomic element consists of five major components:

autonomic manager, managed element, sensors, actuators and knowledge. All of

these components are discussed below [43]:

2.3.1 Autonomic Manager

Autonomic manager is the core component of the autonomic element which dis-

tinguishes between autonomic and non-autonomic systems. As shown in figure 2.1,

17

autonomic manager is responsible for monitoring the managed element, analyzing the

monitored data, planning actions against the analyzed results and finally executing

the plans onto the managed element.

Autonomic manager performs the following four steps of autonomic computing

cycle:

1. Monitor:As sensor senses the managed resource periodically for the life time

of the system, every sensed state is tested whether it is stable or not. If it is

found stable then no further action is taken and next sensed state is monitored.

If it is found unstable then its data is handed over to the diagnose phase.

2. Diagnose: This phase receives data of an unstable state and analyzes it to

diagnose the cause of unstability. It may be an internal cause or external cause.

This diagnostic step needs to be intelligent enough to correctly diagnose the

problem. This step may be carried out using any pre-determined algorithm.

3. Plan: Diagnosed problem is reported to the plan phase which searches for a

suitable solution of the reported problem. This search may be conducted using

a pre-determined algorithm. Searched solution is prepared in the form of a

solution plan.

4. Execute: Solution plan is handed over to the execute phase which prepares

an execution plan and hands over the execution plan to the actuator of the

controller interface.

2.3.2 Managed Element

Managed element is essentially the non-autonomic system. All other components of

the autonomic element work collaboratively to make the whole system autonomic.

18

This element may be a hardware resource being continuously monitored or may be

a software application.

2.3.3 Sensors

As far as architectural considerations are concerned, managed element and autonomic

manager are well-separated components. So an autonomic manager needs some

mechanism to sense and gather data from the managed element. Sensors are used

for this purpose. The nature of the sensor depends on the nature of the managed

element. It may be a software component or some hardware sensor.

2.3.4 Actuators

Actuator is an interface which enables execution of the planning actions on the

managed element. Like sensors, nature of an executor also depends on the nature of

the managed element. It may be as simple as a function which passes certain values

to the managed element, or may be as complex as a robot which has to explicitly

perform some physical task at the problematic point, like removing a physical hurdle.

2.3.5 Knowledge

Autonomic manager maintains a knowledge-base. It needs some mechanism to ana-

lyze the gathered data through monitoring. It may analyze the gathered data using

a statistical approach, pre-learnt model using any machine learning technique or ex-

perience. Knowledge-base is used to store all such information which is essential for

proper analysis and planning. Once new actions are planned and executed, their

results may be observed and the knowledge-base is continuously updated to make

the future decisions more accurate.

19

2.4 Architectures of Autonomic Systems

There are two major architectural options available in literature for building an

autonomic system: externalization and internalization [34], [43], [72].

2.4.1 Externalization

In externalization approach, autonomic manager resides outside the managed ele-

ment and is built as a well-separated implementation layer as shown in Figure 2.1.

Usually, this approach is adapted when an existing non-autonomic system has to be

made autonomic. This architecture is quite useful in applications where autonomic

behavior is enabled at some later stage of the application life cycle or internal im-

plementation details of the application are not that much clear. Legacy systems are

a good candidate for externalization option.

2.4.2 Internalization

In internalization approach, there is no clear separation between the autonomic man-

ager and the managed element as shown in Figure 2.2. This approach makes more

sense when we have to implement a system with built-in self-management capabil-

ities. For the internalization architecture, various programming frameworks have

been proposed in literature like Accord [62], Rudder [60], vGrid [52] etc. Accord

programming framework enables internalization using three kinds of ports: control

port, operational port and functional port. Control port is a shared interface between

element manager and the computational component. Operational port contains rules

to be used by the rule agent for self-management of computational component. Func-

tional port is used to expose the functionality to other autonomic components. To

implement or enable any of the self-* properties, an autonomic computing system has

to perform many different steps which collectively constitute autonomic computing

20

Sensor

Managed Element (to be self-managed)

Monitor

Knowledge

Actuator

Autonomic Manager

Analyze Plan

Execute

Figure 2.1: Externalization Architecture of Autonomic Systems[43]

cycle as shown in Figure 2.1 [34], [43], [72].

2.5 Case-Based Reasoning

Case-based reasoning (CBR) [3], [18], [58], [84], [91], [93] is a problem solving method-

ology which utilizes the past experience on similar kind of problems to solve the cur-

rent problem in an elegant fashion. Past experience is maintained in a repository in

the form of problem-solution pairs. Each pair is known as case and the whole repos-

itory is referred to as case-base. When a new problem arises, it is compared with

the existing problems in the case-base. Similarity is found between the current case

and every case in the case-base. If there is reasonable amount of similarity then we

take those cases into account. Various similarity metrics are used for this purpose,

and their selection depends upon the level of accuracy needed and domain of the

problem. An important observation on which CBR is based is that similar problems

have similar solutions and many real-world domains have empirically supported this

hypothesis [64].

The classical model of CBR problem solving cycle consists of four major phases

21

Element Manager

Operatinal

Port

Functional Port

Control

Port

Autonomic

Component

Figure 2.2: Internalization Architecture of Autonomic Systems [62]

which are collectively known as 4R’s of CBR, as shown in Figure 2.3. Fifth phase is

not part of the regular CBR cycle but may be periodically executed. Each of them

is discussed below:

2.5.1 Retrieve

Current case is compared with the existing cases in case base and most matching

cases are retrieved. To accomplish this task, there are two alternatives: one is k-

Nearest Neighborhood (k-NN) and other is Decision Trees.

k-NN method uses various similarity metrics to retrieve the closest neighbors. Some

of the commonly used similarity metrics for this purpose include City block dis-

tance, Euclidean distance, Mahalanobis distance [33], geometric similarity metrics

[32], probabilistic similarity measures [70] and many more distance measures. This

gives us a set of nearest neighbors of the current case. In order to determine the

significance of each of the attributes, difference feature weighting schemes exist in

literature [95], [41], [100].

There exist many case retrieval methods to find the similarity between current case

22

and a case in the case-base. Some well known methods include Manhattan distance,

Euclidean distance [33], Mahalanobis distance [2], geometric similarity measures [32],

probabilistic similarity measures [70] etc. Some of the widely used similarity mea-

sures are discussed below:

Manhattan Distance

Manhattan distance is used to retrieve similar cases from the case-base. It takes

weighted sum of absolute differences between the current case and any other case in

the case-base. This weight is set by the user or analyst. It is given by [33], [54]:

dij =∑

wk|xik − cjk| (2.1)

Where dij represents distance between ith and jth cases with respect to all parameters.

Euclidean Distance

Similarly, Euclidean distance is applied as a similarity measure between the current

case and any previous case. It calculates the distance by finding weighted distance

like Manhattan distance. It is given by [33], [54]:

dij =

√∑wk(xik − cjk)2 (2.2)

Where dij represents distance between ith and jth cases with respect to all parameters.

Mahalanobis Distance

Mahalanobis distance function seems of better interest because it takes care of the

correlation between various parameters. It is given by [2], [54]:

dij = (xi − cj)tS−1(xi − cj) (2.3)

Where dij is distance between the ith and jth cases, t represents transpose and S−1

is inverse of variance-covariance matrix of the independent variables over the whole

case-base.

23

Geometric Similarity Measure

Geometric similarity metric gets benefit of various geometric and graph algorithms

by transforming the problem in geometric domain. So, one case is represented by a

case graph. This algorithm [32] suggests finding Delaunay triangulation of the case

graph, finding edge costs and applying Dijkstra’s algorithm. These kinds of metrics

are usually applied in case of route planning of planes or robots etc.

Probabilistic Similarity Measure

Problem domain is represented by m discrete attributes. All cases (C1, ..., Cl) are

represented by binary random variables. Ck = 1 represents that Ck is the case to

be compared, and Ck = 0 represents that Ck is not the case to be compared. An

attribute Ai may have ni possible values represented as ai1, ..., aini. A case Ck is

coded as a vector in the following form:

Ck = (Pk(a11), ..., Pk(a1n1)︸︷︷︸Pk(A1)

, Pk(a21), ..., Pk(a2n2)︸︷︷︸Pk(A2)

, ..., Pk(am1), ..., Pk(amnm)︸︷︷︸Pk(Am)

) (2.4)

where Pk(Ai) is the probability distribution for the values of attribute Ai when the

case Ck is under consideration, and is given as:

Pk(aij) = P (Ai = aij|Ck = 1) (2.5)

Case-base is represented as a Bayesian belief network comprised of variablesA1, ..., Am

and cases C1, ..., Cl [70].

Input case is represented in the form of initial probabilities as given below:

c0 = (P0(a11), ..., P0(a1n1), P0(a21), ..., P0(a2n2), ..., P0(am1), ..., P0(amnm)) (2.6)

Bayesian case matching is represented by a score for the case Ck which is represented

as the conditional probability P (Ck = 1|C0 = c0). Each part of the output vector is

represented by conditional probability P (Ai = aij|C0 = c0) [70].

24

Rule-Based Similarity Measures

Sebag et. al. [79] have proposed two rule based similarity measures. Problem domain

consists of rules and each rule consists of a hypothesis and a conclusion. Testing the

rules means whether a particular rule is applicable in a problem domain. The rule

will be fired if hypothesis is satisfied, otherwise not. So, this decision of firing the

rule is boolean. Problem domain is mapped into boolean domain based on this

principle. Counting the number of identical bits of two examples in this boolean

domain gives similarity. So, the rules which are applicable in both example and

rules which are not applicable in both examples will contribute to find the similarity

between two examples. So, only the rules which are applicable in both examples

contribute towards finding the similarity between the two examples. If problem

domain is not represented in rule-based form then it needs to learn rules from the

given data set using any of the machine learning techniques.

Other Similarity Measures

There are many other similarity measures available in literature which can be used.

Selection of a particular measure depends on the type of data and domain of the prob-

lem. Complexity of algorithm to find similarity also matters in the selection process.

Other similarity measures include fuzzy rules based similarity measure [98], utility

based similarity measures [99], Minkowski distance, Canberra distance, Bray-Curtis

distance, Chord distance, Hellinger distance, Jaccard’s coefficient, simple matching

coefficient etc.

2.5.2 Reuse

Solution of the nearest neighbors is used to devise solution of the current case. There

exist various solution algorithms which can be applied on solutions of the nearest

25

Problem (Case)

Planned Solution

Revised Solution

Input

Retrieve

Reuse

Revise

Retain

Output

Case- base

Proble m

(Case)

Proble m

(Case) Set of NN’s

Figure 2.3: The CBR Cycle [3]

neighbors to find out the solution of case at hand. These algorithms include arith-

metic average, weighted average etc [33]. Selection among these solution algorithms

depends on the priorities of neighbors. In case of large case bases, it appears quite

common that new problem and retrieved cases are quite similar and the solutions

can be reused confidently. But in some domains, new problem is not closely simi-

lar to the retrieved cases. In such situation, solution needs to be adapted to fit in

the current scenario. In case of medical decision making, solution often needs to be

adapted [64].

Selection of size of the nearest neighborhood which should contribute towards

computation of the solution of the new case is an open problem in the conventional

CBR systems. Radius of the nearest neighborhood depends on the manual intuitive

decision. So, it can grow from one case to size of the entire case-base. It is a non-

trivial problem to agree upon an appropriate number of nearest neighbors. In Figure

2.4, the increasing circles around the new case (represented as a ⋆) represent various

choices of possible nearest neighborhoods. It can be as big as the entire case-base.

26

R a d

i u s

o f N

N

Figure 2.4: Possible Variation in Cardinality Selection of Nearest Neighborhood of aNew Case

2.5.3 Revise

For better adaptation, we may need to revise the solution of the current problem.

However to make CBR more sensible, adaptation phase should be kept minimal.

Substantial case adaptation or revision may harm the knowledge engineering advan-

tages to be obtained from CBR [18]. But in constructive problem solving tasks like

configuration, planning or design, adaptation is usually an essential part of the CBR

cycle because differences between the new problem and already solved problems are

quite likely. There are three common ways of carrying out the adaptation task:

substitution adaptation replaces a part of the solution, transformation adaptation

changes the solution structure and generative adaptation replays the solution finding

process [64]. Adaptation can also take place after getting some feedback once the

initial solution was applied.

27

2.5.4 Retain

Once we have completed the job of devising solution of the current case, we retain this

new problem-solution pair in the case-base. This is a beauty of the CBR approach

that recently solved problem is immediately available in the case-base for future use

[18], [93]. Deciding when to retain a case and when not to is not a trivial task.

Usually, new problem-solution pair is retained with the assumption that proposed

solution was successful. One possible way to know the effectiveness of retained

solution is by adding an attribute in the case base which will determine how well

the solution met the desired goals. This retention measure will then contribute

in devising the solution of new problem in future. This approach is feasible when

decision making about retention is complex [3], [64].

2.5.5 Repair

Case-base growth is a continuous process due to the retain phase of CBR cycle. As

the case-base size grows, it causes computational bottlenecks. So, case-base needs

to be repaired periodically so that efficiency of CBR cycle remains unaffected. The

repair phase is not a regular phase of the CBR cycle. It can be executed using

different schemes given in literature [47]. This process is conducted to obtain a

competent and compact case-base. Various case-base maintenance strategies exist

in literature [81], [82], [74], [102], [59], [80], [71], [29] to repair the case-base. For

example, this thesis proposes a clustered approach to repair the efficiency bottlenecks

of the case-base.

2.6 Applications of CBR

CBR has been applied in different contexts in the existing literature. Some of the

existing well known applications of CRB have been presented below:

28

2.6.1 Document Retrieval using CBR

CBR got its inspiration from the fact that humans use their experience to solve

a problem. The experience is usually stored in natural language. Deriving some

meaningful information from the text documents is a natural process. Question

arises how this knowledge can be effectively used by CBR. Existing documents are

treated as cases, and on arrival of a new query, relevant documents are retrieved.

An interesting application of textual CBR is intelligent FAQ system. Against user

query, more closely relevant documents are retrieved and by applying various layers

of knowledge extraction, answer is found. Nature and depth of knowledge layers vary

from domain to domain [10].

2.6.2 CBR for Product Selection Tasks in E-Commerce

Due to enormous growth of E-commerce applications over the web, it has really

become for customers to make effective use of the web and take feasible decisions.

Customers may be interacting with these applications before purchase, during pur-

chase and after purchase. At every stage, they may not know exact specification,

name or brand of the product they are looking for. They may be stuck in deciding

among so many offers. They can be facilitated through CBR to make easy and fea-

sible decisions. They can mention their choices, priorities etc and CBR will retrieve

relevant feasible cases or products [10], [91], [92].

2.6.3 CBR for Help-Desk Applications

Various people serving at help-desks of different manufacturers often do not know

the exact engineering details or maintenance details of the equipment. When some

problem is reported by the user, then immediate solution is needed instead of digging

out the causes and analyzing them. So, a question/answer dialogue is conducted to

29

know about the problem. Collection of these answers form the case and solution is

suggested using CBR cycle. Solution may be an advice in text, a drawing, a video

demonstration etc. So, CBR helps to reduce the need of efficient and experienced

people for help-desk services [10], [91].

2.6.4 CBR for Medical Diagnosis

It has been realized in literature [10], [84] that medical diagnosis is a reasoning

process which extensively uses experience. Objective and subjective knowledge are

used together and this experience can be maintained in forms which are usually

used in hospitals. Besides these observations, there are some obstacles to CBR.

Multiple disorders need special attention. Also, doctors cannot dedicate so much

time to interact with the system before suggesting the treatment. To avoid such

situation, CBR system should be integrated with the existing patient information

system. This encourages on demand use of CBR, but case-base is continuously

updated, also getting benefit from the manual experience.

2.6.5 CBR for Planning and Design

There are three categories of design tasks: creative, innovative and routine design.

CBR cannot be applied for creative design because it is not capable to create new

artifacts every time in which experience is not so much involved. Innovative design is

an open planning problem in which there is no predictable exact solution. Suggested

solution needs to be properly adapted. Routine design is a closed planning problem

in which exact solution can be found. CBR supports both open and closed planning

problems [10], [84].

30

2.6.6 CBR for Configuration

Configuration problems are closed planning problems. All components to be config-

ured are known along with their rules. Any knowledge engineering approach can help

in this context. CBR can exploit previous solutions and may suggest a new solution.

This new solution may not be the exactly desirable solution, but will at least be

quite cheaper than constructing it from scratch. If it is not the exact solution then

appropriate adaptation algorithm can be used to make it applicable [10].

2.6.7 CBR for Software Quality Prediction

T. M. Khoshgoftaar et. al. [53], [54] have proposed a methodology to predict software

quality using CBR. According to their suggested approach, case base contains module

(software module) description and number of faults. Module description is stored in

the form of the set of software metrics representing the module. Current module,

whose quality is to be predicted, is matched with all case modules and its number

of faults is predicted based on the number of faults in closely matched cases and the

degree of matching. For the sake of quality prediction, three major components are

used: similarity function, number of nearest neighbor cases used for fault prediction

and solution algorithm. Software metrics are used as independent variables and

number of faults is treated as dependent variables. Target case is compared with

each case in the case library using a similarity function. According to investigation

of T. M. Khoshgoftaar et. al. [54], Mahalanobis distance performs better than other

similarity measures and inverse distance weighted average performs better than the

un-weighted average approach [50].

31

2.6.8 CBR for Software Reuse

Software reuse is an important concept of software engineering which promotes

reusing various software components for similar problems. In literature [23], [24],

[30], [86], software engineering researchers have suggested CBR to promote software

reuse practice. Each software can be specified uniquely using some specification lan-

guage. Against these specifications, code is written. Specification and code fragment

collectively form a case. When a new request to build a software comes in, then its

specification is formalized and a case is prepared. CBR is applied to generate rele-

vant code of the program. Now this automatically generated code may not be the

exact solution. So, most of the time, adaptation is needed. Computer programmers

can play their role for adaptation.

2.7 Summary

This chapter presents the background of the two main pillars of this research: auto-

nomic computing and case-based reasoning. Autonomic computing is a phenomenon

which enables self-managing behavior in conventional computing systems. Case-

based reasoning is a machine learning methodology which enables the learner to

learn from its own experience.

32

Chapter 3

Literature Review

3.1 Literature Survey of Autonomic Computing

In this chapter, various architectures, models and techniques applied for external-

ization and internalization architectures have been discussed with respect to their

relevant self-management properties.

3.1.1 Self-Management Approaches for Externalization

In order to autonomize an existing application or system, autonomic managers are

built as a separate layer and interact with the managed resource through sensors

and actuators. An overview of Various techniques and infrastructures proposed in

literature for enabling autonomic behavior in existing systems is presented below.

Self-Healing Approaches

In [75], Pip, an architecture for detecting the unexpected behavior in a distributed

system, has been proposed which compares the expected behavior with the current

behavior. Expected behavior can be declared by the users using a declarative inter-

face. Rish et. al. [76] have used a statistical approach for problem determination in

distributed systems using active probing. A probe is a an end-to-end test transaction

which outputs based on the system’s components. Appropriate probes are selected

to diagnose the problem and their results are analyzed. In [14], decision tree learning

33

approach has been presented to diagnose the failure in large distributed systems. At

nodes of the tree, various parameters are ranked according to their information gain

value. In [6], [68], [69], CBR has been suggested to diagnose the failure in service de-

livery system and plan a remedial actions. Each problem scenario and corresponding

remedial action are represented as a case and a collection of such scenarios consti-

tutes a case-base. Simple CBR cycle is followed to diagnose a new problem and

resolve it. In [35], environment knowledge is represented in the form of a case-base

and a rule-base. Conversational CBR is used to detect the failure and diagnose the

problem. Rule-base system is used to resolve the detected problem. Conversational

CBR keeps the user in continuous loop to ask questions for case generation and most

of the time, it needs user’s feedback to diagnose the problem.

Self-Configuration Approaches

For existing legacy systems, service-oriented architectures (SOA) have been proposed

to be used for enabling autonomic behavior [12]. Autonomic web services are de-

veloped and deployed using SOA’s. PeerPressure is a statistical solution suggested

to diagnose the problems resulted from misconfigurations [88]. A lot of snapshots

of machine configuration are stored in a database called GeneBank database. Peer-

Pressure uses a statistical analyzer that calculates the probability of each suspected

component to be sick using Bayesian estimation. In [5], an adaptive action selection

technique has been proposed for autonomic software systems. It selects an appropri-

ate action from a finite pool of actions using reinforcement learning.

Self-Optimization Approaches

A decentralized approach of enabling autonomic computing has been presented in an

agent-based architecture known as Unity in [87]. This architecture supports optimum

autonomic resource allocation based on computing many local resource-level utility

34

functions and then aggregating them into a global utility function. Similarly, policy-

based access control frameworks have also been proposed to devise a global solution

after analyzing the local states [57]. Control theory has been exploited in literature

for monitoring and fine-tuning various attributes in an autonomic environment [4],

[20], [27], [42], [85]. Controller with the feedback loop acts as an autonomic manager.

3.1.2 Self-Management Approaches for Internalization

Various approaches for built-in mechanisms of autonomic computing have also been

proposed in literature and are presented below.

Self-Healing Approaches

In [66], an autonomic failure detection algorithm has been proposed which suggests

a self-regulating mechanism for failure detection by setting bounds on resource usage

and failure detection latency. In [13], a technique for cheap recovery known as Micro-

reboot has been presented which starts the recovery process by rebooting components

from a granular level, and incrementally grows the reboot scope if recovery is not

achieved. Rx is another technique to recover from failures presented in [73]. It

treats bugs as allergies by rolling back the program to previous safe checkpoint and

re-executing in a modified environment.

Self-Configuration Approaches

Dynamic reconfiguration has been suggested in [40] as a building block for workload

managers in IBM pSeries servers which provides self-optimization, self-healing and

self-configuration capabilities. To achieve dynamic reconfiguration for distributed

systems, a component framework has been presented in [15] to manage the inter-

actions between components during dynamic reconfiguration. Hot swapping [7] is

another built-in approach of autonomic computing which enables dynamic adap-

35

tation of the execution environment by interpositioning the code or replacing the

code. Interpositioning enables insertion of a new component between two existing

components and replacement enables one passive component to take over an exe-

cuting component. In [44], an artificial intelligence approach has been proposed for

autonomic computing policies which is basically built on the state search model of

artificial intelligence. In [8], dynamic reconfiguration planning has been suggested.

The planner is given the description of the initial state, final state and domain of

possible transitions. Then using various search mechanisms like breadth first search,

depth first search, best first search etc., goal state is reached. An agent-based archi-

tecture has been proposed in [11] to build autonomic systems. In this architecture,

various functionalities of an autonomic system like controller, sensor, actuator etc.

are delegated to individual agents which collaborate in a multi-agent environment.

In [63], a framework for development of autonomic web services has been proposed

which enables self-configuring capability in web services and is based on the adoption

of a simulation engine. The literature survey reveals that CBR has not been used in

internalization architecture of autonomic systems so far.

Self-Optimization Approaches

In [65], an autonomic query optimizer known as LEO has been presented which val-

idates the mathematical model of query execution without human intervention and

is capable to correct the wrong optimization estimates. In [61], [62], Accord, a com-

ponent based framework for building autonomic applications, has been proposed.

It extends some capabilities of existing conventional programming frameworks and

introduces the formulation of autonomic components, their composition and inter-

action.

36

3.2 CBR for Autonomic Computing

In [6], [68], [69], Anglano et. al. have suggested for the fist time to apply CBR in

autonomic computing. Inspired from the role of CBR in various diagnosis systems,

they suggested to apply it for self-healing in autonomic systems. They suggested

to use externalization and proposed to embed CBR cycle in autonomic manager.

CBR case preparation module prepares a case for the newly reported problem and

hands over to the CBR analyzer which finds similar cases from the case-base and

suggests solution to the autonomic manager which applies it onto the managed el-

ement through an actuator. To enable autonomous behavior, sufficient number of

cases are required to be available in the case-base. But when the system is started

for the first time, it doesn’t have any experience knowledge. So a bootstrap phase

is introduced in which system is repaired manually and all manually solved prob-

lems are stored in the case-base. Once sufficient number of cases are available there,

then system is shifted to this proposed paradigm. Though the work done in [6],

[68], [69] has set the foundations of applying CBR in autonomic systems, this work

does not exploit the complete CBR cycle. Also, it does not provide any justification

for choosing an appropriate set of similarity measures, solution algorithms, revision

algorithms and case-base maintenance strategies. Impact of choosing cardinality of

nearest neighborhood has not been investigated.

The literature survey reveals that most of the techniques do not support con-

tinuous learning with the changing execution environment. However, CBR is a lazy

learning model which continuously improves the learning curve of the learner through

its retain phase. The tradeoff observed from this option is continuously increasing

size of the case-base due to retain phase. It incurs high computational costs when

the case-base has significantly grown. To overcome this efficiency bottleneck, this

37

thesis proposes a clustered CBR approach for self-configurable autonomic systems.

3.3 Applications of Clustering in CBR

In literature, clustering has been applied with CBR in other problem domains for case

indexing purposes. Kim and Han [55] proposed an indexing method using competi-

tive ANN’s like self-organizing maps (SOM) and leaning vector quantization (LVQ).

Both of their methods are used to find centroid values of the clusters representing

all cases. Number of clusters are manually determined by initializing the weight

vectors which converge as centroid values of the clusters. Once the centroid values

have been calculated then a decision tree scheme is applied to decide the membership

of each case to a particular cluster based on the distance from each of the centroid

values. Solution of the new case is computed by indexing the case in either of the

clusters and confining solution space within the cluster. Though this approach re-

sults in the improvement of retrieval efficiency, it is static in nature. Number of

clusters is decided in advance and remains constant in future. It does not guarantee

to yield the accurate solution at user defined number of clusters. As the size of the

case-base increases due to retention process, efficiency bottleneck arises again and

this scheme may fail to find the remedy. Similarly, Chiu and Tsai [17] proposed a

weighted feature C-Means clustering algorithm to define k clusters with the objective

of minimizing the distances among all objects and their corresponding centers. It

is explicitly stated in [17] that user has to decide the value of k. The purpose of

applying clustering procedure is to determine the weights of features. The features

which play vital role in clustering process are assigned higher weights for solution

finding process. The process of determining the cluster for a new case is similar to

that of [55]. This approach also exhibits efficiency bottleneck as the case-base size

grows and no criteria has been proposed to determine the number of clusters. Also,

38

the retrieval strategy is dependent on the dissimilarity threshold which may lead to

expand the solution space to many clusters. This flexibility goes against the objec-

tive of the study which claims to enhance the retrieval efficiency. Hong and Liou [38]

also used clustering with case-based reasoning but their objective was to determine

the significance of features using clustering so that the overall feature space should

be reduced which may lead to retrieval efficiency.

3.4 Limitations of the Existing Self-Management

Approaches

Autonomic computing is a nature-inspired phenomenon. In nature, environment is

dynamic and humans have to observe diverse scenarios with passage of time. So,

there is a demand of continuous learning. As human administrators are replaced by

autonomic managers, so their learning curve should not be static. Autonomic systems

may observe new experiences as the time passes on. Most of the self-management

techniques discussed above enforce learning on the current dataset and do not have

the cushion for dynamic learning with passage of time. Therefore, such techniques

will not be able to handle unseen scenarios, whereas human administrators learn

dynamically and are capable to handle such scenarios. They also learn from their

mistakes. So, their learning curve is continuously improved. Another limitation of

most of these artificial intelligence and probabilistic approaches is that they need well-

formalized and structured knowledge for training purposes which is rarely available

in case of autonomic systems.

CBR is an effective candidate technique to be adapted for enabling self-management

capabilities in autonomic systems where rich and well-formalized background knowl-

edge is not easily available. CBR has been applied for self-healing and failure diag-

nosis in literature as discussed earlier. The self-healing approach suggested in [6],

39

[68], [69] is based on strong assumptions that limits its applicability. First assump-

tion is that application of the proposed solution of CBR cycle will not cause another

problem in the managed element. This is a strict assumption which is not applicable

in real world scenarios. Second assumption is that a solution algorithm can take in-

definite amount of time. This is also not a practical assumption. Another restriction

of this approach is that it is only applicable to self-healing capability and cannot be

generalized for self-management of autonomic systems in general. This approach is

also strictly applicable in externalization architecture only. In [35], CBR has been

applied for failure diagnosis. This approach has two major limitations. First limita-

tion is that it is specific to failure detection like the other proposed approaches in [6],

[68], [69] and cannot be generalized to detect optimization, protection or configura-

tion problems in autonomic systems. Second limitation is that it uses CBR only to

diagnose the problem but does not exploit it for remedial actions. Instead, a static

rule-base has been used for failure recovery, which limits the continuous learning of

the autonomic manager.

3.5 Summary

This chapter has presented the literature review of two architectural options of auto-

nomic systems: externalization and internalization. For both of the options, various

existing approaches to enable autonomic behavior have been discussed. One of the

these techniques is CBR. Existing methods of exploitation of CBR in autonomic

managers have been presented and their shortcomings have been highlighted. These

highlighted limitations lead to the proposal of an improved generic CBR based ap-

proach for autonomic managers.

40

Chapter 4

Achieving Self-ManagementCapability using Case-BasedReasoning

Re-occurring nature of problems in software systems domain makes CBR an attrac-

tive option to detect and resolve them. Flexibility in case representation due to

non-structural approach as opposed to other techniques makes CBR easy to apply

for the problem diagnosis and analysis phase of the autonomic managers. Simple and

self-managing behavior of CBR cycle promises to provide nature inspired solution

strategy for the planning phase of autonomic managers. CBR promises to utilize

comparatively lesser input pre-processing overhead due to the flexible problem rep-

resentation format. It also exhibits the capability of exploiting past and the most

recent experiences to suggest a solution to the current problem. Flexible solution

revision mechanism for update of a suggested solution is one of the unique character-

istics of CBR as compared to other self-management algorithms in literature. This

chapter presents the proposed CBR based self-management algorithms.

41

Managed Resource

New Monitored Problem

Retrieval of Nearest Neighbors

Preparation of Case

Applying the Solution

Decision Making to Retain

Decision Making to Reuse or Refine

Solution Algorithm

Case-base

Proble m

(Case)

Proble m

(Case) Case 1

Sensor Actuator

Autonomic Manager

Monitor

Analyze

P l a n

Execute

Figure 4.1: Proposed Architecture for Externalization using CBR

4.1 Self-Management in Autonomic Systems: A

CBR Based Generic Approach

In this work, a CBR-based approach is proposed to achieve the self-management

capabilities in autonomic systems using externalization and internalization architec-

tures. The proposed approach takes care of most of the assumptions and limitations

presented in Section 3.4 and promises a generic solution which can be applied to

achieve any of the self-management capabilities in the autonomic systems. This sec-

tion presents CBR-based autonomic approach in the context of the two candidate

architectures, externalization and internalization, with their scope, proposed CBR-

based externalization and internalization algorithms and CBR-based autonomic so-

lution finding framework Case-BasedAutonome CBA.

42

4.1.1 CBR-Based Externalization

In our proposed CBR-based externalization architecture shown in Figure 4.1, we

present a CBR-based autonomic manager. In this approach, Monitor phase of au-

tonomic manager detects the new problem using a dedicated sensor and prepares

a case as an input to CBR cycle. Analyze phase of autonomic manager retrieves

the nearest neighbors from the case-base. In our approach, the case-base resides

inside the knowledge repository of autonomic manager. The solutions of the nearest

neighbors are aggregated using a solution algorithm in the Plan phase of autonomic

manager. Within the same phase, decision to reuse or refine the solution and retain

the current experience in the form of a case is also taken. During this phase, exter-

nal adaptation and retention through limited human intervention is also allowed for

revising and retaining the current solution for specific scenarios. Execute phase of

autonomic manager finally applies the solution onto the managed resource through

a dedicated actuator. To implement the CBR-based externalization architecture, we

propose Algorithm 1 as given in Figure 4.2.

In this algorithm, the autonomic manager acts as a monitoring entity which

resides outside the actual system R. Sensor S is used to periodically execute a

diagnostic test on R after every T time units. Current state statec of R is captured

through S by the autonomic manager. It is then compared to the known target

state statet and deviation dev of statec from statet is computed. If dev exceeds the

deviation tolerance limit ε, then statec is transformed into a case Cp, which is a

vector of significant parameters representing the current problem, and handed over

to CBR-based autonomic solution finder CaseBasedAutonome(CBA), which finds a

solution solp. This solution is executed by the autonomic manager onto R using the

actuator E. Cushion for the rollback option is also incorporated in case of any worse

43

Input: Monitored resource R, sensor S, executor E,

time delay T, tolerance limit

Output: Autonomic behavior in R

Method:

1. Repeat step 2 to 5 after T time units

2. statec = readInterface(S, R)

3. dev = compareTo(statec, statet)

4. If(dev > )

a. cp = prepareCase(statec)

b. solp = CBA(Cp)

c. executeSolver(solp, E, R)

d. staten = readInterface(S, R)

e. If(needUndo(staten))

i. rollBack(statec)

f. End If

5. End If

6. End Repeat

Figure 4.2: Algorithm 1 - Self-Management Algorithm for Externalization using CBR

undesired new state. Human administrator may decide whether to retain current

state or rollback to previous state statec. In such case, undo option may be availed.

So, in our proposed architecture, human administrator plays a role at the abstract

decision making level. Figures 4.6 and 4.7 give Algorithms 4 and 5 respectively to

implement rollback strategy and undo need check.

4.1.2 CBR-Based Internalization

In internalization architecture, autonomic manager is implemented as a built-in com-

ponent of managed element. We propose a CBR-based extension of Accord [62] pro-

gramming framework for autonomic applications. Accord uses a static rule-agent

as part of the autonomic manager for analysis and planning purposes which limits

the dynamic learning capabilities of the autonomic manager. In this architecture,

autonomic manager and managed element are not well-separated layers in imple-

44

mentation because the internalized autonomic applications inherently exhibit auto-

nomic behavior. We propose to integrate our CBR-based autonomic solution finder

CaseBasedAutonome(CBA) with control and operational ports of this programming

framework using the strategy shown in Figure 4.3.

In our proposed approach, control and operational ports of the Accord [62] pro-

gramming framework play vital role. The control port is responsible for detecting the

problem and presenting it to the CBR cycle as a case. The operational port contains

current knowledge in the form of a case-base. The Analyze and Plan functionalities

of autonomic manager are performed by the CBA using the similar steps as in Algo-

rithm 1 but implementation process, architecture and scope of both approaches are

different. As the CBA is part of the autonomic component and not well separated

from the computational component, we do not need dedicated interfaces to act as

sensors or actuators. Functional port is responsible to expose the core functionality

of the autonomic component to other components and has no direct concern within

the proposed architecture. For CBR-based internalization architecture, we propose

Algorithm 2 as shown in Figure 4.4. Algorithm 2 is similar to Algorithm 1 except

the architectural difference and scope. Operational and control ports in Algorithm 2

play their role in computational component as well as autonomic manager, whereas

computational component and autonomic manager in Algorithm 1 are two separate

layers and can only interact through dedicated sensors and actuators. Algorithm

2 cannot be adapted for existing systems but is a good candidate architecture for

building new autonomic applications.

As CBA is a built-in component of this proposed architecture, it periodically

monitors the control port Φ after every T time units and the current state statec of

the autonomic component AC is captured. CBA plays similar role as in Algorithm

1 except it retrieves similar cases from the operational port.

45

Autonomic Component

Problem

(Case)

Case1

Planned Solution

Revised Solution

Monitor

Retrieve

Reuse

Revise

Retain

Operational Port

Case-base

Case 1

Control Port

Functional Port (Computational Component)

Apply

Figure 4.3: Proposed Architecture for Internalization using CBR

4.1.3 CaseBasedAutonome(CBA): CBR-Based Autonomic So-lution Finder

CaseBasedAutonome (CBA) is a generic CBR-based framework and is used as a

basic building block in Algorithms 1 and 2. This algorithm is the crux of the CBR

based self-management approach and is implemented within the autonomic manager

for the Diagnose and Plan phases. The idea is to maintain a case-base CB containing

n cases. Each case represents a configuration, failure, protection or optimization

problem of the managed element. A case is a vector of various parameters which are

extracted from the captured state of the monitored resource. Solution of each case

is maintained in the form of a code snippet which is an effective way to dynamically

solve problem in autonomic computing [43].

Before executing CBA, we have to select an appropriate similarity measure SM ,

46

Input: Autonomic component AC, time delay T, tolerance limit

Output: Autonomic behavior in AC

Method:

1. Repeat step 2 to 5 after T time units

2. statec = CBA_Monitor( , AC)

3. dev = compareTo(statec, statet)

4. If(dev > )

a. Cp = prepareCase(statec)

b. solp = CBA(Cp)

c. CBA_Solver(solp, )

d. staten = CBA_Monitor( , AC)

e. If(needUndo(staten))

i. rollBack(statec)

f. End If

5. End If

6. End Repeat

Figure 4.4: Algorithm 2 - Self-Management Algorithm for Internalization using CBR

a solution algorithm SA and cardinality Nn of the set of nearest neighbors NN which

contribute to find the final solution. Current case Cp is compared with every case Cj

in CB using SM and similarity simj is computed which may be treated as weight wj

of Cj to find solution solp of Cp. We may also keep the same weight for all compared

cases. Such decisions may vary from domain to domain. Set of nearest neighbors

NN is extracted based on the computed similarities and NN is aggregated using SA

and w to find solp. Human administrator or an automatic adaptation mechanism

can be used to revise the solution, if needed. It may be needed when Cp was an

entirely new experience and its closest matches did not exist in CB. Again, human

administrator or an automatic mechanism may decide whether to retain the vector

< Cp, solp > as new experience in the CB. This decision depends on the magnitude

of new knowledge contributed by this new experience. Implementation level details

are given in Algorithm 3 shown in Figure 4.5.

47

Input: Given a case-base CB containing n cases, case representing

new problem Cp, similarity measure SM, solution algorithm SA,

cardinality of nearest neighborhood |NN|=Nn

Output: Planned solution solp of Cp

Method:

1. For each Cj CB

a. simj = findSimilarity(Cp,Cj,SM)

b. wj = simj

2. End For

3. NN = extractNeighbors(Nn)

4. solp = aggregate(NN,SA,w)

5. If (needRevision())

a. solp = revise(solp)

6. End If

7. If (needRetention())

a. CB = CB U <Cp, solp>

b. Increment n

8. End If

Figure 4.5: Algorithm 3 - CaseBasedAutonome (CBA)

4.2 Research Methodology

In order to effectively analyze the performance of proposed approach, an empiri-

cal research methodology has been adopted. This research methodology has several

dimensions along which the performance of proposed algorithms is tested and em-

pirically validated. We used accuracy, root mean squared error (RMSE) and average

absolute error (AAE) as the evaluation criteria in our empirical study for analyzing

various dimensions. In this section, we discuss these dimensions.

4.2.1 Nature of the Case-base

We constructed two case-bases representing two different case studies, RUBiS and

AFFA. Each case of RUBiS case-base is represented in the form of nominal and

binary attributes whereas each case of AFFA case-base is represented in the form of

numeric attributes scaled between 0 and 1. First case-base has been generated using

an emulator for RUBiS whereas second case-base has been synthetically generated

48

Input: Desired rollback state statec

Output: statec recovered

Method:

1. For each pi statec

a. executeScript (pi)

2. End For

Figure 4.6: Algorithm 4 - Rollback Strategy

Input: New state staten

Output: TRUE or FALSE

Method:

1. devn = compareTo(staten, statec)

2. If (devn > devt) Then

a. return TRUE

3. Else

a. return FALSE

4. End If

Figure 4.7: Algorithm 5 - Check for Undo Need

using various rules extracted from the description of AFFA ports in [61], [62]. Based

on the case-base nature, we used majority voting as the solution algorithm for RUBiS

case-base and weighted average as the solution algorithm for AFFA case-base. We

used simple hold-out validation technique for RUBiS case-base and leave-one-out

(LOO) cross validation technique for AFFA case-base.

4.2.2 Similarity Functions

Eight commonly used similarity functions for retrieval of the set of nearest neigh-

bors of the current case have been used and evaluated in this research work. The

49

analysis presents a comparison of these similarity measures on the basis of RMSE,

AAE and accuracy against varying number of nearest neighbors. The selected sim-

ilarity functions used in this research are Hamming distance, Manhattan distance,

Euclidean distance, Sim1 [45], Sim2 [45], Canberra distance, Bray-Curtis distance,

Squared Chord distance, Squared Chi-Squared distance and Jaccard similarity func-

tion. These similarity functions are given in Table 4.1.

4.2.3 Cardinality of Nearest Neighborhood

Set of nearest neighbors (NN) is used as the input to solution algorithm to devise

the solution in CBA. We vary cardinality of NN from 1 to size of the case-base.

The purpose of this analysis is to analyze the impact of cardinality of NN over the

performance of the proposed approach.

4.2.4 Solution and Adaptation Algorithms

In order to estimate the solution of the monitored problem in the managed ele-

ment, weighted average and majority voting algorithm have been used as solution

algorithm. In order to adapt the proposed solution in the current context, sim-

ple revision mechanism and changing the cardinality of NN approaches have been

applied.

4.2.5 Selection of Model

We implemented and evaluated our proposed approach using various parameters

mentioned above and compared their accuracy, RMSE and average absolute error

AAE. The experimental setup giving maximum accuracy, minimum RMSE and

minimum AAE using least value of the cardinality of the nearest neighborhood (Nn)

is recommended to be selected as the effective model for continuous use.

50

Table 4.1: Similarity Functions UsedSimilarity Measure Name Similarity Measure Definition

Hamming Similarity simij =matchesmk=1(cik,cjk)

m

Manhattan Distance dij =∑m

k=1 wk|cik − cjk|

Euclidean Distance dij =√∑m

k=1(wk(cik − cjk))2

Sim1 Simij = 1−max(IPij, OPc)IPij = max(min(cik, cjk))OPij = min(max(cik, cjk))

Sim2 Simij = 1− t1 ∗ t2t1 = min(IPij, 1−OPij)

t2 =IPij+(1−OPij)

2

Canberra Distance dij =∑m

k=1|cik−cjk|cik+cjk

Bray-Curtis Distance dij =∑m

k=1|cik−cjk|∑m

k=1cik+cjk

Squared Chord Distance dij =∑m

k=1(√cik −

√cjk)

2

Squared Chi-Squared

Distance dij =∑m

k=1(|cik−cjk|)2cik+cjk

Jaccard SimilarityCoefficient simij =

ci∩cjci∪cj

51

4.3 Summary

The empirical study conducted on two different case studies using the proposed algo-

rithms shows promising performance and capability to solve the autonomic problems.

The results show that 98% and 90% accuracies have been achieved in the case studies

of RUBiS and AFFA respectively. However, CBR has its inherent issues which arise

as the size of the case-base grows with the exploration of the new problems in the

autonomic domain. As CBR is an online learning paradigm, it results in the retrieval

efficiency bottleneck. The next chapter handles this efficiency bottleneck focusing

the inherent maintenance problem of CBR.

52

Chapter 5

Improving the Retrieval Efficiencyusing Clustered CBR

To enable self-managing capabilities in autonomic systems, conventional CBR cycle

has been proposed. Whenever a new problem is observed by the sensor of autonomic

manager, its description is compared with all the cases of the case-base using an

appropriate similarity measure and a set of nearest neighbors is extracted from the

complete case-base. Finding the similarity values is very costly in terms of com-

putational complexity, especially when there are a large number of cases present in

the case-base. Once the set of nearest neighbors is found, the solution strategy is

applied to devise the solution for the new configuration problem. Based on the above

discussion, following three major limitations have been identified:

1. New case needs to be compared with all the cases present in the case-base

irrespective of their similarity with the new case. Inclusion of less similar cases

in the solution finding process causes reduction in accuracy, recall and precision.

2. Size of the complete case-base increases due to the retention phase of CBR

cycle, thus resulting in increased computational complexity of the solution

finding process of the new problem.

3. Size of the nearest neighborhood to be considered can not be decided automat-

53

R a d

i u s

o f N

N

Figure 5.1: Confinement of Solution Space: Conventional CBR Approach vs Clus-tered CBR Approach

ically in the conventional CBR approach. It may vary from 1 neighbor to the

entire case-base.

The current work proposes a remedy to the above mentioned drawbacks by mod-

ifying the conventional CBR cycle presented in Chapter . The modifications are

highlighted as follows:

1. M1: Convert the standard case-base into a clustered case-base containing k

clusters

2. M2: Before applying the retrieve phase of the conventional CBR cycle, classify

the new case into one of the identified clusters, i.e. predictedCluster

3. M3: Find the solution for the new case within predictedCluster using CBR

cycle which will confine the solution space of a new case as shown in Figure 5.1

By incorporating the above mentioned modifications in the CBR cycle, the archi-

tecture of autonomic system changes to clustered CBR-based framework as shown

in Figure 5.2. Based on the proposed architecture, two algorithms are presented in

this chapter to accomplish the above-mentioned modifications.

54

As discussed in literature review, CBR has already been applied with clustering in

some other problem domains but the existing approaches have the following common

limitations:

1. The number of clusters is not guaranteed to be optimal. Though finding opti-

mal number of clusters in general is an open problem, it has not been selected

appropriately in the CBR context in literature. It may also lead to poor perfor-

mance in terms of accuracy, recall and precision as evident from our empirical

study.

2. The efficiency bottleneck may reappear, once sizes of individual clusters grow

explosively and get closer to the original size of the case-base.

3. Selection of the clustering algorithm to be applied is biased and none of the

conventional algorithms have been exploited.

This work addresses the above mentioned limitations in the following ways:

1. To find the optimal number of clusters for the given case-base, an accuracy-

based empirical decision criteria has been devised and presented. This is a

brute force approach to search an optimal number of clusters.

2. Re-clustering decision criteria has been devised in order to overcome the com-

mon drawback of the existing approaches.

3. The proposed approach offers the flexibility to select any clustering algorithm

and exploit it.

An additional contribution of this work is that clustered CBR based approach has

been proposed and applied in the domain of autonomic computing to handle the

efficiency bottleneck in computing systems. The computational efficiency of the

computing system has been improved using clustered CBR approach.

55

Managed Resource

New Monitored Problem

Retrieval of Nearest Neighbors

Preparation of Case

Applying the Solution

Decision Making to Retain

Decision Making to Reuse or Refine

Solution Algorithm

Case-base

Sensor Actuator

Autonomic Manager

Case-base

Clustered Case-base

Figure 5.2: Proposed Clustered CBR Approach for Autonomic Managers

5.1 Proposed Clustered CBR Approach

Given a large case-base, a clustering algorithm is selected to be applied on the

case-base and the cases are divided among reasonable number of clusters which can

yield the performance improvement. Their performance is compared with the non-

clustered CBR approach using Accuracy, Recall and Precision (ARP) analysis [83],

[94], and efficiency analysis.

5.1.1 Construction of Clustered Case-base

In order to incorporate modification M1 in the conventional CBR-based autonomic

cycle, the algorithm presented in Figure 5.3 has been proposed. This algorithm takes

inputs as training case-base CB, a similarity measure SM and a clustering algorithm

56

Input: A case-base CB containing n cases, similarity

measure SM, solution algorithm SA, clustering

algorithm CA, test case-base TCB, kin

Output: Clustered case-base CCB

Method (ClusteringCB):

1. currentAccuracy := 0

2. For k := kin to n/2 do

a. accuracyk := 0

b. CCB := CA(CB, k)

c. For each cj TCB

i. clusteri := classify(cj, CCB)

ii. For each cp clusterj and p !"

1. simp := findSimilarity(cj, cp, SM)

2. wp := simp

iii. End For

iv. solj := aggregate(SA,w)

v. update accuracyk

d. End For

e. If (accuracyk > currentAccuracy)

i. currentAccuracy := accuracyk

ii. kopt := k

f. End If

3. End For

4. CCB := CA(CB, kopt)

Figure 5.3: Algorithm to Construct the Clustered Case-base

CA. The algorithm outputs clustered case-base containing k clusters. The choice

of k is based on the accuracy of solution prediction of the clustered case-base. An

optimal k is searched in this iterative process which results is the highest accuracy.

The proposed algorithm can be adapted for any clustering algorithm. The empirical

study conducted on the proposed approach investigates the choice of a particular

clustering algorithm. As the preprocessing phase of the proposed approach, this

algorithm has been devised to determine appropriate number of clusters for the

case-base. This algorithm does not need to be frequently executed. The decision

to re-cluster the case-base primarily depends upon the size of the largest cluster.

57

Whereas size of clusters depends on the distribution of the training data. This

algorithm to construct the clustered case-base is re-executed after merging the pre-

built clusters once size of the largest cluster approaches the previous value of n (which

was used during the previous run of the same algorithm). This criteria is validated

on the retention of every new case which is implemented in the next algorithm. If

this algorithm is to be executed for the very first time then the value of optimalk

will be taken as 2 otherwise it will start from the currently known value of optimalk

from its previous run.

Selection of the clustering algorithm and agreeing upon the number of clusters

for a particular case-base may vary from problem to problem. It also depends on the

size of original case-base and the problem domain.

5.1.2 Devising the Solution of the New Problem

The algorithm to implement the clustered-based CBR cycle within the autonomic

manager has been outlined in Figure 5.4. Step 1 shown in Figure 5.4 implements

modification M2 and steps 2 to 5 implement modification M3 of the proposed ap-

proach. When a new case Cp comes in, it is classified to one of the relevant clusters of

the clustered case-base. Now the conventional CBR cycle is applied on the new case

and its relevant cluster that consists of finding similarity between Cp and all the cases

present in the relevant cluster, finding the weights and finally predicting the solution

of Cp. As a result of the application of this algorithm, the comparison space for Cp

limits to only its relevant cluster, thus reducing the computational complexity and

increasing the efficiency. Step 6 implements the re-clustering criteria. Re-clustering

process does not need to be started from k = 2. It starts from the currently known

optimal number of clusters optimalk. It is intuitive that optimalk ∝ n and is evident

from the results presented in Chapter 6.

58

Input: A clustered case-base CCB, new case cp, similarity

measure SM, classification algorithm CLA, solution

algorithm SA

Output: Solution solp of cp

Method:

1. predictedCluster := classify(cp, CCB, CLA)

2. For each cq predictedCluster

a. simq := findSimilarity(cp,cq,SM)

b. wq := simq

3. End For

4. solp := aggregate(SA,w)

5. predictedCluster := predictedCluster U <cp, solp>

6. If (size(predictedCluster) !nprev) Then

a. CB := Merge all clusters

b. kin := numberOfClusters(CCB)

c. Apply ClusteringCB(CB,SM, SA, CA, TCB, kin )

7. End If

Figure 5.4: Algorithm to Devise Solution of the New Problem in Clustered CBRApproach

If there are n number of cases in the case-base and they are divided among k

clusters, then instead of n comparisons, on average only n/k comparisons are needed.

Given that this approach is applied on large datasets, clustering the case-base will

significantly reduce the search space for similarity calculations and retrieval of the set

of nearest neighbors. This reduction in the size of search space will enhance the effi-

ciency of the CBR solution devising procedure. Also, only relevant cases contribute

towards the solution of the new case which promises better ARP performance.

5.1.3 Computational Complexity

Computational complexity of the solution devising process for each new case without

clustering is O(nm) where n is number of cases in the case-base andm is the complex-

ity of similarity function. If k-Means clustering algorithm is used as part of the first

59

algorithm, then computational complexity of the k-Means algorithm is O(nkt) [33],

where t is the number of iterations required to complete the clustering process. For

each k, clustering algorithm is applied and each case is treated as the test case using

the Leave-One-Out (LOO) validation process. The testing process takes n2/k steps

as n/k is the average cluster size. If O(m) is the computational complexity of simi-

larity measure then the testing process takes O(n2m/k) steps. If we deploy k-Means

clustering algorithm as part of the case-base partitioning process then the overall

computational complexity of the algorithm becomes O(n(nkt+ n2m/k)). Similarly,

if FarthestFirst or density-based clustering algorithms are selected, their computa-

tional complexity in worst case is O(n2). Hence, the complexity of the case-base

partitioning algorithm will be O(n(n2 + n2m/k)). As the case-base partitioning al-

gorithm is not executed on continuous basis, so computational cost of any of the

clustering algorithm does not harm the case retrieval efficiency. It will be executed

once significant number of new cases have been adapted, and re-clustering the case-

base is needed. However, an appropriate clustering algorithm has to be selected

which results in improved performance in terms of ARP analysis. The computa-

tional complexity of the proposed approach to devise solution of each new case using

the second algorithm is O(nm/k) where k is the number of clusters and in case of

large case-bases, it is a significant reduction in the total cost of solution devising

process. Value of k depends on the size of the case-base n, i.e. k = f(n). Hence,

computational complexity of the second algorithm is O(nm/f(n)). The empirical

investigation of our experiments reveals that k ≫√n which supports the signifi-

cant decrease in the computational cost of the proposed approach as compared to

unclustered approach.

60

5.1.4 Limitations of Clustered CBR Approach

Clustered CBR method is based on a brute force approach which searches for optimal

value of k between 2 and n/2. Then the case-base is clustered into the optimal

number of clusters (kopt) clusters. On arrival of a new case, it is classified amongst

one of the existing clusters and its search space is limited to that cluster. Expected

benefit of the approach is to improve the retrieval efficiency. However, this approach

suffers from following limitations:

1. One extreme of the clustering process is that the brute force approach yields

very small constant as the optimal number of clusters. e.g. kopt = 2. Then

average cluster size will be n/2. When a new case is classified among one of the

two clusters, its average retrieval cost will be O(n), leading to no significant

benefit as compared to the conventional CBR approach.

2. The other extreme is that the brute force approach yields very large value as the

optimal number of clusters. e.g. kopt = n/2. This will lead to very small average

cluster size but the classification process will become costly. In conventional

classification process, new instance is compared to all centroids. The centroid

with minimum distance from the new instance represents the predicted cluster

of the new instance. So, this scenario will lead to O(n) comparisons to predict

the relevant cluster. Hence, no significant advantage will be reaped over the

conventional CBR approach.

5.2 Randomized Approach to Estimate Number

of Clusters

It has been observed that deciding an appropriate number of clusters k for a case-base

is not a trivial problem. Conventionally, it multiplies O(n) factor to the complexity

61

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Accuracy

Accuracy

. . . . . . . . . . . . . . . . . . . . n/2

k

Figure 5.5: Sample Acceptable Accuracy Region of Clustered Approach

of clustering and testing process. Though clustered CBR approach provides reduced

retrieval efficiency once the case-base has been clustered into appropriate number

of clusters, still this pre-condition is a computationally expensive job. This need

becomes more significant when re-clustering decision is taken after retention of many

new cases.

Our clustered approach based experiments have revealed that during the brute-

force search of an optimal number of clusters, certain acceptable accuracy regions

are observed. In these regions, accuracy of the clustered approach remains above the

acceptable threshold. So, obtaining any particular value of k within these regions

will lead to an optimal point. As shown in Figures 5.5, accuracy regions cover a

significant span of the variation of k. If a random toss of k ends up with a value

within these regions then the brute-force approach will not applied.

In this research work, the above mentioned limitations have been handled in the

following ways:

62

5.2.1 Upper Bound on the Cluster Size

The existing lower bound on the kopt may yield large average cluster size and lead

to retrieval efficiency bottleneck. It is proposed to set the upper bound on average

cluster size to be log2n. This bound will give a definite reduction on retrieval cost.

In order to assure this average cluster size, minimum value of kopt is set as n/log2n.

Any larger value of kopt will result in lower average cluster size. Hence, this lower

bound on kopt will guarantee O(log2n) retrieval cost to compare the current case with

all the cases in the relevant cluster.

5.2.2 New Classification Process Based on Binary Search

In order to handle the second limitation of the brute force approach, all clusters

are represented by a tuple < i, normi >. Here, i represents the index of a cluster

and normi is the Euclidean norm of the centroid of the cluster i. xij is the j − th

parameter of the center of i− th cluster. It is computed as:

normi =

√√√√ m∑j=1

x2ij (5.1)

All clusters are maintained sorted with respect to their normi. normi represents

distance of the center of cluster i from a zero vector representing the origin. When

a new case comes in, it is possibly classified amongst some of the clusters. Binary

search based probable classification algorithm is proposed for this purpose. Norm of

the new case normc is computed and it is possibly classified amongst some of the

clusters using the binary search classification algorithm given in Figure 5.6. The

clusters having closest normi to the normc are identified. In the worst case scenario,

the number of clusters will be O(n) and the new case will be classified in O(log2n).

In this extreme case, cluster size will be a very small constant and hence the overall

computational cost of the retrieval process will be O(log2n). In Figure 5.6, predicted

63

is the cluster whose center is one of the most probable closest cluster centers to the

current case. Subsequently, all other clusters whose Euclidean norms are equal to

the predicted cluster are trivially found as candidates for classification. Then the

current case is matched with the centers of all clusters identified as candidates and

the case will be classified in the closest one.

The lower bound of kopt may yield n/log2n clusters and average cluster size will

be log2n. In this scenario, classification process will take O(log2(n/log2n)) which

is smaller than O(log2n). Retrieval cost will be O(log2n), hence leading to overall

computational cost of O(log2n).

The average scenario can also not go beyond this overall retrieval cost because

there is a tradeoff between average cluster size and the number of clusters. Our

proposed approach guarantees O(log2n) computational cost for both of the factors.

5.2.3 Las Vegas Randomized Algorithm for Searching theOptimal Value of k

Las Vegas randomized algorithms guarantee success but their computational cost

is dependent on the number of iterations needed to attain the success. As number

of iterations is a random variable, so its complexity will be O(T ). This research

work proposes to pick a random value of k between n/log2n and n/2 and cluster the

case-base into k clusters. In order to pick a reasonable value of k, this randomized

process may be repeated t times. Each iteration has an associated success probability

p estimated through prior knowledge. The objective is to keep on generating random

k until a kacc is reached. kacc is the acceptable k which is not necessarily kopt but yields

an acceptable accuracy. As each next iteration generates a random k regardless of the

k generated in previous iteration, so the probability distribution is memoryless. The

random variable is the number of iterations t which is an integer value, so it exhibits

64

Input: Current start index start, current end index end, search

item key, clusters vector x

Output: Predicted cluster index predicted

Method (BSC):

1. mid := (start + end) / 2

2. If (x[mid] = key)

a. predicted := mid

b. Exit

3. Else if (x[mid] > key)

a. If (start mid – 1)

i. predicted := BSC(start, mid – 1, key, x)

ii. Exit

b. Else

i. If (|x[start] – key| |x[mid – 1] - key|)

1. predicted := start

2. Exit

ii. Else

1. predicted := mid – 1

2. Exit

4. Else

a. If(mid + 1 end)

i. predicted := BSC(mid + 1,end,key,x)

ii. Exit

b. Else

i. If (|x[mid + 1] – key| |x[end] – key|)

1. predicted := mid+1

2. Exit

ii. Else

1. predicted := end

2. Exit

Figure 5.6: A New Efficient Classifier: Binary Search Classifier (BSC)

65

geometric distribution. Geometric distribution models the number of independent

Bernoulli trials needed to get one acceptable value of k. Getting an acceptable value

of k is success. Success probability in each trial is fixed as p. This algorithm is

outlined in Figure 5.7.

Let T be the random variable representing the number of iterations needed to

achieve the acceptable performance. Expected value of T is given as:

E(T ) =1

p(5.2)

5.2.4 Monte Carlo Randomized Algorithm for Searching theOptimal Value of k

Monte Carlo randomized algorithm is the variation of Las Vegas randomized al-

gorithm which completes its execution in deterministic number of iterations. The

accuracy of the algorithm depends on the best accuracy achieved in t iterations.

The proposed algorithm outlined in Figure 5.8 proposes to run the search process

t times. Each randomly chosen k is evaluated on a test case-base and its accuracy

is computed. Output of the algorithm is the best k among the t iterations. By the

random sampling, frequency of different k’s can be computed against different ac-

curacy thresholds. We may set an acceptable accuracy threshold and then compute

the frequency of k’s lying in the acceptable bin. This process lays down one possible

way to incorporate prior knowledge.

Assume that p is the probability of yielding an acceptable accuracy when a ran-

dom k is picked. This p can be estimated using the prior knowledge. (1− p) will be

the probability of not yielding an acceptable accuracy when a random k is picked.

As each experiment of choosing a random k is independent of any other random k,

(1−p)t will be the probability of not yielding an acceptable accuracy when a random

k is chosen in t independent iterations. Success rate of the proposed approach can

66



algorithm CA, test case-base TCB, accuracy threshold x


Method:


2. While (currentAccuracy < x)

a. k := randomGenerate(n/log2n,n/2)

b. accuracyk := 0

c. CCB := CA(CB, k)

d. For each cj TCB


ii. For each cp clusterj and p j


2. wp := simp

iii. End For


v. update accuracyk

e. End For

f. If (accuracyk > currentAccuracy)


ii. kopt := k

g. End If

3. End While


Figure 5.7: Randomized Algorithm to Cluster the Case-base into k Clusters: LasVegas Version

67



algorithm CA, test case-base TCB, number of iterations t


Method:


2. For i := 1 to t do

a. k := randomGenerate(n/log2n,n/2)

b. accuracyk := 0

c. CCB := CA(CB, k)

d. For each cj TCB


ii. For each cp clusterj and p j


2. wp := simp

iii. End For


v. update accuracyk

e. End For

f. If (accuracyk > currentAccuracy)


ii. kopt := k

g. End If

3. End For


Figure 5.8: Randomized Algorithm to Cluster the Case-base into k Clusters: MonteCarlos Version

68

be computed as:

1− (1− p)t = x% (5.3)

Where x% is the predicted level of accuracy. As mentioned earlier, p may be esti-

mated from prior knowledge, the proposed approach gives the flexibility of deciding

number of iteration t to choose a random k in order to achieve x% accuracy.

5.3 Summary

This chapter presents a clustering based approach for case-based reasoning to improve

retrieval performance. To find the optimal number of clusters for the given case-base,

an accuracy-based empirical decision criteria has been devised and presented. This

is a brute force approach to search an optimal number of clusters.To avoid the brute

force search, a randomized approach has been proposed to decide a near to optimal

number of clusters. For random generation of kopt, lower and upper bounds have been

derived in order to maintain the performance of the clustered CBR approach. To

improve the retrieval performance, a new Euclidean norm based binary classification

scheme has been proposed.

69

Chapter 6

Results

This chapter presents the empirical and theoretical results of the research presented

in the previous chapters. Section 6.1 discusses the experimental setup and empiri-

cal results of the proposed externalization based algorithms presented in Chapter 5.

Section 6.2 discusses the experimental setup and empirical results of the proposed

internalization based algorithms presented in Chapter 5. Traditional CBR based au-

tonomic managers suffer form the performance bottlenecks which have been tackled

using clustered CBR approach in Chapter 5. The results of the clustered approach

have been presented in section 6.4. Section 6.6 presents the empirical and theoreti-

cal results of the randomized approach presented in Chapter 5 which addresses the

limitations of the clustered CBR approach.

6.1 Applying CBR Based Self-Management Algo-

rithms on Externalization Based Systems

6.1.1 Case Study: RUBiS

We selected Rice University Bidding System (RUBiS) [1] as the case study for our

CBR-based externalization architecture of autonomic computing. RUBiS is an auc-

tion site prototype equipped with all core functionalities of an auction site. It is

an open-source initiative, used as a benchmark in research for evaluation of various

70

Managed Element (RUBiS)

Presentation Layer (Web Browser)

Data Layer (MySQL)

Application Layer (Apache Web Server)

Autonomic Manager

Service Monitor

CaseBasedAutonome

Problem ResolutionManager

Problem

Injector

JSP

ServletsJava

Beans

Figure 6.1: RUBiS Architecture with CBA

scalability and performance concerns of application servers. The core functionalities

of this bidding prototype include registration, selling, bidding and browsing. There

are three versions of RUBiS: PHP based, Java Servlets based and EJB based. We

used the Java Servlets version as our testbed. Servlets are used at the presentation

tier and generate the HTML response after retrieving data information from Java

Beans. MySQL is used as the database system at the data tier. We implemented an

emulator for RUBiS, a service monitor and a problem resolution manager. Emulator

emulates user actions which characterize the system state. Service monitor monitors

the current state of the system periodically. Problem resolution manager executes

the planned remedial action. We generated its case-base by executing this emulator.

71

This case-base represents various configuration and healing problems in the deployed

RUBiS testbed. If a problem is reported then its solution is suggested by CBA as

depicted in Figure 4.1. Finally, planned solution is executed on the managed resource

(RUBiS) through problem resolution manager. The implementation details of CBA

given in Algorithm 3 with RUBiS are discussed below and shown in Figure 6.1.

6.1.2 Problem Injector

The problem Injector has a pool of possible problems of diverse natures which can

occur in the RUBiS testbed. It randomly picks a problem from the pool after random

interval and invokes a script which causes that problem to occur in the managed

resource. The emulator injects configuration problems and service failures in Apache

web server and MySQL.

6.1.3 Autonomic Manager

Autonomic manager interacts with RUBiS through dedicated sensor and actuator.

The autonomic manager contains the following three modules:

Service Monitor

Service Monitor periodically captures state of the RUBiS testbed through sensor by

executing various health tests. Captured state is compared with the healthy state

and if some deviation is observed then it is handed over to the CBA.

CBA Implementation for RUBiS

This section discusses the implementation details of CBA module for RUBiS:

i. Case Representation:

In general, a case ci can be represented as a set of m parameters:

72

ci = pi1, pi2, pi3, ..., pim (6.1)

The complete case-base CB is represented as a set of n cases:

CB = c1, c2, c3, ..., cn (6.2)

In the testbed deployment, seven parameters were used to describe a RUBiS

problem along with one solution parameter. These variables were extracted

from the documentation of RUBiS [1]. These variables include Service Name

to be monitored (SN), Permissions on the User table of the database (PU),

Permissions on the Item table of the database (PI), Permissions on the Bid

table of the database (PB), Apache Server Configuration (SC), Apache Server

Health (AH) and MySQL Server Health (MH). The solution parameter is

Configuration Script (CS) that represents the name of a code snippet to be ex-

ecuted by the Problem Resolution Manager. These parameters are the control

parameters in the autonomic context. PU , PI, PB and SC are the control pa-

rameters for self-configuration capability in the RUBiS testbed. AH and MH

are the control parameters for self-healing capability in the RUBiS testbed.

Each case of RUBiS is represented as a set of the above mentioned parameters:

ci = SN,PU, PI, PB, SC,AH,MH,CS (6.3)

ii. Case Retrieval :

The holdout validation technique [56] was applied for the empirical evaluation

of our proposed approach on RUBiS. A bootstrap case-base was generated us-

ing the experience of human experts and was used as the training case-base.

A sample bootstrap case-base is shown in Table 6.1. The problem injector on

73

Table 6.1: Sample Bootstrap Case-base for RUBiSSr. No. SN PU PI PB SC AH MH Solution

1 Sell 1 1 1 1 0 1 CS12 Sell 0 1 1 1 1 1 CS23 AboutMe 1 0 1 1 1 1 CS34 Home 1 1 1 1 1 0 CS45 Browse 1 1 0 1 1 1 CS5

RUBiS was executed to generate the test cases. Appropriate similarity mea-

sures were selected from Table 4.1 for the retrieval of varying number of nearest

neighbors from the training case-base. As SN is a nominal attribute (NA), we

used a heterogeneous version of all the similarity measures by assuming equal

weights of all attributes, as given in Equation 6.4.

f(Cp, Cq) =g(NAp, NAq) + (γ − 1)(h(Dp, Dq))

γ(6.4)

Where

Dp = Cp − NAp (6.5)

Dq = Cq − NAq (6.6)

g(NAp, NAq) =

1, if NAp = NAq

0, otherwise(6.7)

Here f is a similarity function, NA = SN , γ = 7 and h(Dp, Dq) is the similarity

between two cases using one of the selected similarity measures on numerical

attributes.

iii. Case Reuse:

The solution parameter in this case study is a nominal variable. Therefore, ma-

jority voting algorithm given in Figure 6.2, was used as the solution aggregation

algorithm during the reuse phase of CBR cycle.

iv. Case Revision:

In this case study, a simple revision algorithm shown in Figure 6.3 was imple-

74

Input: Solution set SS, nearest neighbors NN

Output: Solution with majority vote

Method:

1. For each i SS

a. count [ i ] = 0

2. End For

3. For each j NN

a. count [ soli ] = count [ soli ] + 1

4. End For

5. return maxk |SS|(count [ k ])

Figure 6.2: Majority Voting Algorithm

Input: NN(Cp), error threshold

Output: Adapted solution solp

Method:

1. Sort NN(Cp) in descending order with

respect to similarity

2. For each Ci NN(Cp)

a. If (|soli – solactual| ! )

i. return soli

b. End If

3. End For

Figure 6.3: Solution Adaptation Algorithm

mented which yielded improved results. The solutions of the nearest neighbors

were applied iteratively until a specific neighbor rectified the problem. As

Nn = 1 gave the highest accuracy, so the solution of the nearest neighbor was

applied. In case of failure of first nearest neighbor, second nearest neighbor

was used and so on. The sequence of iterations was determined by the value

of similarity.

v. Case Retention:

Two cases are treated different if they vary by at least one attribute in this

75

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 2 3 4 5 6

Max A

ccu

racy

Nn

Figure 6.4: Effect of Nn on Maximum Accuracy

Table 6.2: Accuracy of Selected Similarity Functions for RUBiSSimilarity Measure Nn = 1 Nn = 2 Nn = 3 Nn = 4 Nn = 5 Nn = 6Hamming Distance 0.76 0.56 0.5 0.45 0.3 0.29Manhattan Distance 0.8 0.5 0.37 0.36 0.3 0.29Euclidean Distance 0.8 0.5 0.37 0.36 0.3 0.29

Sim 1 0.34 0.28 0.23 0.28 0.29 0.29Sim 2 0.16 0.1 0.1 0.22 0.29 0.29

Bray-Curtis Distance 0.17 0.11 0.11 0.23 0.29 0.29Squared Chord Distance 0.8 0.5 0.37 0.36 0.3 0.29

Jaccard Distance 0.29 0.29 0.29 0.29 0.29 0.29

case-base. If the current case is different from all of the existing cases in the

case-base, it is retained in the case-base as new knowledge. In this way, our

experience base gets richer incrementally and causes the improved efficiency of

the proposed approach.

Problem Resolution Manager

Problem Resolution Manager acts as the tool for applying the proposed and revised

solution of CBA. It invokes an appropriate solution script against the suggested plan

through actuator, which is executed on the deployed testbed.

6.1.4 Results and Discussion for RUBiS Testbed

CBR-based externalization approach was applied in the above experimental setup

and results were analyzed. Three performance measures have been used in this case

76

Table 6.3: RMSE of Selected Similarity Functions for RUBiSSimilarity Measure Nn = 1 Nn = 2 Nn = 3 Nn = 4 Nn = 5 Nn = 6Hamming Distance 0.489 0.663 0.707 0.742 0.837 0.843Manhattan Distance 0.447 0.707 0.794 0.799 0.837 0.843Euclidean Distance 0.447 0.707 0.794 0.799 0.837 0.843

Sim 1 0.812 0.848 0.877 0.848 0.843 0.843Sim 2 0.916 0.948 0.948 0.883 0.843 0.843



Table 6.4: AAE of Selected Similarity Functions for RUBiSSimilarity Measure Nn = 1 Nn = 2 Nn = 3 Nn = 4 Nn = 5 Nn = 6Hamming Distance 0.239 0.439 0.5 0.55 0.699 0.709Manhattan Distance 0.2 0.5 0.629 0.639 0.699 0.709Euclidean Distance 0.2 0.5 0.629 0.639 0.699 0.709

Sim 1 0.66 0.72 0.769 0.72 0.709 0.709Sim 2 0.839 0.899 0.899 0.779 0.709 0.709



study namely accuracy, root mean squared error (RMSE) and average absolute error

(AAE). Results are discussed with reference to the following dimensions:

Role of Nn

If Nn is increased and more neighbors are allowed to contribute in the solution of cp,

then accuracy decreases, and RMSE and AAE increase, as shown in Tables 6.2, 6.3

and 6.4. Maximum accuracy obtained using varying values of Nn is shown in Figure

6.4. The deceasing trend is due to the nature of RUBiS case-base which contains

nominal and binary attributes. In these data types, every mismatch of two values of

an attribute is equally treated and is equidistant.

77

Selection of Similarity Measure

This empirical investigation shows that three similarity measures among the selected

ones perform equally good as shown in Table 6.2, 6.3 and 6.4. They are Manhattan

distance, Euclidean distance and Squared Chord distance. Hamming distance also

performed closer to these similarity measures. In this investigation, no adaptation

algorithm was applied, and comparison was made on the basis of first suggested

solution by CBA.

Role of Adaptation Algorithm

The iterative adaptation strategy adopted in this case study yielded inspiring results.

Experiment was repeated three times and accuracy was achieved upto 98%. Results

are shown in Tables 6.5, 6.6 and 6.7. With the increased number of iterations in the

adaptation strategy, accuracy increases, and RMSE and AAE decrease. Iteration 1,

2 and 3 yielded 80%, 93% and 98% accuracy respectively.

Lower Bound in CBA Performance

An interesting observation in this study is that performance of CBA becomes con-

stant after Nn = 6. Another observation is that all similarity measures used in this

case study exhibit similar performance at the maximum value of Nn.

Recommended Model of CBA for RUBiS

On the basis of this empirical study, we selected CBA model Nn = 1, Manhattan

distance, Euclidean distance or Squared Chord distance as similarity measure and

the iterative adaptation strategy upto three iterations.

78

Table 6.5: Accuracy of Adaptation Algorithm for RUBiSSimilarity Measure No Adaptation One Iteration Two IterationsHamming Distance 0.76 0.92 0.98Manhattan Distance 0.8 0.93 0.98Euclidean Distance 0.8 0.93 0.98

Sim 1 0.34 0.51 0.69Sim 2 0.16 0.32 0.51

Bray-Curtis Distance 0.17 0.33 0.52Squared Chord Distance 0.8 0.93 0.98

Jaccard Distance 0.29 0.45 0.72

Table 6.6: RMSE of Adaptation Algorithm for RUBiSSimilarity Measure No Adaptation One Iteration Two IterationsHamming Distance 0.489 0.283 0.141Manhattan Distance 0.447 0.245 0.141Euclidean Distance 0.447 0.245 0.141

Sim 1 0.812 0.7 0.557Sim 2 0.916 0.824 0.7



Table 6.7: AAE of Adaptation Algorithm for RUBiSSimilarity Measure No Adaptation One Iteration Two IterationsHamming Distance 0.239 0.079 0.019Manhattan Distance 0.2 0.07 0.019Euclidean Distance 0.2 0.07 0.019

Sim 1 0.66 0.49 0.31Sim 2 0.839 0.68 0.49



79

6.2 Applying CBR Based Self-Management Algo-

rithms on Internalization Based Systems

6.2.1 Case Study: Autonomic Forest Fire Application

Autonomic Forest Fire Application (AFFA) [61], [62] was selected for the experi-

ments based on the proposed CBR-based internalization architecture of autonomic

computing. The application forecasts the strength, speed and direction of the forest

fire as it propagates under various conditions. This prediction process is based on

various dynamically changing environmental conditions. The application consists of

four components as shown in Figure 6.5: Data Space Manager (DSM), Computa-

tional Resource Manager (CRM), Rothermel and Wind Model. DSM represents the

forest as a two-dimensional data space of cells. CRM keeps records of the available

resources (like the information about burning, burnt or unburnt cells which simulate

the load across a grid) and keeps DSM informed about them. Each cell may have

unburned, burning, or burnt status. Rothermel computes the value and direction of

fire spread based on wind direction and intensity provided by Wind Model. We have

customized this application to predict the probability of a particular cell to be burnt.

On the basis of this probability, CRM may run a configuration script to update its

configurations and take further steps to adapt precautionary protective measures

and reconfigure the forecasted parameters. AFFA is a rule-based self-configuration

application. These rules have been defined in the operational port of the autonomic

component. The case-base CB was created by recording the < α, β, γ > triplet by

executing the rule-base, where α is the attribute, β is the corresponding value of α

and γ is the outcome of the rule. The direct and indirect attributes were extracted

from a description of the rules and ports of the autonomic component given in [61],

[62].

80

CaseBasedAutonome (CBA)

CRM

WindModel

DSM

Rothermel

Control PortOperational

Port

Functional Port

Figure 6.5: AFFA Architecture with CBA

AFFA uses the Accord programming framework [62] and constitutes two major

parts [61], [62], computational component and element manager, discussed below.

6.2.2 Computational Component in AFFA

Computational part is the managed element and constitutes the core business func-

tionality of the whole autonomic application. In conventional architecture of the

autonomic element, this is also known as managed resource. In our simulation,

computational component represents the dynamic fire growth against different envi-

ronmental conditions in the forest. Forest is represented by a two-dimensional grid.

In our simulation, it is a 6× 6 grid. Following the Accord framework [62], it defines

the following three ports:

Functional Port

This part of the autonomic application represents and manages the interactions of

an autonomic component with other components and is represented as Π. It defines

an input-output set of functionalities η ∈ Ω × Λ where Ω represents the function-

81

alities being used by the autonomic component and Λ represents the functionalities

being exposed by this component to other components. This port has no contribu-

tion in this case-study because handling the functionality exposed by an autonomic

component to the outside environment is out of scope of this research work.

Control Port

Control port (Φ) of the autonomic component defines the relation between the man-

aged resource and element manager for observing the current state and actuating

self-management plan. As autonomic manager is CBA in this testbed, therefore

CBA interacts with the computational component using Φ.

Operational Port

This is another core part of the autonomic component which maintains rules and

constraints used by the element manager and is represented as Γ. In this testbed, it

maintains the case-base CB to plan the remedial action of the current case Cp. CBA

interacts with the knowledge-base through Γ during various steps of the CBR-based

problem solving process.

6.2.3 CBA Implementation for Element Manager of AFFA

Element manager is an embedded autonomic manager which takes the responsi-

bilities of invoking appropriate rules from operational port. Element manager is

implemented as a rule agent in the Accord framework [62]. The rule-agent has been

replaced by CBA in this case study as shown in Figure 6.5.

As AFFA is an internalization based application so autonomic manager, which is

essentially CBA with the application ports, is not implemented as an isolated layer.

A synthetic case-base of 500 cases of AFFA simulation was prepared using the rules

and information of various ports given in [61], [62]. Details of CBA implementation

82

in this testbed are discussed as follows:

Case Representation

For testing purposes, five parameters were used to describe the problem along with

two dependent parameters to represent the solution. Selected parameters include

Number of Burning Cells in the Grid (N), Wind Direction with reference to X (WD),

Fire Speed (FS), Wind Intensity (WI) and Minimum Distance from the Burning

Cell (MD). Two dependent attributes to be predicted are Predicted Probability

(PP ) that cell X will be exposed to fire and Configuration Script (CS) based on

the PP value. PP values are distributed in four equidistant bins where each bin

represents a CS.

Each case of Autonomic Forest Fire Application is represented as a set of the

above mentioned parameters:

Ci = N,WD,FS,WI,MD,PP,CS (6.8)

Case Retrieval

The leave-one-out (LOO) cross-validation technique was used for the experiments

in this case-study. Only those similarity measures were selected from Table 4.1

which are applicable on numeric datasets. We tested these similarity measures using

different number of nearest neighbors.

Case Reuse

Case reuse phase is used to devise the solution of current problem using NN . We

applied weighted arithmetic average (WAA) as the solution algorithm to devise the

solution of Cp. WAA is given in Equation 6.9.

solp =∑

q∈NN

wpqsolq (6.9)

83

Where,

wpq =simpq∑

q∈NN simpq

(6.10)

Case Revision

Case revision strategy adapted in this case study involves variation of Nn. If current

solution is not fitting in the current scenario then we re-iterate Algorithm 3 by

changing Nn until an optimal value of Nn is found. It is evident from the empirical

results that this approach yields up to 90% accuracy as shown in Table 6.8. This

revision strategy is different from that of RUBiS due to difference between the two

datasets.

Case Retention

Once a solution of a new problem has been suggested then this new problem-solution

pair is retained in the case-base as a new case if it was a successful case. In this way,

experience-base keeps on growing as well as getting richer. We retain a new problem-

solution pair, if it does not match exactly with any of the existing cases. This affects

retrieval efficiency when case-base gets too large. We intend to investigate this

problem in the future work.

6.2.4 Results and Discussion for AFFA

Accuracy, root mean squared error (RMSE) and average absolute error (AAE) have

been selected as the performance measures of the proposed methodology for this case

study. The results of this empirical investigation have been analyzed across following

dimensions.

84

Table 6.8: Accuracy of Selected Similarity Functions for AFFASimilarityMeasure Nn = 1 Nn = 2 Nn = 3 Nn = 4 Nn = 5 Nn = 6 Nn = 7 Nn = 8Manhattan 0.84 0.84 0.88 0.87 0.89 0.88 0.89 0.9Distance

Euclidean 0.88 0.88 0.89 0.89 0.89 0.89 0.9 0.9Distance

Sim 1 0.58 0.57 0.65 0.64 0.69 0.7 0.71 0.72Sim 2 0.58 0.57 0.61 0.61 0.61 0.62 0.61 0.6

Canberra 0.79 0.79 0.82 0.82 0.82 0.83 0.83 0.83Distance

Bray-Curtis 0.83 0.83 0.84 0.86 0.84 0.85 0.84 0.85Distance

Sq. Chord 0.85 0.85 0.87 0.86 0.87 0.87 0.87 0.88Distance

Sq. Chi-Sq. 0.85 0.85 0.87 0.86 0.87 0.88 0.87 0.88Distance

Jaccard 0.6 0.59 0.67 0.68 0.66 0.67 0.68 0.68Distance

Table 6.9: RMSE of Selected Similarity Functions for AFFASimilarityMeasure Nn = 1 Nn = 2 Nn = 3 Nn = 4 Nn = 5 Nn = 6 Nn = 7 Nn = 8Manhattan 0.101 0.08 0.078 0.078 0.077 0.076 0.077 0.077Distance


Sim 1 0.181 0.145 0.141 0.137 0.134 0.131 0.13 0.128Sim 2 0.173 0.148 0.137 0.134 0.134 0.135 0.135 0.136






85

Table 6.10: AAE of Selected Similarity Functions for AFFASimilarityMeasure Nn = 1 Nn = 2 Nn = 3 Nn = 4 Nn = 5 Nn = 6 Nn = 7 Nn = 8Manhattan 0.04 0.041 0.044 0.048 0.049 0.051 0.053 0.055Distance


Sim 1 0.114 0.112 0.108 0.104 0.098 0.096 0.096 0.094Sim 2 0.111 0.112 0.111 0.112 0.116 0.118 0.119 0.12






Role of Nn

In this experimental setup, value of Nn was varied from 1 to the size of the case-base

to investigate a value of Nn for which optimal accuracy is obtained for each similarity

measure. Optimal values for all similarity functions were found for Nn in the range

of 1 to 8. As depicted in the results shown in Tables 6.8, 6.9 and 6.10, variation in

accuracy is observed upto 2% which is not that much significant with the increasing

value of Nn. This behavior with maximum possible accuracy is shown in Figure 6.6.

This behavior is due to the numeric nature of AFFA case-base in which all numeric

attributes are scaled between 0 and 1.

Selection of Similarity Measure

Euclidean distance can be inferred as the best similarity measure for this case-study

as it gives maximum accuracy, and minimum RMSE and AAE among all similarity

measures as shown in the second columns of Tables 6.2, 6.3 and 6.4. Other closer

86

0.8

0.82

0.84

0.86

0.88

0.9

0.92

1 2 3 4 5 6 7 8

Max A

ccu

racy

Nn

Figure 6.6: Effect of Nn on Maximum Accuracy in AFFA Testbed

similarity measures are Manhattan distance, Bray-Curtis distance, Squared Chord

distance and Squared Chi-Squared distance. In this investigation, no adaptation

algorithm was applied, and comparison was made on the basis of first suggested

solution using Nn by CBA.

Role of Adaptation Algorithm

The adaptation strategy adopted in this case study was varying values of Nn. This

adaptation strategy yielded accuracy upto 90% at Nn = 7 for Euclidean distance

and Nn = 8 for Manhattan distance. Results shown in Tables 6.8, 6.9 and 6.10

also indicate that this adaptation algorithm yielded upto 88% accuracy in case of

Squared Chord distance and Squared Chi-Squared distance. With the increase of Nn

in the adaptation strategy, no more optimal point beyond Nn = 8 was observed in

this empirical investigation.

Recommended Model for CBA of AFFA

On the basis of this empirical study, we selected CBA model with Nn = 1 and

Manhattan distance, Euclidean distance, Squared Chord distance or Squared Chi-

Squared distance as similarity measure. Adaptation strategy may give better perfor-

mance but improvement observed in this adaptation strategy is upto only 2%. We

recommend Nn = 1 because it yields upto 88% accurate results and offers minimum

87

Table 6.11: Performance Comparison of CBR with Other Machine Learning Ap-proaches on RUIBiS

Algorithm Accuracy Recall PrecisionDecision Tree 70% 80% 95.83%Naive Bayes 80% 87.5% 100%

ID3 70% 80% 95.83%Bayes Net 80% 87.5% 100%

SVM 70% 80% 95.83%Cart 70% 80% 63.89%J48 70% 80% 95.83%

RandomForest 70% 80% 63.89%CBR 70% 79.75% 63.89%

computational overhead. We recommend all these four similarity measures as the

good candidate for the selected model because there is no abrupt major change in

their behavior.

6.3 Comparison of CBR with Other Machine Learn-

ing Approaches

For both of the case-studies of autonomic computing, CBR has been compared with

existing machine learning approaches. For other machine learning approaches, we

have used Weka implementation [37], [90], [97]. 30% of the data was used as the

training data and 70% of the data was used for the testing purpose. Performance of

CBR was compared in terms of accuracy, recall and precision with other approaches.

Its performance has been found amongst the good performing approaches on both

of the datasets as shown in Tables 6.11 and 6.12 respectively.

RUBiS dataset is a discrete dataset containing nominal and binary attributes.

Discrete problem domain results in a better estimation of prior probabilities. There-

fore, the probabilistic approaches including Naive Bayes and Bayes network have

performed better than CBR on RUBiS dataset. However, performance of CBR keeps

on improving on continuous basis as more knowledge is added to the case-base.

88

Table 6.12: Performance Comparison of CBR with Other Machine Learning Ap-proaches on AFFA

Algorithm Accuracy Recall PrecisionDecision Tree 46% 25% 46%Naive Bayes 60% 39.53% 55.49%

ID3 46% 25% 46%Bayes Net 60.67% 39.86% 44.41%

SVM 65.33% 34.23% 65%Cart 54.67% 28.51% 54.55%J48 46% 25% 46%

RandomForest 46 25 46CBR 86% 70% 80%

AFFA dataset is a continuous dataset containing real attributes. It makes the

problem domain more complicated. Learning on a continuous domain is a difficult

task. CBR has outperformed on the AFFA dataset.

6.4 Applying Clustered CBR Approach for Self-

Management Capabilities

For AFFA, a case consists of seven parameters: five of these parameters define the

current state of the forest, the sixth parameter defines the predicted probability of the

spread of the forest fire and the seventh parameter represents name of a configuration

script which is executed based on the value of predicted probability. For experimental

purposes, three case-bases of AFFA of sizes 500, 2000 and 10000 cases were generated

and used as part of the knowledge repository in autonomic computing cycle.

6.4.1 Clustering the Case-Base

After generating the case-bases of n = 500, n = 2000 and n = 10000, case-base

partitioning algorithm given in Figure 5.3 was applied to cluster each of the case-

bases. Nine separate case studies were conducted for evaluating the performance

of the proposed approach employing a separate clustering algorithm and separate

89

case-base for each case-study. The clustering algorithms include k-Means, Farthest-

First and density-based algorithms. These algorithms were run using their Weka

implementations [37], [90], [97]. The value of k was varied from 2 to n/2 for each

case-study. Various simulations were undertaken using the three clustering algo-

rithms and an analysis to find the most appropriate value of k that satisfied the

objective was conducted.

6.4.2 Implementing the Clustered-CBR Cycle

As the first step of the clustered CBR-cycle, the new case was classified among one

of the relevant clusters. The similarity of the current case with all cases within that

particular cluster was found. Euclidean distance based similarity measure was used

for this purpose as given in Equation 6.11 because it was amongst the similarity

measures which gave highest accuracy in case of the conventional CBR approach.

dij =

√√√√√size(cluster)∑k=1

zk(cik − cjk)2 (6.11)

Where dij is the distance between ith and jth cases, cik and cjk are the kth attributes

of cases ci and cj respectively and zk is the weight of kth attribute. Similarity between

the new case and each case in the cluster was used to weigh the contribution of each

case towards the solution as given in Equation 6.12.

wi =1/dij∑size(cluster)

k=1 1/djk(6.12)

Where wi is the weight of ith case for solution finding process. The solution of

the new case is computed using the weighted average [54] of all the cases in the

corresponding cluster, as shown in equation 6.13.

Solp =

∑size(cluster)i=1 wiSoli∑size(cluster)

i=1 wi

(6.13)

90

6.4.3 Performance Analysis

The proposed approach has been implemented and tested using Leave-One-Out

(LOO) validation technique. Every case among the n cases was treated as the test

case and remaining n− 1 cases acted as the training cases. There are two dependent

solution parameters: PP and CS. The CS parameter is a nominal variable, so it is

treated as a class label. There are four different configuration scripts which result in

a four-class problem. We construct a binary confusion matrix for every class versus

all other classes. For each experiment, four confusion matrices are constructed and

accuracy, recall and precision are computed individually. Later, they are averaged to

compute the overall performance of the experiment. Format of a confusion matrix

is shown in Table 6.13. In this matrix, a represents positive examples classified cor-

rectly, b represents positive examples misclassified, c represents negative examples

misclassified and d represents negative examples classified correctly. Based on the

values of a, b, c and d, three evaluation measures Accuracy, Recall and Precision are

computed as shown in the Equations 6.14, 6.15 and 6.16 [83], [94] for the positive

class.

Accuracy =a+ d

a+ b+ c+ d(6.14)

Recall =a

a+ c(6.15)

Precision =a

a+ b(6.16)

Sample computations of one experimental setup for n = 500 using k-Means clustering

algorithm at k = 60 have been outlined in the Tables 6.14, 6.15, 6.16, 6.17 and 6.18.

Similar experiments were conducted for each n and each clustering algorithm while

91

Table 6.13: Format of a Confusion MatrixActual Positive Actual Negative

Predicted Positive a bPredicted Negative c d

Table 6.14: Sample Confusion Matrix for CS1Actual Positive Actual Negative

Predicted Positive 19 2Predicted Negative 24 455

varying value of k from 2 to n/2 and results have been aggregated in the Figures 6.7

to 6.15.

Accuracy, recall and precision capture different aspects of the performance of

the prediction approach. No individual performance measure can depict the actual

performance [83], [94]. It may lead to incorrect conclusions if any single measure

is considered. For example, in a case-base of size n = 10000, if there are 100 pos-

itive cases and the proposed approach predicts all of the cases as negative then

Precision = 100% and Accuracy = 99%. Both of these measures show excellent

results for a poor classifier whose Recall = 0%. Therefore, Recall plays significant

role to measure the effectiveness of classifier in this scenario. Similarly, if the pro-

posed approach predicts all of the cases as positive then its Recall = 100% but

Accuracy = 1% and Precision = 1%. In another extreme case, 600 cases out of

n = 11, 100 cases are positive. If the proposed approach is poor in predicting the

positives and predicts 500 positive cases as positives, 500 negative cases as positives

and 10000 negative cases are predicted as negative then its Accuracy = 95% and

Recall = 83% where as Precision = 50%. In this case, precision performs the criti-

cal role to judge the quality of prediction approach. Therefore, we have opted to use

these three evaluation measures collectively referred as ARP analysis.

As stated earlier, the performance of three clustering algorithms was studied.

92





The k-Means clustering algorithm [22], [33] partitions the objects based on the mean

value of the object cluster. FarthestFirst clustering algorithm [36] is a fast simple

approximate clusterer which clusters the given dataset heuristically. It initializes the

cluster centers at the farthest possible distances. Density-based clustering algorithm

is a Weka implementation which implements an abstract probabilistic approach to

estimate probability of an instance belonging to a cluster based on the conditional

density of the cluster [26], [90]. In this approach, a cluster is defined as a maximal

set of density-connected points. All of these clustering algorithms were applied for

different values of k, starting from 2 to n/2.

Performance Comparison of Clustered versus Unclustered Approach

As discussed in the earlier sections, the main objective of the proposed approach is

performance improvement. The performance is measured in terms of ARP analysis

and efficiency analysis. The detailed ARP and efficiency analysis is presented below:

1. ARP Analysis: Figures 6.7, 6.8 and 6.9 represent the behavior of the three

clustering algorithms in terms of accuracy with different cluster sizes (k) for

three datasets of sizes 500, 2000 and 10000 respectively. FarthestFirst cluster-

ing algorithm performs best across all experimental setups among the other two

clustering algorithms. Unclustered approach performance has also been pointed

out in these figures using a ⋆ sign. These results reveal that clustered approach

93



Table 6.18: Sample ARP CalculationsClass Name Accuracy Recall Precision

CS1 0.95 0.44 0.90CS2 0.79 0.80 0.74CS3 0.80 0.77 0.73CS4 0.96 0.62 0.84

Overall 0.87 0.66 0.80

performs almost the same or marginally better than the unclustered approach

leading up to 94% accuracy (performance of FarthestFirst clustering algorithm

for n = 10000 at k = 2012). In terms of recall, the behavior of FarthestFirst

clustering algorithm is much better than the other clustered algorithms and

unclustered approach as shown in Figures 6.10, 6.11 and 6.12 resulting in up

to 87% recall performance (performance of FarthestFirst clustering algorithm

for n = 10000 at k = 2012). Here, again the conventional unclustered CBR

results deteriorate in comparison with the clustered approach. Our proposed

approach exhibits considerable improvement as compared to unclustered ap-

proach in terms of the recall because solution space is now confined to the

relevant cluster. Precision of the clustered approach is much higher than the

conventional unclustered approach leading up to 85% precision as evident from

Figures 6.13, 6.14 and 6.15 (performance of FarthestFirst clustering algorithm

for n = 10000 at k = 2012). In terms of overall ARP analysis, FarthestFirst

based clustered CBR approach performed much better than the other two clus-

tered approaches and conventional CBR approach, and is suggested to be opted

as part of Algorithm 1 based on the empirical results.

94

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300

k Means

FarthestFirst

Density based

Unclustered (0.853)

Accuracy

k

Figure 6.7: Accuracy of Clustered Approach vs Unclustered Approach for n = 500

0 9

1

0.4

0.5

0.6

0.7

0.8

0.9

k Means

FarthestFirst

Accuracy

0

0.1

0.2

0.3

0 200 400 600 800 1000 1200

Density based

Unclustered (0.894)

k


2. Efficiency Analysis: The efficiency analysis presented in this research work

is based on the computational performance, specifically cost incurred in the

retrieval phase of the CBR cycle, of the clustered versus unclustered approach.

In case of the unclustered approach, 500, 2000 and 10000 comparisons were

required to find out a new solution for n = 500, n = 2000 and n = 10000

respectively. Whereas, the efficiency on retrieving similar cases in the proposed

clustered CBR based approach exhibited a significant improvement as shown

in Table 6.19. The table presents the improvement in retrieval efficiency in

terms of best case, worst case and average case analysis for the three clustering

95

0 6

0.7

0.8

0.9

1

y

0.2

0.3

0.4

0.5

0.6k Means

FarthestFirst

Density based

Unclustered (0.905)

Accuracy

0

0.1

0 1000 2000 3000 4000 5000 6000

k


0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300

k Means

FarthestFirst

Density based

Unclustered (0.420)

Recall

k

Figure 6.10: Recall of Clustered Approach vs Unclustered Approach for n = 500

algorithms used in this study. The best, worst and average case analyses are

based on the smallest cluster size, largest cluster size and average cluster size

respectively obtained through the application of Algorithm 1 of the proposed

approach at optimal values of k. Optimal values of k in each experimental

setup were determined using Algorithm 1.

Efficiency improvement is defined and computed in terms of percentage de-

crease in terms of number of comparisons needed in retrieval phase of CBR

cycle. Results presented in Table 6.19 reveal that FarthestFirst clustering al-

gorithm is the best candidate in terms of efficiency as well and performs better

in all experimental setups.

96

0 6

0.7

0.8

0.9

1

0.2

0.3

0.4

0.5

0.6k Means

FarthestFirst

Density based

Unclustered (0.470)

Recall

0

0.1

0 200 400 600 800 1000 1200

k


1

0.6

0.7

0.8

0.9

k Means

0 1

0.2

0.3

0.4

0.5k Means

FarthestFirst

Density based

Unclustered (0.489)

Recall

0

0.1

0 1000 2000 3000 4000 5000 6000

k


3. Effect of k on Performance: For the proposed approach to work, a reasonable

value of k has to be determined. The value of k should be such that the overall

retrieval cost in the CBR cycle reduces to a reasonable level without affecting

the accuracy of the new solution significantly. For the case study, k was varied

from 2 to n/2 and a record of average absolute error obtained for a given

scenario was maintained. This study revealed that k ≫√n which gives a clue

about the impact of k on the improvement of overall efficiency of the proposed

approach. This estimation is based on the comparison of optimal number of

clusters with the size of the case-base.

97

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300

k Means

FarthestFirst

Density based

Unclustered (0.359)

Precision

0 50 100 150 200 250 300

k

Figure 6.13: Precision of Clustered Approach vs Unclustered Approach for n = 500

1

0.6

0.7

0.8

0.9

k Meanssion

0 1

0.2

0.3

0.4

0.5k Means

FarthestFirst

Density based

Unclustered (0.396)

Precis

0

0.1

0 200 400 600 800 1000 1200

k


6.5 Performance of The Clustered CBR Approach

on CTG Case Study

6.5.1 Case Study: Cardiotocography (CTG)

The theoretical justification about the performance of the clustered CBR approach

vs the conventional CBR approach has been discussed in Chapter 5 and it is clear

that it promises to exhibit sustainable performance not only in the autonomic con-

text but also on general case-bases. This case study [19], [25] is aimed to support the

theoretical justifications and empirical results of the previous sections. Cardiotocog-

raphy (CTG) dataset contains different measurements of fetal heart rate (FHR) and

uterine contraction (UC) features. The instances have been classified into 10 classes

98

1

0.6

0.7

0.8

0.9

1

k Meanson

0.2

0.3

0.4

0.5

k Means

FarthestFirst

Density based

Unclustered (0.406)

Precisio

0

0.1

0 1000 2000 3000 4000 5000 6000

k


Table 6.19: Percentage Improvement in Retrieval Efficiency for Optimal Values of kn Clustering Algorithm kopt Best Case Average Case Worst Case500 k-Means 36 99.80% 96.40% 93.00%500 FarthestFirst 123 99.80% 98.60% 97.40%500 Density-based 63 99.00% 62.10% 25.20%2000 k-Means 500 99.95% 99.47% 99.00%2000 FarthestFirst 440 99.95% 99.55% 99.15%2000 Density-based 305 99.90% 52.75% 5.60%10000 k-Means 3021 99.99% 99.88% 99.77%10000 FarthestFirst 2012 99.99% 99.91% 99.84%10000 Density-based 4001 99.99% 99.08% 98.16%

by the experts of the domain. This dataset contains 20 real attributes representing

the patients’ statistics for diagnosis of diseases and has 2126 instances.

6.5.2 Attributes of CTG Dataset

Brief description of the attributes as given at UCI machine learning repository [25]

has been presented in Table 6.20.

6.5.3 Recommended Model for CTG Dataset

The results of clustered CBR approach on CTG are presented in Figure 6.16. During

this implementation, following experimental setup has been used:

• Similarity Measure: Euclidean Distance

99

Table 6.20: Description of the Attributes of CTG Dataset [25]

Sr.No. Attribute Desciption1 AC Number of accelerations per second2 FM Number of fetal movements per second3 UC Number of uterine contractions per second4 DL Number of light decelerations per second5 DS Number of severe decelerations per second6 DP Number of prolongued decelerations per second7 ASTV Percentage of time with abnormal short term variability8 MSTV Mean value of short term variability9 ALTV Percentage of time with abnormal long term variability10 MLTV Mean value of long term variability11 Width Width of FHR histogram12 Min Minimum of FHR histogram13 Max Maximum of FHR histogram14 Nmax Number of histogram peaks15 Nzeros Number of histogram zeros16 Mode Histogram mode17 Mean Histogram mean18 Median Histogram median19 Variance Histogram variance20 Tendency Histogram tendency21 CLASS FHR pattern class code (1 to 10)

100

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 50 100 150 200 250 300 350

Accuracy

k

k Means

Unclustered

Figure 6.16: Accuracy of Clustered vs Unclustered Approach on CTG Dataset

• Clustering Algorithm: k-Means Clustering Algorithm

• Cardinality of Nearest Neighbors: Size of the Cluster

• Optimal Results Achieved At: k = 10

• Average Cluster Size: 213

• Highest Accuracy Achieved at Optimal Point: 67%

• Accuracy of Unclustered Approach: 64%

Selection of the above mentioned experimental setup has been learnt during the

implementation of clustered CBR approach on AFFA as presented in section 6.4.

So, in this case-study, all clustering algorithms and all similarity measures have not

been implemented and used.

At the optimal point, there is 3% improvement in the accuracy of the clustered

approach as compared to the conventional CBR approach. On the other hand,

average number of comparisons have reduced to 213 whereas in the conventional

approach, 2126 comparisons where needed. It leads to 90% improvement in the

101

Table 6.21: Expected Number of Iterations for Random k using Case-base of Sizen = 500

x p E(T ) = 1/p0.86 0.92 1.0870.87 0.92 1.0870.88 0.76 1.3160.89 0.68 1.4700.90 0.18 5.556


x p E(T ) = 1/p0.88 0.94 1.0640.89 0.92 1.0870.90 0.92 1.0870.91 0.86 1.1620.92 0.52 1.923

computational performance. So, the empirical investigation presented in this section

advocates the theoretical justification of the clustered CBR approach. This approach

can be exploited in any general framework where CBR can be exploited.

6.6 Applying Randomized Approach to Improve

The Efficiency of Clustered CBR Approach

We have used three versions of the training case-bases of AFFA which contained 500,

2000 and 10000 cases respectively. Besides these three case-bases, a fourth case-base

also containing 2000 cases have been used as the test case-base.

6.6.1 Estimating Prior Knowledge

Prior knowledge can be incorporated using many different ways. In this work, we

have exploited the random sampling approach to estimate prior knowledge. For

102


x p E(T ) = 1/p0.88 0.94 1.0640.89 0.93 1.0790.90 0.92 1.0870.91 0.80 1.2430.92 0.69 1.450

Input: A case-base CB containing n cases, test case-base TCB,

Accuracy threshold d

Output: Expected number of iterations t

Method:


2. t := 0

3. While (currentAccuracy < d)

a. t := t+1

b. k := random(2, n/2)

c. CCB := cluster(CB, k)

d. For each cj TCB


ii. For each cp clusterj


2. wp := simp

3. solp := solution( cp )

iii. End For

iv. solj := solutionAlgorithm(sol,w)

v. update currentAccuracy

e. End For

4. End While

5. Output

Figure 6.17: Implementation to Compute Actual Number of Iterations

103

Table 6.24: Comparison of Expected and Observed Number of Iterations using Case-base of Size n = 500

Accuracy Threshold Observed Expected σ SE t-stat p-value0.86 1.0 1.087 0 0 - -0.87 1.4 1.087 0.69 0.22 1.41 0.190.88 2.2 1.316 1.62 0.51 1.73 0.120.89 2.1 1.470 1.10 0.35 1.81 0.100.90 7.6 5.556 6.74 2.13 0.96 0.36





104

this purpose, accuracy thresholds have been defined and frequency of k’s which can

yield at least the bin accuracy have been computed. We varied bins from 86% to

92% accuracies. For these case-bases, 50 random k′s have been generated and their

memberships to accuracy bins has been observed. This gives an approximation of

the prior knowledge.

6.6.2 Theoretical Estimate of Expected Number of Itera-tions

The proposed approach obeys geometric distribution due to its memoryless prop-

erty of trying random number of clusters. So, geometric distribution explains the

expected number of iterations. The expected value is not the most probable value

which may guarantee number of iterations of one experiment. It defines the expected

average number of iterations over some series of experiments. So, guessing the av-

erage number of iterations is a safe guess. As this number is an integer, so we take

the ceiling of the expected value. Expected values of number of iterations have been

shown in the Tables 6.21, 6.22 and 6.23 for both of the training case-bases.

6.6.3 Empirical Estimate of Number of Iterations

Figure 6.17 outlines the implemented estimator to estimate the observed number of

iterations which may yield a user-defined accuracy. This estimator is an implemen-

tation of geometric distribution which keeps on generating a new random k until we

get a success. An outcome is classified as success if the current k yields accuracy

more than the threshold when tested on the test cases. We repeated this experiment

10 times and took an average. This average is known as the observed average. Ob-

served averages over 10 runs for both of the case-bases have been shown in Tables

6.24, 6.25 and 6.26 respectively.

105

6.6.4 Validation of Results using The t-test

The t-test has been used to validate the proposed approach. Five different accuracy

bins have been considered for each of the case-bases. For each accuracy bin, 10

random experiments have been conducted and their observed mean of number of

iterations has been computed. In order to apply t-test, standard deviation (σ) has

been computed. Standard error (SE) is computed as:

SE =σ√n

n(6.17)

The t-statistic is computed as:

t-value =observed− expected

SE(6.18)

As there were total 10 runs for each of the experiments, so we take degree of

freedom (DF ) as n− 1 = 9. The p-value is calculated using the p-table for DF = 9.

Among ten experimental setups for both of the case-bases, nine of the experimental

setups show that the difference between the observed mean of number of iterations

and expected value of number of iterations is not significant. This significance test

substantiates the proposed approach.

6.7 Summary

This chapter presents the empirical results of the proposed approaches. Effectiveness

of CBR in decision making process of autonomic manager has been shown on exter-

nalization and internalization based applications. Clustered CBR approach in the

autonomic context and its utilization has been empirically investigated. Results show

improvement in efficiency as well as performance. In order to resolve the bottlenecks

of the clustered approach, randomized approach exhibits promising performance. Its

statistical and theoretical validation has been presented.

106

Chapter 7

Conclusion and Future ResearchDirections

7.1 Concluding Remarks on Current Research

In the current research work, a CBR-based approach has been presented for the

externalization and internalization architectural options of autonomic systems. A

CBR-based solution finder for autonomic systems CaseBasedAutonome (CBA) has

been proposed and discussed in implementation level details. An empirical investi-

gation has been conducted for exploring effect of different CBR parameters in exter-

nalization and internalization domains. Two case studies namely RUBiS and AFFA

have been used in this empirical study. The results show that the proposed frame-

work promises upto 98% accuracy in externalization architecture and 90% accuracy

in internalization architecture. The study also reveals that Manhattan distance and

Squared Chord distance perform as good as Euclidean distance in the externalization

case study. But Euclidean distance is the best similarity measure for the internaliza-

tion architecture across all cardinalities of nearest neighborhood and different adap-

tation schemes. This variation in performance is due to the nature of the datasets.

The study also discusses the effect of cardinality (Nn) of nearest neighborhood and

adaptation algorithms. No significant improvement in the performance of CBA has

been observed for the larger values of Nn. Smaller values of Nn show adequate perfor-

107

mance and have been used for confining the solution space of the monitored problem.

Adaptation mechanisms of CBR applied in this research have resulted in 18% and

6% performance improvement in externalization and internalization architectures

respectively.

Conventional CBR based systems present an efficiency bottleneck in terms of

computational cost, and especially for self-managing applications. A new clustered

CBR based framework for autonomic systems has been introduced in this research

work. This approach reduces the computational cost and retrieval time without ad-

versely affecting the overall performance of the system in terms of Accuracy, Recall

and Precision (ARP). In the proposed framework, when a new problem is presented

as a case, it is classified into one of the predetermined clusters that acts as a compar-

ison space for the present case rather than the complete case-base. The CBR cycle

is then applied to the selected cluster only, thus considerably reducing the number of

comparisons. This reduces the time complexity and enhances the case retrieval effi-

ciency. To show the effectiveness of the proposed approach, the proposed framework

has been applied to three experimental setups of AFFA. For clustering the case-base,

experiments with three different clustering algorithms have been conducted and com-

pared using their ARP performance and efficiency performance. At an optimal value

of k, Accuracy, Recall and Precision of the proposed approach can be achieved up to

94%, 87% and 85% respectively. Our results also show that the proposed approach

outperforms the conventional unclustered approach with an efficiency improvement

of up to 99%. The empirical investigation conducted in this research work reveals

that FarthestFirst clustering algorithm is the optimal option to be used as building

block of the case-base partitioning algorithm. The proposed approach has been com-

pared with the conventional CBR approach which has been used as the comparison

benchmark.

108

Deciding the optimal number of clusters for a given set of points is an open

problem. In literature, various approximations have been exploited to deal with

the problem. The problem domain focused in this research work is CBR based

autonomic systems where accuracy of the planned solution is critical. So, accuracy

based selection has been used to partition the case-base into the optimal number of

clusters. To avoid the brute force search, randomized approach has been proposed

to decide a near to optimal number of clusters. For random generation of kopt,

lower and upper bounds have been derived in order to maintain the performance

of the clustered CBR approach. Theoretical performance estimates of the proposed

approach has been compared with an empirical experimental setup on two case-

bases. Empirical result show that the proposed theoretical estimates are up to 90%

significantly same as that of empirical estimates.

7.2 Future Work

Current research work can be extended in various future directions. Some salient

directions have been highlighted below:

7.2.1 Service Oriented Architectures for Autonomic Systems

Software engineering literature is rich in terms of various architectural options to

build conventional software systems but very less efforts have been put forward for

generic architectures to build autonomic systems. There are various paradigms pro-

posed in literature to enable self-management capabilities in conventional software

systems but most of these paradigms are not generic in nature. There are three ma-

jor aspects of specificity which most of the existing autonomic architectures exhibit:

one is the specific self-management capability, second is specific approach to enable

that capability and third aspect is specific to one application domain.

109

Some architectures and design issues of autonomic systems have been highlighted

in literature from the software engineering perspective but implementation level

frameworks have not been widely discussed and presented. A possible future direction

would be to propose and implement a hierarchical service-based generic architecture

for enabling self-management capabilities in conventional applications. Such archi-

tecture should consist of a hierarchy of services. Each level in the hierarchy will be

representing the abstraction level of self-management effort. Highest level is the au-

tonomic manager which directly interfaces with the managed element. Second level

contains the local self-* services dedicated for different self-management capabilities.

Control is delegated to one of them through autonomic service integrator which is

part of the global autonomic manager. Third level of services constitutes intelligent

services (IS’s) which implement various diagnostic and planning algorithms. Various

IS’s are published by intelligent services manager (ISM). Intelligent Services Selec-

tor (ISS) is used to pick the relevant IS which seems appropriate for the particular

scenario being faced at that particular instance of time.

7.2.2 Solution Composition for Autonomic Systems

In CBR based autonomic managers, the current research reuses the existing solutions

in the form of configuration scripts. These scripts are executed onto the managed

element once the decision making process of CBR drills down to a specific script.

It would be an interesting research direction in future to investigate the potential

need of combining existing solution scripts in a certain way to solve the system

problems which might not be solvable by individual scripts. For such work, a solution

composition mechanism will need to be laid down like service composition techniques

in service oriented architectures.

110

7.2.3 Case-base Maintenance Strategies

The current research presented and exploited an innovative clustering based case-

base maintenance mechanism. However, in general, determining the optimal number

of partitions for a problem space is an open problem and the current approximation

may be improved. An interesting research direction would be estimate the utility

of the case-base knowledge once it has grown to certain limits. If a desired level of

knowledge utility has been achieved then some other machine learning techniques

can be exploited to calibrate a learnt model and then compare its utilization with

the conventional CBR approach.

7.2.4 Identification and Exploitation of Penalty Based Schemesfor Case-base Partitioning Problem

The current research presented clustered CBR approach which estimates optimal k

based on accuracy of that particular k versus other values. Each particular value of

k may be bringing some inherent complexity besides the accuracy. Such complexity

may be incorporated in the clustered CBR approach as penalty and then some revised

selection scheme of a specific value of k can be introduced.

7.2.5 Testing the CBR based Approach on Large Scale Testbeds

The current research has been validated on small scale applications. Implementing,

deploying and testing a large scale autonomic testbeds is one of the great research

challenges being faced by the autonomic computing community. One potential re-

search direction would be to design, implement and deploy a large scale testbed

which may serve the purpose of validation of different proposed self-management

algorithms. Such testbed may result in some public dataset repository as well.

111

Bibliography

[1] RUBiS: Rice university bidding system. http://rubis.objectweb.org/.

[2] Data mining in MATLAB (mahalanobis distance).

http://matlabdatamining.blogspot.com/2006/11/mahalanobis-distance.html,

2006.

[3] A. Aamodt and E. Plaza. Case-based reasoning: Foundational issues, method-

ological variations, and system approaches. In AI Communications, volume

7:1, pages 39–59. IOS Press, March 1994.

[4] S. Abdelwahed, N. Kandasamy, and S. Neema. A control-based framework for

self-managing distributed computing systems. In Proceedings of Workshop on

Self-healing Systems, pages 3–7. ACM Press, 2004.

[5] M. Amoui, M. Salehie, S. Miararab, and L. Tahvildari. Adaptive action se-

lection in autonomic software using reinforcement learning. In Proceedings

of Fourth International Conference on Autonomic and Autonomous Systems

(ICAS’08), pages 175–181. IEEE Computer Society, 2008.

[6] C. Anglano and S. Montani. Achieving self-healing in autonomic software sys-

tems: a case-based reasoning approach. In Proccedings of International Con-

ference on Self-Organization and Adaptation of Multi-agent and Grid Systems,

pages 267–281. IOS Press, 2005.

112

[7] J. Appavoo, K. Hui, C. A. N. Soules, R. W. Wisniewski, D. M. D. Silva,

O. Krieger, M. A. Auslander, D. J. Edelsohn, B.Gamsa, G. R. Ganger,

P. McKenney, M. Ostrowski, B. Rosenburg, M. Stumm, and J. Xenidis. En-

abling autonomic behavior in systems software with hot swapping. IBM Sys-

tems Journal, 42(1):60–76, 2003.

[8] N. Arshad, D. Heimbigner, and A. L. Wolf. Deployment and dynamic re-

configuration planning for distributed software systems. In Proceedings of

15th IEEE International Conference on Tools with Artificial Intelligence (IC-

TAI’03), 2003.

[9] R. Barrett, P. P. Maglio, E. Kandogan, and J. Bailey. Usable autonomic

computing systems: The administrator’s perspective. In Proceedings of 1st

International Conference on Autonomic Computing. IEEE Press, 2004.

[10] B. Bartsch-Sporl, M. Lenz, and A. Hubner. Case-based reasoning: Survey and

future directions. Lecture Notes In Computer Science, 1570, 1999.

[11] J. P. Bigus, D. A. Schlosnagle, J. R. Pilgrim, W. N. Mills, and Y. Diao. Able:

A toolkit for building multiagent autonomic systems. IBM Systems Journal,

41(3):350–371, 2002.

[12] R. Calinescu. Implementation of a generic autonomic framework. In Pro-

ceedings of International Conference on Autonomic and Autonomous Systems

(ICAS’08), pages 124–129. IEEE Computer Society, 2008.

[13] G. Candea, S. Kawamoto, Y. Fujiki, G. Friedman, and A. Fox. Microreboot -

a technique for cheap recovery. In Proceedings of 6th Symposium on Operating

Systems Desing and Implementation (OSDI). USENIX Association, 2004.

113

[14] M. Chen, A. X. Zheng, J. Lloyd, M. I. Jordan, and E. Brewer. Failure diagnosis

using decision trees. In Proceedings of International Conference on Autonomic

Computing (ICAC’04), pages 36–43. IEEE Computer Society, 2004.

[15] X. Chen and M. Simons. A component framework for dynamic reconfiguration

of distributed systems. Lecture Notes in Computer Science, Springer Verlag,

pages 82–96, 2002.

[16] D. M. Chess, C. C. Palmer, and S. R. White. Security in autonomic computing

environment. IBM Systems Journal, 42(1):107–118, 2003.

[17] C.-C. Chiu and C.-Y. Tsai. A weighted feature c-means clustering algorithm for

case indexing and retrieval in case-based reasoning. Lecture Notes in Computer

Science, Springer, 4570, 2007.

[18] P. Cunningham. CBR: Strengths and weaknesses. In Proceedings of 11th Int.

Conf. on Industrial and Engineering Applications of Artificial Intelligence and

Expert Systems, pages 517–523. Springer Verlag, 1998.

[19] A. de Campos, J. Bernardes, A. Garrido, J. M. de S J, and L. Pereira-Leite. Sis-

porto 2.0: A program for automated analysis of cardiotocograms. The Journal

of Maternal-Fetal Medicine, PubMed.gov, 9(5):311–318, 2000.

[20] Y. Diao, J. L. Hellerstein, S. Parekh, R. Griffith, G. Kaiser, and D. Phung.

Self-managing systems: A control theory foundation. In Proceedings of 12th In-

ternational Conference and Workshops on the Engineering of Computer-Based

Systems (ECBS’05). IEEE Computer Society, 2005.

114

[21] K. Dias, M. Ramacher, U. Shaft, V. Venkataramani, and G. Wood. Automatic

performance diagnosis and tuning in oracle. In Proccedings of the 2005 CIDR

Conference, 2005.

[22] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley

and Sons, Inc., 2001.

[23] G. Fouque and S. Matwin. CAESAR: a system for case based software reuse.

In Proceedings of 7th Knowledge-Based Software Engineering Conference. IEEE

Press, 1992.

[24] G. Fouque and S. Matwin. Compositional software reuse with case-based rea-

soning. In Proceedings of the 9th International Conference on Artificial Intel-

ligence for Applications, pages 128–134. IEEE Press, 1993.

[25] A. Frank and A. Asuncion. UCI machine learning repository, 2010.

[26] E. Frank and R. Bouckaert. Conditional density estimation with class prob-

ability estimators. In Proceedings of the 1st Asian Conference on Machine

Learning. Springer, 2009.

[27] N. Gandhi, J. L. Hellerstien, S. Parekh, and D. M. Tilbury. Managing the

performance of lotus notes: A control theoretic approach. In Proceedings of

the Computer Measurement Group, 2001.

[28] A. G. Ganek and T. A. Corbi. The dawning of the autonomic computing era.

IBM Systems Journal, 42(1):5–17, 2003.

[29] P. Gomes, F. C. Pereira, P. Paiva, N. Seco, P. Carreiro, J. L. Ferreira, and

C. Bento. Evaluation of case-based maintenance strategies in software design.

Lecture Notes in Artificial Intelligence, Springer-Verlag, 2689:186–200),, 2003.

115

[30] P. A. Gonzalez. Applying knowledge modelling and case-based reasoning to

software reuse. IEE Proceedings Software, 147(5):169–177, October 2000.

[31] A. Gounaris, C. Yfoulis, R. Sakellariou, and M. D. Dikaiakos. A control the-

oretical approach to self-optimizing block transfer in web service grids. ACM

Transactions on Autonomous and Adaptive Systems, 3(2), May 2008.

[32] K. Z. Haigh and J. R. Shewchuk. Geometric similarity metrics for case-based

reasoning. In Case-Based Reasoning: Working Notes from the AAAI-94 Work-

shop, pages 182–187. AAAI Press, 1994.

[33] J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan

Kaufmann Publishers, 2002.

[34] S. Hariri, B. Khargharia, H. Chen, J. Yang, Y. Zhang, M. Parashar, and H. Liu.

The autonomic computing paradigm. Cluster Computing, 9, 2006.

[35] S. Hassan, D. McSherry, and D. Bustard. Autonomic self healing and recovery

informed by environment knowledge. Artificial Intelligence Review, Springer,

2007.

[36] D. S. Hochbaum and D. B. Shmoys. A best possible heuristic for the k-center

problem. Mathematics of Operations Research, 10(2):180–184, May 1985.

[37] G. Holmes, A. Donkin, and I. H. Witten. Weka: A machine learning workbench.

In Proceedings of 2nd Australia and New Zealand Conference on Intelligent

Information Systems, 1994.

[38] T.-P. Hong and Y.-L. Liou. Case-based reasoning with feature clustering. In

Proceedings of 7th International Conference on Cognitive Informatics, pages

449–454. IEEE Press, 2008.

116

[39] A. C. Huang and P. Steenkiste. Building self-adapting services using service-

specific knowledge. In Proceeding of IEEE High Performance Distributed Com-

puting (HPDC), 2005.

[40] J. Jann, L. M. Browning, and R. S. Burugula. Dynamic reconfiguration: Basic

building blocks for autonomic computing on IBM pSeries servers. IBM Systems

Journal, 42(1):29–37, 2003.

[41] J. Jarmulak, S. Craw, and R. Rowe. Genetic algorithms to optimize cbr re-

trieval. Lecture Notes in Artificial Intelligence, Springer, 1898:136–147, 2000.

[42] N. Kandasamy, S. Abdelwahed, and J. P. Hayes. Self-optimization in computer

systems via online control: Application to power management. In Proceedings

of the International Conference on Autonomic Computing. IEEE Press, 2004.

[43] J. O. Kephart and D. M. Chess. The vision of autonomic computing. IEEE

Computer, pages 41–50, January 2003.

[44] J. O. Kephart and W. E. Walsh. An artificial intelligence perspective on auto-

nomic computing policies. In Proceedings of the 5th International Workshop on

Policies for Distributed Systems and Networks. IEEE Computer Society, 2004.

[45] M. J. Khan, M. M. Awais, and S. Shamail. Achieving self-configuration capa-

bility in autonomic systems using case-based reasoning with a new similarity

measure. Communications in Computer and Information Science, Springer

Berlin Heidelberg, 2:97–106, August 2007.

[46] M. J. Khan, M. M. Awais, and S. Shamail. Enabling self-configuration in

autonomic systems using case-based reasoning with improved efficiency. In

117

Proceedings of 4th International Conference on Autonomic and Autonomous

Systems (ICAS’08), pages 112–117. IEEE Computer Society, March 2008.

[47] M. J. Khan, M. M. Awais, and S. Shamail. Improving efficiency of self-

configurable autonomic systems using clustered cbr approach. IEICE Trans-

actions on Information and Systems, E93-D(11), November 2010.

[48] M. J. Khan, M. M. Awais, and S. Shamail. A randomized partitioning approach

for cbr based autonomic systems to improve retrieval performance. Accepted

in The Computer Journal, Oxford University Press, 2012.

[49] M. J. Khan, M. M. Awais, S. Shamail, and I. Awan. An empirical study of

modeling self-management capabilities in autonomic systems using case-based

reasoning. Simulation Modelling Practice and Theory, Elsevier, 19(10):2256–

2275, 2011.

[50] M. J. Khan, M. M. Awais, S. Shamail, and T. Hussain. Comparative study of

various artificial intelligence techniques to predict software quality. In Proceed-

ings of 10th IEEE International MultiTopic Conference, pages 173–177. IEEE

Press, 2006.

[51] M. J. Khan, S. Shamail, and M. M. Awais. Self-configuration in autonomic

systems using clustered cbr. In Proceedings of 5th International Conference on

Autonomic Computing (ICAC’08). IEEE Computer Society, 2008.

[52] B. Khargharia, S. Hariri, M. Parashar, L. Ntaimo, and B. uk Kim. vgrid: A

framework for building autonomic applications. In Proceedings of the Interna-

tional Workshop on Challenges of Large Applications in Distributed Environ-

ments. IEEE Computer Society, 2003.

118

[53] T. M. Khoshgoftaar, K. Ganesan, E. B. Allen, F. D. Ross, F. D. Ross, N. Goel,

and A. Nandi. Predicting fault-prone modules with case-based reasoning. In

Proceedings 8th International Symposium On Software Reliability Engineering,

pages 27–35. IEEE Press, 1997.

[54] T. M. Khoshgoftaar, N. Seliya, and N. Sundaresh. An empirical study of

predicting software faults with case-based reasoning. Software Quality Journal,

Springer Verlag, pages 85–111, 2006.

[55] K.-S. Kim and I. Han. The cluster-indexing method for case-based reason-

ing using self-organizing maps and learning vector quantization for bond rat-

ing cases. Expert Systems with Applications, Elsevier, 21(3):147–156, October

2001.

[56] R. Kohavi. A study of cross-validation and bootstrap for accuracy estimation

and model selection. In Proceedings of the International Joint Conference on

Artificial Intelligence (IJCAI). Morgan Kaufmann, 1995.

[57] H. Koshutanski and F. Massacci. Interactive access control for autonomic

systems: From theory to implementation. ACM Transactions on Autonomous

and Adaptive Systems, 3(3), August 2008.

[58] D. B. Leake. Case-Based Reasoning, Experiences, Lessons and Future Direc-

tions. AAAI Press / MIT Press, 1996.

[59] D. B. Leake and D. C. Wilson. Remembering why to remember: Performance-

guided case-base maintenance. Lecture Notes in Artificial Intelligence,

Springer-Verlag, 1898:161–172),, 2000.

119

[60] Z. Li and M. Parashar. Rudder: An agent-based infrastructure for autonomic

composition of grid applications. Multiagent and Grid System An Interna-

tional Journal, IOS Press, 1(4):183–195, 2005.

[61] H. Liu and M. Parashar. A component based programming framework for

autonomic applications. In Proceedings of the International Conference on

Autonomic Computing, 2004.

[62] H. Liu and M. Parashar. Accord: A programming framework for autonomic

applications. IEEE Transactions on Systems, Man and Cybernetics - Part C:

Applications and Reviews, 36(3):341–352, May 2006.

[63] E. Mancini, U. Villano, M. Rak, and R. Torella. A simulation-based framework

for autonomic web services. In Proceedings of the 11th International Conference

on Parallel and Distributed Systems. IEEE Computer Society, 2005.

[64] R. L. D. Mantaras, D. Mcsherry, D. Bridge, D. Leake, B. Smith, S. Craw,

B. Faltings, M. L. Maher, M. T. Cox, K. Forbus, M. Keane, A. Aamodt, and

I. Watson. Retrieval, reuse, revision, and retention in case-based reasoning.

The Knowledge Engineering Review, 20(3), 2005.

[65] V. Markl, G. M. Lohman, and V. Raman. LEO: An autonomic query optimizer

for DB2. IBM Systems Journal, 42(1):98–106, 2003.

[66] K. Mills, S. Rose, S. Quirolgico, M. Britton, and C. Tan. An autonomic failure-

detection algorithm. ACM SIGSOFT Software Engineering Notes, 29(1):79–83,

January 2004.

120

[67] N. H. Minsky. On conditions for self-healing in distributed software systems.

In Proccedings of The International Autonomic Computing Workshop. IEEE

Press, 2003.

[68] S. Montani and C. Anglano. Case-based reasoning for autnomous service failure

diagnosis and remediation in software systems. Lecture Notes in Artificial

Intelligence, Springer, 4106:489–503, 2006.

[69] S. Montani and C. Anglano. Achieving self-healing in service delivery software

systems by means of case-based reasoning. Applied Intelligence, Springer, 2007.

[70] P. Myllymaki and H. Tirri. Massively parallel case-based reasoning with prob-

abilistic similarity metrics. Lectures Notes in Artificial Intelligence, Springer

Verlag, 837:144–154, 1994.

[71] S. Ontanon and E. Plaza. Collaborative case retention strategies for cbr agents.

Lecture Notes in Artificial Intelligence, Springer-Verlag, 2689:392–406),, 2003.

[72] M. Parashar and S. Hariri. Autonomic computing: An overview. Lecture Notes

in Computer Science, Springer, 3566:247–259, 2005.

[73] F. Qin, J. Tucek, J. Sundaresan, and Y. Zhou. Rx: Treating bugs as allergies

- a safe method to survive software failures. In Proceedings of 20th ACM

Symposium on Operating Systems Principles. ACM Press, 2005.

[74] K. Racine and Q. Yang. Maintaining unstructured case-bases. Lecture Notes

in Artificial Intelligence, Springer-Verlag, 1266:553–564),, 1997.

[75] P. Reynolds, C. Killian, and J. L. Wiener. Pip: Detecting the unexpected in

distributed systems. In Proceedings of 3rd Symposium on Network Systems

Design and Implementation, 2006.

121

[76] I. Rish, M. Brodie, and N. Odintsova. Real time problem determination in dis-

tributed systems using active probing. In Network Operations and Management

Symposium, pages 133–146. IEEE Press, 2004.

[77] C. Roblee, V. Berk, and G. Cybenko. Implementing large-scale autonomic

server monitoring using process query systems. In Proceedings of 2nd Interna-

tional Conference on Autonomic Computing. IEEE Press, 2005.

[78] M. Salehie and L. Tahvildari. Autonomic computing: Emerging trends and

open problems. ACM SIGSOFT Software Engineering Notes, 30(4), July 2005.

[79] M. Sebag and M. Schoenauer. A rule-based similarity measure. Lecture Notes

in Computer Science, 837:119–131, 1993.

[80] B. Smyth and M. T. Keane. Remembering to forget: A competence-preserving

case deletion policy for case-based reasoning systems. In Proceedings of In-

ternational Joint Conference on Artificial Intelligence, pages 377–383. AAAI,

1995.

[81] B. Smyth and E. McKenna. Modeling the competence of case-bases. Lecture

Notes in Artificial Intelligence, Springer-Verlag, 1488:208–220),, 1998.

[82] B. Smyth and E. McKenna. Building compact competent case-bases. Lecture

Notes in Artificial Intelligence, Springer-Verlag, 1650, 1999.

[83] M. Sokolova. Assessing invariance properties of evaluation measures. In Pro-

ceedings of NIPS’06 Workshop on Testing Deployable Learning and Decision

Systems, 2006.

[84] L. Spalazzi. A survey on case-based planning. Artificial Intelligence Review,

16:3–36, 2001.

122

[85] N. B. Tarek Abdelzaher, Kang G. Shin. Performance guarantees for web server

end-systems: A control-theoretical approach. IEEE Transactions on Parallel

and Distributed Systems, 13(1):80–96, 2001.

[86] C. Tautz and K.-D. Althoff. Using case-based reasoning for reusing software

knowledge. Lecture Notes In Computer Science, 1266:156–165, 1997.

[87] G. Tesauro, D. M. Chess, W. E. Walsh, and R. Das. A multi-agent systems

approach to autonomic computing. Autonomous Agents and Multi-Agent Sys-

tems, 2004.

[88] H. J. Wang, J. C. Platt, Y. Chen, R. Zhang, and Y.-M. Wang. Automatic

misconfiguration troubleshooting with peerpressure. In Proceedings of the 6th

Symposium on Opearting Systems Design and Implementation. IEEE Press,

2004.

[89] R. Want, T. Pering, and D. Tennenhouse. Comparing autonomic and proactive

computing. IBM Systems Journal, 42(1):129–135, 2003.

[90] R. D. Was and D. L. Neal. Weka machine learning project: Cow culling.

Technical report, The University of Waikato, Computer Science Department,

Hamilton, New Zealand, 1994.

[91] I. Watson. Applying Case-Based Reasoning: Techniques for Enterprise Sys-

tems. Morgan Kaufmann Publishers, 1997.

[92] I. Watson. A case-based reasoning application for engineering sales support

using introspective reasoning. AAAI/IAAI, pages 1054–1059, 2000.

[93] I. Watson and F. Marir. Case-based reasoning: A review. The Knowledge

Engineering Review, 9(4):327–354, 1994.

123

[94] C. G. Weng and J. Poon. A new evaluation measure for imbalanced datasets.

In Proceedings of 7th Australasian Data Mining Conference, pages 27–32. Aus-

tralian Computer Society, 2008.

[95] D. Wettschereck and D. Aha. Weighting features. In Proceedings of Interna-

tional Conference on Case-Based Reasoning, pages 347–358, 1995.

[96] J. Wildstorm, P. Stone, E. Witchel, R. J. Mooney, and M. Dahlin. Towards

self-configuring hardware for distributed computer systems. In Proceedings

of International Conference on Autonomic Computing, pages 241–249. IEEE

Press, 2005.

[97] I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and

techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005.

[98] N. Xiong. Learning fuzzy rules for similarity assessment in case-based reason-

ing. Expert Systems and Applications, Elsevier, 38(9):10780–10786, 2011.

[99] N. Xiong and P. Funk. Building similarity metrics reflecting utility in case-

based reasoning. Intelligent and Fuzzy Systems, IOS Press, 17(4):407–416,

2006.

[100] N. Xiong and P. Funk. Combined feature selection and similarity modelling in

case-based reasoning using hierarchical memetic algorithm. In Proceedings of

IEEE World Congress on Computational Intelligence, pages 1537–1542, 2010.

[101] Z. Yu, J. J. P. Tsai, and T. Weigert. An adaptive automatically tuning intrusion

detection system. ACM Transactions on Autonomous and Adaptive Systems,

3(3), August 2008.

124

[102] J. Zhu and Q. Yang. Remembering to add: A competence-preserving case

addition policy for case-base maintenance. In Proceedings of International

Joint Conference on Artificial Intelligence, pages 234–241. Morgan Kaufmann

Publishers, 1999.

125

achieving self-management capabilities in autonomic systems...

Documents