[ieee 2010 8th ieee international conference on pervasive computing and communications workshops...

On Assessing Self-Adaptive Systems

Stefan Taranu, Jens Tiemann

Fraunhofer FOKUS Kaiserin-Augusta-Allee 31

10589 Berlin, Germany +49-30-3463-7341

{ stefan-liviu.taranu I jens.tiemann }@fokus.fraunhofer.de

Abstract- Self-managed communication systems are using self-adaptive algorithms to control the services they offer. It is

assumed that autonomy of system operation helps to reduce

operational expenses, to increase performance or even to allow

the emergence of new functionalities. The comparison and the understanding of operation (jointly termed here "assessment") of different self-managed systems require their complete

description. In addition to traditional performance metrics

(which these new systems still need to satisfy) this comparison

must include new characteristics based on their adaptive nature. Due to the huge amount and the variety of operational

conditions assumed for these systems the focus should be on the abilities of systems to solve problems rather than on their exhaustive testing. We describe this ability to solve a problem

as a new metric, We will also present the process to determine

the metric based on a series of performance measurements. We demonstrate that the metric helps appraise the ability of a

system to cope with new and never experienced situations and

to build confidence in autonomous system operation.

Keywords: Assessment, Benchmarking, Context, Evaluation, Self-Adaptation, Self-Management

I. INTRODUCTION

In autonomic communication self-adaptive mechanisms are used for self-management of services and devices. The adaptation process is based on information about user requirements, available resources and environmental conditions. The self-adaptation process takes into account all these requirements and as a result the system better matches its requirements - functional, such as performance, or nonfunctional.

The study of elementary impact of context information on the evaluation of self-adaptive communication systems is our starting point in the assessment of context-aware systems (CAS). Isolated assessment of a system or of its selfmanagement (also termed controller) enables us to evaluate traditional performance characteristics as well as new metrics that need to be introduced for describing or characterizing the self-adaptation property of context-aware systems.

Unlike the evaluation of traditional systems, the evaluation of self-adaptive systems needs to be refined in the two dimensions. First, to cope with the variety and amount of information that can influence the control and decision process, the controller of the evaluated system needs to be isolated from the surrounding infrastructure. Second, a new

978-1-4244-5328-3/10/$26.00 ©2010 IEEE 214

metric is needed to describe the characteristics of selfadaption - characteristics that are beyond well-established performance metrics.

We propose a metric to describe the ability of a system to solve domain specific problems and self-adapt, even to upcoming situations. The goal is to estimate these characteristics as the ability to solve domain specific problems (hereafter ability) related to adaptability of such a system. This includes untested situations that a system has never experienced before. In self-managed networking, this ability translates into a set of tasks such as self-configuration, self-healing, self-optimization and self-protection. This new metric must be added to traditional performance metrics that can be found in device data sheets, where it can help estimate the performance of the system in new and complex situations. Because the proposed metric is intrinsic, it cannot be measured directly and requires the assessment process we propose below.

For the assessment of CAS we follow the black-box approach known in conventional testing, using context information as our main stimulus. From the observed system behavior and from the measured performance we infer the new metric by applying an assessment process. This process is based on an ordered sequence of benchmarks - the difficulty of a problem corresponding to a benchmark if the associated problem is correctly solved indicates the value of the ability metric.

The rest of the paper is organized as follows. In the next section we give an overview of available approaches for the evaluation of self-managed systems, with an emphasis on test configurations and metrics. In section 3 we outline our proposed evaluation framework. In section 4 we introduce an example to illustrate our approach. We use the "always best connected" (ABC) scenario and evaluate different algorithms for the purposes of illustration. This paper concludes with a discussion of results and future research.

II. EVALUATION OF SELF-ADAPTIVE SYSTEMS

Self-managed systems are expected to adapt to complex situations. To identify which, multiple environmental dimensions and user requirements must be measured. These CAS can be described by their main control loop.

For this work we use the following definitions to describe and limit the scope of systems under investigation [1]: Selfadapting systems are systems that detect relevant changes in relevant contexts and adapt their behavior to those changes

in a timely manner, while cognitive systems are systems that learn to adapt, which means that they are able to discover new relevant contexts.

There is a multitude of testing methodologies available, such as conformance and performance testing. The former is used to show the correctness of the system operation with respect to specification and implementation, while the latter shows the characteristics of a system operation where the results depend on operating conditions as well. Other evaluation methodologies might be portability testing or sanity testing. Some of these methodologies can be applied to CAS.

Assessment allows us to show the changes of the system operation and internal abilities. It expands on traditional testing procedures, whose results depend not only on specification, implementation and situation but also on the internal, complex state of the system (e.g. knowledge) that makes this possible.

A complex self-managed system is expected to include the capabilities of learning or planning, but this is not a prerequisite for the assessment method we present.

The goal of this work is to assess ability, so that system's behavior over time can be evaluated from the viewpoint of process correctness [4].

Our objective is not to evaluate the amount of knowledge (or to detect its presence) within a system. This is because, one, knowledge is based on past situations to which the system was exposed and with which there is no guarantee of situation repetitiveness, and because, two, the system might have been influenced by situations atypical for normal operation.

A. Related work

The first approaches to the evaluation of selfmanagement features originate from autonomic computing. Indeed, there are some interesting relations between the evaluation of autonomic communications and the evaluation of self-management features. First, both deploy the performance evaluation, with benchmarking for computing and with performance measurement for communications. Second, since the usage of adaptive distributed algorithms is better known in communications than in computing, the tasks of network self-management are better defined and understood.

Brown et. al. [3] describe a first approach on how to benchmark autonomic capabilities. The goal is to build a benchmark suite for the self-CHOP features (Configuration, Healing, Optimization, Protection) of a system in order to influence the progress of developments in the area of selfmanaging computing systems, and to allow for cross-system comparison. The suggested method includes the setup of a benchmark environment and the generation of a synthetic workload with an additional injection of changes. Metrics are related to system response, such as levels of responsiveness, quality, impact and cost, and are presented in a scorecard. The challenges identified by this paper lie in the injection of changes, e.g. repeatability and representativeness. Our Assessment Framework includes answers to these challenges: we identified system interfaces for

215

communication systems to facilitate the assessment process and to make the tests repeatable. Additionally, we argue that repeated runs of a single test or of an entire benchmark are needed to address decisions based on imprecise and possibly incomplete information in dynamic environments.

The FP7 project SOCRATES derives partial metrics important for operation and self-management of (cellular) networks: performance, coverage, capacity, business aspects (e.g., CAPEX and OPEX) and, finally, additional algorithmspecific metrics (e.g., convergence time and robustness) [2]. The goal is the assessment of gains based on the utilization of new algorithms. The metrics are grouped according to their domains and used in a benchmark. For the final result -that is, to arrive at the integrated assessment metric - the groups of the above partial metrics are weighted according to their importance. So, for instance, importance might be weighted differently for different business models. Since we are working with similar partial metrics as explained below, the approach of grouping and weighting of metrics is applicable within the calculation and presentation of our Assessment Framework.

In our approach we focus neither on a single performance metric nor on a group of them. We want to rate the overall system from the external viewpoint according to the system's overall "level of autonomicity" [5], ensuring that a user of such a system is aware of the level of complexity (of management tasks) the system can run on its own. Because of the high number of possible application areas for selfmanaged systems and the absence of a theoretical apparatus, we use performance metrics in our Assessment Framework as indicators of ability.

III. ASSESSMENT PROCESS

In the following we highlight the components and the interfaces for the assessment process that we later use in the CAS example. We also define available contexts and the associated management of context information.

Figure 1. Assessment Components and System Interfaces [6]

The System under Test (SuT) represents a system, subsystem or an algorithm to be evaluated. The overall system is expected to be controlled by policies. The algorithmic core will be based on a cognitive loop (not shown here) and on the internal, complex system state (shown as knowledge). To evaluate the system in a black-box fashion and to repeat the tests we need to be able to control this part of the system.

In our view, the backbone of future autonomic or selfmanaged systems is their knowledge plane / internal context management, which offers a flexible platform to exchange required information with related systems and the environment.

The Foreground Testers (FT) stimulate the SuT as they would a real environment or counterpart, e.g. as they would a peer system. In most cases, local decisions made within the SuT will influence other systems or the environment in general. This local feedback will be absorbed by the FT and reflected in a new stimulus.

The Background testers (BT) stimulate the SuT without interacting with it. In our approach this interaction includes basic information from the environment that cannot be influenced by the SuT, such as environmental temperature, geo-Iocation, and traffic situation.

Note that we are following the black-box testing approach for system evaluation, so we are not able to access internal structures of the SuT. This makes the following interfaces necessary in order to capture certain states and behaviors of the system:

Ac - interface for context handling (helps isolate the appropriate part of the SuT for evaluation)

Ap - interface for Policies' handling (to be implemented only if policies are dynamically injected into the SuT)

AK - interface for knowledge handling (enables system state to be reset to the initial state, the current state to be captured, the previous state to be enforced or tests to be repeated, the last being a fundamental requirement for any test procedure.)

Our assessment process [7] has been developed in the FP7 project E3 and can be summarized as follows: The process starts with the isolation of the SuT from the environment. This step requires that all the necessary interfaces between the SuT and the components be implemented.

The creation of benchmarks starts with the selection of the metrics that describe the performance of the algorithm. In our case the metrics are selected from performance metrics. This is the starting point for identifying ability. We then define a minimum threshold on the selected performance metric. When reached by the SuT in a series of assessment tests this value will signal that the algorithm no longer has the desired performance.

Next, the assessment process describes and implements a number of situations (i.e., problems to be solved). These artificial situations are actually emulated by the context used to stimulate the algorithm. The goal is to have a complete coverage of the situations, from "easy to solve" to "hard to solve".

These situations, which represent typical operational problems for a system or function, are ordered incrementally by level of difficulty and placed on a benchmark scale. In a first approach, experts create the situations and rate based on their expected difficulty. The assessment process "translates" the system's abilities to solve these benchmark problems into an overall rating of the system.

Because of the changeable and probabilistic nature of self-adaptation, the benchmarks are run multiple times. If we assume that an algorithm is learning, it is important that too much information not be obtained from a single run. If the algorithm passes (i.e., finds a correct solution) in more than 50% percent of runs, then the algorithm is assessed as being able to solve that specific problem of that specific hardness.

216

The tester monitors the system behavior while running the benchmarks. The response of the system is monitored based on a key performance indicator (KPI) indicated by the benchmark.

We propose a new class of metrics characterizing the system's ability. This metric contributes directly to users' trust in context-aware autonomous algorithms. In algorithms with a high ability, systems that host them have a high probability of handling unexpected situations to which the problem solved by the algorithm belongs.

IV. EXAMPLE

To illustrate our approach we provide an example demonstrating how a particular sample metric of the new class can be computed for a particular problem-solving algorithm. Our focus is on the creation of benchmarks and the difficulty of the presented situation.

A. Scenario

We have chosen the "Always Best Connected" (ABC) problem in the wireless network domain as our example. The generic scenario pictured in Figure 2 could be described as follows. Consider a mobile phone user moving within an area covered by different networks while using different services with different requirements. The algorithm, a component of the user's mobile device, has to decide based on available contexts which network should it attach to. Each available network topology in combination with the used service(s) constitutes a new situation for the algorithm to deal with.

,:�� j' Fu:�on "UMTS. A .: . .

/ '� '-- ----oi:"'---'" :" . ... · · . . .::1. ,c . .'. A��o��aid ' Network ..... . ........... 'WLANfree" -----... --------------/

--.. /

... ' ... "-' ......

...... . ...... .

Figure 2. Always Best Connected Scenario and Situation

We want to determine to what extent we can trust an algorithm, and by "trust an algorithm" we mean to what extent the algorithm provides good solutions in unexpected or unusual situations. Another way to put it: what is the ability of the algorithm and what is its expected performance using the described assessment process?

B. Assumptions

In the real world, the scenario and the algorithm or implemented the system are given. We start our assessment work with the creation of the benchmark by understanding the needed context. Furthermore we consider:

Services - a list of all the possible services. In our example, Services = {Mail, Browsing, Audio, Video} is a set

ordered by the required QoS: • Mail (M) - it requires only connectivity • Web (W) - is more interactive and can require

higher bandwidth • Audio (A) - is a real-time service that requires low

bandwidth • Streaming Video (V) - is also a real-time service

that requires high bandwidth Networks - a list of all possible networks types. In our

example, Networks = {WlA.N F , WlA.N P ,V} where:

• V - offers paid service with high coverage

• WlA.NP - offers paid service with lower coverage

• WlA.NF - offers free services with lower coverage

NetworksWithQoS - the Cartesian product between the

Networks set and a set of QoS levels. In our example, the QoS levels are normalized and the set is defined as: Networks X {GoodQos, BadQos}. This is a complementary set;

a network with a good QoS is considered a totally different network from the same network with a bad QoS.

SubSituation - a tuple (S, {net I nete NetworksWithQos}) ,

where S e Services. This is represented shorter as:

(A,WlA.N: + VG )

where A - represents the Audio service, WlA.N: + V G -

represents a list composed of two networks: Wireless LAN Free with Bad QoS and UMTS with Good Qos

Hypothetically, we consider SituationSet =

{SubSitlSubSit is a SubSituation} - the list of all possible sub

situations. In this case, we consider Situation - an ordered subset of Sub-situations ordered by the time of their occurrence. A situation is represented as follows:

Subsituationl � Subsituation2 � ... � Subsituationn

The relation between context types in our example and to some extent the "experience" from a developer of a self-x algorithm is defined in Table 1:

TABLE I. TABLE 1 INITIAL INTERNAL KNOWLEDGE

Network Price Available services

WlA.NF 0 �il ,web, voice, video

WlA.NP 5 �ail ,web, voice,

V 10 �ail ,web, voice, video

C. Example Algorithms

We now demonstrate how a self-adaptive algorithm resolving the ABC problem can be evaluated. For purposes of illustration we designed some ad hoc algorithms and compared them. For simplicity all our algorithms are based on a generic table structure. The left half of the table includes all the inputs, which can be seen, basically, as the context. In our case the context is the application used and the set of

217

available networks. The right half of the table indicates the mobile device's configuration. The configuration is the result of solving the ABC problem and the monitored KPI values, which we interpret as a network's feedback for the performed solution.

.. -----------------. : Config. : ,._-' ..... \

._-_ ..... .. L�e��k.j

App.

.. .

audio .. .

audio I video

.. .

audio

Available Networks UMTS WLANpaid WLANfree

X ./ ./

./ ./

./

Selected Network

WLANfree

WLANpaid WLANpaid

WLANpaid

/ ..... Experi OoS

++

- -

I

Figure 3. Illustration of Algorithm Behavior

enced

The algorithms (SuTs) that will be assessed can be briefly described as follows:

1. Always UMTS algorithm (UA) - ignores any context and selects always UMTS, which is assumed to have ubiquitous network coverage.

2. Self X algorithm (SA) - uses a read-only table shown in Figure 3 (pre-defined or pre-loaded knowledge) to select the pre-defined network when the pre-defined context is detected. The AK was implemented in such a way so that the knowledge is saved from, and loaded in, a table format.

3. Cognitive algorithm (CA) - has the same table as the SA, however the CA can re-write values in the table's cells. This algorithm is able to learn by modifying its internal knowledge (i.e. the table).

D. Benchmarks

For the example described in section A we defined the following 4 situations:

• Situation 1 - the user is alternatively using audio and video services and is moving in an area with different types of network coverage. The situation can be defined as:

sitl = (A, WlA.N� ) � (A,V G ) � (A, WlA.N� +V G)

� (A,VG ) � (V,WlA.N� +VG ) � (V'VG) � (V,WlA.N� +VG)

• Situation 2 - The user is using audio and video services and is either moving or not in an area with the same network configuration. The situation is defined as:

Sit2= (V, WLAN� +V G) � (A, WLANS +V G) � (v, WLANS +V G)

• Situation 3 - a situation similar to sitl but with a network configuration that includes networks with which the user experiences bad QoS. The situation is defined as:

Sit3=(A, WLA"-' +WLA�) -7(A,WLA"-' +WLAI{; +VG) -7(A,uG)

-7 (A, WLA"" +WLAI{; +UG )-7(v,WLA"" +UG) -7(v,WLAI{; +UG)

Situation 4 - the user is using only one service but in an area populated by many networks. The situation is defined as:

Sif4 = (V,WLAN� +UG + WLANr;)

These situations are identified by the BT and Ff from Section III and are transported to the SuT using the Ff and BT, implemented in CCDS [8] through the Ac interface.

E. Evaluation

A number of situations are placed on a benchmark scale and ordered according to their complexity.

The device should adapt its decisions based on the available context. Given the current SubStuation as the context, which consists of the service used and the available networks, the algorithm should choose the network that has a good QoS and the lowest cost. The cost of the sub-situation will be in this case the cost of the service in the chosen network. If the algorithm does not make the right decision, the cost of the sub-situation will be a penalty cost. After all the costs have been computed, the performance is calculated for a given set of situations according to the following definition:

perj(sit) = 1

L CSubSit subSilE Situations

subSitE Situation Next, for each situation the assessed algorithm runs 10

times. That is to say, the same network configuration and service are presented to the algorithm 10 times under variances of background context. It is obvious that if an algorithm is not able to achieve problem-solving stability for the current hardness under slight variances in background context it will not exhibit better abilities for harder problems. For each situation a minimum level of performance is tolerated. If this threshold of performance is exceeded at the end of a run then the algorithm is considered to pass the run. If the algorithm passed more than 50% of the runs (in our case 5 runs) for the current situation, then the algorithm is considered to be able to resolve the problem presented in the situation and, subsequently, all the situations with lower complexity. Otherwise, if the algorithm passed less than 50% of the runs then it is considered that it is not able to provide a good solution in this situation or any other situation with higher complexity than the current one. In this case, the assessment process terminates. Ability is given by the number of passed tests divided by the total number of tests.

One problem that remains is the ordering of situations based on complexity. We will address this issue in the next section.

F. Computing complexity

In this section we will discuss a way of computing the complexity of a situation. If a new situation is created, its complexity can be computed and there will be no need to run the above procedure again. The new situation will be placed on the benchmarking scale according to its complexity and the performance will be computed by calculating the value of the performance function at the complexity point. For

218

instance, in Figure 4, if a new situation is created and the complexity of the situation is determined to be 2.1, the performance of each of the algorithms can be easily seen on the graph: for UA is -32%, for SA is -45%, etc.

100,00

I 80,00 -

/ g 60,00 �

/ 8 � .� --� 0

-� 't: � 40,00

l 20,00 V- -

0,00 1,8 2 2,3 3

Complexity

Figure 4. Algorithms' Performances

We assumed that performance and difficulty were complementary. That is to say, if in certain situations the performance of the algorithms tends to be high this will lead to a low degree of difficulty for that particular situation.

First, we computed the achievable performance of the algorithms for each sub-situation as follows:

for each sub-situation

for each algorithm run the algorithm in sub-situation

perf[i] � compute the performance

sub-situation-perf � avg(perf[i])

Using this algorithm, the general performance for a subsituation was computed Then we considered the difficulty'

100 90 -80

i!' 70 f:- f:- �--'

'3 60

ru (,) 50 !E 40 � I� � c

30 20

.• �� U I� 10 0 "

1 4 7 1013 16 1922 25 2 831 34 37 4043 46 4952 555861 #Situation

Figure 5. Difficulty of the VIDEO-Sub-situations

dif: Situations-? 9i, dif(sub _ situation) = 100- perf(sub _ situation)

Figure 5 shows a graph with the difficulties of the

VIDEO-Sub-situations (i.e., the tuple (v, *) ) U sing the difficulties of the sub-situations, we could

calculate the general difficulty of the situations sit! to sit4:

L dif (sub _ situation )

gendif (sit) = sub - situation e sit , sit c Situations

card (sit) In the given example we have:

gendif(sitl) = 45.56, gendif(sit2) = 46.67, gendif(sit3) = 64.77

gendif(sit4) = 13.34

The situations can now be ordered by their complexity on the benchmark scale.

V. CONCLUSION AND FURTHER WORK

We presented a new approach for an advanced, abstract characterization of self-adaptive systems, especially in the area of self-managed networks and communication devices. In assuming that well-known performance metrics are no longer sufficient for describing all aspects of self-managed networks, devices and their functions, we have proposed a class of metrics to describe the ability of those to solve domain-specific problems. Because the metric is hidden and therefore cannot be measured directly, we introduced in our assessment process the capabilities of inferring ability from observed system behavior and from the system's measured performance in different situations. The assessment process is based on a series of benchmarks - sequences of situations that are ordered as increasingly difficult to solve - and on the system's ability to solve a specific benchmark problem. This ability represents the system's characteristic even for situations that it has not faced. In this way, our metric helps to build confidence in the operation of the autonomous and self-adaptive system.

While self-adaptation and control theory are well-known research fields, the application of these paradigms to the selfmanagement of computer and communication systems is still a hot topic, especially in industrial research and development. The absence of a complete and well-founded system theory in this area led us to develop a pragmatic, partially analytic approach for determining the missing description of a system characteristic. While intended for the test and evaluation of communication systems (accompanying conformance test and performance measurements), the developed method can also be used by developers of algorithms (inside systems or in simulations) and can contribute to emerging system theory for selfadaptive systems.

With the presented example we were able to illustrate our approach and to highlight some room for improvement. In our approach we transferred the attribution of the ability of a system to the attribution of the difficulty of a benchmark. While we assume that this difficulty can be determined or measured more easily than the system's ability itself (e.g. by expert opinion or by trials with several implementations), more work is needed to derive this value automatically, especially considering the wide varieties of possible context. In this paper we presented an approach to calculate the difficulty of a situation starting from a performance metric. A related task is to ensure complete coverage of the benchmark scale - presenting a number of benchmarks ranging from "very easy to solve" to "incredibly hard" or "theoretical maximum bound on hardness". While in some cases this is easily achieved, usually it is very hard to reach a high level of coverage. We believe that our proposed assessment process will make the input context the starting point in reaching this target. Another task is to determine

219

how many runs per situation are adequate. One possible approach is to run the algorithm until the observed system behavior becomes stable. However, more investigation is needed on the influence of the number of runs during a single benchmark. Finally, we need to understand the dynamic behavior of the system being tested during the application of the assessment process.

ACKNOWLEDGMENTS

The authors wish to thank Mikhail Smimov for his comments on this paper.

Some of this research was carried out as part of the E3 project, which was funded by the Community's Seventh Framework programme. This paper reflects only the authors' views. The Community is not liable for any use of the information contained therein. The authors would like to acknowledge the contributions of colleagues at the E3 consortium.

REFERENCES

[1] M. Smirnov, J. Tiemann, R. Chaparadza, Y. Rebahi, et al.: "Demystifying self-awareness of autonomic systems", ICTMobileSummit 2009. Conference Proceedings, 10-12 June 2009, Santander, Spain, IIMC, Dublin, Ireland, ISBN: 978-1-905824-12-0 9

[2] M. Amirijoo, R. Litjens, K. Spaey, M. Dottling, T. Jansen, N. Scully and U. Tiirke: "Use cases, requirements and assessment criteria for future self-organising radio access networks", IWSOS 2008, Vienna, Austria, December 10-12, 2008.

[3] A. B. Brown, J. Hellerstein, M. Hogstrom, T. Lau, S. Lightstone, P. Shum and M. Peterson Yost, "Benchmarking autonomic capabilities: promises and pitfalls", First International Conference on Autonomic Computing (ICAC'04), 266-267, 2004.

[4] S. Dobson, S. Denazis, A. Fernandez, D. Galti, E. Gelenbe, F. Massacci, P. Nixon, F. Saffre, N. Schmidt and F. Zambonelli, "A survey of autonomic communications", ACM Trans. Auton. Adapt. Syst. 1,2 (Dec. 2006), 223-259.

[5] D. Lewis, D. O'Sullivan and J. Keeney, "Towards the knowledgedriven benchmarking of autonomic communications". In Proceedings of the 2006 international Symposium on on World of Wireless, Mobile and Multimedia Networks (June 26 - 29, 2006). International Workshop on Wireless Mobile Multimedia. IEEE Computer Society, Washington, DC, USA.

[6] S. Taranu and J. Tiemann, "General method for testing context aware applications". In Proceedings of the 6th international Workshop on Managing Ubiquitous Communications and Services (Barcelona, Spain, June 15 - 15, 2009). MUCS '09. ACM, New York, NY, USA.

[7] Project E3 Deliverable D2.3, "Architecture, information model and reference points, assessment framework, platform independent programmable interfaces", September 2009.

[8] Fraunhofer FOKUS libccds available at http://www.fokus.fraunhofer.de/go/ccds.

[ieee 2010 8th ieee international conference on pervasive computing and communications workshops...

Documents