bouyssou,marchant,pirlot,perny,tsoukias,vincke evaluation and decision models - a critical...

EVALUATION AND DECISION

MODELS:

a critical perspective

EVALUATION AND

DECISION MODELS:

a critical perspective

Denis BouyssouESSEC

Thierry MarchantGhent University

Marc PirlotSMRO, Faculte Polytechnique de Mons

Patrice PernyLIP6, Universite Paris VI

Alexis TsoukiasLAMSADE - CNRS, Universite Paris Dauphine

Philippe VinckeSMG - ISRO, Universite Libre de Bruxelles

KLUWER ACADEMIC PUBLISHERSBoston/London/Dordrecht

Contents

1 Introduction 11.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.5 Who are the authors ? . . . . . . . . . . . . . . . . . . . . . . . . . 51.6 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.7 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Choosing on the basis of several opinions 72.1 Analysis of some voting systems . . . . . . . . . . . . . . . . . . . 8

2.1.1 Uninominal election . . . . . . . . . . . . . . . . . . . . . . 92.1.2 Election by rankings . . . . . . . . . . . . . . . . . . . . . . 132.1.3 Some theoretical results . . . . . . . . . . . . . . . . . . . . 16

2.2 Modelling the preferences of a voter . . . . . . . . . . . . . . . . . 182.2.1 Rankings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2.2 Fuzzy relations . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.3 Other models . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.3 The voting process . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.3.1 Definition of the set of candidates . . . . . . . . . . . . . . 232.3.2 Definition of the set of the voters . . . . . . . . . . . . . . . 242.3.3 Choice of the aggregation method . . . . . . . . . . . . . . 24

2.4 Social choice and multiple criteria decisionsupport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.4.1 Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Building and aggregating evaluations 293.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1.2 Evaluating students in Universities . . . . . . . . . . . . . . 30

3.2 Grading students in a given course . . . . . . . . . . . . . . . . . . 313.2.1 What is a grade? . . . . . . . . . . . . . . . . . . . . . . . . 313.2.2 The grading process . . . . . . . . . . . . . . . . . . . . . . 313.2.3 Interpreting grades . . . . . . . . . . . . . . . . . . . . . . . 373.2.4 Why use grades? . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3 Aggregating grades . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.3.1 Rules for aggregating grades . . . . . . . . . . . . . . . . . 41

v

vi

3.3.2 Aggregating grades using a weighted average . . . . . . . . 423.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4 Constructing measures 534.1 The human development index . . . . . . . . . . . . . . . . . . . . 54

4.1.1 Scale Normalisation . . . . . . . . . . . . . . . . . . . . . . 564.1.2 Compensation . . . . . . . . . . . . . . . . . . . . . . . . . 574.1.3 Dimension independence . . . . . . . . . . . . . . . . . . . . 584.1.4 Scale construction . . . . . . . . . . . . . . . . . . . . . . . 594.1.5 Statistical aspects . . . . . . . . . . . . . . . . . . . . . . . 59

4.2 Air quality index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.2.1 Monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . 614.2.2 Non compensation . . . . . . . . . . . . . . . . . . . . . . . 624.2.3 Meaningfulness . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.3 The decathlon score . . . . . . . . . . . . . . . . . . . . . . . . . . 634.3.1 Role of the decathlon score . . . . . . . . . . . . . . . . . . 65

4.4 Indicators and multiple criteria decisionsupport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Assessing competing projects 715.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.2 The principles of CBA . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.2.1 Choosing between investment projects in private firms . . . 735.2.2 From Corporate Finance to CBA . . . . . . . . . . . . . . . 755.2.3 Theoretical foundations . . . . . . . . . . . . . . . . . . . . 76

5.3 Some examples in transportation studies . . . . . . . . . . . . . . . 795.3.1 Prevision of traffic . . . . . . . . . . . . . . . . . . . . . . . 805.3.2 Time gains . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.3.3 Security gains . . . . . . . . . . . . . . . . . . . . . . . . . . 815.3.4 Other effects and remarks . . . . . . . . . . . . . . . . . . . 82

5.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6 Comparing on several attributes 876.1 Thierry’s choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.1.1 Description of the case . . . . . . . . . . . . . . . . . . . . 886.1.2 Reasoning with preferences . . . . . . . . . . . . . . . . . . 91

6.2 The weighted sum . . . . . . . . . . . . . . . . . . . . . . . . . . . 976.2.1 Transforming the evaluations . . . . . . . . . . . . . . . . . 986.2.2 Using the weighted sum on the case . . . . . . . . . . . . . 996.2.3 Is the resulting ranking reliable? . . . . . . . . . . . . . . . 996.2.4 The difficulties of a proper usage of the weighted sum . . . 1016.2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

6.3 The additive value model . . . . . . . . . . . . . . . . . . . . . . . 1066.3.1 Direct methods for determining single-attribute

value functions . . . . . . . . . . . . . . . . . . . . . . . . . 107

vii

6.3.2 AHP and Saaty’s eigenvalue method . . . . . . . . . . . . . 1116.3.3 An indirect method for assessing single-attribute value func-

tions and trade-offs . . . . . . . . . . . . . . . . . . . . . . 1176.3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

6.4 Outranking methods . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.4.1 Condorcet-like procedures in decision analysis . . . . . . . . 1246.4.2 A simple outranking method . . . . . . . . . . . . . . . . . 1296.4.3 Using ELECTRE I on the case . . . . . . . . . . . . . . . . 1316.4.4 Main features and problems of elementary outranking ap-

proaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1396.4.5 Advanced outranking methods: from thresholding towards

valued relations . . . . . . . . . . . . . . . . . . . . . . . . 1416.5 General conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

7 Deciding automatically 1477.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1477.2 A System with Explicit Decision Rules . . . . . . . . . . . . . . . . 149

7.2.1 Designing a decision system for automatic watering . . . . . 1507.2.2 Linking symbolic and numerical representations . . . . . . . 1507.2.3 Interpreting input labels as scalars . . . . . . . . . . . . . . 1537.2.4 Interpreting input labels as intervals . . . . . . . . . . . . . 1567.2.5 Interpreting input labels as fuzzy intervals . . . . . . . . . . 1617.2.6 Interpreting output labels as (fuzzy) intervals . . . . . . . . 164

7.3 A System with Implicit Decision Rules . . . . . . . . . . . . . . . . 1707.3.1 Controlling the quality of biscuits during baking . . . . . . 1707.3.2 Automatising human decisions by learning from examples . 171

7.4 An hybrid approach for automatic decision-making . . . . . . . . . 1747.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

8 Dealing with uncertainty 1798.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1798.2 The context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1798.3 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

8.3.1 The set of actions . . . . . . . . . . . . . . . . . . . . . . . 1808.3.2 The set of criteria . . . . . . . . . . . . . . . . . . . . . . . 1818.3.3 Uncertainties and scenarios . . . . . . . . . . . . . . . . . . 1828.3.4 The temporal dimension . . . . . . . . . . . . . . . . . . . . 1848.3.5 Summary of the model . . . . . . . . . . . . . . . . . . . . . 186

8.4 A didactic example . . . . . . . . . . . . . . . . . . . . . . . . . . . 1868.4.1 The expected value approach . . . . . . . . . . . . . . . . . 1878.4.2 Some comments on the previous approach . . . . . . . . . . 1878.4.3 The expected utility approach . . . . . . . . . . . . . . . . . 1898.4.4 Some comments on the expected utility approach . . . . . . 1918.4.5 The approach applied in this case: first step . . . . . . . . . 1938.4.6 Comment on the first step . . . . . . . . . . . . . . . . . . . 1968.4.7 The approach applied in this case: second step . . . . . . . 198

viii

8.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

9 Supporting decisions 2059.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2069.2 The Decision Process . . . . . . . . . . . . . . . . . . . . . . . . . . 2079.3 Decision Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

9.3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . 2119.3.2 The Evaluation Model . . . . . . . . . . . . . . . . . . . . . 2139.3.3 The final recommendation . . . . . . . . . . . . . . . . . . . 219

9.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

10 Conclusion 23710.1 Formal methods are all around us . . . . . . . . . . . . . . . . . . . 23710.2 What have we learned? . . . . . . . . . . . . . . . . . . . . . . . . 23910.3 What can be expected? . . . . . . . . . . . . . . . . . . . . . . . . 243

Bibliography 247

Index 262

1INTRODUCTION

1.1 Motivations

Deciding is a very complex and difficult task. Some people even argue that our abil-ity to make decisions in complex situations is the main feature that distinguishesus from animals (it is also common to say that laughing is the main difference).Nevertheless, when the task is too complex or the interests at stake are too impor-tant, it quite often happens that we do not know or we are not sure what to decideand, in many instances, we resort to a decision support technique: an informalone–we toss a coin, we ask an oracle, we visit an astrologer, we consult an expert,we think–or a formal one. Although informal decision support techniques can beof interest, in this book, we will focus on formal ones. Among the latter, we findsome well-known decision support techniques: cost-benefit analysis, multiple crite-ria decision analysis, decision trees, . . . But there are many other ones, sometimesnot presented as decision support techniques, that help making decisions. Let uscite but a few examples.

• When the director of a school must decide whether a given student will passor fail, he usually asks each teacher to assess the merits of the student bymeans of a grade. The director then sums the grades and compares the resultto a threshold.

• When a bank must decide whether a given client will obtain a credit or not,a technique, called credit scoring, is often used.

• When the mayor of a city decides to temporarily forbid car traffic in a citybecause of air pollution, he probably takes the value of some indicators, e.g.the air quality index, into account.

• Groups or committees must also make decisions. In order to do so, theyoften use voting procedures.

All these formal techniques are what we call (formal) decision and evaluationmodels, i.e. a set of explicit and well-defined rules to collect, assess and processinformation in order to be able to make recommendations in decision and/or eval-uation processes. They are so widespread that almost no one can pretend he is

1

2 CHAPTER 1. INTRODUCTION

not using or suffering the consequences of one of them. These models–probablybecause of their formal character–inspire respect and trust: they look scientific.But are they really well founded ? Do they perform as well as we want ? Can wesafely rely on them when we have to make important decisions ?

That is why we try to look at formal decision and evaluation models with acritical eye in this book. You guessed it: this book is more than 200 pages long.So, there is probably a lot of criticism. You are right.

None of the evaluation and decision models that we examined are perfect orthe best. They all suffer limitations. For each one, we can find situations in whichit will perform very poorly. This is not really new: most decision models havehad contenders for a long time. Do we want to contend all models at the sametime ? Definitely not ! Our conviction is that there cannot be a best decision orevaluation model–this has been proved in some contexts (e.g. in voting) and seemsempirically correct in other contexts–but we are convinced as well that formalevaluation and decision models are useful in many circumstances and here is why:

• Formal models provide explicit and, to a large extent, unambiguous represen-tations of a given problem; they offer a common language for communicatingabout the problem. They are therefore particularly well suited for facilitatingcommunication among the actors of a decision or evaluation process.

• Formal models require that the decision maker makes a substantial effort tostructure his perception or representation of the problem. This effort canonly be beneficial as it forces the decision maker to think harder and deeperabout his problem.

• Once a formal model has been established, a battery of formal techniques(often implemented on a computer) become available for drawing any kindof conclusion that can be drawn from the model. For example, hundreds ofwhat-if questions can be answered in a flash. This can be of great help if wewant to devise robust recommendations.

For all these reasons (complexity, usefulness, importance of the interests atstake, popularity) plus the fact that formal models lend themselves easily to criti-cism, we think that it is important to deepen our understanding of evaluation anddecision models and encourage their users to think more thoroughly about them.

Our aim with this book is to foster reflection and critical thinking among allindividuals utilising decision and evaluation models, whether it be for research orapplications.

1.2 Audience

Most of us are confronted with formal evaluation and decision models. Very often,we use them without even thinking about it. This book is intended for the awareor enlightened practitioner, for anyone who uses decision or evaluation models–forresearch or for applications–and is willing to question his practice, to have a deeperunderstanding of what he does. We have tried to keep mathematics and formalism

1.3. STRUCTURE 3

at a very low level so that, hopefully, most of the material will be accessible to thenot mathematically-inclined readers. A rich bibliography will allow the interestedreader to locate the more technical literature easily.

1.3 Structure

There are so many decision and evaluation models that it would be impossible todeal with all of them within a single book. As will become apparent later, most ofthem rely on similar kinds of principles. We decided to present seven examples ofsuch models. These examples, chosen in a wide variety of domains, will hopefullyallow the reader to grasp these principles. Each example is presented in a chapter(Chapters 2 to 8), almost independent of the other chapters. Each of these sevenchapters ends with a conclusion, placing what has been discussed in a broadercontext and indicating links with other chapters. Chapter 9 is somewhat differentfrom the seven previous ones: it does not focus on a decision model but presents areal world application. The aim of this chapter is to emphasise the importance ofthe decision aiding process (the context of the problem, the position of the actorsand their interactions, the role of the analyst, . . . ), to show that many difficultiesarise there as well and that a coherence between the decision aiding process andthe formal model is necessary.

Some examples have been chosen because they correspond to decision modelsthat everyone has experienced and can understand easily (student grades andvoting). We chose some models because they are not often perceived as decisionor evaluation models (student grades, indicators and rule based control). The otherexamples (cost-benefit analysis, multiple criteria decision support and choice underuncertainty) correspond to well identified and popular evaluation and decisionmodels.

1.4 Outline

Chapter 2 is devoted to the problem of voting. After showing the analogy betweenvoting and multiple criteria decision support, we present a sequence of twelveshort examples, each one illustrating a problem that arises with a particular votingmethod. We begin with simple methods based on pairwise comparisons and weend up with the Borda method. Although the goal of this book is not to overwhelmthe reader with theory, we informally present two theorems (Arrow and Gibbard-Satterthwaite) that in one way or another explain why we encountered so manydifficulties in our twelve examples.

Then we turn to the way voters’ preferences are modelled. We present manydifferent models, each one trying to outdo the previous one but suffering its ownweaknesses. Finally, we explore some issues that are often neglected: who is goingto vote? Who are the candidates? These questions are difficult and we show thatthey are important. The construction of the set of voters and the set of candidates,as well as the choice of a voting method must be considered as part of the votingprocess.


After examining voting, we turn in Chapter 3 to another very familiar topic forthe reader: students’ marks or grades. Marks are used for different purposes (e.g.ranking the students, deciding whether a student is allowed to begin the next levelof study, deciding whether a student gets a degree, . . . ). Students are assessed ina huge variety of ways in different countries and schools. This seems to indicatethat assessing students might not be trivial. We use this familiar topic to discussoperations such as evaluating a performance and aggregating evaluations.

In Chapter 4, three particular indicators are considered: the Human Devel-opment Index (used by the United Nations), the ATMO index (an air pollutionindicator used by the French government) and the decathlon score. We presenta few examples illustrating some problems occurring with indicators. We assertthat some difficulties are the consequences of the fact that the role of an indicatoris often manifold and not well defined. An indicator is a measure but, often, it isalso a tool for controlling or managing (in a broad sense).

Cost-benefit analysis (CBA) is a decision aiding method that is extremelypopular among economists. Following the CBA approach, a project should onlybe undertaken when its benefits outweigh its costs. First we present the principlesof CBA and its theoretical foundations. Then, using an example in transportationstudies, we illustrate some difficulties encountered with CBA. Finally, we clarifysome of the hypotheses at the heart of CBA and criticise the relevance of thesehypotheses in some decision aiding processes.

In Chapter 6, using a well documented example, we present some difficultiesthat arise when one wants to choose from or rank a set of alternatives consideredfrom different viewpoints. We examine several aggregation methods that lead toa value function on the set of alternatives, namely the weighted sum, the sum ofutilities (direct and indirect assessment) and AHP (the Analytic Hierarchy Pro-cess). Then we turn to the so called outranking methods. Some of these methodscan be used even when the data are not very rich or precise. The price we payfor this is that results provided by these methods are not rich either, in the sensethat conclusions that can be drawn regarding a decision are not clear-cut.

Chapter 7 is dedicated to the study of automatic decision systems. Thesesystems concern the execution of repetitive decision tasks and the great majorityof them are based on more or less explicit decision rules aimed towards reflectingthe usual decision policy of humans. The goal of this section is to show the interestof some formal tools (e.g. fuzzy sets) to model decision rules but also to clarifysome problems arising when simulating the rules. Three examples are presented:the first one concerns the control of an automatic watering system while the othersare about the control of a food process. The first two examples describe decisionsystems based on explicit decision rules; the third one addresses the case of implicitdecision rules.

The goal of Chapter 8 is to raise some questions about the modelling of un-certainty. We present a real-life problem concerning the planning of electricityproduction. This problem is characterised by many different uncertainties: forexample, the price of oil or the electricity demand in 20 years time. This prob-lem is classically described by using a decision tree and solved with an expectedutility approach. After recalling some well known criticisms directed against this

1.5. WHO ARE THE AUTHORS? 5

approach, we present the approach that has been used by the team that “solved”this problem. Some of the drawbacks of this approach are discussed as well. Therelevance of probabilities is criticised and other modelling tools, such as belieffunctions, fuzzy set theory and possibility theory, are briefly mentioned.

Convinced that there is more to decision aiding than just number crunching,we devote the last chapter to the description of a real world decision aiding processthat took place in a large Italian company a few years ago. It concerns the eval-uation of offers following a call for tenders for a GIS (Geographical InformationSystem) acquisition. Some important elements such as the participating actors,the problem formulation, the construction of the criteria, etc. deserve greater con-sideration. One should ideally never consider these elements separately from theaggregation process because they can impact the whole decision process and eventhe way the aggregation procedure behaves.

1.5 Who are the authors ?

The authors of this book are European academics working in six different universi-ties, in France and in Belgium. They teach in engineering, business, mathematics,computer science and psychology schools. Their background is quite varied aswell: mathematics, economics, engineering, law and geology but they are all ac-tive in decision support and more particularly in multiple criteria decision support.Among their special interests are preference modelling, fuzzy logic, aggregationtechniques, social choice theory, artificial intelligence, problem structuring, mea-surement theory, operations research, . . . Besides their interest in multiple criteriadecision support, they share a common view on this field. Five of the six authorsof the present volume presented their thoughts on the past and the objectives offuture research in multiple criteria decision support in the Manifesto of the newMCDA era (Bouyssou, Perny, Pirlot, Tsoukias and Vincke 1993).

The authors are very active in theoretical research on the foundations of de-cision aiding, mainly from an axiomatic point of view, but have been involvedin a variety of applications ranging from software evaluation to location of a nu-clear repository, through the rehabilitation of a sewer network or the location ofhigh-voltage lines.

In spite of the large number of co-authors, this book is not a collection ofpapers. It is a joint work.

1.6 Conventions

To refer to a decision maker, a voter or an individual whose sex is not determined,we decided not to use the politically correct “he/she” but just “he” in order tomake the text easy to read. The fact that all of the authors are male has nothingto do with this choice. The same applies for “his/her”.

None of the authors is a native English speaker. Therefore, even if we didour best to write in correct English, the reader should not be surprised to find


some mistakes or inelegant expressions. We beg the reader’s leniency for anyincorrectness that might remain.

The adopted spelling is the British and not the American one.

1.7 Acknowledgements

We are ggreatly indebted to our /////////collEague friend Philippe Fortemps \cite{Fortemps99}.

Without him and his knowledge of Late-x, this book would look like this paragraph.%\newline

The authors also wish to thank J.-L. Ottinger, who contributed to Chapter8, H. Melot, who laid out the complex diagrams of that chapter, and StefanoAbruzzini, who gave us a number of references concerning indicators. Chapter 6is based on a report by Sebastien Clement written to fulfil the requirements of acourse on multiple criteria decision support. Large part of chapter 9 uses materialalready published in (Paschetta and Tsoukias 1999).

A special thank goes to Marjorie and Diane Gassner who had the patience toread and correct our continental approximation of the English language and toFrancois Glineur who helped in solving a great number of latex problems.

We thank Gary Folven from Kluwer Academic Publisher for his constant sup-port during the preparation of this manuscript.

2CHOOSING ON THE BASIS OF

SEVERAL OPINIONS: THEEXAMPLE OF VOTING

Voting is easy! You’ve voted hundreds of times in committees, in presidentialelections, for the senate, . . . Is there much to say about voting ? Well, just thinkabout the way heads of state or members of parliament are elected in Australia,France, the UK, . . .

United Kingdom’s members of parliament The territory of the UK is di-vided into about 650 constituencies. One representative is elected in eachconstituency. Each voter chooses one of the candidates in his constituency.The winner is the candidate that is chosen by more voters than any otherone. Note that the winner does not have to win an overall majority of votes.

France’s members of parliament As in the UK, the French territory is dividedinto single-seat constituencies. In a constituency, each voter chooses one ofthe candidates. If one candidate receives more than 50 % of the votes, heis elected. Otherwise a second stage is organised. During the second stage,all candidates that were chosen by more than 12.5 % of the registered votersmay compete. Once more, each voter chooses one of the candidates. Thewinner is the candidate that received the most votes.

France’s president Each voter chooses one of the candidates. If one candidatehas been chosen by more than 50 % of the voters, he is elected. Otherwisea second stage is organised. During the second stage, only two candidatesremain: those with the highest scores. Once again, each voter chooses one ofthe candidates. The winner is the candidate that has been chosen by morevoters than the other one.

Australia’s members of parliament The territory is divided into single-seatconstituencies called divisions. In a division, each voter is asked to rank allcandidates: he puts a 1 next to his preferred candidate, a 2 next to his secondpreferred candidate, then a 3, and so on until his least preferred candidate.Then the ballot papers are sorted according to the first preference votes. If acandidate has more than 50 % of the ballot papers, he is elected. Otherwise,the candidate that received fewer papers than any other is eliminated andthe corresponding ballot papers are transferred to the candidates that got

7

8 CHAPTER 2. CHOOSING ON THE BASIS OF SEVERAL OPINIONS

a 2 on these papers. Once more, if a candidate has more than 50 % of theballot papers, he is elected. Otherwise, the candidate that received fewerpapers than any other is eliminated and the corresponding ballot papers aretransferred to the candidates that got a 3 on these papers, etc. In the worstcase, this process ends when all but two candidates are eliminated, because,unless they are tied, one of the candidates necessarily has more than 50 %of the papers. Note that, as far as we know, it seems that the case of a tieis seldom considered in electoral laws.

Canada’s members of parliament and prime minister Every five years, theCanadian parliament is elected as follows. The territory is divided into about270 constituencies called counties. In each county, each party can presentone candidate. Each voter chooses one candidate. The winner in a county isthe candidate that is chosen by more voters than any other one. He is thusthe county’s representative in the parliament. The leader of the party thathas the most representatives becomes prime minister.

Those interested in voting methods and the way they are applied in variouscountries will find valuable information in Farrell (1997) and Nurmi (1987). Thediversity of the methods applied in practice probably reflects some underlyingcomplexity and, in fact, if you take a closer look at voting, you will be amazedby the incredible complexity of the subject. In spite of its apparent simplicity,thousands of papers have been devoted to the problem of voting (Kelly 1991) andour guess is that many more are to come.

Our aim in this chapter is, on the one hand, to show that many difficult andinteresting problems arise in voting and, on the other hand, to convince the readerthat a formal study of voting might be enlightening. This chapter is organisedas follows. In Section 1, we make the following basic assumption: each voter’spreferences can accurately be represented by a ranking of all candidates from bestto worse, without ties. Then we show some problems occurring when aggregatingthe rankings, using classical voting systems such as those applied in France or theUnited Kingdom. We do this through the use of small and classical examples. InSection 2, we consider other preference models than the linear ranking of Section1. Some models are poorer in information but more realistic. Some are richer andless realistic. In most cases, the aggregation remains a difficult task. In Section3, we change the focus and try to examine voting in a much broader context.Voting is not instantaneous. It is not just counting the votes and performingsome mathematical operation to find the winner. It is a process that begins whensomebody decides that a vote should occur (or even earlier) and ends when thewinner begins his mandate (or even later). In Section 4, we discuss the analogywith multiple criteria decision support. The chapter ends with a conclusion.

2.1 Analysis of some voting systems

From now on, we will distinguish between the election—the process by which thevoters express their preferences about a set of candidates—and the aggregation

2.1. ANALYSIS OF SOME VOTING SYSTEMS 9

method—the process used to extract the best candidate or a ranking of the can-didates from the result of the election. In many cases, the election is uninominal,i.e. each voter votes for one candidate only

2.1.1 Uninominal election

Let us recall the assumption that we mentioned earlier and that will hold through-out Section 1. Each voter, consciously or not, ranks all candidates from best toworse, without ties and, when voting, each voter sincerely (or naively) reports hispreferences. Thus, in a uninominal election, we shall assume that each voter votesfor the candidate that he ranks in first position. For example, suppose that a voterprefers candidate a to b and b to c (in short aPbPc). He votes for a. We are nowready to present a first example that illustrates a difficulty in voting.

Example 1. Dictature of majority

Let {a, b, c, . . . , y, z} be a set of 26 candidates for a 100 voters election. Supposethat

51 voters have preferences aPbPcP . . . PyPzand 49 voters have preferences zPbPcP . . . PyPa.

It is clear that 51 voters will vote for a while 49 vote for z. Thus a has anabsolute majority and, in all uninominal systems we are aware of, a wins. Butis a really a good candidate ? Almost half of the voters perceive a as the worstone. And candidate b seems to be a good candidate for everyone. Candidate bcould be a good compromise. As shown by this example, a uninominal electioncombined with the majority rule allows a dictatorship of majority and doesn’tfavour a compromise. A possible way to avoid this problem might be to ask thevoters to provide their whole ranking instead of their preferred candidate. Thiswill be discussed later. Let us continue with some strange problems arising whenusing a uninominal election.

Example 2. Respect of majority in the British system

The voting system in the United Kingdom is plurality voting, i.e. the election isuninominal and the aggregation method is simple majority. Let {a, b, c} be theset of candidates for a 21 voters election. Suppose that

10 voters have preferences aPbPc,6 voters have preferences bPcPa

and 5 voters have preferences cPbPa.

Then a (resp. b and c) obtains 10 votes (resp. 6 and 5). Thus a is chosen.Nevertheless, this might be different from what a majority of voters wanted. In-deed, an absolute majority of voters prefers any other candidate to a (11 out of21 voters prefer b and c to a).


Let us see, using the same example, if such a problem would be avoided by thetwo-stage French system. After the first stage, as no candidate has an absolutemajority, a second stage is run between candidates a and b. We suppose that thevoters keep the same preferences on {a, b, c}. Thus a obtains 10 votes and b, 11votes so that candidate b is elected. This time, none of the beaten candidates (aand c) are preferred to b by a majority of voters. Nonetheless we cannot concludethat the two-stage French system is superior to the British system from this pointof view, as shown by the following example.

Example 3. Respect of majority in the two-stage French system

Let {a, b, c, d} be the set of candidates for a 21 voters election. Suppose that

10 voters have preferences bPaPcPd,6 voters have preferences cPaPdPb

and 5 voters have preferences aPdPbPc.

After the first stage, as no candidate has absolute majority, a second stage isrun between candidates b and c. Candidate b easily wins with 15 out of 21 votesthough an absolute majority (11/21) of voters prefer a and d to b. Because itis not necessary to be a mathematician to figure out such problems, some votersmight be tempted not to sincerely report their preferences as shown in the nextexample.

Example 4. Manipulation in the two-stage French system

Let us continue with the example used above. Suppose that the six voters havingpreferences cPaPdPb decide not to be sincere and vote for a instead of c. Thencandidate a wins after the first stage because there is an absolute majority forhim (11/21). If they had been sincere (as in the previous example), b would havebeen elected. Thus, casting a non sincere vote is useful for those 6 voters as theyprefer a to b. Such a system, that may encourage voters to falsely report theirpreferences, is called manipulable. This is not the only weakness of the Frenchsystem as attested by the three following examples.

Example 5. Monotonicity in the two-stage French system

Let {a, b, c} be the set of candidates for a 17 voters election. A few days beforethe election, the results of a survey are as follows:

6 voters have preferences aPbPc,5 voters have preferences cPaPb,4 voters have preferences bPcPa

and 2 voters have preferences bPaPc.

With the French system, a second stage would be run, between a and b anda would be chosen obtaining 11 out of 17 votes. Suppose that candidate a, inorder to increase his lead over b and to lessen the likelihood of a defeat, decides tostrengthen his electoral campaign against b. Suppose that the survey did exactly


reveal the preferences of the voters and that the campaign has the right effect onthe last two voters. Hence we observe the following preferences.

8 voters have preferences aPbPc,5 voters have preferences cPaPb

and 4 voters have preferences bPcPa.

After the first stage, b is eliminated, due to the campaign of a. The secondstage opposes a to c and c wins, obtaining 9 votes. Candidate a thought thathis campaign would be beneficial. He was wrong. Such a method is called nonmonotonic because an improvement of a candidate’s position in some of the voter’spreferences can lead to a deterioration of his position after the aggregation. It isclear with such a system that it is not always interesting or efficient to sincerely re-port one’s preferences. You will note in the next example that some manipulationscan be very simple.

Example 6. Participation in the two-stage French system

Let {a, b, c} be the set of candidates for a 11 voters election. Suppose that

4 voters have preferences aPbPc,4 voters have preferences cPbPa


Using the French system, a second stage should oppose a to c and c should winthe election obtaining 7 out of 11 votes. Suppose that 2 of the 4 first voters (withpreferences aPbPc) decide not to vote because c, the worst candidate accordingto them, is going to win anyway. What will happen ? There will be only 9 voters.

2 voters have preferences aPbPc,4 voters have preferences cPbPa


Contrary to all expectations, candidate c will loose while b will win, obtaining5 out of 9 votes. Our two lazy voters can be proud of their abstention since theyprefer b to c. Clearly such a method does not encourage participation.

Example 7. Separability in the two-stage French system

Let {a, b, c} be the set of candidates for a 26 voters election. The voters are locatedin two different areas: countryside and town. Suppose that the 13 voters locatedin the town have the following preferences.

4 voters have preferences aPbPc,3 voters have preferences bPaPc,3 voters have preferences cPaPb

and 3 voters have preferences cPbPa.

Suppose that the 13 voters located in the countryside have the following pref-erences.


4 voters have preferences aPbPc,3 voters have preferences cPaPb,3 voters have preferences bPcPa

and 3 voters have preferences bPaPc.

Suppose now that an election is organised in the town, with 13 voters. Candi-dates a and c will go to the second stage and a will be chosen, obtaining 7 votes.If an election is organised in the countryside, a will defeat b in the second stage,obtaining 7 votes. Thus a is the winner in both areas. Naturally we expect a tobe the winner in a global election. But it is easy to observe that in the globalelection (26 voters) a is defeated during the first stage. Such a method is callednon separable.

The previous examples showed that, when there are more than 2 candidates, itis not an easy task to imagine a system that would behave as expected. Note that,in the presence of 2 candidates, the British system (uninominal and one-stage) isequivalent to all other systems and it suffers none of the above mentioned problems(May 1952). Thus we might be tempted by a generalisation of the British system(restricted to 2 candidates). If there are two candidates, we use the British system;if there are more than two candidates, we arbitrarily choose two of them and we usethe British system to select one. The winner is opposed (using the British system)to a new arbitrarily chosen candidate. And so on until no more candidates remain.This would require n− 1 votes between 2 candidates. Unfortunately, this methodsuffers severe drawbacks.

Example 8. Influence of the agenda in sequential voting


1 voter has preferences aPbPc,1 voter has preferences bPcPa

and 1 voter has preferences cPaPb.

The 3 candidates will be considered two by two in the following order or agenda:a and b first, then c. During the first vote, a is opposed to b and a wins withabsolute majority (2 votes against 1). Then a is opposed to c and c defeats a withabsolute majority. Thus c is elected.

If the agenda is a and c first, it is easy to see that c defeats a and is thenopposed to b. Hence, b wins against c and is elected.

If the agenda is b and c first, it is easy to see that, finally, a is elected. Conse-quently, in this example, any candidate can be elected and the outcome dependscompletely on the agenda, i.e. on an arbitrary decision. Let us note that sequentialvoting is very common in different parliaments. The different amendments to abill are considered one by one in a predefined sequence. The first one is opposed tothe status quo, using the British system; the second one is opposed to the winner ,and so on. Clearly, such a method lacks neutrality. It doesn’t treat all candidatesin a symmetric way. Candidates (or amendments) appearing at the end of theagenda are more likely to be elected than those at the beginning.


Example 9. Violation of unanimity in sequential voting


1 voter has preferences bPaPdPc,1 voter has preferences cPbPaPd

and 1 voter has preferences aPdPcPb.

Consider the following agenda: a and b first, then c and finally d. Candidate ais defeated by b during the first vote. Candidate c wins the second vote and d isfinally elected though all voters unanimously prefer a to d. Let us remark thatthis cannot happen with the French and British systems.

Up to now, we have assumed that the voters are able to rank all candidatesfrom best to worse without ties but the only information that we collected was thebest candidate. Why not try to palliate the many encountered problems by askingvoters to explicitly rank the candidates ? This idea, though interesting, will leadus to many other pitfalls that we discuss just below.

2.1.2 Election by rankings

In this kind of election, each voter provides a ranking without ties of the candidates.Hence the task of the aggregation method is to extract from all these rankings thebest candidate or a ranking of the candidates reflecting the preferences of thevoters as much as possible.

At the end of the 18th century, two aggregation methods for election by rank-ings appeared in France. One was proposed by Borda, the other by Condorcet.Although other methods have been proposed, their methods are still at the heartof many scientists’ concerns. In fact, many methods are variants of the Borda andCondorcet methods.

The Condorcet method

Condorcet (1785) suggests to compare all candidates pairwise in the following way.A candidate a is preferred to b if and only if the number of voters ranking a beforeb is larger than the number of voters ranking b before a. In case of tie, candidatesa and b are indifferent. A candidate that is preferred to all other candidates iscalled a (Condorcet) winner. In other words, a winner is a candidate that, opposedto each of the n − 1 other candidates, wins by a majority. It can be shown thatthere is never more than one Condorcet winner.

Note that both the British as well as the two-stage French methods are differentfrom the Condorcet method. In example 2, candidate a is elected by the Britishmethod but b is the Condorcet winner. In example 3, a is the Condorcet winneralthough b is chosen by the French method.

Although the principle underlying the Condorcet method—the candidate thatbeats all other candidates in a pairwise contest is the winner—seems very natural,close to the concept of democracy and hence very appealing, it is worth notingthat, in some instances, this principle might be questioned: in example 1, a is the


Condorcet winner, although almost half of the voters consider him to be the worsecandidate. Consider also example 10 taken from Fishburn (1977).

Example 10. Critique of the majority principle

Let {a, b, c, d, e, f, g, x, y} be a set of 9 candidates for a 101 voters election. Supposethat

19 voters have preferences yPaPbPcPdPePfPgPx,21 voters have preferences ePfPgPxPyPaPbPcPd,10 voters have preferences ePxPyPaPbPcPdPfPg,10 voters have preferences fPxPyPaPbPcPdPePg,10 voters have preferences gPxPyPaPbPcPdPePf

and 31 voters have preferences yPaPbPcPdPxPePfPg.

Candidate x wins against every other candidate with a majority of 51 votes.Thus x is the Condorcet winner. But let us focus on the candidates x and y.Let us summarise their results in Table 2.1. In view of Table 2.1, it seems that yshould be elected.

k1 2 3 4 5 6 7 8 9

x 0 30 0 21 0 31 0 0 19y 50 0 30 0 21 0 0 0 0

Table 2.1: Number of voters who rank the candidate in k-th place in their prefer-ences

Furthermore, there are cases (called Condorcet paradoxes) where there is noCondorcet winner. Consider example 8: a is preferred to b, b is preferred to cand c is preferred to a. No candidate is preferred to all others. In such a case,the Condorcet method fails to elect a candidate. One might think that example8 is very bizarre and very unlikely to happen. Unfortunately it isn’t. If youconsider an election with 25 voters and 11 candidates, the probability of such aparadox is significantly high as it is approximately 1/2 (Gehrlein 1983) and themore candidates or voters, the higher the probability of such a paradox. Notethat, in order to obtain this result, all rankings are supposed to have the sameprobability. Such an hypothesis is clearly questionable (Gehrlein 1983).

Many methods have been designed that elect the Condorcet winner, if he exists,and choose a candidate in any case (Fishburn 1977, Nurmi 1987).

The Borda method

Borda (1781) proposed to use the following aggregation method. In each voter’spreference, each candidate has a rank: 1 for the first candidate in the ranking, 2for the second, . . . and n for the last. Compute the Borda score of each candidate,i.e. the sum for all voters of that candidate’s rank. Then choose the candidatewith lowest Borda score.


Note that there can be several such candidates. In these cases, the Bordamethod does not tell us which one to choose. They are considered as equivalent.But the likelihood of indifference is rather small and decreases as the number ofcandidates or voters increases. For example, for 3 candidates and 2 voters, theprobability of all candidates being tied is 1/3; for 3 candidates and 50 voters, it isless than 1 %. Note that once again, we supposed that all rankings have the sameprobability.

Note that the Borda method not only allows to choose one candidate but torank them (by increasing Borda scores). If two candidates have the same Bordascore, then they are indifferent.

Example 11. Comparison of the Borda and Condorcet methods


2 voters have preferences bPaPcPdand 1 voter has preferences aPcPdPb.

The Borda score of a is 5 = 2×2+1×1. For b, it is 6 = 2×1+1×4. Candidatesc and d receive 8 and 11. Thus a is the winner. Using the Condorcet method, theconclusion is different: b is the Condorcet winner. Thus, when a Condorcet winnerexists, it is not always chosen by the Borda method. Nevertheless, it can be shownthat the Borda method never chooses a Condorcet looser, i.e. a candidate that isbeaten by all other candidates by an absolute majority (contrary to the Britishsystem, see Example 2).

Suppose now that candidates c and d decide not to compete because theyare almost sure to lose. With the Borda method, the new winner is b. Thus bnow defeats a just because c and d dropped out. Thus the fact that a defeatsor is defeated by b depends upon the presence of other candidates. This can bea problem as the set of the candidates is not always fixed. It can vary becausecandidates withdraw, because feasible solutions become infeasible or the converse,because new solutions emerge during discussions, . . .

With the Condorcet method, b remains the winner and it can be shown thatthis is always the case: if a candidate is a Condorcet winner, then he is still aCondorcet winner after the elimination of some candidates.

Example 12. Borda and the independence of irrelevant alternatives


1 voter has preferences aPcPband 1 voter has preferences bPaPc.

The alternative with the lowest Borda score is a. Now consider a new electionwhere the alternatives and voters are identical but they changed their preferencesabout c. Suppose that

1 voter has preferences aPbPcand 1 voter has preferences bPcPa.


It turns out that b has the lowest Borda score. However, none of the twovoters changed their opinion about the pair {a, b}. The first (resp. second) voterprefers a (resp. b) in both cases. Only the relative position of c changed andthis was enough to turn b into a winner and a into a looser. This can be seenas a shortcoming of the Borda method. One says that the Borda method doesnot satisfy the independence of irrelevant alternatives. It can be shown that theCondorcet method satisfies this property.

2.1.3 Some theoretical results

We could go on and on with examples showing, that any method you can think ofsuffers severe problems. But we think it is time to stop for at least two reasons.First, it is not very constructive and, second, each example is related to a particularmethod; hence this approach lacks generality. A more general (and thus theoretic)approach is needed. We should find a way to answer questions like

• Do non manipulable methods exist ?

• Is it possible for a non separable method to satisfy unanimity ?

• . . .

In another book, in preparation, we will follow such a general approach but, inthe present volume, we try to present various problems arising in evaluation anddecision models in an informal way and to show the need for formal methods.Nevertheless, we cannot resist to the desire to present now, in an informal way,some of the most famous results of social choice theory.

Arrow’s theorem

Arrow (1963) was interested by the aggregation of rankings with ties into a ranking,possibly with ties. We will call this ranking the overall ranking. He examined themethods verifying the following properties.

Universal domain. This property implies that the aggregation method must beapplicable to all cases. Whatever the rankings provided by the voters, themethod must yield an overall ranking of the candidates. This property rulesout methods that would impose some restrictions on the preferences of thevoters.

Transitivity. The result of the aggregation must always be a ranking, possiblywith ties. This implies that, if aPb and bPc in the overall ranking, thenaPc in the overall ranking. Example 8 showed that the Condorcet methoddoesn’t verify transitivity: a is preferred to b, b is preferred to c and c ispreferred to a.

Unanimity. If all voters are unanimous about a pair of candidates, e.g. if all votersrank a before b, then a must be ranked before b in the overall preference.This seems quite reasonable but example 9 showed that some commonly used


aggregation methods fail to respect unanimity. This property is often calledPareto condition.

Independence. The relative position of two candidates in the overall ranking de-pends only on their relative positions in the individual’s preferences. There-fore other alternatives are considered as irrelevant with respect to that pair.Note that we observed in example 12 that the Borda method violates theindependence property. This property is often called Independence of irrel-evant alternatives.

Non-dictatorship. None of the voters can systematically impose his preferenceson the other ones. This rules out aggregation methods such that the overallranking is always identical to the preference ranking of a given voter. Thismay be seen as a minimal requirement for a democratic method.

These five conditions allow to state Arrow’s celebrated theorem.

Theorem 2.1 (Arrow) When the number of candidates is at least 3, there ex-ists no aggregation method satisfying simultaneously the properties of universaldomain, transitivity, unanimity, independence and non-dictatorship.

To a large extent, this theorem explains why we encountered so many diffi-culties when trying to find a satisfying aggregation method. For example, let usobserve that the Borda method satisfies the universal domain, transitivity, una-nimity and non-dictatorship properties. Therefore, as a consequence of theorem2.1, we can deduce that it cannot satisfy the independence condition. What aboutthe Condorcet method ? It satisfies the universal domain, unanimity, independenceand non-dictatorship properties. Hence it cannot verify transitivity (see example8). Note that Arrow’s theorem uses only five conditions that, in addition, arequite weak (at least at first glance). Yet, the result is powerful. If, in addition tothese five conditions, we wish to find a method satisfying neutrality, separability,monotonicity, non-manipulability, . . . we face an even more puzzling problem.

Gibbard-Satterthwaite’s theorem

Gibbard (Gibbard 1973) and Satterthwaite (Satterthwaite 1975) were very inter-ested by the (non-)manipulability of aggregation methods, especially those leadingto the election of a unique candidate. Informally, a method is non-manipulable if,in no case, a voter can improve the result of the election by not reporting his truepreferences. They proved the following result.

Theorem 2.2 (Gibbard-Satterthwaite) When the number of candidates is largerthan two, there exists no aggregation method satisfying simultaneously the proper-ties of universal domain, non-manipulability and non-dictatorship.

Example 4 concerning the two-stage French system can be revisited bearingin mind theorem 2.2. The French system satisfies universal domain and non-dictatorship. Therefore, it is not surprising that it is manipulable.


Many other impossibility results can be found in the literature. But this is notthe place to review them. Besides impossibility results, many characterisations areavailable. A characterisation of a given aggregation method is a set of propertiessimultaneously satisfied by only that method. These results help to understandthe fundamental principles of a method and to compare different methods.

At the beginning of this chapter, we decided to focus on elections of a uniquecandidate. Some voting systems lead to the election of several candidates andaim towards achieving a kind of proportional representation. One might thinkthat those systems are the solution to our problems. In fact, they are not. Thosesystems raise as many questions (perhaps more) as the ones we considered (Balinskiand Young 1982). Furthermore, suppose that a parliament has been elected, usingproportional representation. This parliament will have to vote on many differentissues and, very often, only one candidate or law or project will have to be chosen.

2.2 Modelling the preferences of a voter

Let us consider the assumption that we made in Section 1: the preferences of eachvoter can accurately be represented by a ranking of all candidates from best toworse, without ties. We all know that this is not always realistic. For example, insome instances, there are several candidates that a voter cannot rank, just becausehe considers them as equivalent. Those candidates are tied. There are many otherreasons to question our assumption. In some cases, a voter is not able to rankthe candidates; in others, he is able to rank them but another kind of modeling ofhis preferences would be more accurate. In this section, we list different cases inwhich our initial assumption is not valid.

2.2.1 Rankings

To model the preferences of a voter, we can use a ranking without ties. This modelcorresponds to the assumption of Section 1. This implies that when you present apair of candidates (a, b) to a voter, he is always able to tell if he prefers a to b orthe converse. Furthermore, if he prefers a to b and b to c, he necessarily prefers ato c (transitivity of preference).

Indifference: rankings with ties

In some cases, a voter is unable to state if he prefers a to b or the converse becausehe thinks that both candidates are of equal value. He is indifferent between a andb. Thus, we need to model his preferences by a ranking with ties. For each pairof candidates (a, b), we have “a is preferred to b”, the converse or “a is indifferentto b” (which is equivalent to “b is indifferent to a”). Preference still is transitive.Suppose that a voter prefers a to b, c and d, he is indifferent between b and c and,finally, he prefers a, b and c to d. We can model his preferences by a ranking withties. A graphic representation of this model is given in Fig. 2.1 where an arrowbetween two candidates (e.g. a and b) means that a is preferred to b and a linebetween them means that a is indifferent to b. Note that, in a ranking with ties,

2.2. MODELLING THE PREFERENCES OF A VOTER 19

a

b

c

d

Figure 2.1: A complete pre-order. Arrows implied by transitivity are not repre-sented

indifference also is transitive. If a voter is indifferent between a and b and betweenb and c, he is also indifferent between a and c.

Incomparability: partial rankings

It can also occur that a voter is unable to rank the candidates, not because hethinks that some of them are equivalent but because he cannot compare some ofthem. There can be several reasons for this.

Poor information Suppose that a voter must compare two candidates a and babout which he knows almost nothing, except that their names are a and band that they are candidates. Such a voter cannot declare that he prefers ato b nor the converse. If he is forced to express his preferences by means of aranking with ties, he will probably rank a and b tied rather than ranking oneabove the other. But this would not really reflect his preferences because hehas no reasons to consider that they are equivalent. It is very likely that oneis better than the other but, as he doesn’t know which one, he is better offnot stating any preferences about them.

Conflicting information Suppose that a voter has to compare two candidates aand b about which he knows a lot. He might be embarrassed when asked totell which candidate he prefers because, in some respects, a is far better thanb but, in other respects, b is far better than a. And he does not know howto balance the pros and cons or he does not want to do so for the moment.

Confidential information Suppose that your mother invited you and your wifefor dinner. At the end of the meal, your mother says “I have never eatensuch a good pie! Does NameOfYourWife prepare it as well as I do ?” Nomatter what your preference is, you would probably be very embarrassed toanswer. And your answer is very likely to be “Well, it is difficult to say.In fact they are different. I like both but I cannot compare them.” Suchsituations are very common in real life where people do not tell the truth,all the truth and nothing but the truth about their preferences.

Of course, this list is not exhaustive. We therefore need to introduce a new modelin which voters are allowed to express incomparabilities. Hence, when comparingtwo candidates a and b, four situations can arise:

1. a is preferred to b,


2. b is preferred to a,

3. a is indifferent to b or

4. a and b are incomparable.

If we keep the transitivity of preference (and indifference), the structure weobtain is called a partial ranking.

Example 13. Transitivity and coffee: semiorders

Consider a voter who is indifferent between a and b as well as between b and c.If we use a ranking with ties to model his preferences, he is necessarily indifferentbetween a and c, because of the transitivity of indifference. Is this what we want ?We are going to borrow a small example from Luce (1956) to show that transitivityof indifference should be dropped, at least in some cases. Let us suppose that Ipresent two cups of coffee to a voter: one cup without sugar, the other one withone grain of sugar. Let us also suppose that he likes his coffee with sugar. If I askhim which cup he prefers, he will tell me that he is indifferent (because he is notable to detect one grain of sugar). He equally dislikes both. I will then present hima cup with one grain and another with two. He will still be indifferent. Next, twograins and three grains, and so on until nine hundred ninety nine and one thousandgrains. The voter will always be indifferent between the two cups that I presentto him because they differ by just one grain of sugar. Because of the transitivityof indifference, he must also be indifferent between a cup without sugar and a cupwith one thousand grains (2 full spoons). But of course, if I ask him which onehe prefers, he will choose the cup with one thousand grains. Thus transitivity ofindifference is violated. A possible objection to this is that the voter will be tiredbefore he reaches the cup with one thousand grains. Furthermore–this is moreserious–the coffee will be cold and he hates that.

There is a structure that keeps transitivity of preference and drops it for in-difference. Consequently, it can model the preferences of our coffee drinker. It iscalled semiorder. For details about semiorders, see Pirlot and Vincke (1997).

Example 14. Transitivity and poneys: more semiorders

Do we need semiorders only when a voter cannot distinguish between two verysimilar objects ? The following example, adapted from (Armstrong 1939) will givethe answer. Suppose that you ask your child to choose between two presents forhis birthday: a poney and a blue bicycle. As he likes both of them equally, he willsay he is indifferent. Suppose now that you present him a third candidate: a redbicycle with a small bell. He will probably tell you that he prefers the red one tothe blue one. “So, you prefer the red bicycle to the poney, is that right ?” youwould say if you consider a transitive indifference. However, it is obvious that thechild can still be indifferent between the poney and the red bicycle.

2.2. MODELLING THE PREFERENCES OF A VOTER 21

poney

blue bike

red bike

Figure 2.2: The poney vs bicycles semiorder

Other binary relations

Rankings with or without ties, partial rankings and semiorders are all binaryrelations. Many other families of binary relations have been considered in theliterature in order to formally model the preferences of individuals as faithfully aspossible (e.g. Roubens and Vincke 1985, Abbas, Pirlot and Vincke 1996). Notethat even the transitivity of strict preference can be questioned due to empiricalobservations (e.g. Fishburn 1988, Fishburn 1991, Tversky 1969, Sen 1997). Let usnow focus on another kind of mathematical structure used to model the preferencesof a voter.

2.2.2 Fuzzy relations

Fuzzy relations can be used to model preferences in at least two very differentsituations.

Fuzzy relations and uncertainty

When a voter is asked to express his preferences by means of a binary relation, hehas to examine each pair and choose “a is preferred to b”, “b is preferred to a”,“a is indifferent to b” or “a and b are incomparable” (if indifference and incom-parability are allowed). In fact, reality is more subtle. When facing a questionlike “do you prefer a to b”, a voter might hesitate. It is easy to imagine situationswhere a voter would like to say “perhaps”. And it is just a step further to imag-ine different situations where a voter would hesitate but with various degrees ofconfidence: almost yes but not completely sure, perhaps but more on the side ofyes, perhaps, perhaps but more on the side of no, . . . There can be many reasonsfor his hesitations.

• He does not have full knowledge about the candidates. For example, in alegislative election, a voter does not necessarily know what the position ofall candidates is regarding a particular issue.

• He does have full knowledge about the candidates but not about some eventsthat might occur in the future and affect the way he compares the candi-dates. For example, again in a legislative election, a voter might ideally knoweverything about all candidates. But he does not know if, during the forth-coming mandate, the representatives will have to vote on a particular issue.If such a vote is to occur, a voter might prefer candidate a to candidate b.


In the other case, he might prefer b to a because there is just one thing thathe disapproves of the policy of b: his position about that particular issue.

• He does not fully know his preferences. Suppose that the community inwhich you live has decided to build a new recreational facility. There aretwo options: a tennis court or a playground. You have to vote. You perfectlyknow the two options (budget, time to completion, plan, . . . ). You like tennisand your children would love that playground. You will have access to bothfacilities under the same conditions. Can you tell which one you will choose ?What will you enjoy more ? To play tennis or to let your children play inthe playground ?

These three cases can be seen as three facets of a single problem. The voter isuncertain about the final consequences of his choice.

Fuzzy relations can be used to model such preferences. The voter must stillanswer the above mentioned question (do you prefer a to b ?), but by numbers,no longer by yes or no. If he feels that “a is preferred to b” is definitely true, heanswers 1. If he feels that “a is preferred to b” is definitely false, he answers 0. Forintermediate situations, he chooses intermediate numbers. For example, perhapscould be 0.5 and almost yes, 0.9. A typical fuzzy relation on three candidates isillustrated by Fig. 2.3 where a number on the arrow between two candidates (e.g.a and b) is the answer of the voter to the question “is a preferred to b”.

a

c

b0.6

0.4

0.8 0.3

1.0

0.0

Figure 2.3: A fuzzy relation

Note that, in some cases, a probability distribution on the possible conse-quences is assumed to exist. In such cases, the problem faced by the voter is nolonger uncertainty but risk. In these cases, probabilities of preference might beassigned to each pair.

Fuzzy relations and preference intensity

In some cases, when a voter is asked to tell if he prefers a to b, he will tend toexpress faint differences in his judgement, not because he is uncertain about hisjudgement, but because the concept of preference is vague and not well defined.For example, a voter might say “I definitely prefer a to b but not as much as Iprefer c to d”. This is due to the fact that preference is not a clear-cut concept.We might then model his preferences by a fuzzy relation and choose 0.5 for (a, b)and 0.8 for (c, d). A value of 0 would correspond to no preference.

2.3. THE VOTING PROCESS 23

Note that in many cases, uncertainty and vagueness are probably simultane-ously present. For a thorough review of fuzzy preference modelling, see (Pernyand Roubens 1998).

2.2.3 Other models

Many other models can be conceived or have been described in the literature. Animportant one is the utilitarian one: a voter assigns to each candidate a number(the utility of the candidate). The position of a candidate with respect to anyother candidate is a function only of the utilities of the two candidates. If theutilities of a and b are respectively are 50 and 40, the implication is that a ispreferred to b. In addition, if the utilities of c and d are respectively 30 and 10,it implies that the preference between c and d is twice as large as the preferencebetween a and b.

Another important model is used in approval voting (Brams and Fishburn1982). In this voting system, every voter votes for as many candidates as he wantsor approves. Consequently, the preferences of a voter are modelled by a partitionof the set of candidates into two subsets: a subset of approved candidates anda subset of disapproved candidates. Approval voting received a lot of attentionduring the last twenty years and has been adopted by a number of committees.

We will not continue our list of preference models any further. Our aim wasjust to give a small overview of the many problems that can arise when tryingto model the preferences of a voter. But there is an important issue that we stillmust address. We encountered many problems in Section 2.1. In this section, wewere using complete orders to model voters’ the preferences. We then examinedalternative models. Is it easier to aggregate individual preferences modelled bymeans of complete pre-orders, semiorders, fuzzy relations, . . . ? Unfortunately,the answer is no. Many examples, similar to those in Section 1, can be built todemonstrate this (Sen 1986, Salles, Barrett and Pattanaik 1992).

2.3 The voting process

Until now, we considered only modelling the preferences of a voter and aggregatingthe preferences of several voters. But voting is much more than that. Here are afew points that are included in the voting process, even if they are often left asidein the literature.

2.3.1 Definition of the set of candidates

Who is going to define the candidates or alternatives that will be submitted to avote ? All the voters, some of them or one of them ? In some cases, e.g. presidentialelections, the candidates are voters that become candidates on a voluntary basis.Nevertheless, there are often some rules: not everyone can be a candidate. Whoshould fix these rules and how ? There is an even more fundamental question:who should decide that voting should occur, on what issue, according to which


rules ? All these questions received different answers in different countries andcommittees. This may indicate that they are far from trivial.

Let us now be more pragmatic. The board of directors of a company asks theexecutive committee to prepare a report on the future investment strategies. Avote on the proposed strategies will be held during the next board of directorsmeeting. How should the executive committee prepare its report ? Should theyinclude all strategies, even infeasible ones ? If infeasible ones are to be avoided,who should decide that they are infeasible. To find all feasible strategies mightbe prohibitively resource and time consuming. And one can never be sure thatall feasible strategies have been explored. There is no systematic way, no formalmethod to do that. Creativity and imagination are needed during this process.

Finally, suppose that the executive committee decides to explore only somestrategies. A more or less arbitrary selection needs to be made. Even if they domake this selection in a perfectly honest way, it can have far reaching consequenceson the outcome of the process. Remember example 11 in which we showed that,for some aggregation methods, the relative ranking of two candidates depends onthe presence (or absence) of some other candidates. Furthermore, some studiesshow that an individual can prefer a to b or b to a depending on the presence orabsence of some other candidate (Sen 1997).

2.3.2 Definition of the set of the voters

Who is going to vote ? As in the previous subsection, let us look at differentdemocracies, past or present. Citizens, rich people, noble people, men, men andwomen, everyone, white men, experts who have some knowledge about the dis-cussed problem, one representative for each faction, a number of representativesproportional to the size of that faction, . . . There is no universal answer.

2.3.3 Choice of the aggregation method

Even the choice of the aggregation method can be considered as part of the votingprocess for, in some cases, the aggregation method is at least as important as theresult of the vote. Consider two countries, A and B: A is ruled by a dictator,B is a democracy. Suppose that each time a policy is chosen by voting in B,the dictator of A applies the same policy in his country, without voting. Hence,all governmental decisions are the same in A and B. The only difference is thatthe people in A do not vote; their benevolent dictator decides alone. In whatcountry would you prefer to live ? I guess you would choose B, unless you arethe dictator. And you would probably choose B even if the decisions taken inB were a little bit worse than the decisions taken in A. What we value in B isfreedom of choice. Some references or more details on this topic can be found in(Sen 1997, Suzumura 1999).

2.4. SOCIAL CHOICE AND MULTIPLE CRITERIA DECISION SUPPORT 25

2.4 Social choice and multiple criteria decisionsupport

2.4.1 Analogies

There is an interesting analogy between voting and multiple criteria decision sup-port. Replace criteria by voters, alternatives by candidates and you get it. Letus be more explicit. In multiple criteria decision support, most papers consideran entity, called decision-maker, that wants to choose an alternative from a set ofavailable alternatives. The decision-maker is often assumed to be an individual,a person. To make his choice, the decision maker takes several viewpoints calledcriteria into account. These criteria are often conflicting, i.e. according to a cri-terion, a given alternative is the best one while, according to another criterion,other alternatives are better.

In a large part of the literature on voting, there is an entity called groupor society that has to choose a candidate from a set of candidates. This entityconsists of individuals and, for some reasons, that can vary largely in differentgroups, the choice made by this entity must reflect in some way the opinion ofthe individuals. And, of course, the individuals often have conflicting views aboutthe candidates. In other words, the preferences of an individual play the samerole, in social choice, as the preferences along a single viewpoint or criterion inmultiple criteria decision support. The collective or social preferences, in socialchoice theory, and the global or multiple criteria preferences, in multiple criteriadecision support, can be compared in the same way.

The main interest of this analogy lies in the fact that voting has been studiedfor a long time. The seminal works by Borda (1781), Condorcet (1785), and Arrow(1963) have led to an important stream of research in the 20th century. Hence wehave a huge amount of results on voting at our disposal for use in multiple criteriadecision support. Besides, this similarity has widely been used (see e.g. Arrow andRaynaud 1986, Vansnick 1986).

In this chapter, we only discussed elections in which only one candidate mustbe chosen (single-seat constituencies, prime ministers or presidents). However, itis often the case that several candidates must be chosen. For example, in Belgiumand Germany, in each constituency, several representatives are elected so as toachieve a proportional representation. A committee that must select projects froma list often selects several ones, according to the available resources. In multiplecriteria decision support, such cases are common. An investor usually invests in aportfolio of stocks. A human resources manager chooses amongst the candidatesthose that will form an efficient team, etc.

In fact, the comparison can be extended to the processes of voting and decision-making. In multiple criteria decision support, the decision process is much broaderthan just the extraction, by some aggregation method, of the best alternative froma performance tableau.

The very beginning of the process, the problem definition, is a crucial step.When a decision maker enters a decision process, he has no clearly defined problem.He just feels unsatisfied with his current situation. He then tries to structure his


view of the situation, to put labels on different entities, to look for relationshipsbetween entities, etc. Finally he obtains a “problem ”, as one can find in books.It is a description, in formal language or not, of the current situation. It usuallycontains a description of the reasons for which that situation is not satisfying andit contains an implicit description of the potential solutions to the problem. Thatis, the problem statement contains information that allows to recognise if a givenaction or course of actions is a potential solution or not. The problem statementmust not be too broad, otherwise anything can be a solution and the decision-maker is not helped. On the contrary, if the statement is too narrow, some actionsare not recognised as potential solutions even if they would be good ones.

Some authors, mainly in the United Kingdom, have developed methods to helpdecision-makers to better structure their problem (Rosenhead 1989, Daellenbach1994).

When the problem has been stated, the decision-maker has a problem, but nosolution. He must construct the set of alternatives, like the candidates set in socialchoice. Brainstorming and other techniques promoting and stimulating creativityhave been developed to support this step.

The criteria, like the voters, are not given in a decision process. The decision-maker needs to identify all the viewpoints that are relevant with respect to hisproblem. He then must define a set of criteria that reflect all relevant viewpointsand that fulfills some conditions. There must not be several criteria reflectingthe same viewpoint. All criteria should be independent except if the aggregationmethod to be used thereafter allows dependence between criteria. Depending onthe aggregation method, the scales corresponding to the criteria must have someproperties. And so on. See e.g. Roy (1996) and Keeney and Raiffa (1976).

Last but not least, the aggregation method itself must be chosen by the analystand/or the decision-maker. It is hard to imagine how an aggregation procedurecould be scientifically proven to be the best one. The decision-maker must thusmake a choice. He should choose the one that satisfies some properties he judgesimportant, the one he can understand, the one he trusts.

2.5 Conclusions

In this chapter, we have shown that the operation of voting is far from simple. Inthe first section, using small examples, describing very simple situations, we foundthat intuition and common sense are not sufficient to avoid the many traps thatawait us when using aggregation procedures. In fact, in this domain, commonsense is of very little help. We also presented two theoretical results indicatingthat there is no hope of finding a perfect voting procedure. Therefore, if we stillwant to use a voting procedure–this seems hardly avoidable–we must accept to usean imperfect one. But this does not mean that we can use any procedure in anycircumstance and any way. The flaws of a particular procedure are probably lessdamageable in some instances than in others. Some features of a voting proceduremay be highly desirable in a given context while not so important in another one.So, for each voting context, we have to choose the procedure that best matches our

2.5. CONCLUSIONS 27

needs. And, when we have made this choice, we must be aware that this match isnot perfect, that we must use the procedure in such a way that the risk of facinga problematic situation is kept as low as possible.

In Section 2, we found that even the input of voting procedures–the preferencesof the voters–are not simple things. Many different models for preferences existand can be used in aggregation procedures. This shows that what is usuallyconsidered as data is not really data. When we feed our aggregation procedureswith preferences, these are not given. They are constructed in some more or lessarbitrary way. The choice of a particular model (ranking with ties, fuzzy relations,. . . ) is itself arbitrary. Nothing in the “problem” tells us what model to use.

Finally, in Section 3, we showed that the voting process itself is highly complex.Voting procedures are decision models, just like student grades, indicators,

cost-benefit analysis, multiple criteria decision support (this has already been dis-cussed in Section 4), . . . They are decision models devoted to the special case wherea decision must be taken by a group of voters and are mainly concerned with thecase of a finite and small set of alternatives. This peculiarity doesn’t make votingprocedures very different from other decision and evaluation models. As you willsee in the following chapters, most decision models suffer the same kind of problemsthat we have met in this chapter: there is no perfect aggregation procedure; thedata are not data, they are imperfect and arbitrary models; the decision modelsare too narrow, they do not take into account the fact that decision support occursin a human process (the decision making process) and in a complex environment.

3BUILDING AND AGGREGATING

EVALUATIONS: THE EXAMPLE OFGRADING STUDENTS

3.1 Introduction

3.1.1 Motivation

In chapter 2, we tried to show that “voting”, although being a familiar activityto almost everyone, raises many important and difficult questions that are closelyconnected to the subject of this book. Our main objective in this chapter issimilar. We all share the – more or less pleasant – experience of having received“grades” in order to evaluate our academic performances. The authors of thisbook spend part of their time evaluating the performance of students throughgrading several kinds of work, an activity that you may also be familiar with. Thepurpose of this chapter is to build upon this shared experience. This will allow usto discuss, based on simple and familiar situations, what is meant by “evaluatinga performance” and “aggregating evaluations”, both activities being central tomost evaluation and decision models. Although the entire chapter is based on theexample of grading students, it should be stressed that “grades” are often usedin contexts unrelated to the evaluation of the performance of students: employeesare often graded by their employers, products are routinely tested and graded byconsumer organisations, experts are used to rate the feasibility or the riskiness ofprojects, etc. The findings of this chapter are therefore not limited to the realmof a classroom.

As with voting systems, there is much variance across countries in the way“education” is organised. Curricula, grading scales, rules for aggregating gradesand granting degrees, are seldom similar from place to place (for information onthe systems used in the European Union see www.eurydice.org).

This diversity is even increased by the fact that each “instructor” (a word thatwe shall use to mean the person in charge of evaluating students) has generallydeveloped his own policy and habits. The authors of this book have studied in fourdifferent European countries (Belgium, France, Greece and Italy) and obtaineddegrees in different disciplines (Maths, Operational Research, Computer Science,Geology, Management, Physics) and in different Universities. We were not overlyastonished to discover that the rules that governed the way our performances wereassessed were quite different. We were perhaps more surprised to realise that

29

30 CHAPTER 3. BUILDING AND AGGREGATING EVALUATIONS

although we all teach similar courses in comparable institutions, our “gradingpolicies” were quite different even after having accounted for the fact that thesepolicies are partly contingent upon the rules governing our respective institutions.Such diversity might indicate that evaluating students is an activity that is perhapsmore complex than it appears at first sight.

3.1.2 Evaluating students in Universities

We shall restrict our attention in this chapter to education programmes with whichwe are familiar. Our general framework will be that of a programme at Universitylevel in which students have to take a number of “courses” or “credits”. In eachcourse the performance of students is graded. These grades are then collected andform the basis of a decision to be taken about each student. Depending on theprogramme, this decision may take various forms, e.g. success or failure, success orfailure with possible additional information such as distinctions, ranks or averagegrades, success or failure with the possibility of a differed decision (e.g. the degree isnot granted immediately but there is still a possibility of obtaining it). Quite oftenthe various grades are “summarised”, “amalgamated”, we shall say “aggregated”,in some way before a decision is taken.

In what follows, we shall implicitly have in mind the type of programmes inwhich we teach (Mathematics, Computer Science, Operational Research, Engi-neering) that are centred around disciplines which, at least at first sight, seem toraise less “evaluation problems” than if we were concerned with, say, Philosophy,Music or Sports.

Dealing only with “technically-oriented” programmes at University level willclearly not allow us to cover the immense literature that has been developed inEducation Science on the evaluation of the performance of students. For goodaccounts in English, we refer to Airaisian (1991), Davis (1993), Lindheim, Morrisand Fitz-Gibbon (1987), McLean and Lockwood (1996), Moom (1997) and Speck(1998). Note that in Continental Europe, the Piagetian influence, different institu-tional constraints and the popularity of the classic book by Pieron (1963) have ledto a somewhat different school of thought, see Bonboir (1972), Cardinet (1986),de Ketele (1982), de Landsheere (1980), Merle (1996) and Noizet and Caverini(1978). As we shall see, this will however allow us to raise several important is-sues concerning the evaluation and the aggregation of performances. Two typesof questions prove to be central for our purposes:

• how to evaluate the performance of students in a given “course”, what is themeaning of the resulting “grades” and how to interpret them?

• how to combine the various grades obtained by a student in order to arriveat an overall evaluation of his academic performance?

These two sets of questions structure this chapter into sections.

3.2. GRADING STUDENTS IN A GIVEN COURSE 31

3.2 Grading students in a given course

Most of you have probably been in the situation of an “instructor” having toattribute grades to students. Although this is clearly a very important task, manyinstructors share the view that this is far from being the easiest and most pleasantpart of their jobs. We shall try here to give some hints on the process that leadsto the attribution of a grade as well as on some of its pitfalls and difficulties.

3.2.1 What is a grade?

We shall understand a grade as an evaluation of the performance of a student ina given course, i.e. an indication of the level to which a student has fulfilled theobjectives of the course.

This very general definition calls for some remarks.

1. A grade should always be interpreted in connection with the objectives of acourse. Although it may appear obvious, this implies a precise statement ofthe objectives of the course in the syllabus, a condition that is unfortunatelynot always perfectly met.

2. All grades do not have a similar function. Whereas usually the final gradeof a course in Universities mainly has a “certification” role, intermediategrades, on which the final grade may be partly based, have a more complexrole that is often both “certificative” and “formative”, e.g. the result of amid-term exam is included in the final grade but is also meant to be a signalto a student indicating his strengths and weaknesses.

3. Although this is less obvious in Universities than in elementary schools, itshould be noticed that grades are not only a signal sent by the instructorto each of his students. They have many other potential important “users”:other students using them to evaluate their position in the class, other in-structors judging your severity and/or performance, parents watching overtheir child, administrations evaluating the performance of programmes, em-ployers looking for all possible information on an applicant for a job.

Thus, it appears that a grade is therefore a complex “object” with multi-ple functions (see Chatel 1994, Laska and Juarez 1992, Lysne 1984, McLean andLockwood 1996). Interpreting it necessarily calls for a study of the process thatleads to its attribution.

3.2.2 The grading process

What is graded and how?

The types of work that are graded, the scale used for grading and the way of amal-gamating these grades may vary in significant ways for similar types or courses.


1. The scale that is used for grading students is usually imposed by the pro-gramme. Numerical scales are often used in Continental Europe with varyingbounds and orientations: 0-20 (in France or Belgium), 0-30 (in Italy), 6-1 (inGermany and parts of Switzerland), 0-100 (in some Universities). Americanand Asian institutions often use a letter scale, e.g. E to A or F to A. Obvi-ously we would not want to conclude from this that Italian instructors havecome to develop much more sensitive instruments for evaluating performancethan German ones or that the evaluation process is in general more “precise”in Europe than it is in the USA. Most of us would agree that the choice ofa particular scale is mainly conventional. It should however be noted thatsince grades are often aggregated at some point, such choices might not betotally without consequences. We shall come back to that point in section3.3.

2. Some courses are evaluated on the basis of a single exam. But there aremany possible types of exams. They may be written or oral; they may beopen-book or closed-book. Their duration may vary (45 minute exams arenot uncommon in some countries whereas they may last up to 8 hours insome French programmes). Their content for similar courses may vary frommultiple choice questions to exercises, case-studies or essays.

3. In most courses the final grade is based on grades attributed to multipletests. The number and type of work may vary a lot: final exam, mid-termexam, exercises, case-studies or even “class participation”. Furthermore theway these various grades are aggregated is diverse: simple weighted average,grade only based on exams with group work (e.g. case-studies or exercises)counting as a bonus, imposition of a minimal grade at the final exam, etc.(an overview of grading policies and practices in the USA can be found inRiley, Checca, Singer and Worthington 1994).

4. Some instructors use “raw” grades. For reasons to be explained later, othersmodify the “raw” grades in some way before aggregating and/or releasingthem, e.g. standardising them.

Preparing and grading a written exam

Within a given institution suppose that you have to prepare and grade a written,closed-book, exam. We shall take the example of an exam for an “Introduction toOperational Research (OR)” course, including Linear Programming (LP), IntegerProgramming and Network models, with the aim of giving students a basic un-derstanding of the modelling process in OR and an elementary mastering of somebasic techniques (Simplex Algorithm, Branch and Bound, elementary NetworkAlgorithms). Many different choices interfere with such a task.

1. Preparing a subject. All instructors know that preparing the subject of anexam is a difficult and time consuming task. Is the subject of adequate diffi-culty? Does it contain enough questions to cover all parts of the programme?


Do all the questions clearly relate to one or several of the announced objec-tives of the course? Will it allow to discriminate between students? Is therea good balance between modelling and computational skills? What shouldthe respective parts of closed vs. open questions be?

2. Preparing a marking scale. The preparation of the marking scale for a givensubject is also of utmost importance. A “nice-looking” subject might beimpractical in view of the associated marking scale. Will the marking scaleinclude a bonus for work showing good communication skills and/or willmisspellings be penalised? How to deal with computational errors? Howto deal with computational errors that lead to inconsistent results? How todeal with computational errors influencing the answers to several questions?How to judge an LP model in which the decision variables are incompletelydefined? How to judge a model that is only partially correct? How to judge amodel which is inconsistent from the point of view of units? Although muchexpertise and/or “rules of thumb” are involved in the preparation of a goodsubject and its associated marking scale, we are aware of no instructor nothaving had to revise his judgement after correcting some work and realisinghis severity and/or to correct work again after discovering some frequentlygiven half-correct answers that were unanticipated in the marking scale.

3. Grading. A grade evaluates the performance of a student in completingthe tasks implied by the subject of the exam and, hopefully, will give anindication of the extent to which a student has met the various objectivesof the course (in general an exam is far from dealing with all the aspectsthat have been dealt with during the course). Although this is debatable,such an evaluation is often thought of as a “measure” of performance. Forthis kind of “measure” the psychometric literature (see Ebel and Frisbie1991, Kerlinger 1986, Popham 1981), has traditionally developed at leasttwo desirable criteria. A measure should be:

• reliable, i.e. give similar results when applied several times in similarconditions,

• valid, i.e. should measure what was intended to be measured and onlythat.

Extensive research in Education Science has found that the process of givinggrades to students is seldom perfect in these respects (a basic reference re-mains the classic book of Pieron (1963). Airaisian (1991) and Merle (1996)are good surveys of recent findings). We briefly recall here some of thedifficulties that were uncovered.

The crudest reliability test that can be envisaged is to give similar works tocorrect to several instructors and to record whether or not these works aregraded similarly. Such experiments were conducted extensively in variousdisciplines and at various levels. Not overly surprisingly, most experimentshave shown that even in the more “technical” disciplines (Maths, Physics,Grammar) in which it is possible to devise rather detailed marking scales


there is much difference between correctors. On average the difference be-tween the more generous and the more severe correctors on Maths workcan be as high as 2 points on a 0-20 scale. Even more strikingly on somework in Maths the difference can be as high as 9 points on a 0-20 scale (seePieron 1963).

In other experiments the same correctors are asked to correct a work thatthey have already corrected earlier. These auto-reliability tests give similarresults since in more than 50% of the cases the second grade is “significantly”different from the first one. Although few experiments have been conductedwith oral exams, it seems fair to suppose that they are no more reliable thanwritten ones.

Other experiments have shown that many extraneous factors may interfere inthe process of grading a paper and therefore question the validity of grades.Instructors accustomed to grading papers will not be surprised to note that:

• grades usually show much auto correlation: similar papers handed inby a usually “good” student and by a usually “uninterested” studentare likely not to receive similar grades,

• the order in which papers are corrected greatly influences the grades.Near the end of a correction task, most correctors are less generous andtend to give grades with a higher variance.

• “anchoring effects” are pervasive: it is always better to be correctedafter a remarkably poor work than after a perfect one.

• misspellings and poor hand-writing prove to have a non negligible influ-ence on the grades even when the instructor declares not to take theseeffects into account or is instructed not to.

4. The influence of correction habits. Experience shows that “correction habits”tend to vary from one instructor to another. Some of them will tend to givean equal percentage of all grades and will tend to use the whole range of thescale. Some will systematically avoid the extremes of the range and the dis-tribution of their marks will have little variability. Others will tend to giveonly extreme marks e.g. arguing that either the basic concepts are under-stood or they are not. Some are used to giving the lowest possible grade afterhaving spotted a mistake which, in their minds, implies that “nothing hasbeen understood” (e.g. proposing a “non linear LP model”). The distribu-tion of grades for similar papers will tend to be highly different according tothe corrector. In order to cope with such effects, some instructors will tendto standardise the grades before releasing them (the so-called “z-scores”),others will tend to equalise average grades from term to term and/or use amore or less ad hoc procedure.

Defining a grading policy

A syllabus usually contains a section entitled “grading policy”. Although instruc-tors do not generally consider it as the most important part of their syllabus, they


are aware that it is probably the part that is read first and most attentively by allstudents. Besides useful considerations on “ethics”, this section usually describesthe process that will lead to the attribution of the grades for the course in detail.On top of describing the type of work that will be graded, the nature of examsand the way the various grades will contribute to the determination of the finalgrade, it usually also contains many “details” that may prove important in orderto understand and interpret grades. Among these “details”, let us mention:

• the type of preparation and correction of the exams: who will prepare thesubject of the exam (the instructor or an outside evaluator)? Will the workbe corrected once or more than once (in some Universities all exams arecorrected twice)? Will the names of the students be kept secret?

• the possibility of revising a grade: are there formal procedures allowingthe students to have their grades reconsidered? Do the students have thepossibility of asking for an additional correction? Do the students havethe possibility of taking the same course at several moments in the academicyear? What are the rules for students who cannot take the exam (e.g. becausethey are sick)?

• the policy towards cheating and other dishonest behaviour (exclusion fromthe programme, attribution of the lowest possible grade for the course, at-tribution of the lowest possible grade for the exam).

• the policy towards late assignments (no late assignment will be graded, minusx points per hour or day).

Determining final grades

The process of the determination of the final grades for a given course can hardlybe understood without a clear knowledge of the requirements of the programmein order to obtain the degree. In some programmes students are only requiredto obtain a “satisfactory grade” (it may or not correspond to the “middle” ofthe grading scale that is used) for all courses. In others, an “average grade”is computed and this average grade must be over a given limit to obtain thedegree. Some programmes attribute different kinds of degrees through the use of“distinctions”. Some courses (e.g. “core courses”) are sometimes treated apart; a“dissertation” may have to be completed.

The freedom of an instructor in arranging his own grading policy is highlyconditioned by this environment. A grade can hardly be interpreted without aclear knowledge of these rules (note that this sometimes creates serious problemsin institutions allowing students pertaining to different programmes with differentsets of rules to attend the same courses). Within a well defined set of rules,however, many degrees of freedom remain. We examine some of them below.

Weights We mentioned that the final grade for a course was often the combina-tion of several grades obtained throughout the course: mid-term exam, final exam,case-studies, dissertation, etc. The usual way to proceed is to give a (numerical)


weight to each of the work entering into the final grade and to compute a weightedaverage, more important works receiving higher weights. Although this process issimple and almost universally used, it raises some difficulties that we shall examinein section 3.3. Let us simply mention here that the interpretation of “weights” insuch a formula is not obvious. Most instructors would tend to compensate for avery difficult mid term exam (weight 30%) preparing a comparatively easier finalexam (weight 70%). However, if the final exam is so easy that most studentsobtain very good grades, the differences in the final grades will be attributablealmost exclusively to the mid term exam although it has a much lower weightthan the final exam. The same is true if the final grade combines an exam witha dissertation. Since the variance of the grades is likely to be much lower for thedissertation than for the exam, the former may only marginally contribute towardsexplaining differences in final grades independently of the weighting scheme. Inorder to avoid such difficulties, some instructors standardise grades before averag-ing them. Although this might be desirable in some situations, it is clear that themore or less arbitrary choice of a particular measure of dispersion (why use thestandard deviation and not the inter quartile range? should we exclude outliers?)may have a crucial influence on the final grades. Furthermore, the manipulationof such “distorted grades” seriously complicates the positioning of students withrespect to a “minimal passing grade” since their use amounts to abandoning anyidea of “absolute” evaluation in the grades.

Passing a course In some institutions, you may either “pass” or “fail” a courseand the grades obtained in several courses are not averaged. An essential problemfor the instructor is then to determine which students are above the “minimalpassing grade”. When the final grade is based on a single exam we have seenthat it is not easy to build a marking scale. It is even more difficult to conceive amarking scale in connection to what is usually the minimal passing grade accordingto the culture of the institution. The question boils down to deciding what amountof the programme should a student master in order to obtain a passing grade, giventhat an exam only gives partial information about the amount of knowledge of thestudent.

The problem is clearly even more difficult when the final grade results from theaggregation of several grades. The use of weighted averages may give undesirableresults since, for example, an excellent group case-study may compensate for avery poor exam. Similarly weighted averages do not take the progression of thestudent during the course into account.

It should be noted that the problem of positioning students with respect to aminimal passing grade is more or less identical to positioning them with respectto any other “special grades”, e.g. the minimal grade for being able to obtain a“distinction”, to be cited on the “Dean’s honour list” or the “Academic HonourRoll”.


3.2.3 Interpreting grades

Grades from other institutions

In view of the complexity of the process that leads to the attribution of a grade,it should not be a surprise that most instructors find it very difficult to interpretgrades obtained in another institution. Consider a student joining your programmeafter having obtained a first degree at another University. Arguing that he hasalready passed a course in OR with 14 on a 0-20 scale, he wants to have theopportunity to be dispensed from your class. Not aware of the grading policy ofthe instructor and of the culture and rules of the previous University this studentattended, knowing that he obtained 14 offers you little information. The knowledgeof his rank in the class may be more useful: if he obtained one of the highest gradesthis may be a good indication that he has mastered the contents of the coursesufficiently. However, if you were to know that the lowest grade was 13 and that14 is the highest, you would perhaps be tempted to conclude that the differencebetween 13 and 14 may not be very significant and/or that you should not trustgrades that are so generous and exhibit so little variability.

Grades from colleagues

Being able to interpret the grade that a student obtained in your own institutionis quite important at least as soon as some averaging of the grades is performed inorder to decide on the attribution of a degree. This task is clearly easier than thepreceding one: the grades that are to be interpreted here have been obtained in asimilar environment. However, we would like to argue that this task is not an easyone either. First it should be observed that there is no clear implication in havingobtained a similar grade in two different courses. Is it possible or meaningful toassert that a student is “equally good” in Maths and in Literature? Is it possibleto assert that, given the level of the programme, he has satisfied to a greaterextent the objectives of the Maths course than the objectives of the Literaturecourse? Our experience as instructors would lead us to answer negatively to suchquestions even when talking of programmes in which all objectives are very clearlystated. Secondly, in section 3.2.2 we mentioned that, even within fixed institutionalconstraints, each instructor still had many degrees of freedom to choose his gradingpolicy. Unless there is a lot of co-ordination between colleagues they may applyquite different rules e.g. in dealing with late assignments or in the nature andnumber of exams. This seriously complicates the interpretation of the profile ofgrades obtained by a student.

Interpreting your own grades

The numerical scales used for grades throughout Europe tend to give the impres-sion that grades are “real measures” and that, consequently these numbers maybe manipulated as any other numbers. There are many possible kinds of “mea-sure” and having a numerical scale is no guarantee that the numbers on that scalemay be manipulated in all possible ways. In fact, before manipulating numberssupposedly resulting from “measurements” it is always important to try to figure


out on which type of scales they have been “measured”. Let us notice that thisis true even in Physics. Saying that Mr. X weighs twice as much as Mr. Y “makessense” because this assertion is true whether mass is measured in pounds or inkilograms. Saying that the average temperature in city A is twice as high as theaverage temperature in city B may be true but makes little sense since the truthvalue of this assertion clearly depends on whether temperature is measured usingthe Celsius or the Fahrenheit scale.

The highest point on the scale An important feature of all grading scales isthat they are bounded above. It should be clear that the numerical value attributedto the highest point on the scale is somewhat arbitrary and conventional. No lossof information would be incurred using a 0-100 or a 0-10 scale instead of a 0-20one. At best it seems that grades should be considered as expressed on a ratioscale, i.e. a scale in which the unit of measurement is arbitrary (such scales arefrequent in Physics, e.g. length can be measured in meters or inches without lossof information).

If grades can be considered as measured on a ratio scale, it should be recognisedthat this ratio scale is somewhat awkward because it is bounded above. Unless youadmit that knowledge is bounded or, more realistically, that “perfectly fulfillingthe objectives of a course” makes clear sense, problems might appear at the upperbound of the scale. Consider two excellent, but not necessarily “equally excellent”,students. They cannot obtain more than the perfect grade 20/20. Equality ofgrades at the top of the scale (or near the top, depending on grading habits) doesnot necessarily imply equality in performance (after a marking scale is devised it isnot exceptional that we would like to give some students more than the maximalgrade, i.e. because some bonus is added for particularly clever answers, whereasthe computer system of most Universities would definitely reject such grades !).

The lowest point on the scale It should be clear that the numerical value thatis attributed to the lowest point of the scale is no less arbitrary and conventionalthan was the case for the highest point. There is nothing easier than to transformgrades expressed on a 0-20 scale to grades expressed on a 100-120 scale and thisinvolves no loss of information. Hence it would seem that a 0-20 scale mightbe better viewed as an interval scale, i.e. a scale in which both the origin andthe unit of measurement are arbitrary (think of temperature scale in Celsius orFahrenheit). An interval scale allows comparisons of “differences in performance”;it makes sense to assert that the difference between 0 and 10 is similar to thedifference between 10 and 20 or that the difference between 8 and 10 is twice aslarge as the difference between 10 and 11, since changing the unit and origin ofmeasurement clearly preserves such comparisons.

Let us notice that using a scale that is bounded below is also problematic. Insome institutions the lowest grade is reserved for students who did not take theexam. Clearly this does not imply that these students are “equally ignorant”.Even when the lowest grade can be obtained by students having taken the exam,some ambiguity remains. “Knowing nothing”, i.e. having completely failed to meetany of the objectives of the course, is difficult to define and is certainly contingent


upon the level of the course (this is all the more true that in many institutionsthe lowest grade is also granted to students having cheated during the exam,with obviously no guarantee that they are “equally ignorant”). To a large extent“knowing nothing” — in the context of a course — is somewhat as arbitrary as is“knowing everything”. Therefore, if grades are expressed on interval scales, careshould be taken when manipulating grades close to the bounds of the scale.

In between We already mentioned that on an interval scale, it makes sense tocompare differences in grades. The authors of this book (even if their studentsshould know that they spend a lot of time and energy in grading them !) donot consider that their own grades always allow for such comparisons. First wealready mentioned that a lot of care should be taken in manipulating grades thatare “close” to the bounds. Second, in between these bounds, some grades are veryparticular in the sense that they play a particular role in the attribution of thedegree. Let us consider a programme in which all grades must be above a minimalpassing grade, say, 10 on a 0-20 scale, in order to obtain the degree. If it is clearthat an exam is well below the passing grade, few instructors will claim that thereis a highly significant difference between 4/20 and 5/20. Although the latter examseems slightly better than the former, the essential idea is that they are bothwell below the minimal passing grade. On the contrary the gap between 9/20and 10/20 may be much more important since before putting a grade just belowthe passing grade most instructors usually make sure that they will have goodarguments in case of a dispute (some systematically avoid using grades just belowthe minimal passing grade). In some programmes, not only the minimal passinggrade has a special role: some grades may correspond to different possible levelsof distinction, other may correspond to a minimal acceptable level below whichthere is no possibility of compensation with grades obtained in other courses. Inbetween these “special grades” it seems that the reliable information conveyed bygrades is mainly ordinal. Some authors have been quite radical in emphasising thispoint, e.g. Cross (1995) stating that: “[...] we contend that the difficulty of nearlyall academic tests is arbitrary and regardless of the scoring method, they providenothing more than ranking information” (but see French 1993, Vassiloglou andFrench 1982). At first sight this would seem to be a strong argument in favourof the letter system at use in most American Universities that only distinguishesbetween a limited classes of grades (usually from F or E to A with, in someinstitutions, the possibility of adding “+” or “–” to the letters). However, sincethese letter grades are usually obtained via the manipulation of a distribution ofnumerical grades of some sort, the distinction between letter grades and numericalgrades is not as deep as it appears at first sight. Furthermore the aggregation ofletter grades is often done via a numerical transformation as we shall see in section3.3.

Finally it should be observed that, in view of the lack of reliability and validityof some aspects of the grading process, it might well be possible to assert that smalldifferences in grades that do not cross any special grades may not be significant atall. A difference of 1 point on a 0-20 scale may well be due only to chance via theposition of the work, the quality of the preceding papers, the time of correction.


Once more grades appear as complex objects. While they seem to mainlyconvey ordinal information (with the possibility of the existence of non significantsmall differences) that is typical of a relative evaluation model, the existence ofspecial grades complicates the situation in introducing some “absolute” elementsof evaluation in the model (on the measurement-theoretic interpretation of gradessee French 1981, Vassiloglou 1984).

3.2.4 Why use grades?

Some readers, and most notably instructors, may have the impression that wehave been overly pessimistic on the quality of the grading process. We wouldlike to mention that the literature in Education Science is even more pessimisticleading some authors to question the very necessity of using grades (see Sager 1994,Tchudi 1997). We suggest to sceptical instructors the following simple experiment.Having prepared an exam, ask some of your colleagues to take it with the followinginstructions: prepare what you would think to be an exam that would just beacceptable for passing, prepare an exam that would clearly deserve distinction,prepare an exam that is well below the passing grade. Then apply your markingscale to these papers prepared by your colleagues. It would be extremely likelythat the resulting grades show some surprises!

However, none of us would be prepared to abandon grades, at least for thetype of programmes in which we teach. The difficulties that we mentioned wouldbe quite problematic if grades were considered as “measures” of performance thatwe would tend to make more and more “precise” and “objective”. We tend toconsider grades as an “evaluation model” trying to capture aspects of somethingthat is subject to considerable indetermination, the “performance of students”.

As is the case with most evaluation models, their use greatly contributes totransforming the “reality” that we would like to “measure”. Students cannotbe expected to react passively to a grading policy; they will undoubtedly adapttheir work and learning practice to what they perceive to be its severity andconsequences. Instructors are likely to use a grading policy that will dependon their perception of the policy of the Faculty (on these points, see Sabot andWakeman 1991, Stratton, Myers and King 1994). The resulting “scale of measure-ment” is unsurprisingly awkward. Furthermore, as with most evaluation modelsof this type, aggregating these evaluations will raise even more problems.

This not to say that grades cannot be a useful evaluation model. If these lineshave lead some students to consider that grades are useless, we suggest they tryto build up an evaluation model that would not use grades without, of course,relying too much on arbitrary judgements. This might not be an impossible task;we, however, do not find it very easy.

3.3. AGGREGATING GRADES 41

3.3 Aggregating grades

3.3.1 Rules for aggregating grades

In the previous section, we hope to have convinced the reader that grading astudent in a given course is a difficult task and that the result of this process is acomplex object.

Unfortunately, this is only part of the evaluation process of students enrolledin a given programme. Once they have received a grade in each course, a decisionstill has to be made about each student. Depending on the programme, we alreadymentioned that this decision may take different forms: success or failure, successor failure with possible additional information e.g. distinctions, ranks or averagegrades, success or failure with the additional possibility of partial success (thedegree is not granted immediately but there remains a possibility of obtaining it),etc. Such decisions are usually based on the final grades that have been obtainedin each course but may well use some other information, e.g. verbal comments frominstructors or extra-academic information linked to the situation of each student.

What is required from the students to obtain a degree is generally describedin a lengthy and generally opaque set of rules that few instructors—but generallyall students—know perfectly (as an interesting exercise we might suggest thatyou investigate whether you are perfectly aware of the rules that are used in theprogrammes in which you teach or, if you do not teach, whether you are aware ofsuch rules for the programmes in which your children are enrolled). These rulesexhibit such variety that it is obviously impossible to exhaustively examine themhere. However, it appears that they are often based on three kinds of principles(see French 1981).

Conjunctive rules

In programmes of this type, students must pass all courses, i.e. obtain a gradeabove a “minimal passing grade” in all courses in order to obtain the degree. Ifthey fail to do so after a given period of time, they do not obtain the degree.This very simple rule has the immense advantage of avoiding any amalgamationof grades. It is however seldom used as such because:

• it is likely to generate high failure rates,

• it does not allow to discriminate between grades just below the passing gradeand grades well below it,

• it offers no incentive to obtain grades well above the minimal passing grade,

• it does not allow to discriminate (e.g. using several kinds of distinctions)between students obtaining the degree.

Most instructors and students generally violently oppose such simple systems sincethey generate high failure rates and do not promote “academic excellence”.


Weighted averages

In many programmes, the grades of students are aggregated using a simple weightedaverage. This average grade (the so-called “GPA” in American Universities) is thencompared to some standards e.g. the minimal average grade for obtaining the de-gree, the minimal average grade for obtaining the degree with a distinction, theminimal average grade for being allowed to stay in the programme, etc. Whereasconjunctive rules do not allow for any kind of compensation between the gradesobtained for several courses, all sorts of compensation effects are at work with aweighted average.

Minimal acceptable grades

In order to limit the scope of compensation effects allowed by the use of weightedaverages, some programmes include rules involving “minimal acceptable grades”in each course. In such programmes, the final decision is taken on the basis ofan average grade provided that all grades entering this average are above someminimal level.

The rules that are used in the programmes we are aware of often involve amixture of these three principles, e.g. an average grade is computed for each “cat-egory” of courses provided that the grade of each course is above a minimal leveland such average grades per category of courses are then used in a conjunctivefashion. Furthermore, it should be noticed that the final decision concerning astudent is very often taken by a committee that has some degree of freedom withrespect to the rules and may, for instance, grant the degree to someone who doesnot meet all the requirements of the programme e.g. because of serious personalproblems.

All these rules are based on “grades” and we saw in section 3.2 that the verynature of the grades was highly influenced by these rules. This amounts to aggre-gating evaluations that are highly influenced by the aggregation rule. This makesaggregation an uneasy task. We study some aspects of the most common aggre-gation rule for grades below: the weighted average (more examples and commentswill be found in chapters 4 and 6).

3.3.2 Aggregating grades using a weighted average

The purpose of rules for aggregating grades is to know whether the overall per-formance of a student is satisfactory taking his various final grades into account.Using a weighted average system amounts to assessing the performance of a stu-dent combining his grades using a simple weighting scheme. We shall suppose thatall final grades are expressed on similar scales and note gi(a) the final grade forcourse i obtained by student a. The average grade obtained by student a is thencomputed as g(a) =

∑ni=1 wigi(a), the (positive) weights wi reflecting the “im-

portance” (in “academic” terms and/or in function of the length of the course)of the course for the degree. The weights wi may, without loss of generality, benormalised in such a way that

∑ni=1 wi = 1. Using such a convention the aver-

age grade g(a) will be expressed on a scale having the same bounds as the scale


used for the gi(a). The simplest decision rule consists in comparing g(a) withsome standards in order to decide on the attribution of the degree and on possibledistinctions. A number of examples will allow us to understand the meaning ofthis rule better and to emphasise its strengths and weaknesses (we shall supposethroughout this section that students have all been evaluated on the same courses;for the problems that arise when this is not so, see Vassiloglou (1984)).

Example 1

Consider four students enrolled in a degree consisting of two courses. For eachcourse, a final grade between 0 and 20 is allocated. The results are as follows:

g1 g2a 5 19b 20 4c 11 11d 4 6

Student c has performed reasonably well in all courses whereas d has a consis-tent very poor performance; both a and b are excellent in one course while havinga serious problem in the other. Casual introspection suggests that if the studentswere to be ranked, c should certainly be ranked first and d should be ranked last.Students a and b should be ranked in between, their relative position dependingon the relative importance of the two courses. Their very low performance in 50%of the courses does not make them good candidates for the degree. The use ofsimple weighted average of grades leads to very different results. Considering thatboth courses are of equal importance gives the following average grades:

average gradesa 12b 12c 11d 5

which leads to having both a and b ranked before c. As shown in figure 3.1, we cansay even more: there is no vector of weights (w, 1−w) that would rank c before botha and b. Ranking c before a implies that 11w+ 11(1−w) > 5w+ 19(1−w) whichleads to w > 8

15 . Ranking c before b implies 11w+11(1−w) > 20w+4(1−w), i.e.w < 7

16 (figure 3.1 should make clear that there is no loss of generality in supposingthat weights sum to 1). The use of a simple weighted sum is therefore not in linewith the idea of promoting students performing reasonably well in all courses.The exclusive reliance on a weighted average might therefore be an incentive forstudents to concentrate their efforts on a limited number of courses and benefit


0

2

4

6

8

10

12

14

16

18

20

0 2 4 6 8 10 12 14 16 18 20

a

b

c

d l

l

l

l

Figure 3.1: Use of a weighted sum for aggregating grades

from the compensation effects at work with such a rule. This is a consequence ofthe additivity hypothesis embodied in the use of weighted averages.

It should finally be noticed that the addition of a “minimal acceptable grade”for all courses can decrease but not suppress (unless the minimal acceptable gradeis so high that it turns the system in a nearly conjunctive one) the occurrence ofsuch effects.

A related consequence of the additivity hypothesis is that it forbids to accountfor “interaction” between grades as shown in the following example.

Example 2

Consider four students enrolled in an undergraduate programme consisting in threecourses: Physics, Maths and Economics. For each course, a final grade between 0and 20 is allocated. The results are as follows:

Physics Maths Economicsa 18 12 6b 18 7 11c 5 17 8d 5 12 13

On the basis of these evaluations, it is felt that a should be ranked before b. Al-though a has a low grade in Economics, he has reasonably good grades in both


Maths and Physics which makes him a good candidate for an Engineering pro-gramme; b is weak in Maths and it seems difficult to recommend him for anyprogramme with a strong formal component (Engineering or Economics). Usinga similar type of reasoning, d appears to be a fair candidate for a programme inEconomics. Student c has two low grades and it seems difficult to recommend himfor a programme in Engineering or in Economics. Therefore d is ranked before c.

Although these preferences appear reasonable, they are not compatible withthe use of a weighted average in order to aggregate the three grades. It is easy toobserve that:

• ranking a before b implies putting more weight on Maths than on Economics(18w1 + 12w2 + 6w3 > 18w1 + 7w2 + 11w3 ⇒ w2 > w3),

• ranking d before c implies putting more weight on Economics than on Maths(5w1 + 17w2 + 8w3 > 5w1 + 12w2 + 13w3 ⇒ w3 > w2),

which is contradictory.In this example it seems that “criteria interact”. Whereas Maths do not over-

weigh any other course (see the ranking of d vis-a-vis c), having good grades inboth Math and Physics or in both Maths and Economics is better than havinggood grades in both Physics and Economics. Such interactions, although notunfrequent, cannot be dealt with using weighted averages; this is another conse-quence of the additivity hypothesis. Taking such interactions into account callsfor the use of more complex aggregation models (see Grabisch 1996).

Example 3

Consider two students enrolled in a degree consisting of two courses. For eachcourse a final grade between 0 and 20 is allocated; both courses have the sameweight and the required minimal average grade for the degree is 10. The resultsare as follows:

g1 g2a 11 10b 12 9

It is clear that both students will receive an identical average grade of 10.5: thedifference between 11 and 12 on the first course exactly compensates for the oppo-site difference on the second course. Both students will obtain the degree havingperformed equally well.

It is not unreasonable to suppose that since the minimal required average forthe degree is 10, this grade will play the role of a “special grade” for the instructors,a grade above 10 indicating that a student has satisfactorily met the objectivesof the course. If 10 is a “special grade” then, it might be reasonable to considerthat the difference between 10 and 9 which crosses a special grade is much moresignificant than the difference between 12 and 11 (it might even be argued that thesmall difference between 12 and 11 is not significant at all). If this is the case, we


would have good grounds to question the fact that a and b are “equally good”. Thelinearity hypothesis embodied in the use of weighted averages has the inevitableconsequence that a difference of one point has a similar meaning wherever on thescale and therefore does not allow for such considerations.

Example 4

Consider a programme similar to the one envisaged in the previous example. Wehave the following results for three students:

g1 g2a 14 16b 15 15c 16 14

All students have an average grade of 15 and they will all receive the degree.Furthermore, if the degree comes with the indication of a rank or of an averagegrade, these three students will not be distinguished: their equal average grademakes them indifferent. This appears desirable since these three students havevery similar profiles of grades.

The use of linearity and additivity implies that if a difference of one point onthe first grade compensates for an opposite difference on the other grade, then adifference of x points on the first grade will compensate for an opposite differenceof −x points on the other grade, whatever the value of x. However, if x is chosento be large enough this may appear dubious since it could lead, for instance, toview the following three students as perfectly equivalent with an average grade of15:

g1 g2a′ 10 20b 15 15c′ 20 10

whereas we already argued that, in such a case, b could well be judged preferable toboth a′ and c′ even though b is indifferent to a and c. This is another consequenceof the linearity hypothesis embodied in the use of weighted averages.

Example 5

Consider three students enrolled in a degree consisting of three courses. For eachcourse a final grade between 0 and 20 is allocated. All courses have identicalimportance and the minimal passing grade is 10 on average. The results are asfollows:


g1 g2 g3a 12 5 13b 13 12 5c 5 13 12

It is clear that all students have an average equal to the minimal passing grade10. They all end up tied and should all be awarded the degree.

As argued in section 3.2 it might not be unreasonable to consider that finalgrades are only recorded on an ordinal scale, i.e. only reflect the relative rank ofthe students in the class, with the possible exception of a few “special grades”such as the minimal passing grade. This means that the following table might aswell reflect the results of these three students:

g1 g2 g3a 11 4 12b 13 13 6c 4 14 11

since the ranking of students within each course has remained unchanged as wellas the position of grades vis-a-vis the minimal passing grade. In this case, onlyb (say the Dean’s nephew) gets an average above 10 and both a and c fail (withrespective averages of 9 and 9.6). Note that using different transformations, wecould have favoured any of the three students.

Not surprisingly, this example shows that a weighted average makes use of the“cardinal properties” of the grades. This is hardly compatible with grades thatwould only be indicators of “ranks” even with some added information (a view thatis very compatible with the discussion in section 3.2). As shown by the followingexample, it does not seem that the use of “letter grades”, instead of numericalones, helps much in this respect.

Example 6

In many American Universities the Grade Point Average (GPA), which is nothingmore than a weighted average of grades, is crucial for the attribution of degrees andthe selection of students. Since courses are evaluated on letter scales, the GPAis usually computed by associating a number to each letter grade. A common“conversion scheme” is the following:

A 4 (outstanding or excellent)B 3 (very good)C 2 (good)D 1 (satisfactory)E 0 (failure)


in which the difference between two consecutive letters is assumed to be equal.Such a practice raises several difficulties. First, letter grades for a given course

are generally obtained on the basis of numerical grades of some sort. This impliesusing a first “conversion scheme” of numbers into letters. The choice of such ascheme is not obvious. Note that when there are no “holes” in the distributionof numerical grades it is possible that a very small (and possibly non significant)difference in numerical grades results in a significant difference in letter grades.

Secondly, the conversion scheme of letters into numbers used to compute theGPA is somewhat arbitrary. Allowing for the possibility of adding “+” or “–” tothe letter grades generally results in a conversion schemes maintaining an equaldifference between two consecutive letter grades. This can have a significant impacton the ranking of students on the basis of the GPA.

To show how this might happen suppose that all courses are first evaluatedon a 0–100 scale (e.g. indicating the percentage of correct answers to a multiplechoice questionnaire). These numbers are then converted into letter grades usinga first conversion scheme. These letter grades are further transformed, using asecond conversion scheme, into a numerical scale and the GPA is computed. Nowconsider three students evaluated on three courses on a 0-100 scale in the followingway:

g1 g2 g3a 90 69 70b 79 79 89c 100 70 69

Using an E to A letter scale, a common conversion scheme (that is used in manyUniversities) is

A 90–100%B 80–89%C 70–79%D 60–69%E 0–59%

This results in the following letter grades:

g1 g2 g3a A D Cb C C Bc A C D

Supposing the three courses of equal importance and using the conversion schemeof letter grades into numbers given above, the calculation of the GPA is as follows:


g1 g2 g3 GPAa 4 1 2 2.33b 2 2 3 2.33c 4 2 1 2.33

making the three students equivalent.Now another common (and actually used) scale for converting percentages into

letter grades is as follows:

A+ 98–100%A 94–97%A– 90–93%B+ 87–89%B 83–86%,B– 80–82%C+ 77–79%,C 73–76%,C– 70–72%,D 60–69%,F 0–59%

This scheme would result in the following letter grades:

g1 g2 g3a A– D C–b C+ C+ B+c A+ C– D

Maintaining the usual hypothesis of a constant “difference” between two consecu-tive letter grades we obtain the following conversion scheme:

A+ 10A 9A– 8B+ 7B 6B– 5C+ 4C 3C– 2D 1F 0


which leads to the following GPA:

g1 g2 g3 GPAa 8 1 2 3.66b 4 4 7 5.00c 10 2 1 4.33

In this case, b (again the Dean’s nephew) gets a clear advantage over a and c.It should be clear that standardisation of the original numerical grades before

conversion offers no clear solution to the problem uncovered.

Example 7

We argued in section 3.2 that small differences in grades might not be significantat all provided they do not involve crossing any “special grade”. The explicittreatment of such imprecision is problematic using a weighted average; most often,it is simply ignored. Consider the following example in which three students areenrolled in a degree consisting of three courses. For each course a final gradebetween 0 and 20 is allocated. All courses have the same weight and the minimalpassing grade is 10 on average. The results are as follows:

g1 g2 g3a 13 12 11b 11 13 12c 14 10 12

All students will receive an average grade of 12 and will all be judged indifferent.If all instructors agree that a difference of one point in their grades (away from 10)should not be considered as significant, student a has good grounds to complain.He can argue that he should be ranked before b: he has a significantly higher gradethan b on g1 while there is no significant difference between the other two grades.The situation is the same vis-a-vis c: a has a significantly higher grade on g2 andthis is the only significant difference.

In a similar vein, using the same hypotheses, the following table appears evenmore problematic:

g1 g2 g3a 13 12 11b 11 13 12c 12 11 13

since, while all students clearly obtain a similar average grade, a is significantlybetter than b (he has a significantly higher grade on g1 while there are no signifi-cant differences on the other two grades), b is significantly better than c and c is

3.4. CONCLUSIONS 51

significantly better than a (the reader will have noticed that this is a variant ofthe Condorcet paradox mentioned in chapter 2).

Aggregation rules using weighted sums will be dealt with again in chapters 4and 6. In view of these few examples, we hope to have convinced the reader thatalthough the weighted sum is a very simple and almost universally accepted rule,its use may be problematic for aggregating grades. Since grades are a complexevaluation model, this is not overly surprising. If it is admitted that there is noeasy way to evaluate the performance of a student in a given course, there is noreason why there should be an obvious one for an entire programme. In particular,the necessity and feasibility of using rules that completely rank order all studentsmight well be questioned.

3.4 Conclusions

We all have been accustomed to seeing our academic performances in coursesevaluated through grades and to seeing these grades amalgamated in one way oranother in order to judge our “overall performance”. Most of us routinely gradevarious kinds of work, prepare exams, write syllabi specifying a grading policy,etc. Although they are very familiar, we have tried to show that these activitiesmay not be as simple and as unproblematic as they appear to be. In particular,we discussed the many elements that may obscure the interpretation of gradesand argued that the common weighted sum rule to amalgamate them may not bewithout difficulties. We expect such difficulties to be present in the other types ofevaluation models that will be studied in this book.

We would like to emphasise a few simple ideas to be drawn from this examplethat we should keep in mind when working on different evaluation models:

• building an evaluation model is a complex task even in simple situations.Actors are most likely to modify their behaviour in response to the imple-mentation of the model;

• “evaluation operations” are complex and should not be confused with “mea-surement operations” in Physics. When they result in numbers, the proper-ties of these numbers should be examined with care; using “numbers” maybe only a matter of convenience and does not imply that any operation canbe meaningfully performed on these numbers.

• the aggregation of the result of several evaluation models should take thenature of these models into account. The information to be aggregated mayitself be the result of more or less complex aggregation operations (e.g. ag-gregating the grades obtained at the mid-term and the final exams) and maybe affected by imprecision, uncertainty and/or inaccurate determination.

• aggregation models should be analysed with care. Even the simplest andmost familiar ones may in some cases lead to surprising and undesirableconclusions.


Finally we hope that this brief study of the evaluation procedures of students willalso be the occasion for instructors to reflect on their current grading practices.This has surely been the case for the authors.

4CONSTRUCTING MEASURES: THE

EXAMPLE OF INDICATORS

Our daily life is filled with indicators: I.Q., Dow Jones, GNP, air quality, physiciansper capita, poverty index, social position index, consumer price index, rate ofreturn, . . . If you read a newspaper, you could feel that these magic numbers rulethe world.

The EU countries with a deficit/GNP ratio lower than 3% will beallowed to enter the EURO.

Today’s air quality is 7: older persons, pregnant women and youngchildren should stay indoors.

The World Bank threatens country x to suspend its help if it doesn’tsucceed in bringing indicator y to level z.

Note that in many cases, the decisions of the World Bank to withdraw helpare not motivated by economic or financial reasons. Violations of human rightsare often presented as the main factor. But it is worth noting that indicators ofhuman rights also exist (see e.g. Horn (1993)).

Why are these indicators (often called indices) so powerful ? Probably becauseit is commonly accepted that they faithfully reflect reality. This forces us to raiseseveral questions.

1. Is there one reality, several realities or no reality ? Many philosophers nowa-days consider that reality is not unique. Each person has a particular per-ception of the world and, hence, a particular reality. One could argue thatthese particular realities are just particular views of the same reality but,as it is impossible to consider reality independently of our perception of it,it might be meaningless to consider that reality exists per se (Roy 1990).As a consequence, an indicator might only be relevant for the person whoconstructed it.

2. Whatever the answer to the previous question, can we hope that an indicatorfaithfully reflects reality (the reality or a reality) ? Reality is so complex thatthis is doubtful. Therefore, we must accept that an indicator accounts onlyfor some aspects of reality. Hence, an indicator must be designed so as

53

54 CHAPTER 4. CONSTRUCTING MEASURES

to reflect those aspects that are relevant with respect to our concerns. Asan illustration, the Human Development index (HDI) defined by the UnitedNations Development Programme (UNDP) to measure development (UnitedNations Development Programme 1997) is used by many different people indifferent continents and in different areas of activity (politicians, economists,businessmen, . . . ). Can we assume that their concerns are similar ?

In the Human development report 1997, UNDP proudly reports that

The HDI has been used in many countries to rank districts orcounties as a guide to identifying those most severely disadvan-taged in terms of human development. Several countries, such asthe Philippines, have used such analysis as a planning tool. [. . . ]The HDI has been used especially when a researcher wants a com-posite measure of development. For such uses, other indicatorshave sometimes been added to the HDI.

This clearly shows that many people used the HDI in completely differentways.

Furthermore, are the concerns of UNDP itself with respect to the HDI clearlydefined ? Why do they need the human development index ? To cut subsidiesto nations evolving in the wrong direction ? To share subsidies among thepoorest countries (according to what key) ? To put some pressure on thegovernments performing the worst ? To prove that Western democracieshave the best political systems ?

3. Suppose that the purpose of an indicator is clearly defined. Are we sure thatthis indicator indicates what we want it to ? Do the arithmetic operationsperformed during the computation of the indicator lead to something thatmakes sense ?

Let us now discuss three well known indicators arising in completely differentareas of our lives in detail: the human development index, the air quality indexand the decathlon score.

4.1 The human development index

As stated by the United Nations Development Programme (1997), page 14,

The human development index measures the average achievements ina country in three basic dimensions of human development–longevity,knowledge and a decent standard of living. A composite index, the HDIthus contains three variables: life expectancy, educational attainment(adult literacy and combined primary, secondary and tertiary enroll-ment) and real GDP (Gross Domestic Product) per capita expressedin PPP$ (Purchasing Power Parity $).

4.1. THE HUMAN DEVELOPMENT INDEX 55

HDI’s precise definition is presented on page 122 of the 1997 Human Develop-ment Report. The HDI is a simple average of the life expectancy index, educationalattainment index and adjusted real GDP per capita (PPP$) index. Here is howeach index is computed.

Life Expectancy Index (LEI) This index measures life expectancy at birth. Inorder to normalise the scale of this index, a minimum value (25 years) anda maximum one (85 years) have been defined. The index is defined as

life expectancy at birth− 2585− 25

.

Hence, it is a value between 0 and 1.

Educational Attainment Index (EAI) It is a combination of two other indi-cators: the Adult Literacy Index (ALI) and the combined primary, secondaryand tertiary Enrollment Ratio Index (ERI). The first one is the proportionof literate adults while the second one is the proportion of children in age ofprimary, secondary or tertiary school that really go to school. The EAI is aweighted average of ALI and ERI; it is equal to

2ALI + ERI3

.

Adjusted real GDP per capita (PPP$) Index (GDPI) This index aims atmeasuring the income per capita. As the value of one dollar for someoneearning $100 is much larger than the value of one dollar for someone earning$100 000, the income is first transformed using Atkinson’s formula (Atkinson1970). The transformed value of y, i.e. W (y), is given by one of the following:

y if 0 < y < y∗,y∗ + 2[(y − y∗)1/2] if y∗ ≤ y < 2y∗,y∗ + 2(y∗)1/2 + 3[(y − 2y∗)1/3] if 2y∗ ≤ y < 3y∗,...y∗ + 2(y∗)1/2 + 3(y∗)1/3 + . . . if (n− 1)y∗ ≤ y < ny∗

+n[(y − (n− 1)y∗)1/n] .

In this formula, y represents the income, W (y) the transformed income andy∗ is set at $5 835 (PPP$) which was the World average annual income percapita in 1994.

Thereafter, the income scale is normalised, using the maximum value of$40 000, the minimum value of $100 and the formula

transformed income−W (100)W (40 000)−W (100)

.

Hence, it is a value between 0 and 1. Note that W (40 000) = 6 154 andW (100) = 100.


Some words about the data and their collection time: the Human DevelopmentReport is a yearly publication (since 1990). Obviously, the 1997 report does notcontain the 1997 data. Indeed, the HDI computed in the 97 report is consideredby the UNDP as the HDI of 1994. To make things more complicated, the 199i HDI(in the 199j report) is an aggregate of data from 199i (for some dimensions) andfrom earlier years (for other dimensions). In this volume, we use only data fromthe 1997 Human Development Report. We refer to them as HDR97, irrespectiveof the collection year.

To illustrate how the HDI works, let’s compute the HDI for Greece (HDR97).Life expectancy in Greece is 77.8 years. Hence, LEI = (77.8−25)/(85−25) = 0.880.The ALI is 0.967 and the ERI is 0.820. Hence, EAI = (2 × 0.967 + 0.820)/3 =0.918. Greece’s real GDP per capita at $11 265 is above y∗ by less than twicey∗. Thus the adjusted real GDP per capita for Greece is $5 982 (PPP$) because5 982 = 5 835+2(11 265−5 835)1/2. Hence GDPI = (5 982−W (100))/(W (40 000)−W (100)) = (5 982− 100)/(6 154− 100) = 0.972. Finally, Greece’s HDI is (0.880 +0.918 + 0.972)/3 = 0.923.

4.1.1 Scale Normalisation

To obtain the LEI and the GDPI, maximum and minimum values have been de-fined so that, after normalisation, the range of the index is [0,1]. The choice ofthese bounds is quite arbitrary. Why 25 and 85 years ? Is 25 years the smallestobserved value ? No, the lowest observed value is 22.6 (Rwanda, HDR97). There-fore the LEI is negative for Rwanda. The value of 25 was chosen for the firstreport (1990), when the lowest observed value was above 35. At that time, no onewould ever have thought that life expectancy could be lower than 25. To avoidthis problem, they could have chosen a much lower value: 20 or 10. The likelihoodof observing a value smaller than the minimum would have been much smaller.But the choice of the bounds is not without consequences. Consider the followingexample.

Suppose that the EAI and GDPI have been computed for South Korea andCosta Rica (HDR97). We also know the life expectancy at birth for South Koreaand Costa Rica (see Table. 4.1) If the maximum and minimum for life expectancy

life expectancy EAI GDPISouth Korea 71.5 .93 .97Costa Rica 76.6 .86 .95

Table 4.1: Bounds: life expectancy, EAI and GDPI for South Korea and CostaRica (HDR97)

are set to 85 and 25, then the HDI is 0.890 for South Korea and 0.889 for CostaRica. But if the maximum and minimum for life expectancy are set to 80 and 25,then the HDI is 0.915 for South Korea and 0.916 for Costa Rica. In the first case,Costa Rica is less developed than South Korea while in the second one, we obtainthe converse: Costa Rica is more developed than South Korea. Hence, the choiceof the bounds matters.


In fact narrowing the range of life expectancy from [25,85] to [25,80] increasesthe difference between any two values of LEI by a factor (85−25)/(80−25). Henceit amounts to increasing the weight of LEI by the same factor. In our example,Costa Rica performed better than South Korea on life expectancy. Therefore, itis not surprising that its position is improved when life expectancy is given moreweight (by narrowing its range).

Note that, apparently, no bounds were fixed for the ALI and the ERI. In reality,this is equivalent to choosing 1 for maximum and 0 for minimum. This is also anarbitrary choice. It is obvious that values 0 and 1 have not been observed and arenot likely to be observed in a foreseeable future. Hence the range of these scalesis narrower than [0,1] and the scale could be normalised, using other values than0 and 1.

4.1.2 Compensation

Consider Table 4.2 where the data for two countries (Gabon and the SolomonIslands, HDR97) are presented. The Solomon Islands perform quite well on alldimensions; Gabon is slightly better than the Solomon Islands on all dimensionsexcept life expectancy where it is very bad. For us, this very short life expectancyis clearly a sign of severe underdevelopment, even if other dimensions are good.Nevertheless, the HDI is equal to 0.56 for both Gabon and Solomon Islands. Hence,

life expectancy ALI ERI real GDPGabon 54.1 .63 .60 3 641

Solomon Islands 70.8 .62 .47 2 118

Table 4.2: Compensation: performances of Gabon and Solomon Islands (HDR97)

in spite of the informal analysis we performed on the table, we should concludethat Gabon and Solomon Islands are at the same development level. This problemis due to the fact that we used the usual average to aggregate our data into onenumber. Weaknesses on some dimensions are compensated by strengths on otherdimensions. This is probably desirable, to some extent. Yet, extreme weaknessesshould not be compensated, even by very good performances on other dimensions.

Let us go further with compensation. As any weakness can be compensatedby a strength, a decrease in life expectancy by one year can be compensated bysome increase in adjusted real GDP (income transformed by Atkinson’s formula).Let us compute this increase. A decrease by one year yields a decrease of LEI by1/(85−25) = 0.016667. To compensate this, the GDPI must increase by the sameamount. Hence, the adjusted real GDP must be increased by 0.016667(6 154 −100)= 100.9$ (recall that W (40 000) = 6 154). Accordingly, a decrease in lifeexpectancy by 2 years can be compensated by an increase in adjusted real GDPby 2 times 100.9$; a decrease in life expectancy by n years can be compensatedby an increase in adjusted real GDP by n times 100.9$. The value of one year oflife is thus 100.9$ (adjusted by Atkinson’s formula). The value 100.9 is called thesubstitution rate between life expectancy and adjusted real GDP.


Other substitution rates are easy to compute: e.g. the substitution rate betweenlife expectancy and adult literacy is 0.016667(1− 0)(3/2)=0.025. To compensatea decrease of n years of life expectancy, you need an increase of the adult literacyindex of n times 0.025.

Let us now think in terms of real GDP (not adjusted). In a country wherereal GDP is 13 071$ (Cyprus, HDR97), a decrease in life expectancy of one yearcan be compensated by an increase in real GDP of 21 084$. In a country wherereal GDP is 700$ (Chad, HDR97), a decrease of life expectancy by one year canbe compensated by an increase in real GDP by 100.9$. Hence, poor people’s lifeexpectancy has much less value than that of rich ones.

4.1.3 Dimension independence

Consider the example of Table 4.3. Countries x and y perform equally badly on

life expectancy ALI ERI real GDPx 30 .80 .65 500y 30 .35 .40 3 500

Table 4.3: Independence: performances of x and y

life expectancy, y is much lower than x on adult literacy but much higher thanx on income. As life expectancy is very short, one might consider that adultliteracy is not very important (because there are almost no adults) but income ismore important because it improves quality of life in other respects. Furthermore,health conditions and life expectancy can be expected to improve rapidly due to ahigher income. Hence, one could conclude that y is more developed than x. Ourconclusion is confirmed by the HDI: 0.30 for x and 0.34 for y.

Let us now compare two countries, w and z similar to x and y except that lifeexpectancy is equal to 70 for both w and z (see Table 4.4). In such conditions, theperformance of z on adult literacy is really bad compared to that of w. The adultpopulation is very important and its illiteracy is a severe problem. Even if the highincome of z is used to foster education, it will take decades before a significantpart of the population is literate. On the contrary, w’s low income doesn’t seem to

life expectancy ALI ERI real GDPw 70 .80 .65 500z 70 .35 .40 3 500

Table 4.4: Independence: performances of w and z

be a problem for the quality of life, as life expectancy is high as well as education.Hence, it might not be unreasonable to conclude that w is more developed thanz. But if we compute the HDI, we obtain 0.52 for w and 0.56 for z! This shouldnot be a surprise; there is no difference between x and y on the one hand and wand z on the other hand, except for life expectancy. But the differences in life


expectancy between x and w and between y and z are equal. Hence, this resultsin the same increase of the HDI (compared to x and y) for both w and z.

When a sum (or an average) is used to aggregate different dimensions, identicalperformances of by two items (countries or whatever) on one or more dimensionsare not relevant for the comparison of these items. The identical performances canbe changed in any direction; as long as they remain identical, they do not affectthe way both items compare to each other. This is called dimension independence;it is inherent to sums and averages. But we saw that this property is not alwaysdesirable. When we compare countries on the basis of life expectancy, educationand income, dimension independence might not be desirable.

4.1.4 Scale construction

In a way, we already have discussed this topic in Section 4.1.1 (Scale Normali-sation). But there is more to scale construction than scale normalisation. Forexample, concerning real GDP, before normalising this scale, the real GDP is ad-justed using Atkinson’s formula. The goal of this adjustment is obvious: if youearn 40 000 dollars, one more dollar is negligible. If you earn 100 dollars, onemore dollar is considerable. Atkinson’s formula reflects this. But why choosingy∗= $5 835 ? Why choose Atkinson’s formula ? Other formulas and other valuesfor y∗ would work just as well. Once more, an arbitrary choice has been madeand we could easily build a small example showing that another arbitrary (butdefendable) choice would yield a different ranking of the countries.

Note that the fact that life expectancy, adult literacy and enrollment havenot been adjusted is also an arbitrary choice. One could argue that improvinglife expectancy by one year in a country where life expectancy is 30 is a hugeachievement while it is a moderate one in a country where life expectancy is 70.Some could even argue that increasing life expectancy above a certain threshold isno longer an improvement. It increases the health budget in such proportions thatno more resources are available for other important areas: education, employmentpolicy, . . .

4.1.5 Statistical aspects

Let us consider the four indices of the HDI from a statistical point of view. Thelife expectancy index is the average over the population and for a determined timeperiod of the length of the lives of the individuals in the population. It is wellknown that averages, even if they are useful, cannot reflect the variety present inthe population. A country where approximately everyone lives until 50 has a lifeexpectancy of 50 years. A country where a part of the population (rural or pooror of some race) dies early and where another part of the population lives until 80might also have a life expectancy of 50 years.

Note that this kind of average is quite particular. It is very different fromthe average that we perform when, for example, we have several measures of theweight of an object and we consider the average as a good estimate of its actualweight. The weight of an object really exists (as far as reality exists). On the


contrary, even if reality exists, the average of the length of life doesn’t correspondto something real. It is the length of life of a kind of average or ideal human, asif we (the real humans) were imperfect, irregular or noisy copies of that averagehuman. Until the 19th- century, both kinds of averages were called by differ-ent names (moyenne proportionnelle–different measures of one object–and valeurcommune–different objects, each measured once) and considered as completely dif-ferent. During the 19th-century the Belgian astronomer and statistician Quetelet(1796-1894) invented the concept of the average human and unified both averages(Desrosieres 1995).

To convince you that the concept of the average human is quite strange (thoughpossibly useful), consider a country where all inhabitants are right triangles ofdifferent sizes and shapes (example borrowed from Warusfel (1961)). To make iteasy, let us suppose that there are just two kinds of right triangles (see Fig. 4.1),in the same proportion. A statistician wants to measure the average right triangle.In order to do so, he computes the average length of each edge. What he gets isa triangle with edges of length 4, 8 and 9, i.e. a triangle which is not right-angledfor 42 + 82 6= 92. The average right triangle is no longer a right triangle! Whatlooks like a right angle is in fact approximately a 91 degrees angle. In the samespirit, Quetelet measured the average size of humans, in all dimensions, includingthe liver, heart, spleen and other organs. What he got was an average human inwhich it was impossible to fit all its average organs. They were too large!

35

4

513

12

49

8

Figure 4.1: Two right triangles and their average

The adult literacy index is quite different: it is just the number of literateadults, divided by the total adult population to allow comparisons between coun-tries. Hence one could think it is not an average. In fact it depends on how weinterpret it. If we consider that an ALI of 0.60 means that 60% of the populationis literate, then it is not an average. If we consider that an ALI of 0.60 means thatthe average literacy level is 60%, then it is an average. And this last interpreta-tion is not more silly than computing a life expectancy index. Consider a variablewhose value is 0 for an illiterate adult and 1 for a literate one. Compute the av-erage of this variable over the population and over some a time period. What doyou get ? The adult literacy index!

We can analyse the enrolment ratio index and the adjusted real GDP index inthe same way as the ALI. They are quantities that are measured at country level.The first one being a proportion and the second one being normalised, they canalso be interpreted at individual level, like averages.

What about the HDI itself. According to the United Nations DevelopmentProgramme (1997), it is designed to

4.2. AIR QUALITY INDEX 61

[. . . ] measure the average achievements in a country [. . . ]

Furthermore, the HDI contains an index (LEI) which can only be interpreted bear-ing in mind Quetelet’s average human. Therefore the ALI, GDPI and HDI shouldbe interpreted in this way as well. The HDI somehow describes how developed theaverage human in a country is.

4.2 Air quality index

Due to the alarming increase in air pollution, mainly in urban areas, during the lastdecades, several governments and international organisations edited some normsconcerning pollutants’ concentration in the air (e.g., the Clean Air Act in the US).Usually these norms specify, for each pollutant, a concentration that should not beexceeded. Naturally, these norms are just norms and they are often are exceeded.Therefore, as a good quality air is not guaranteed by norms, different monitoringsystems have been developed in order to provide governments as well as citizenswith some information about air pollution. Two examples of such systems arethe Pollutant Standards Index (PSI), developed by the US Environmental Protec-tion Agency ((Ott 1978) or http://www.epa.gov/oar/oaqps/psi.html), and theATMO Index, developed by the French Environment Ministry (http://www-sante.ujf-grenoble.fr/SANTE/paracelse/envirtox/Pollatmo/Surveill/atmo.html). These two indi-cators are very similar and we will discuss the French ATMO.

The ATMO index is based on the concentration of 4 major pollutants: sulfurdioxide (SO2), nitrogen dioxide (NO2), ozone (O3) and particulate matter (soot,dust, particles). For each pollutant, a sub-index is computed and the final ATMOindex is defined as being equal to the largest sub-index. Here is how each sub-index is defined. For each pollutant, the concentration is converted into a numberon a scale from 1 to 10. Level 1 corresponds to an air of excellent quality; levels5 and 6 are just around the EU long term norms, levels 8 corresponds to the EUshort term norms and 10 indicates hazardous conditions.

To illustrate, suppose that the sub-indices are as in Table 4.5. The resulting

pollutant CO2 SO2 O3 dustsub-index 3 3 2 8

Table 4.5: Sub-indices of the ATMO index

ATMO index is the largest value, that is 8. Hence the air quality is very bad. Inthe following paragraphs, we discuss some problems arising with the ATMO index.

4.2.1 Monotonicity

Suppose that, due to heavy traffic, the absence of wind and a very sunny day, theozone sub-index increases from 3 to 8 for the air described in Table 4.5. Clearly,this corresponds to a worse air: no pollutant did decreased, one of them increased.In these conditions, we expect the ATMO index to worsen as well. In fact the


ATMO index does not change. The maximum is still 8. Thus some changes, evensignificant ones, are not reflected by the index. In our example, the change is verysignificant as the ozone sub-index was almost perfect and became very bad.

Note that if the ozone sub-index decreases from 8 to 3, the ATMO index doesnot change either though the air quality improves. This shows that the ATMOindex is not monotonic. Some changes, in both directions, are not reflected by theindex.

4.2.2 Non compensation

Let us consider the ATMO index for two different airs (x and y), as described byTable 4.6. Air x is perfect on for all measurements but one: it scores just above

pollutant CO2 SO2 O3 dustx 1 1 6 1y 5 4 5 5

Table 4.6: Sub-indices for x and y

the EU long term norm for ozone. Air y is not good for any dimensions. It is ofaverage quality on all dimensions and close to the EU long term norms for threedimensions. The ATMO index is 6 for air x and 5 for air y. Hence, the quality ofair x is considered to be lower than that of air y. Contrary to what we observedwith the HDI, no compensation at all occurs between the different dimensions.The small weakness of x (6 compared to 5, for ozone) is not compensated by itslarge strengths (1 compared to 4 or 5, for carbon dioxide, nitrogen dioxide anddust). In the case of human development, the compensation between dimensionswas too strong. Here, we face another extreme: no compensation at all, which isprobably not better.

4.2.3 Meaningfulness

Let us forget our criticism of the ATMO index and suppose that it works well.Consider the statement “Today’s ATMO index (6) is twice as high as yesterday’sindex (3)”. What does it mean ? We are going to show that it is meaningless, ina certain sense. Let us come back to the definition of the sub-indices. For a givenpollutant, the concentration is measured in µg/m3. The concentration figures arethen transformed into numbers between 1 and 10. This is done in an arbitraryway. For example, instead of choosing 5-6 for the EU long term norms and 8 forthe short term ones, 6-7 and 9 could have been chosen. The index would workas well. The relevant information provided by the index is not the figure itself; itis some information about the fact that we are above or below some norms thatare related to the effects of the pollutants on health (a somewhat similar situationhas been encountered in Chapter 3). But in such a case, the values of today’sand yesterday’s index would be different, say 7 and 4, and 7 is not twice as largeas 4. To conclude, the statement “Today’s ATMO index (6) is twice as high as

4.3. THE DECATHLON SCORE 63

yesterday’s index (3)” would be valid, or meaningful, only in a particular context,depending upon arbitrary choices. Such a statement is said to be meaningless.

On the contrary, the statement “Today’s ATMO sub-index for ozone (6) ishigher than yesterday’s sub-index for ozone (3)” is meaningful. Any reasonabletransformation of the concentration figures into numbers between 1 and 10 wouldlead to the same conclusion: today’s sub-index is higher than yesterday’s one. By“reasonable transformation” we mean a transformation that preserves the order:a concentration cannot be transformed into an index value lower than the indexvalue corresponding to a lower concentration. Concentration of 110 and 180 µg/m3

can be transformed in 3 and 6, or 4 and 6, or 2 and 4 but not 4 and 2.More subtle: “Today’s ATMO index (6) is larger than yesterday’s ATMO index

(3)”. Is this sentence meaningful ? In the previous paragraph, we saw that thearbitrariness involved in the construction of the 1 to 10 scale of a sub-index is nota problem when we want to compare two values of the same sub-index. But if wewant to compare two values of two different sub-indices, it is no longer true. Avalue of 3 on a sub-index could be more dangerous for health than a 6 on anothersub-index. Of course, the scales have been constructed with care: 5 correspondsto the EU long term norms on all sub-indices and 8 to the short term norms.This is intended to make all sub-indices commensurable. Comparisons shouldthus be meaningful. But can we really assume that a 5 (or the correspondingconcentration in µg/m3) is equivalent on two different sub-indices ? Equivalent inwhat terms ? Some pollutants might have short term effects and other pollutants,long term effects. They can have effects on different parts of the organism. Shouldwe compare the effects in terms of discomfort, mortality after n years, health carecosts, . . . ?

4.3 The decathlon score

The decathlon is a 10-event athletic contest. It consists of 100-meter, 400-meter,and 1 500-meter runs, a 110-meter high hurdles race, the javelin and discus throws,shot put, pole vault, high jump, and long jump. It is usually disputed over twoor three days. It was introduced as a three-day event at the Olympic Gamesof Stockholm in 1912. To determine the winner of the competition, a score iscomputed for each athlete and the athlete with the best score is the winner. Thisscore is the sum of the single-event scores. The single event scores are not justtimes and distances. It doesn’t make sense to add the time of a 100-meter run tothe time of a 1 500-meter run. It is even worse to add the time of a run to thelength of a jump. This should be obvious for everyone.

Until 1908, the single-event scores were just the rank of an athlete in thatevent. For example, if an athlete performed the third best high jump, his single-event score for the high jump was 3. The winner was thus the athlete with thelowest overall score. Note that this amounts to using the Borda method (see p.14)to elect the best athlete when there are ten voters and the preferences of eachvoter are the rankings defined by each event.

The main problem with these single-event scores is that they very poorly reflect


distance distance

po

ints

po

ints

Figure 4.2: Decathlon tables for distances: general shape of a convex (left) andconcave (right) tables

the performances of the athletes. Suppose that an athlete arrived 0.1 second beforethe next athlete in the 100-meter run. They have ranks i and i+1. So the differencein the scores that they receive is 1. Suppose now that the delay between thesetwo athletes is 1 second. Their ranks are unchanged. Thus the difference of inthe scores that they receive is still 1 though a larger difference would be moreappropriate. That is why other tables of single-event scores have been used since1908 (de Jongh 1992, Zarnowsky 1989). In the tables used after 1908, high scoresare associated to good performances (contrary to scores before 1908). Hence, thewinner is the athlete that has the highest overall score.

Some of these tables (different versions, in use between 1934 and 1962) arebased on the idea that improving a performance by some amount (e.g. 5 centime-tres in a long jump) is more difficult if the performance is close to the world record.Hence, it deserves more points. The general shape of these tables, for distances,is given in Figure 4.2 (convex table). For times (in runs), the shape is different asan improvement is a decrease in time.

A problem raised by convex tables is the following: if an athlete decides tofocus on some events (for example the four kinds of runs) and to do much moretraining for them than for the other ones, he will have an advantage. He will comecloser to the world record for runs and earn many points. At the same time, hewill be further away from the world record for the other disciplines but that willmake him lose less points as the slope of the curve is more gentle in that direction.The balance will be positive. Thus these tables encourage athletes to focus onsome disciplines, which is contrary to the spirit of the decathlon.

That is why, since 1962, different concave tables (see Figure 4.2) have been used.These tables strongly encourage the athletes to be excellent in all disciplines. Anexample of a real table, in use in 1998, is presented in Figure 4.3. Note that a newchange occurred: this table is no longer concave. It is almost linear but slightlyconvex.

There are many interesting points to discuss about the decathlon score.

• How are the minimum and maximum values set ? They can highly influ-ence the score as it was shown with the HDI (in Section 4.1.1). Obviously,the maximum value must somehow be related to the world record. But as

4.3. THE DECATHLON SCORE 65

400

500

600

700

800

900

1000

1100

1200

9.5 1 0 10.5 1 1 11.5 1 2 12.5 1 3100 meters time

sco

re

Figure 4.3: A plot for the 100 meters run score table in 1998

everyone knows, world records are objects that athletes like to break.

• Why adding single-event scores ? Other operations might work as well. Forexample, multiplication may favour the athletes that perform equally well inall disciplines. To illustrate this point very simply, consider a 3-event contestwhere single-event scores are between 0 and 10. An athlete, say x obtains 8in all three events. Another one, y obtains 9, 8 and 7. If we add the scores,x and y obtain the same score: 24. If we multiply the scores, x gets 512while y looses with 504.

• . . .

The point on which we will focus, in this decathlon example, is the role of theindicator.

4.3.1 Role of the decathlon score

Although one might think that the role of the overall score is clearly to designatethe winner, we are going to show that it plays many roles (like student grades, seeChapter 3) and that this is one of the reasons why it changes so often. Of course,one of the roles is to designate the winner and it was probably the only purposethat the first designers of the score had in mind. But we can be quite sure thatimmediately after the first contest, another role arose. Many people probably usedthe scores to assess the performance of the athletes. Such athlete has a score very


close to that of the winner and is thus a good athlete. Another one is far from thewinner and is consequently not a good one athlete.

Not much later (after the second competition), a third role appeared. How didthe athletes evolve ? This athlete has improved his score or x has a better score inthis contest than the score of y in the previous contest. This kind of comparisonis not meaningful: suppose that an athlete wins a contest with a score of 16. Inthe next contest, he performs very poorly: short jumps, slow runs, short throws.But his main opponents are absent or perform equally poorly. He might still winthe contest and even with a higher score although his performance is worse thanthe previous time.

After some time, the organisers of decathlons became aware of the second andthird role. It was probably part of the motivations to abandon the sum of ranksand to use convex tables. These tables, to some extent, made the comparisons ofscores across athletes and/or competitions meaningful. At the same time, the scorefound a new role as a monitoring tool during the training. Before 1908, the scorescould be computed only during competitions as they were sums of ranks. And itwas not long before a wise coach used it as a strategic tool, advising his athlete tofocus on some events. For this reason, since 1962, the organisers conferred a newrole to the score: to foster excellence in all disciplines. This was achieved by theintroduction of concave tables. But it is most likely that the score is still used asa strategic tool, hopefully in a less perverse way.

It is worth noting that this new role doesn’t replace any of the previous ones.The score aims at rewarding equal performances in all disciplines but it is alsoused to assess the performance of an athlete. Even if we only consider only thesetwo roles (the other ones could be seen as side effects), it is amazing to see howincompatible they are.

4.4 Indicators and multiple criteria decisionsupport

Classically, in a decision aiding process, a decision-maker wants to rank the ele-ments of the set of alternatives (or to choose the best element). In order to rank,he selects several dimensions (criteria) that seem relevant with respect to his prob-lem. Each alternative is characterised by a performance on each criterion (this isthe evaluation matrix or performance tableau). A MCDA method is then used torank the alternatives, with respect to the preferences of the decision-maker.

When an indicator is built, several dimensions are also selected. Each item ischaracterised by a performance on each dimension. An index that can be used torank the items is computed. The analogy between a decision support method andan index is obvious: both aim at aggregating multi-dimensional information abouta set of objects. But there is a tremendous difference as well: when an indicator isbuilt, it is often the case that there is no clearly defined decision problem, decision-maker and, a fortiori, preferences. To avoid the absence of preference, one couldconsider that the preferences are those of the potential users of the indicator.To some extent, this is possible because very often the preferences of the users

4.4. INDICATORS AND MULTIPLE CRITERIA DECISION SUPPORT 67

go in the same direction for each dimension taken separately. For example, foreach dimension of the ATMO index, everyone prefers a lower concentration. Butit is definitely not reasonable to assume that the global preferences are similar.Furthermore, even if single-dimensional preferences go in the same direction, itdoes not mean that single-dimensional preferences are identical. Those who arenot very sensitive to a pollutant will value a decrease in concentration much moreif it occurs at high concentration than at low concentration. On the contrary,sensitive people might value concentration decreases at low and high levels equally.

The relevance of measurement theory

The absence of preferences is crucial. In decision support, many studies and con-cepts relate to measurement theory. Measurement theory is the theory that studieshow we can measure objects (assign a number to an object) so as to reflect a re-lation on these objects. E.g., how can we assign numbers to physical objects soas to reflect the relation “heavier than” ? That is, how to assign a number (calledweight) to each object so that “x’s weight > y’s weight” implies “x is heavier thany” ? Additional properties may be required. For example, in the case of weightmeasurement, one wishes that the number assigned to x and y taken together bethe sum of their individual weights.

Another example is that of distance. How to assign numbers to points inthe space so as to reflect the relation “more distant than” with respect to somereference point ? Contrary to the previous example, this one has several dimensions(usually two or three: : x, y or x, y, z or altitude, longitude, latitude, etc.). Eachobject (point) is characterised by a performance (co-ordinate) in each dimensionand one tries to aggregate these performances into one indicator: the distanceto the reference point. This problem is at the core of geometry. Note that theanswer is not unique. Very often the Euclidean distance is chosen (assuming thatthe shortest path between two points is the straight line). Sometimes, a Gaussiandistance is more relevant (when you consider points on the earth’s surface, unlessyou are a mole, the shortest path is no longer a straight line but a curve). In othercircumstances, the Manhattan distance is more appropriate (between two pointsin Manhattan, if you are not flying, the shortest path is not a straight line nor acurve, it is a succession of perpendicular straight lines). And there are many otherdistances.

As far as physical properties are concerned (larger than, warmer than, fasterthan, . . . ), the problem is easy: good measurements were carried out in Antiquitywithout any theory of measurement. But when we consider other kinds of relations,things are more complex. How to assign numbers to people or alternatives so as toreflect the relations “more loveable than”, “preferable to” or “more risky than” ?In such cases, measurement theory can be of great assistance but is insufficient tosolve all problems.

In decision support, measuring objects with respect to the relation “is preferredto” can be of some help because, once the objects have been measured, it is rathereasy to handle numbers. It is often assumed that a preference relation over thealternatives exists but is not well known and one tries to measure the alternatives


so as to discover the preference relation. Sometimes, the preference relation is notassumed to completely exist a priori. Preferences can emerge and evolve duringthe decision aid process, but some characteristics of the preference relation stillexist a priori. Measurement theory can therefore be used to build or to analyse adecision support method.

Many indices are built without the assumption that a relation over the items apriori exists or without trying to reflect a pre-existent relation. On the contrary,it seems that, in many cases, the aim of an index is precisely to build or createa relation over the items. Therefore, in such a case, measurement theory cannottell us much about the index. Measurement theory loses some of its power whenthere is no a priori relation to be reflected.

Indicators and reality

The index does not help to uncover reality, that is a pre-existent relation. It insti-tutes or settles reality (Desrosieres 1995). This is very obvious with the decathlonscore. Between 1908 and 1962, the scores were designed to assess the performancesand to compare them. As one of the most important things for a professional ath-lete is to win (contrary to the opinion of de Coubertin), the score is considered asthe true measure of performance. Any athlete that was not convinced of this hadto change his mind and to behave accordingly if he wanted to compete. This isnot particular to the decathlon score. Many governments probably try to exhibitgood HDI for their country in order to keep international subsidies or to legitimisetheir authority to the population of the country or foreign governments. Some citycouncils, willing to attract high salaried residents, claim, among others, to havehigh air quality. The most efficient way for them to make their claim credible is toexhibit a good ATMO index (or any other index in countries other than France),even if other policies might be more beneficial to the country.

One might be tempted to reject any indicator that does not reflect reality,that, in some arbitrary way, institutes reality. Nevertheless, the indicators arenot useless. An indicator can be considered as a kind of language. It is based onsome (more or less necessarily arbitrary) conventions and helps us to efficientlycommunicate about different topics or perform different tasks. By “efficiently”,we mean “more efficiently than without any language”; not necessarily in themost efficient way. As any language, it is not always precise and leaves roomfor ambiguities and contradictions. If the people that created the decathlon haddecided to wait until a sound theory shows them how to designate the winner, itis very likely that no decathlon contest would ever have taken place.

But this does not mean that all indicators are equally good. Ambiguities andcontradictions are certainly adequate for poetry otherwise we could never enjoythings like this:

Mis pasos en esta calleResuenan

en otra calledonde

4.5. CONCLUSIONS 69

oigo mis pasospasar en esta calledondeSolo es real la niebla1

orWenn ich mich lehn’ an deine Brust,kommt’s uber mich wie Himmelslust;doch wenn du sprichst: ich liebe dich!so muss ich weinen bitterlich.2

But, when it comes to decision-making, ambiguities and contradictions shouldgenerally be kept at a minimum. When possible, they should be avoided. Whencertain elements of preferences are known for sure, all indicators should reflectthem.

Back to multiple criteria decision support

In a decision aiding process, preferences are not perfectly known a priori. Other-wise, it would be very unlikely that any aid would be required. Therefore, relyingsolely on measurement theory is not possible. Most decision aiding processes, likemost indicators, probably cannot avoid some arbitrary elements. They can occurat different steps of the process: the choice of an analyst, of the criteria, of theaggregation scheme, to mention a few.

But unlike cases where indicators are built without any decision problem inmind, most decision aiding processes relate to a more or less precisely defined de-cision problem. Consequently, at least some elements of preferences are present.Therefore, if some measurement (associating numbers to alternatives) is performedduring the aiding process, measurement theory can be used to ensure that themodel built during the aiding process does not contradict these elements of pref-erences, that it reflects them and that all sound conclusions that can be drawnfrom the conjunction of these elements are actually drawn.

4.5 Conclusions

Among evaluation and decision models, indicators are probably more widespreadthan any other model (this is definitely true if you think of cost-benefit analysis or

1Octavio Paz, Here, translated by Nims (1990)My footsteps in this / street / Re-echo / in another street / where / I hear my footsteps /

passing in this street / where / Nothing is real but the fog

2Heinrich Heine, Ich liebe dich, translated by Louis Untermeyer(van Doren 1928)And when I lean upon your breast / My soul is soothed with godlike rest; / But when you

swear: I love but thee! / Then I must weep–and bitterly.


multiple criteria decision support). Student grades are also very popular, as well–almost every one has faced them at some point of his life–but, besides the fact thatmost people use and/or encounter them, indicators are pervasive in many domainsof human activity, contrary to student grades that are confined to education (notethat student grades could be considered as special cases of indicators).

Indicators are not often thought of as decision support models but, actually,in many circumstances, are. Indicators are usually presented as an efficient wayto synthesise information. But what do we need information for ? For makingdecisions !

In this chapter, we analyzed three different indicators: the human developmentindex, the ATMO (an air quality index) and the decathlon score.

On the one hand, all three indicators have been shown to present flaws: theydo not always reflect reality or what we consider as reality. This is due to an excessor a lack of compensation, to non monotonicity, to an incapability of dealing withdimension dependence, . . . These problems are not specific to indicators. Some ofthem have already been discussed in Chapter 3 and/or will be met in Chapter 6.

On the other hand, we saw that an indicator does not necessarily need to reflectreality or, at least, it does not need to reflect only reality.

5ASSESSING COMPETING

PROJECTS: THE EXAMPLE OFCOST-BENEFIT ANALYSIS

5.1 Introduction

Decision-making inevitably implies, at some stage, the allocation of rare resourcesto some alternatives rather than to others (e.g. deciding how to use one’s income).It is therefore not at all surprising that the question of helping a decision-makerto choose between competing alternatives, projects, courses of action and/or toevaluate them, has attracted the attention of economists. Cost-Benefit Analysis(CBA) is a set of techniques that economists have developed for this purpose. Itis based on the following simple and apparently inescapable idea: a project shouldonly be undertaken when its “benefits” outweigh its “costs”.

CBA is particularly oriented towards the evaluation of public sector projects.Decisions made by governments, public agencies and firms or international organ-isations are complex and have a huge variety of consequences. Some examples ofareas in which CBA has been applied will give a hint of the type of projects thatare evaluated:

• Economics: determining investment strategies for developing countries, al-locating budgets among agencies, developing an energy policy for a nation(Dinwiddy and Teal 1996, Kirkpatrick and Weiss 1996, Little and Mirlees1968, Little and Mirlees 1974),

• Transportation: building new roads or motor ways (Willis, Garrod andHarvey 1998), building a high-speed train, reorganising the bus lines in acity (Adler 1987, Schofield 1989),

• Health: building new hospitals, setting up prevention policies, buying newdiagnosis tools, choosing standard treatments for certain types of illnesses(Folland, Goodman and Stano 1997, Johannesson 1996),

• Environment: establishing pollution standards, creating national parks, ap-proving the human consumption of genetically-modified organisms, or irra-diated food (Hanley and Spash 1993, International Atomic Energy Agency1993, Johansson 1993, Toth 1997).

71

72 CHAPTER 5. ASSESSING COMPETING PROJECTS

These types of decision are immensely complex. They affect our everydaylife and are likely to affect that of our children. Most economists view CBA asthe standard way of evaluating such projects and of supporting public decision-making (numerous examples of practical studies using CBA can easily be found inapplied economics journals, e.g. American Journal of Agricultural Economics, En-ergy Economics, Environment and Planning, Journal of Environmental Economicsand Management, Journal of Health Economics, Journal of Policy Analysis andManagement, Journal of Public Finance and Public Choice, Journal of TransportEconomics and Policy, Land Economics, Pharmaco-Economics, Public Budget-ing and Finance, Regional Science and Urban Economics, Water Resources Re-search). Since fairly different approaches to these problems have been advocated,it is important to have a clear idea of what CBA is; if the claim of economistswas to be perfectly well-founded there would be hardly any need for other deci-sion/evaluation models.

Although it has distant origins (see Dupuit 1844), the development of CBAhas unsurprisingly coincided with the more active involvement of governments ineconomic affairs that started after the great depression and climaxed after WorldWar II in the 50’s and 60’s. A good overview of the early history of CBA canbe found in Dasgupta and Pearce (1972). After having started in the USA inthe field of Water Resource Management (see Krutilla and Eckstein (1958) foran overview of these pioneering developments), the principles of CBA were soonadopted in other areas and countries, the UK being the first and more active one.While research on (and applications of) CBA grew at a very fast rate during the50’s and 60’s, the principles of CBA were entrenched in a series of very influential“manuals for project evaluation” produced by several international organisations(OECD: Little and Mirlees (1968), Little and Mirlees (1974), ONUDI: Dasgupta,Marglin and Sen (1972) and, more recently, World Bank: Adler (1987), Asian De-velopment Bank: Kohli (1993)). In many countries nowadays, the Law makes itan obligation to evaluate projects using the principles of CBA. Research on CBAis still active and economists have spent considerable time and energy in investi-gating its foundations and refining the various tools that it requires in practicalapplications (recent references include Boardman 1996, Brent 1996, Nas 1996).

It would be impossible to give a fair account of the immense literature on CBAin a few pages. Although somewhat old, two excellent introductory references areDasgupta and Pearce (1972) and Lesourne (1975). Less ambitiously, we shall tryhere to:

• give a brief and informal account of the principles underlying CBA,

• give an idea of how these principles are applied in practice,

• give a few hints on the scope and limitations of CBA.

These three objectives structure the rest of this chapter into sections. Ouraim, while clearly not being to promote the use of CBA, is not to support thenowadays-fashionable claim (especially among environmentalists) that CBA is anoutdated useless technique either. In pointing out what we believe to be some

5.2. THE PRINCIPLES OF CBA 73

limitations of CBA, we only want to give arguments refuting the claim of someeconomists that, under all circumstances, it is the only “consistent” way to supportdecision/evaluation processes (Boiteux 1994).

5.2 The principles of CBA

5.2.1 Choosing between investment projects in private firms

The idea that a project should only be undertaken if its “benefits” outweigh its“costs” is at the heart of CBA. This claim may seem so obvious that it need notbe discussed any further. It is of little practical content however unless we definemore precisely what “costs” and “benefits” are and how to evaluate and comparethem. Some discussion will therefore prove useful.

A simple starting point is to be found in the literature on Corporate Finance onthe choice between “investment projects” in private firms. An investment projectmay usefully be seen as an operation in which money is spent today (the “costs”),with the hope that this money will produce even more money (the “benefits”)tomorrow.

A useful way to evaluate such an investment project is the following. First atime horizon for its evaluation must be chosen. If the very nature of the projectmay command this choice (e.g. because after a certain date the Law will change,equipment will have to be replaced) the general case is that the duration of theproject is more or less conventionally chosen as the period of time for which itseems reasonable and useful to perform the evaluation.

Although a “continuous” evaluation is theoretically possible, real-world appli-cations imply dividing the duration of the project into time periods of equal length.This involves some arbitrariness (should we choose years or semesters?) as well astrade-offs between the depth and the complexity of the evaluation model.

Suppose now that a project is to be evaluated on T time periods of equal length.The next step is to try to evaluate the consequences of the project in each of thesetime periods. Such a task may be more or less easy depending on the nature ofthe project, the environment of the firm and the duration of the project. We seekto obtain an evaluation of the amount of cash that is generated by the projectduring each time period, this amount being the difference between the “benefits”and the “expenses” generated by the project (including the residual value of theproject in the last period). Note that these evaluations are relative: they aim atcapturing the influence of the project on the firm and not its overall situation.Let us denote b(i) (resp. c(i)) the benefits (resp. the expenses) generated by theproject during the ith period of time. The net effect of the project in period i istherefore a(i) = b(i)− c(i).

At this stage, the evaluation model of the project has the form of an evaluationvector with T+1 components (a(0), a(1), . . . , a(T )) where 0 conventionally denotesthe starting time of the project. In general, some of the components of this vector(most notably a(0)) will be negative (if not, you should enjoy the free lunch andthere is hardly any evaluation problem). Although all components of the evaluationvector are expressed in identical monetary units (m.u.), the (algebraic) sum a(0)


is to be received today while a(1) will only be received one time period ahead.Therefore these two numbers, although expressed in the same unit, are not directlycomparable. There is a simple way however to summarise the components of theevaluation vector using a single number.

Suppose that there is a capital market on which the firm is able to lend orborrow money at a fixed interest rate of r per time period (this market is assumedto be perfect: borrowing and lending will not affect r and are not restricted). Ifyou borrow 1 m.u. for one time period on this market today, you will have to spend(1 + r) m.u. in period 1 in order to respect your contract. Similarly, if you knowthat you will receive 1 m.u. in period 1, you can borrow an amount of 1

1+r m.u.:your revenue of 1 m.u. in period 1 will allow you to reimburse exactly what youhave to i.e. 1

1+r (1 + r) = 1 m.u. Hence, being sure of receiving 1 m.u. in period1 corresponds to receiving, here and now, an amount 1

1+r m.u. Using a similarreasoning and taking into account compound interest, receiving 1 m.u. in periodi corresponds to an amount of 1

(1+r)i m.u. now. This is what is called discountingand r is called the discounting rate.

This suggests a simple way of summarising the components of the vector(a(0), a(1), . . . , a(T )) as the sum to be received now that is equivalent to thiscash stream via borrowing and lending operations on the capital market. Thissum, called the Net Present Value (NPV ) of the project is given by:

NPV =T∑

i=0

a(i)(1 + r)i

=T∑

i=0

b(i)− c(i)(1 + r)i

(5.1)

If NPV > 0, the cash stream of the project is equivalent to receiving moneynow, i.e. taking into account the costs and the benefits of the project and theirdispersion in time, it appears that the project makes the firm richer and, thus,should be undertaken. The reverse conclusion obviously holds if NPV < 0. WhenNPV = 0, the firm is indifferent between undertaking the project or not.

This simple reasoning underlies the following well-known rule for choosing be-tween investment projects in Finance: “when projects are independent, choose allprojects that have a strictly positive NPV ”. In deriving this simple rule, we havemade various hypotheses. Most notably:

• a duration for the project was chosen,

• the duration was divided into conveniently chosen time periods of equallength,

• all consequences of the projects were supposed to be adequately modelled asbenefits b(i) and costs c(i) expressed in m.u. for each time period,

• a perfect capital market was assumed to exist,

• the effect of uncertainty and/or imprecision was neglected,


• other possible constraints were ignored (e.g. projects may be exclusive, syn-ergetic).

The literature in Finance is replete with extensions of this simple model that allowto cope with less simplistic hypotheses.

5.2.2 From Corporate Finance to CBA

Although the projects that are usually evaluated using CBA are considerably morecomplex than the ones we implicitly envisaged in the previous paragraph, CBAmay usefully be seen as using a direct extension of the rule used in Finance. Themain extensions are the following:

• in CBA “costs” and “benefits” are evaluated from the point of view of “so-ciety”,

• in CBA “costs” and “benefits” are not necessarily directly expressed in m.u.;when this happens, conveniently chosen “prices” are used to convert theminto m.u.,

• in CBA the discounting rate has to be chosen from the point of view of“society”.

Retaining the spirit of the notations used above, the benefits b(i) and costs c(i)of a project in period i are seen in CBA as vectors with respectively ` and `′

components:

b(i) = (b(1, i), b(2, i), . . . b(`, i)) ,c(i) = (c(1, i), c(2, i), . . . c(`′, i))

where b(j, i) (resp. c(k, i)) denotes the “social benefits” (resp. the “social costs”)on the jth dimension (resp. on the kth dimension), evaluated in units that arespecific to that dimension, generated by the project in period i.

In each period, “costs” and “benefits” are converted into m.u. using suitablychosen “prices”. We denote by p(j) (resp p′(k)) the price of one unit of socialbenefit on the jth dimension (resp. one unit of the social cost on the kth dimension)expressed in m.u. (for simplicity, and consistently with real-world applications,prices are assumed to be independent from the time period). These prices areused to summarise the vectors b(i) and c(i) into single numbers expressed in m.u.letting:

b(i) =∑j=1

p(j)b(j, i)

and

c(i) =`′∑

k=1

p′(k)c(k, i)


where b(i) (resp. c(i)) denotes the social benefits (resp. costs) generated by theproject in period i converted into m.u.

After this conversion and having suitably chosen a social discounting rate r,it is possible to apply the standard discounting formula for computing the NetPresent Social Value (NPSV ) of a project. We have:

NPSV =T∑

i=0

b(i)− c(i)(1 + r)i

=T∑

i=0

∑j=1

p(j)b(j, i)−`′∑

k=1

p′(k)c(k, i)

(1 + r)i(5.2)

and a project where NPSV > 0 will be interpreted as improving the welfareof society and, thus, should be implemented (in the absence of other constraints).

It should be observed that the difficulties that we mentioned concerning thecomputation of the NPV are still present here. Extra difficulties are easily seen toemerge:

• how can one evaluate “benefits” and “costs” from a “social point of view”?

• is it always possible to measure the value of “benefits” and “costs” in mon-etary units and how should the prices be chosen?

• how is the social discount rate chosen?

It is apparent that CBA is a “mono-criterion” approach that uses “money” asa yardstick. Clearly the foundations of such a method and the way of using itin practice deserve to be clarified. Section 5.2.3 presents an elementary theoret-ical model that helps understanding the foundations of CBA. It may be skippedwithout loss of continuity.

5.2.3 Theoretical foundations

It is obviously impossible to give a complete account of the vast literature on thefoundations of CBA which has deep roots in Welfare Economics here. We wouldhowever like to give a hint of why CBA consistently insists on trying to “price out”every effect of a project. The important point here is that CBA conducts projectevaluation within an “environment” in which markets are especially importantinstruments of social co-ordination.

An elementary theoretical model

Consider a one-period economy in which m individuals consume n goods that areexchanged on markets. Each individual j is supposed to have completely orderedpreferences for consumption bundles. These preferences can be conveniently repre-sented using a utility function Uj(qj1, qj1, . . . , qjn) where qji denotes the quantitiesof good i consumed by individual j.


Social preferences are supposed to be well-defined in terms of the preferencesof the individuals through a “social utility function” (or “social welfare function”)W (U1, U2, . . . , Un). It is useful to interpret W as representing the preferences of a“planner” regarding the various “social states”.

Starting from an initial situation in the economy, consider a “project”, inter-preted as an external shock to the economy, consisting in a modification of thequantities of goods consumed by each individual. These modifications are sup-posed to be marginal; they will not affect the prices of the various goods. Theimpact of such a shock on social welfare is given by (assuming differentiability):

dW =m∑

j=1

n∑i=1

WjUjidqji(5.3)

whereWj = ∂W

∂Ujand Uji = ∂Uj

∂qji

Social welfare will increase following the shock if dW > 0.The existence of markets for the various goods and the hypothesis that indi-

viduals operate on these markets so as to maximise utility ensure that, before theshock, we have, for all individuals j and for all goods i and k:

Uji

Ujk=pi

pk(5.4)

where pi denotes the price of the ith good. Having chosen a particular goodfor numeraire (we shall call that good “money”), this implies that:

Uji = λjpi(5.5)

where λj can be interpreted as the marginal effect on the utility of individualj of a marginal variation of the consumption of the numeraire good, i.e. as themarginal utility of “income” for individual j.

Using 5.5, 5.3 can be rewritten as:

dW =m∑

j=1

λiWj

n∑i=1

pidqji(5.6)

In equation 5.6, the coefficient λiWj has a useful interpretation: it representsthe increase in social welfare following a marginal increase of the income of indi-vidual j.

Under the hypothesis that, before the shock, the distribution of income is “op-timal” in the society, the conclusion is that the coefficients λiWj are constant overindividuals (otherwise income would have been reallocated in favour of individualsfor which λiWj is the larger). Under this hypothesis, we may always normalise Win such a way that λiWj = 1, for all j. We therefore rewrite equation 5.6 as:


dW =m∑

j=1

n∑i=1

pidqji(5.7)

which amounts to saying that the social effects of the shock are measured asthe sum over individuals of the variation of their consumption evaluated at marketprices (i.e. the so-called consumer surplus). In this simple model, variations ofsocial welfare are therefore conveniently measured in money terms using marketprices.

Returning to CBA, the relation 5.7 coincides with the computation of theNPSV when time is not an issue and the effects (costs or benefits) of a projectcan be expressed in terms of consumption of goods exchanged on markets. Thegeneral formula for computing the NPSV may be seen as an extension of 5.7without these restrictions.

Extensions and remarks

The limitations of the elementary model presented above are obvious. The mostimportant ones seem to be the following:

• the model only deals with marginal changes in the economy,

• the model considers a single-period economy without production,

• the economy is closed (no imports or exports) and there is no government(and in particular no taxes),

• the distribution of income was assumed to be optimal.

In spite of all its limitations, our model allows us to understand, through thesimple derivation of equation 5.7, the rationale for trying to price out all effects ofa project in order to assess its contribution to social welfare.

A detailed treatment of the foundations of CBA without our simplifying hy-potheses can be found in Dreze and Stern (1987). Although we shall not enterinto details, it should be emphasised that the theoretical foundations of CBA arecontroversial on some important points. The appropriateness of equation 5.7 andof related formulas is particularly clear in situations that are fairly different fromthe ones in which CBA is currently used as an evaluation tool. These are oftencharacterised by:

• non-marginal changes (think of the construction of a new underground linein a city),

• the presence of numerous public goods for which no market price is available(think of health services or education),

• the presence of numerous externalities (think of the pollution generated bya new motorway),

5.3. SOME EXAMPLES IN TRANSPORTATION STUDIES 79

• markets in which competition is altered in many ways (monopolies, taxes,regulations),

• effects that are highly complex and may concern a very long period of time(think of a policy for storing used nuclear fuel),

• effects that are very unevenly distributed among individuals and raise im-portant equity concerns (think of your reaction if a new airport were to bebuilt close to your second residence in the middle of the countryside),

• the overwhelming presence of uncertainty (technological changes, future prices,long term effects of air pollution on health),

• the difficulty of evaluating some effects in well-defined units (think of theaesthetic value of the countryside) and, thus, to price them out

In spite of these difficulties, CBA still mainly rests on the use of the NPSV (orsome of its extensions) to evaluate projects. Economists have indeed developed anincredible variety of tools in order to use the NPSV even in situations in whichit would a priori seem difficult to do so. It is impossible to review the immenseliterature that these efforts have generated here. It includes: the determination ofprices for “goods” without markets, e.g. contingent valuation techniques or hedonicprices (see Scotchmer 1985, Loomis, Peterson, Champ, Brown and Lucero 1998),the determination of an appropriate social discounting rate (useful references onthis controversial topic include Harvey 1992, Harvey 1994, Harvey 1995, Keelerand Cretin 1983, Weitzman 1994), the inclusion of equity considerations in thecalculation of the NPSV (Brent 1984), the treatment of uncertainty, the consid-eration of irreversible effects (e.g. through the use of option values). An overviewof this literature may be found in Sugden and Wiliams (1983) and in Zerbe andDively (1994). We will simply illustrate some of these points in section 5.3.

5.3 Some examples in transportation studies

Public investment in transportation facilities amounts to over 80 109 FRF annu-ally in France (around 14 109 USD or 14 109

e). CBA is presently the standardevaluation technique for such projects. It is impossible to give a detailed accountof how CBA is currently applied in France for the evaluation of transportationinvestment projects; this would take an entire book even for a project of moderateimportance. In order to illustrate the type of work involved in such studies, weshall only take a few examples (for more details, see Boiteux (1994) and Syndi-cat des Transports Parisiens (1998); a useful reference in English is Adler (1987))based on a number of real-world applications. For concreteness, we shall envisage aproject consisting in the extension of an underground line in the suburbs of Paris.Effects of such a project are clearly very diverse. We will concentrate on some ofthem here, leaving direct financial effects aside (construction costs, maintenancecosts, exploitation costs) although their evaluation may raise problems.


5.3.1 Prevision of traffic

An inevitable step in all studies of this type is to forecast the modification of thevolume and the structure of the traffic that would follow the implementation of theproject. Its main “benefits” consist in “time gains”, which are obviously directlyrelated to traffic forecasts (time gains converted into m.u. frequently account formore than 50% of the benefits of these types of projects).

Implementing such forecasting models is obviously an enormous task. Localmodifications in the offer of public transportation may have consequences on thetraffic in the whole region. Furthermore, such forecasts are usually made at anearly stage of development of the project, a stage in which all details (concerninge.g. the tariffing of the new infrastructure or the frequency of the trains) may notbe completely decided yet.

Traffic forecast models usually involve highly complex modal choice modulescoupled with forecasting and/or simulation techniques. Their outputs are clearlycrucial for the rest of the study. Nearly all public transportation firms and gov-ernmental agencies in France have developed their own tools for generating trafficforecasts. They differ on many points, e.g. the statistical tools used for modalchoice or the segmentation of the population that is used (Boiteux 1994). Unsur-prisingly these models lead to very different results.

As far as we know, all these models forecast the traffic for a period of time thatis not too distant from the installation of the new infrastructure. These forecastsare then more or less mechanically updated (e.g. increased following the observedrate of growth of the traffic in the past few years) in order to obtain figures for allthe periods of study. None of them seem to integrate the potential modificationsof behaviour of a significant proportion of the population in reaction to the newinfrastructure (e.g. by moving away from the centre of the city) whereas sucheffects are well-known and have proved to be overwhelming in the past.

These models are not part of CBA and indicating their limitations shouldnot be seen as a criticism of CBA. Their results, however, form the basis of theevaluation model.

5.3.2 Time gains

Traffic forecasts are used to evaluate the time that inhabitants of the Paris regionwould gain with the extension of the metro line. Such evaluations, on top of beingtechnically rather involved, raise some basic difficulties:

• is one minute equal to one minute? Such a question may not be as silly asit seems. In most models time gains are evaluated on the basis of what iscalled “generalised time” i.e. a measure of time that accounts for elements of(dis)comfort of the journey (e.g. temperature, stairs to be climbed, a moreor less crowded environment). Although this seems reasonable, much lessefforts have been devoted to the study of models allowing to convert timeinto generalised time than on the “price of time” that will be used afterwards,

5.3. SOME EXAMPLES IN TRANSPORTATION STUDIES 81

• is one hour worth 60 times one minute? Most models evaluating and pricingout time gains are strictly linear. This is dubious since some gains (e.g. 10seconds per user-day) might well be considered insignificant. Furthermore,the loss of one hour daily for some users may have a much greater impactthan 60 losses of 1 minute,

• what is the value of time and how should time gains be converted into mon-etary units? Should we take the fact that people have different salaries intoaccount? Should we rather use price based on “stated preferences”? Shouldwe take into account the fact that most surveys using stated preferences haveshown that the value of time highly depends on the motive of the journey(being much lower for journeys not connected to work)?

The present practice in the Paris region is to linearly evaluate all (generalised)time gains using the average hourly net salary in the Region (74 FRF/hour in 1994or approximately 13 USD/hour or 13 e/hour). In view of the major uncertaintiessurrounding traffic forecasts that are used to compute the time gains and thearbitrariness of the “price of time” that is used, it does not seem unfair to considerthat such evaluations give, at best, interesting indications.

5.3.3 Security gains

Important benefits of projects in public transportation are “security gains” (hope-fully, using the metro is far less risky than driving a car). A first step consists inevaluating, based on traffic forecasts, the gain of security in terms of the numberof (“statistical”) deaths and serious injuries that would be avoided annually by theproject. The following one consists in converting these figures into monetary unitsthrough the use of a “price for human life”. The following figures are presentlyused in France (in 1993 FRF; they should be divided by a little less than 6 in orderto obtain 1993 USD):

Death 3 600 000 FRFSerious injury 370 000 FRFOther injury 79 000 FRF

these figures being based on several stated preference studies (it is not withoutinterest to note that these figures were quite different before 1993, human lifebeing, at that time, valued at 1 866 000 FRF). Using these figures and combiningthem with statistical information concerning the occurrence of car accidents andtheir severity, leads to benefits in terms of security which amount to 0.08 FRF pervehicle-km avoided in the Paris region.

Although this might not appear as a very pleasant subject of study, econo-mists have developed many different methods for evaluating the value of humanlife, including methods based on “human capital”, the value of life insurance con-tracts, sums granted by courts following accidents, stated preference approaches,revealed preference approaches including smoking and driving behaviour, wages


for activities involving risk (Viscusi 1992). Besides raising serious ethical dif-ficulties (Broome 1985), these studies exhibit incredible variations across tech-niques and, seemingly similar, countries (this explains why in many medical stud-ies, in which “benefits” mainly include lives saved, “cost-effectiveness” analysisis often preferred to CBA since it does not require to price out human life (seeJohannesson 1995, Weinstein and Stason 1977). We reproduce below some sig-nificant figures for the value of life used in several European countries (this tableis adapted from Syndicat des Transports Parisiens 1998); all figures are in 1993European Currency Unit (ECU), one 1993 ECU being approximately one 1993USD):

Country Price of human lifeDenmark 628 147 ECUFinland 1 414 200 ECUFrance 600 000 ECUGermany 406 672 ECUPortugal 78 230 ECUSpain 100 529 ECUSweden 984 940 ECUUK 935 149 ECU

5.3.4 Other effects and remarks

The inclusion of other effects in the computation of the NPSV of a project in suchstudies raises difficulties similar to the ones mentioned for time gains and securitygains. Their evaluation is subject to much uncertainty and inaccurate determina-tion. Moreover the “prices” that are used to convert them into monetary units canbe obtained using many different methods leading to significantly different results.

As is apparent in Syndicat des Transports Parisiens (1998), prices used to“monetarise” effects like:

• noise,

• local air pollution,

• contribution to the greenhouse effect,

are mainly conventional.The social discounting rate used for such projects is determined by the govern-

ment (the “Commissariat General du Plan”). Presently a rate of 8% is used (notethat this rate is about twice as high as the rate commonly used in Germany). Aperiod of evaluation of 30 years is recommended for this type of project.

The conclusions and recommendations of a recent official report (Boiteux 1994)on the evaluation of public transportation projects stated that:

• although CBA has limitations, it remains the best way to evaluate suchprojects,

5.4. CONCLUSIONS 83

• all effects that can reasonably be monetarised should be included in thecomputation of the NPSV,

• all other effects should be described verbally. Monetarised effects and nonmonetarised ones should not be included in a common table that wouldgive the same statute and, implicitly, importance to all. A multiple criteriapresentation would furthermore attribute an unwarranted scientific value tosuch tables,

• extensive sensitivity analyses should be conducted,

• all public firms and administrations should use a similar methodology inorder to allow meaningful comparisons,

• an independent group of CBA experts should evaluate all important projects,

• CBA studies should remain as transparent as possible.

In view of:

• the immense complexity of such evaluation studies,

• the unavoidable elements of uncertainty and inaccurate determination enter-ing in the evaluation model,

• the rather unconvincing foundations of CBA for this type of project,

the conclusion that CBA remains the “best” method seems unwarranted. CBAhas often been criticised on purely ideological grounds, which seems ridiculous.However the insistence on seeing CBA as a “scientific”, “rational” and “objective”evaluation model, all words that are frequently spotted in texts on CBA (Boiteux1994), seems no more convincing.

5.4 Conclusions

CBA is an important decision/evaluation method. We would like to note in par-ticular that:

• it has a sound, although limited and controversial on some points, theoreticalbasis. Contrary to many other decision/evaluation methods that are more orless ad hoc, the users of CBA can rely on more than 50 years of theoreticaland practical investigations,

• CBA emphasises the fact that decision and/or evaluation methods are notcontext-free. Having emerged from economics, it is not surprising that mar-kets and prices are viewed as the essential parts of the environment in CBA.More generally, any decision/evaluation method that would claim to becontext-free would seem of limited interest to us,


• CBA emphasises the need for consistency in decision-making. It aims atproviding simple tools allowing, in a decentralised way, to ensure a minimalconsistency between decisions taken by various public bodies. Any deci-sion/evaluation model should tackle this problem,

• CBA explicitly acknowledges that the effects of a project may be diverseand that all effects should be taken into account in the model. In view ofthe popularity of purely financial analyses for public sector projects, this isworth recalling (Johannesson 1995),

• although the implementation of CBA may involve highly complex models(e.g. traffic forecasts), the underlying logic of the method is simple and easilyunderstandable,

• CBA is a formal method of decision/evaluation. It is the belief and expe-rience of the authors of this book that such methods may have a highlybeneficial impact on the treatment of highly complex questions. Althoughother means of evaluation and of social co-ordination (e.g. negotiation, elec-tions, exercise of power) clearly exist, formal methods based on an explicitlogic can provide invaluable contributions allowing sensitivity analyses, pro-moting constructive dialogue and pointing out crucial issues.

We already mentioned that we disagree with the view held by some economiststhat CBA is the only “rational” “scientific” and “objective” method for helpingdecision-makers (such views are explicitly or implicitly present in Boiteux (1994)or Mishan (1982)). We strongly recommend Dorfman (1996) as an antidote to thisradical position.

We shall stress here why we think that decision/evaluation models should notbe confused with CBA:

• supporting decision/evaluation processes involves many more activities thanjust “evaluation”. As we shall see in chapter 9, “formulation” is a basicactivity of any analyst. The determination of the “frontiers” of the study andof the various stakeholders, the modelling of their objectives, the inventionof alternatives, form an important—we would tend to say a crucial—part ofany decision/evaluation support study. CBA offers little help at this stage.Even worse, too radical an interpretation of CBA might lead (Dorfman 1996)to an excessive attention given to monetarisation, which may be detrimentalto an adequate formulation,

• having sound theoretical foundations, such as CBA, is probably a necessarybut insufficient condition to build useful decision/evaluation tools (let alonethe “best” ones). A recurrent theme in OR is that a successful implemen-tation of a model is contingent on many other factors than just the qualityof the underlying method. Creativity, flexibility and reactivity are essen-tial ingredients of the process. They do not seem always to be compatiblewith a too rigid view on what a “good decision/evaluation model” shouldbe. Furthermore, the foundations of CBA are especially strong in situa-tions that are at variance with the usual context of public sector projects:

5.4. CONCLUSIONS 85

non-marginal changes, public goods, externalities are indeed pervasive (seeBrekke 1997, Holland 1995, Laslett 1995),

• a decision/evaluation tool will be all the more useful that it lends itselfeasily to an insertion into a decision process. Decision processes involvingpublic sector projects are usually extremely complex. They last for yearsand involve many stakeholders generally having conflicting objectives. CBAtries to summarise the effects of complex projects into a single number. Thecomplex calculations leading to the NPSV use a huge amount of “data” withvarying levels of credibility. Merging rather uncontroversial information (e.g.the number of deaths per vehicle-km in a given area) with much more sensibleand debatable information (e.g. the price of human life) from the start mightnot give many opportunities to stakeholders for reaching partial agreementsand/or for starting negotiations. This might also result in a model that mightnot appear transparent enough to be really convincing (Nyborg 1998),

• CBA is a mono-criterion approach. Although this allows to produce outputsin simple terms (the NPSV) it might be argued that the efforts that have tobe made in order to monetarise all effects may not always be needed. On thebasis of less ambitious methods, it is not unlikely that some projects may beeasily discarded and/or that some clearly superior project will emerge. Evenwhen monetarisation is reasonably possible, it may not always be necessary,

• in CBA the use of “prices” supposedly revealed by markets (most oftenin “market-like” mechanisms) tend to obscure the, implicit, weighting of thevarious effects of a project. This leaves little room for political debate, whichmight be an incentive for some stakeholders to simply discard CBA,

• the additive linear structure of the, implicit, aggregation rule used in CBAcan be subjected to the familiar criticisms already mentioned in chapters 3and 4. Probably all users of CBA would agree that an accident killing 10 000people might result in a dramatic situation in which the “costs” incurredhave little relation with the “costs” of 10 000 accidents each resulting in oneloss of life (think of a serious nuclear accident compared to “ordinary” caraccidents). Similarly, they might be prepared to accept that there may existair pollution levels above which all mammal life on earth could be endangeredand that although these levels are multiples of those currently manipulatedin the evaluation of transportation projects, they may have to be priced outquite differently. If there are limits to linearity, CBA offers almost no clueas to where to place these limits. It would seem to be a heroic hypothesis tosuppose that such limits are simply never reached in practice,

• the implicit position of CBA vis-a-vis distributional considerations is puz-zling. Although the possibility of including in the computation of the NPSVindividual “weights” (capturing a different impact on social welfare of indi-vidual variations of income) exists (Brent 1984), it is hardly ever used in prac-tice. Furthermore, this possibility is at much variance with more subtle views


on equity and distributional considerations (see Fishburn 1984, Fishburn andSarin 1991, Fishburn and Sarin 1994, Fishburn and Straffin 1989, Gafni andBirch 1997, Schneider, Schieber, Eeckoudt and Gollier 1997, Weymark 1981),

• the use of a simple “social discounting rate” as a surrogate for taking a clearposition on inter-generational equity issues is open to discussion. Even ac-cepting the rather optimistic view of a continuous increase of welfare and oftechnical innovation, taking decisions today that will have important conse-quences in 1000 years (think of the storage of used nuclear fuel) while using amethod that gives almost no weight to what will happen 60 years from now( 11.0860 ≈ 1%) seems debatable (see Harvey 1992, Harvey 1994, Weitzman

1994),

• the very idea that “social preferences” exist is open to question. We showedin chapter 2 that “elections” were not likely to give rise to such a concept.It seems hard to think of other forms of social co-ordination that could domuch better. We doubt that markets are such particular institutions thatthey always allow to solve or bypass the problem in an undebatable way. Butif “social preferences” are ill-defined, the meaning of the NPSV of a projectis far from being obvious. We would argue that it gives, at best, a partialand highly conventional view of the desirability of the project,

• decision/evaluation models can hardly lead to convincing conclusions if el-ements of uncertainty and inaccurate determination entering the model arenot explicitly dealt with. This is especially true in the context of the eval-uation of public sector projects. Practical texts on CBA always insist onthe need for sensitivity analysis before coming to conclusions and recom-mendations. Due to the amount of data of varying quality included in thecomputation of the NPSV, sensitivity analysis is often restricted to studyingthe impact of the variation of a few parameters on the NPSV, one parametervarying at a time. This is rather far from what we could expect in such situ-ations; a true “robustness analysis” should combine simultaneous variationsof all parameters in a given domain,

These limitations should not be interpreted as implying a condemnation ofCBA. We consider them as arguments showing that, in spite of its many qual-ities, CBA is far from exhausting the activity of supporting decision/evaluationprocesses (Watson 1981). We are afraid to say that if you disagree on this point,you might find the rest of this book of extremely limited interest. On the otherhand, if you expect to discover in the next chapters formal decision/evaluationtools and methodologies that would “solve all problems and avoid all difficulties”you should also realise that your chances of being disappointed are very high.

6COMPARING ON THE BASIS OF

SEVERAL ATTRIBUTES: THEEXAMPLE OF MULTIPLE CRITERIA

DECISION ANALYSIS

6.1 Thierry’s choice

How to choose a car is probably the multiple criteria problem example that hasbeen most frequently used to illustrate the virtues and possible pitfalls of multiplecriteria decision aiding methods. The main advantage of this example is that theproblem is familiar to most of us (except for one of the authors of this book who isdefinitely opposed to owning a car) and it is especially appealing for male decision-makers and analysts for some psychological reason. However, one can object thatin many illustrations, the problem is too roughly stated to be meaningful; themotivations, needs, desires and/or phantasms of the potential buyer of a new orsecond-hand car can be so diversified that it will be very difficult to establish a listof relevant points of view and build criteria on which everybody would agree; theprice for instance is a very delicate criterion since the amount of money the buyeris ready to spend clearly depends on his social condition. The relative importanceof the criteria also very much depends on the personal characteristics of the buyer:there are various ideal types of car buyers, for instance people who like sportivecar driving, or large comfortable cars or reliable cars or cars that are cheap to run.One point should be made very clear: it is unlikely that a car could be universallyrecognised as the best, even if one restricts oneself to a segment of the market;this is a consequence of the existence of decision-makers with many different “valuesystems”.

Despite these facts, we have chosen to use the “Choosing a car” example,in a properly defined context, for illustrating the hypotheses underlying variouselementary methods for modelling and aggregating evaluations in a decision aidingprocess. The case is simple enough to allow for a short but complete description;it also offers sufficient potential for reasoning on quite general problems raised bythe treatment of multi-dimensional data in view of decision and evaluation. Wedescribe the context of the case below and will invoke it throughout this chapterfor illustrating a sample of decision aiding methods.

87

88 CHAPTER 6. COMPARING ON SEVERAL ATTRIBUTES

Trademark and type1 Fiat Tipo 20 ie 16V2 Alfa 33 17 16V3 Nissan Sunny 20 GTI 164 Mazda 323 GRSI5 Mitsubishi Colt GTI6 Toyota Corolla GTI 167 Honda Civic VTI 168 Opel Astra GSI 169 Ford Escort RS 2000

10 Renault 19 16S11 Peugeot 309 GTI 16V12 Peugeot 309 GTI13 Mitsubishi Galant GTI 1614 Renault 21 20 turbo

Table 6.1: List of the cars selected as alternatives

6.1.1 Description of the case

Our example is adapted from an unpublished report by a Belgian engineeringstudent who describes how he decided which car he would buy. The story datesback to 1993; our student—call him Thierry—aged 21, is passionate about sportivecars and driving (he has taken lessons in sports car driving and participates in carraces). Being a student, he cannot afford to buy either a new car nor a luxurysecond hand sports car; so he decides to explore the middle range segment, 4 yearold cars with powerful engines. Thierry intends to use the car in everyday life andoccasionally in competitions. His strategy is first to select the make and type ofthe car on the basis of its characteristics, estimated costs and performances, thento look for such a car in second hand car sale advertisements. This is what heactually did, finding “the rare pearl” about twelve months after he made up hismind as to which car he wanted.

Selecting the alternatives

The initial list of alternatives was selected taking an additional feature into ac-count. Thierry lives in town and does not have a garage to park the car in atnight. So he does not want a car that would be too attractive to thieves. Thisexplains why he discards cars like VW Golf GTI or Honda CRX. He thus limitshis selection of alternatives to the 14 cars listed in Table 6.1.

Selecting the relevant points of view and looking for or constructing indicesthat reflect the performances of the alternatives for each of the viewpoints oftenconstitutes a long and delicate task; it is moreover a crucial one since the qualityof the modelling will determine the relevance of the model as a decision aidingtool. Many authors have advocated a hierarchical approach to criteria building,each viewpoint being decomposed into sub-points that can be further decomposed

6.1. THIERRY’S CHOICE 89

(Keeney and Raiffa (1976), Saaty (1980)). A thorough analysis of the propertiesrequired of the family of criteria selected in any particular context (consistentfamily, i.e. exhaustive, non-redundant and monotonic) can be found in Roy andBouyssou (1993) (see also Bouyssou (1990), for a survey).

We shall not emphasise the process of selecting viewpoints in this chapter,although it is a matter of importance. It is sufficient to say that Thierry’s concernsare very particular and that he accordingly selected five viewpoints related to cost(criterion 1), performance of the engine (criteria 2 and 3) and safety (criteria 4and 5).

Evaluations of the cars on these viewpoints have been obtained from monthlyjournals specialised in the benchmarking of cars. The official quotation of secondhand vehicles of various ages is also published in such journals.

Evaluating the alternatives

Evaluating the expenses incurred by buying and using a specific car is not asstraightforward as it may seem. Large variations from the estimation may oc-cur due to several uncertainty and risk factors such as actual life-length of thecar, actual selling price (in contrast to the official quotation), actual mileage peryear, etc. Thierry evaluates the expenses as the sum of an initial fixed cost andexpenses resulting from using the car. The fixed costs are the amount paid forbuying the car, estimated by the official quotation of the 4-year old vehicle, plusvarious taxes. The yearly costs involve another tax, insurance and petrol consump-tion. Maintenance costs are considered roughly independent of the car and henceneglected. Petrol consumption is estimated on the basis of three figures that arehighly conventional: the number of litres of petrol burned in 100 km is taken fromthe magazine benchmarks; Thierry somehow estimates his mileage at 12 000 kmper year and the price of the petrol to .9e per litre (1e, the European currencyunit, is approximately equivalent to 1 USD). Finally he expects (hopes) to use thecar for 4 years. On the basis of these hypotheses he gets the estimations of hisexpenses for using the car during 4 years that are reported in Table 1 (Criterion1 = Cost). The resale value of the car after 8 years is not taken into account dueto the high risk of accidents resulting from Thierry’s offensive driving style. Notethat the petrol consumption cost which is estimated with a rather high degree ofimprecision counts for about one third of the total cost. The purchase cost is alsohighly uncertain.

For building the other criteria Thierry has a large number of performance in-dices whose value is to be found in the magazine benchmarks at his disposal.Thierry’s particular interest in sporty cars is reflected in his definition of the othercriteria. Car performances are evaluated by their acceleration; criterion 2 (“Accel”in Table 6.2) encodes the time (in seconds) needed to cover a distance of one kilo-metre starting from rest. One could alternatively have taken other indicators suchas power of the engine, time needed to reach a speed of 100 km/h or to cover 400meters that are also widely available. Some of these values may be impreciselydetermined: they may be biased when provided by the car manufacturer (theprocedures for evaluating petrol consumption are standardised but usually under-


Name of cars Crit1 Crit2 Crit3 Crit4 Crit5Cost Accel Pick up Brakes Road-h

1 Fiat Tipo 18 342 30.7 37.2 2.33 32 Alfa 33 15 335 30.2 41.6 2 2.53 Nissan Sunny 16 973 29 34.9 2.66 2.54 Mazda 323 15 460 30.4 35.8 1.66 1.55 Mitsubishi Colt 15 131 29.7 35.6 1.66 1.756 Toyota Corolla 13 841 30.8 36.5 1.33 27 Honda Civic 18 971 28 35.6 2.33 28 Opel Astra 18 319 28.9 35.3 1.66 29 Ford Escort 19 800 29.4 34.7 2 1.75

10 Renault 19 16 966 30 37.7 2.33 3.2511 Peugeot 309 16V 17 537 28.3 34.8 2.33 2.7512 Peugeot 309 15 980 29.6 35.3 2.33 2.7513 Mitsubishi Galant 17 219 30.2 36.9 1.66 1.2514 Renault 21 21 334 28.9 36.7 2 2.25

Table 6.2: Data of the “choosing a car” problem

estimate the actual consumption for everyday use); when provided by specialisedjournalists in magazines, the procedures for measuring are generally unspecifiedand might vary since the cars are not all evaluated by the same person.

The third criterion that Thierry took into consideration is linked with thepick up or suppleness of the engine in urban traffic; this dimension is consideredimportant since Thierry also intends to use his car in normal traffic. The indicatorselected to measure this dimension (“Pick up” in Table 6.2) is the time (in seconds)needed for covering one kilometre when starting in fifth gear at 40 km/h. Againother indicators could have been chosen (e.g. the torque). This dimension is notindependent of the second criterion, since they are generally positively correlated(powerful engines generally lead to quick response times on both criteria); carsthat are specially prepared for competition may however lack suppleness in lowoperation conditions which is quite unpleasant in urban traffic. So, from the pointof view of the user, i.e. in terms of preferences, criteria 2 and 3 reflect differentrequirements and are thus both necessary. For a short discussion about the notionsof independence and interaction, the reader is referred to Section 6.2.4.

In the magazine’s evaluation report, several other dimensions are investigatedsuch as comfort, brakes, road-holding behaviour, equipment, body, boot, finish,maintenance, etc. For each of these, a number of aspects are considered: 10 forcomfort, 3 for brakes, 4 for road-holding, . . . . In view of Thierry’s particularmotivations, only the qualities of braking and of road-holding are of concern tohim and lead to the building of criteria 4 and 5 (resp. “Brakes” and “Road-h”in Table 6.2). The 3 or 4 partial aspects of each viewpoint are evaluated on anordinal scale the levels of which are labelled “serious deficiency”, “below average”,“average”, “above average”, “exceptional”. To get an overall indicator of brakingquality (and also for road-holding), Thierry re-codes the ordinal levels with integers


from 0 to 4 and takes the arithmetic mean of the 3 or 4 numbers; this results inthe figures with 2 decimals provided in the last two columns of Table 1. Obviouslythese numbers are also imprecise, not necessarily because of imprecision in theevaluations but because of the arbitrary character of the cardinal re-coding ofthe ordinal information and its aggregation via an arithmetic mean (postulatingimplicitly that, in some sense, the 3 components of each viewpoint are equallyimportant and the levels of each of the three scales are equally spaced). We shallhowever consider that these figures reflect, in some way, the behaviour of each carfrom the corresponding viewpoint; it is clear however that not too much confidenceshould be awarded to the precision of these “evaluations”.

Note that the first 3 criteria have to be minimised while the last 2 must bemaximised.

This completes the description of the “data” which, obviously, are not givenbut selected and elaborated on the basis of the available information. Being in-trinsically part of this data is an appreciation (more or less explicit) of their degreeof precision and their reliability.

6.1.2 Reasoning with preferences

In the second part of the presentation of this case, Thierry will provide informationabout his preferences. In fact, in the relatively simple decision situation he wasfacing (“no wife, no boss”, Thierry decides for himself and the consequences of hisdecision should not affect him crucially), he was able to make up his mind withoutusing any formal aggregation method. Let us follow his reasoning.

First of all he built a graphic representation of the data. Many types of rep-resentations can be thought of; popular spreadsheet software offer a large numberof graphical options for representing multi-dimensional data. Figure 6.1 showssuch a representation. Note that the evaluations for the various criteria have beenre-scaled in view of a better readability of the figure. The values for all criteriahave been mapped (linearly) onto intervals of length 2, the first criterion beingrepresented in the [0, 2] interval, the second criterion, in the [2, 4] interval and soon. For each criterion, the lowest evaluation observed for the sample of cars ismapped on the lower bound of the interval while the highest value is representedon the upper bound of the interval. Such a transformation of the data is notalways innocent; we briefly discuss this point below.

In view of reaching a decision, Thierry first discards the cars whose brakingefficiency and road-holding behaviour is definitely unsatisfactory, i.e. car numbers4, 5, 6, 8, 9, 13. The reason for such an elimination is that a powerful engine isneedless in competition if the chassis is not good enough and does not guaranteegood road-holding; efficient brakes are also needed to keep the risk inherent tocompetition at a reasonable level. The rules for discarding the above mentionedcars have not been made explicit by Thierry in terms of unattained levels on thecorresponding scales. Rules that would restate the set of remaining cars are forinstance:

criterion 4 ≥ 2


CostAccel

Supple

FiatAlfaNissan

MazdaMitsuColt

ToyotaHonda

OpelFord

R19Peu16

PeuMitsuGal

R21

0

1

2

3

4

5

6

Criteria to be minimised

BrakesRoad−h

FiatAlfaNissan

MazdaMitsuColt

ToyotaHonda

OpelFord

R19Peu16

PeuMitsuGal

R21

0

0.5

1

1.5

2

2.5

3

3.5

4

Criteria to be maximised

Figure 6.1: Performance diagram of all cars along the first three criteria (above;to be minimised) and the last two (below; to be maximised)


andcriterion 5 ≥ 2

with at least one strict inequality.Looking at the performances of the remaining cars, those labelled 1, 2, 10 are

further discarded. The set of remaining cars is restated for instance by the rule:

criterion 2 < 30

Finally, the car labelled 14 is eliminated since it is dominated by car number 11.“Dominated by car 11” means that car 11 is at least as good on all criteria and

better on at least one criterion (here all of them!). Notice that car number 14would not have been dominated if other criteria had been taken into considerationsuch as comfort or size: this car is indeed bigger and more classy than the othercars in the sample.

The cars left after the above elimination process are those labelled 3,7,11,12;their performances are shown on Figure 6.2. In these star-diagrams each car isrepresented by a pentagon; their values on each criterion have all been linearlyre-scaled, being mapped on the [1, 3] interval. The choice of interval [1, 3] insteadof interval [0, 2] is dictated by the mode of representation: the value “0” plays aspecial role since it is common to all axes; if an alternative was to receive a 0 valueon several criteria, those evaluations would all be represented by the origin, whichmakes the graph less readable. On each axis, the value 1 corresponds to the lowestvalue for one of the cars in the initial set of 14 alternatives on each criterion; thevalue 3 corresponds to the highest value for one of the 14 cars. In interpreting thediagrams, remember that criteria 1, 2 and 3 are to be minimised while the othershave to be maximised.

Thierry did not use the latter diagram (Figure 6.2); he drew the same diagramas in Figure 6.1 instead after reordering the cars; the 4 candidate cars were allput on the right of the diagram as shown in Figure 6.3; in this way Thierry wasstill able to compare the difference in the performances of two candidate cars for acriterion to typical differences for that criterion in the initial sample. This suggeststhat the evaluations of the selected cars should not be transformed independentlyof the values of the cars in the initial set; these still constitute reference points inrelation to which the selected cars are evaluated. On Figure 6.4, for the reader’sconvenience, we show a close-up of Figure 6.3 that is focused on the 4 selectedcars only.

Thierry first eliminates car number 12 on the basis of its relative weaknesson the second criterion (acceleration). Among the 3 remaining cars the one hechooses is number 11. Here are the reasons for this decision.

1. Comparing cars 3 and 11, Thierry considers that the price difference (about500e ) is worth the gain (.7 second) on the acceleration criterion.

2. Comparing cars 7 and 11, he considers that the cost difference (car 7 about1 500e more expensive) is not balanced by the small advantage on accelera-tion (.3 second) coupled with a definite disadvantage (.8 second) on supple-ness.


Nissan sunny 20 GTI 16V

0,00

1,00

2,00

3,00crit 1 (cost)

crit 2 (accel)

crit 3 (supple)crit 4 (brakes)

crit 5 (road-h)

Honda civic VTI 16

0,00

1,00

2,00

3,00crit 1 (cost)

crit 2 (accel)


crit 5 (road-h)

Peugeot 309 GTI 16

0,00

1,00

2,00

3,00crit 1 (cost)

crit 2 (accel)


crit 5 (road-h)

Peugeot 309 GTI

0,00

1,00

2,00

3,00crit 1 (cost)

crit 2 (accel)

crit 3 (supple)

crit 4 (brakes)

crit 5 (road-h)

Figure 6.2: Star graph of the performances of the 4 cars left after the eliminationprocess

Name of car Crit1 Crit2 Crit3 Crit4 Crit5Cost Acc Pick Brakes Road

3 Nissan Sunny 16 973 29 34.9 2.66 2.57 Honda Civic 18 971 28 35.6 2.33 2

11 Peugeot 16V 17 537 28.3 34.8 2.33 2.7512 Peugeot 15 980 29.6 35.3 2.33 2.75

Table 6.3: Performances of the 4 candidate cars


Cost (min)Accel (min)

Pick up (min)Brakes (Max)

Road−h (Max)

Fiat (1)

Alfa (2)

Mazda (4)

Mitsu Colt (5)

Toyota (6)

Opel (8)

Ford (9)

R19 (10)

Mitsu Gal (13)

R21 (14)

Nissan (3)

Honda (7)

Peu16 (11)

Peu (12)

0

1

2

3

4

5

6

7

8

9

10

Figure 6.3: Performance diagram of all cars; the 4 candidate cars stand on theright


Cost (min)

Accel (min)

Pick up (min)

Brakes (Max)

Road−h (Max)

Nissan (3)

Honda (7)

Peu16 (11)

Peu (12)

0

2

4

6

8

10

Figure 6.4: Detail of Figure 6.3: the 4 cars remaining after initial screening

Comments

Thierry’s reasoning process can be analysed as being composed of two steps. Thefirst one is a screening process in which a number of alternatives are discardedon the basis of the fact that they do not reach aspiration levels on some criteria.

Notice that these levels have not been set a priori as minimal levels of satis-faction; they have been set after having examined the whole set of alternatives, toa value that could be described as both desirable and accessible. The rules thathave been used for eliminating certain alternatives have exclusively been combinedin conjunctive mode since an alternative is discarded as soon as it does not fulfilany of the rules.

More sophisticated modes of combinations may be envisaged, for instance mix-ing up conjunctive and disjunctive modes with aspiration levels defined for sub-sets of criteria (see Fishburn (1978) and Roy and Bouyssou (1993), pp. 264-266).Another elementary method that has been used is the elimination of dominatedalternatives (car 11 dominates car 14).

In the second step of Thierry’s reasoning,

1. Criteria 4 and 5 were not invoked; there are several possible reasons for this:criteria 4 and 5 might be of minor importance or considered satisfactoryonce a certain level is reached; they could be insufficiently discriminating forthe considered subset of cars (this is certainly the case for criterion 4): thevalues of the differences for the set of candidate cars could be such that theyare not large enough to balance the differences on other criteria.

2. Subtle considerations on whether the balance of differences in performancebetween pairs of cars on 2 or 3 criteria results in an advantage to one of thecars in the pair.

6.2. THE WEIGHTED SUM 97

3. The reasoning is not made on the basis of re-coded values like those usedin the graphics; more intuition is needed, which is better supported by theoriginal scales. Since criteria 4 and 5 are aggregates and, thus, are notexpressed in directly interpretable units, this might also have been a reasonfor not exploiting them in the final selection.

This kind of reasoning that involves comparisons of differences in evaluationsis at the heart of the activity of modelling preferences and aggregating them inorder to have an informed decision process. In the simple case we are dealing withhere, the small number of alternatives and criteria has allowed Thierry to make uphis mind without having to build a formal model of his preferences. We have seen,however, that after the first step consisting in the elimination of unsatisfactoryalternatives, the analysis of the remaining four cars has been much more delicate.

Note also that if Thierry’s goal had been to rank order the cars in order ofdecreasing preference, it is not sure that the kind of reasoning he used for justchoosing the best alternative for him would have fit the bill. In more complexsituations (when more alternatives remain after an initial elimination or morecriteria have to be considered or if a ranking of the alternatives is wanted), it mayappear necessary to use tools for modelling preferences.

There is another rather frequent circumstance in which more formal methodsare mandatory; if the decision-maker is bound to justify his decision to other per-sons (shareholders, colleagues, . . . ), the evaluation system should be more system-atic, for instance being able to cope with new alternatives that could be suggestedby the other people.

In the rest of this chapter, we discuss a few formal methods commonly used foraggregating preferences. We report on how Thierry applied some of them to hiscase and extrapolate on how he could have used the others. This can be viewed asan ex post analysis of the problem, since the decision was actually made well beforeThierry became aware of multiple criteria methods. In his ex post justificationstudy, Thierry has in addition tried to derive a ranking of the alternatives thatwould reflect his preferences.

6.2 The weighted sum

When dealing with multi-dimensional evaluations of alternatives, the basic andalmost natural (or perhaps, cultural?) attitude consists in trying to build a one-dimensional synthesis, which would reflect the value of the alternatives on a syn-thetic “super scale of evaluation”. This attitude is perhaps inherited from schoolpractice where all other performance evaluations of the pupils have long been (andoften still are) summarised in a single figure, a weighted average of their grades inthe various subjects. The problems raised by such a practice have been discussedin depth in Chapter 3. We discuss the application of the weighted sum to the carexample below, emphasising the very strong hypotheses underlying the use of thistype of approach.

Starting from the standard situation of a set of alternatives a ∈ A evaluatedon n points of view by a vector g(a) = (g1(a), g2(a), . . . , gn(a)), we consider the


value f(a) obtained by linearly combining the components of g , i.e.

f(a) = k1g1(a) + k2g2(a) + . . .+ kngn(a)(6.1)

Suppose, without loss of generality, that all criteria are to be maximised, i.e. thelarger the value gi(a), the better the alternative a on criterion i (if, on the contrary,gi were to be minimised, substitute gi by −gi or use a negative weight ki). Once theweights ki have been determined, choosing an alternative becomes straightforward:the best alternative is the one associated with the largest values of f . Similarly,a ranking of the alternatives is obtained by ordering them in decreasing order ofthe value of f .

This simple and most commonly used procedure relies however on very stronghypotheses that can seldom be considered plausibly satisfied. These problemsappear very clearly when trying to use the weighted sum approach on the carexample.

6.2.1 Transforming the evaluations

A look at the evaluations of the cars (see Table 6.2) prompts a remark that wasalready made when we considered representing the “data” graphically. The rangesof variation on the scales are very heterogeneous: from 13841 to 21334 on the costcriterion; from 1.33 to 2.66 on criterion 4. Clearly, asking for values of the weightski in terms of the relative importance of the criteria without referring to thescales would yield absurd results. The usual way out consists in normalising thevalues on the scales but there are several manners of doing this. One consists individing gi by the largest value on the ith scale, gi,max ; alternatively one mightsubtract the minimal value gi,min and divide by the range gi,max − gi,min. Thesenormalisations of the original gi functions are respectively denoted g′i and g′′i inthe following formulae

g′i(a) =gi(a)gi,max

(6.2)

g′′i (a) =gi(a)− gi,min

gi,max − gi,min(6.3)

For simplicity, we suppose here that gi are positive. In the former case the maximalvalue of g′i will be 1 while value 0 is kept fixed which means that the ratio of theevaluations of any pair a, b of alternatives remains unaltered:

g′i(a)g′i(b)

=gi(a)gi(b)

(6.4)

This transformation can be advanced when using ratio scales, in which the value0 plays a special role. Statements such as “alternative a is twice as good as b oncriterion i” remain valid after transformation.

In the case of gi, the top evaluation will be mapped onto 1 while the bottomone goes onto 0; ratios are not preserved but ratios in differences of evaluations


do: for all alternatives a, b, c, d,

g′′i (a)− g′′i (b)g′′i (c)− g′′i (d)

=gi(a)− gi(b)gi(c)− gi(d)

(6.5)

Such a transformation is appropriate for interval scales; it does not alter thevalidity of statements like “the difference between a and b on criterion i is twicethe difference between c and d”.

Note that the above are not the only possible options for transforming the data;note also that these transformations depend on the set of alternatives: consideringthe 14 cars of the initial sample or the 4 cars retained after the first eliminationwould yield substantially different results since the values gi,min and gi,max dependon the set of alternatives.

6.2.2 Using the weighted sum on the case

Suppose we consider that 0 plays a special role in all scales and we choose the firsttransformation option. The values of the gi’s that are obtained are shown in Table6.4. A set of weights has been chosen which is, to some extent, arbitrary but seemscompatible with what is known about Thierry’s preferences and priorities. Thefirst three criteria receive negative weights namely and respectively −1, −2, −1(since they have to be minimised), while the last two are given the weight .5. Thealternatives are listed in Table 6.4 in decreasing order of the values of f . As canbe seen in the last column of Table 6.4, this rough assignment of weights yieldscar number 3 as first choice followed immediately by car number 11 which wasactually Thierry’s choice. Moreover, the difference in the values of f for those twocars is tiny (less than .01) but we have no idea as to whether such a difference ismeaningful; all we can do is being very prudent in using such a ranking since theweights were chosen in a rather arbitrary manner. It is likely that by varying theweights slightly from their present value, one would readily get rank reversals i.e.permutations of alternatives in the order of preference; in other words, the rankingis not very stable. Varying the values that are considered imprecisely determinedis what is called sensitivity analysis; it helps to detect what the stable conclusionsin the output of a model are; this is certainly a crucial activity in a decision aidingprocess.

6.2.3 Is the resulting ranking reliable?

Weights depend on scaling

To illustrate the lack of stability of the ranking obtained, let us consider Table6.5 where the set of alternatives is reduced to the 4 cars remaining after theelimination procedure; the re-scaling of the criteria yields values of gi that arenot the same as in Table 6.4 since gi,max depends on the set of alternatives. Thisperturbation, without any change in the values of the weights, is sufficient to causea rank reversal between the leading two alternatives. Of course, one could preventsuch a drawback, by using a normalising constant that would not depend on the


Weights ki Value−1 −2 −1 0.5 0.5 f

Nr Name of cars Cost Accel Pick Brak Road3 Nissan Sunny 0.80 0.94 0.84 1.00 0.77 -2.63

11 Peugeot 16V 0.82 0.92 0.84 0.88 0.85 -2.6412 Peugeot 0.75 0.96 0.85 0.88 0.85 -2.6610 Renault 19 0.80 0.97 0.91 0.88 1.00 -2.717 Honda Civic 0.89 0.91 0.86 0.88 0.62 -2.821 Fiat Tipo 0.86 1.00 0.89 0.88 0.92 -2.855 Mitsu Colt 0.71 0.96 0.86 0.62 0.54 -2.912 Alfa 33 0.72 0.98 1.00 0.75 0.77 -2.928 Opel Astra 0.86 0.94 0.85 0.62 0.62 -2.966 Toyota 0.65 1.00 0.88 0.50 0.62 -2.974 Mazda 323 0.72 0.99 0.86 0.62 0.46 -3.029 Ford Escort 0.93 0.95 0.83 0.75 0.54 -3.03

14 Renault 21 1.00 0.94 0.88 0.75 0.69 -3.0413 Mitsu Galant 0.81 0.98 0.89 0.62 0.38 -3.15

Table 6.4: Normalising then ranking through a weighted sum

Weights ki Value-1 -2 -1 0.5 0.5 f

Nr Name of car Cost Accel Pick Brak Road11 Peugeot 16V 0.92 0.96 0.98 0.88 1.00 -2.8763 Nissan Sunny 0.89 0.98 0.98 1.00 0.91 -2.890

12 Peugeot 0.84 1.00 0.99 0.88 1.00 -2.8967 Honda Civic 1.00 0.95 1.00 0.88 0.73 -3.090

Table 6.5: Normalising then ranking a reduced set of alternatives


set of alternatives, for instance the worst acceptable value (minimal requirementfor a performance to be maximised; maximal level of a variable to be minimised,a cost, for instance) on each criterion; with such an option, the source of the lackof stability would be the imprecision in the determination of the worst acceptablevalue. Notice that the above problem has already been discussed in Chapter 4,Section 4.1.1.

Conventional codings

Another comment concerns the figures used for evaluating the performances of thecars on criteria 4 and 5. Recall that those were obtained by averaging equallyspaced numerical codings of an ordinal scale of evaluation. The obtained figurespresumably convey a less quantitative and more conventional meaning than forinstance acceleration performances measured in seconds in standardisable (if notstandardised) trials. These figures however are treated in the weighted sum justlike the “more quantitative” ones associated with the first three criteria. In par-ticular, other codings of the ordinal scale might have been envisaged, for instancecodings with unequal intervals separating the levels on the ordinal scale. Some ofthese codings could obviously have changed the ranking.

6.2.4 The difficulties of a proper usage of the weighted sum

The meaning of the weights

What is the exact significance of the weights in the weighted sum model? Theweights have a very precise and quantitative meaning; they are trade-offs: tocompensate for a disadvantage of ki units for criterion j, you need an advantageof kj units for criterion i. An important consequence is that the weights dependon the determination of the unit on each scale. In a weighted sum model thatwould directly use the evaluations of the alternatives given in Table 6.2, it is clearthat the weight of criterion 2 (acceleration time) has to be multiplied by 60 iftimes are expressed in minutes instead of seconds. This was implicitly a reasonfor normalising the evaluations as was done through formulae 6.2 and 6.3. Aftertransformation, both g′i and g′′i are independent of the choice of a unit; yet they arenot identical and, in a consistent model, their weights should be different. Indeed,we have

g′′i (a) =gi,max

gi,max − gi,min× g′i(a) + λi = κi × g′i(a) + λi(6.6)

where λi is a constant. Additive constants do not matter since they do not alterthe rating. So, unless gi,min = 0, g′′i is essentially related to g′i by a multiplicativefactor κi 6= 1; in order to model the same preferences through a weighted sum ofthe g′′i and a weighted sum of the g′i , the weight k′′i of g′′i should be obtained bydividing the weight k′i by κi. Obviously, the weights have to be assessed in relationto a particular determination of the evaluations on each scale and eliciting themin practice is a complex task. In any case, they certainly cannot be evaluated in a


meaningful manner through naive questions about the relative importance of thecriteria; reference to the underlying scale is essential.

Up to this point we have considered the influence on the weights of multiplyingthe evaluations by a positive constant. Note that translating the origin of a scalehas no influence on the ranking of the alternatives provided by the weighted sumsince it results in adding a (positive or negative) constant to f , the same for allalternatives. There is still a very important observation that has to be made:all scales used in the model are implicitly considered linear in the sense thatequal differences in values on a criterion result in equal differences in the overallevaluation function f and this does not depend on the position of the intervalof values corresponding to that difference on the scale. For instance in the carexample, car number 12 is finally eliminated because it accelerates too slowly. Thedifference between car 12 and car 3 with respect to acceleration is 0.6 between 29seconds and 29.6 seconds. Does Thierry perceive this difference as almost equallyimportant as a difference of 0.7 between cars 11 and 3, the latter difference beingpositioned between 28.3 seconds and 29 seconds on the acceleration scale? It seemsrather clear from Thierry’s motivations, that coming close to a performance of 28seconds is what matters to him while cars above 29 seconds are unworthy. Thismeans that the gain for passing from 29.6 seconds to 29 seconds has definitely lessvalue than a gain of similar amplitude, say from 29 to 28.3 seconds. As will beconfirmed in the sequel (see Section 6.3 below), it is very unlikely that Thierry’spreferences are correctly modelled by a linear function of the current scales ofperformance.

Independence or interaction

The next issue is more subtle. Evaluations of the alternatives for the various pointsof view taken into consideration by the decision-maker often show correlations; thisis because the attributes that are used to reflect these viewpoints are often linkedby logical or factual interdependencies. For instance, indicators of cost, comfortand equipment, which may be used as attributes for assessing the alternatives forthose viewpoints, are likely to be positively correlated. This does not mean thatthe corresponding points of view are redundant and that one should eliminatesome of them. One is perfectly entitled to work with attributes that are (evenstrongly) correlated. That is the first point.

A second point is about independence. In order to use a weighted sum, theviewpoints should be independent, but not in the statistical sense implying thatthe evaluations of the alternatives should be uncorrelated! They should be in-dependent with respect to preferences. In other words, if two alternatives thatshare the same profile on a subset of criteria compare in a certain way in terms ofoverall preferences, their relative position should not be altered when the profilethey share on a subset of criteria is substituted by any other common profile. Onthe contrary, a famous example of dependence in the sense of preferences in agastronomic context is the following: the preference for white wine or red wineusually depends on whether you are eating fish or meat. There are relatively sim-ple tests for independence in the sense of preferences, which consist in asking the


decision-maker about his preferences on pairs of alternatives that share the sameprofile for a subset of attributes; varying the common profile should not reversethe preferences when the points of view are independent. Independence is a nec-essary condition for the representation of preferences by a weighted sum; it is nota sufficient one of course.

There is a different concept that has been recently implemented for modellingpreferences. It is the concept of interacting criteria that was already discussedin example 2 of Chapter 3. Suppose that in the process of modelling the prefer-ences of the decision-maker, he declares that the influence of positively correlatedaspects should be dimmed and that conjoint good performances for negativelycorrelated aspects should be emphasised. In our case for instance, criteria 2 and3, respectively acceleration and suppleness, may be thought of as being positivelycorrelated. It may then prove impossible to model some preferences by means of aweighted sum of the evaluations such as those in Table 6.2 (and even of transfor-mations thereof such as obtained through formulae like 6.3). This does not meanthat no additive model would be suitable and it does not imply that the prefer-ences are not independent (in the above-defined sense). In the next section weshall study an additive model, more general than the weighted average, in whichthe evaluations gi may be “re-coded” through using “value functions” ui. Withappropriate choices of u2 and u3 it may be possible to take the decision-maker’spreferences about positively and negatively correlated aspects into account, pro-vided they satisfy the independence property. If no re-coding is allowed (like inthe assessment of students, see Chapter 3) there is a non-additive variant of theweighted average that could help modelling interactions among the criteria; insuch a model the weight of a coalition of criteria may be larger or smaller thanthe sum of the weights of its components (see Grabisch (1996), for more detail onnon-additive averages).

Arbitrariness, imprecision and uncertainty

In the above discussion as well as in the presentation of our example we haveemphasised the many sources of uncertainty (lack of knowledge) and of imprecisionthat bear on the figures used as input in the weighted sum. Let us summarise someof them:

1. Uncertainty in the evaluation of the cost: the buying price as well as thelife-length of a second hand car are not known. This uncertainty can beconsidered of stochastic nature; statistical data could help to master—tosome extent—such a source of uncertainty; in practice, it will generally bevery difficult to get sufficient relevant and reliable statistical information infor this kind of problems.

2. Imprecision in the measurement of some quantities: for instance, how preciseis the measurement of the acceleration? Such an imprecision can be reducedby making the conditions of the measurement as standard as possible andcan then be estimated on the basis of the precision of the measurementapparatus.


3. Arbitrary coding of non-quantitative data: re-coding of ordinal scales ofappreciation of braking and road-holding behaviour. Any re-coding thatrespects the order of the categories would in principle be acceptable. Tomaster such an imprecision one could try to build quantitative indicatorsfor the criteria or try to get additional information on the comparison be-tween differences of levels on the ordinal scale: for instance, is the differencebetween “below average” and “average” larger than the difference between“above average” and “exceptional”?

4. Imprecision in the determination of the trade-offs (weights ki); the ratios ofweights kj/ki must be elicited as conversion rates: a unit for criterion j isworth kj/ki units for criterion i; of course, the scales must first be re-coded inorder that one unit difference on a criterion has the same “value” everywhereon the scale (linearisation); these operations are far from obvious and as aconsequence, the imprecision of the linearisation process combines with theinaccuracy in the determination of weights.

Making a decision

All these sources of imprecision have an effect on the precision of the determinationof the value of f that is almost impossible to quantify; contrary to what can (often)be done in physics, there is generally little information on the size of the impre-cisions; quite often, there is not even probabilistic information on the accuracyof the evaluations. As a consequence, the apparently straightforward decision—choosing the alternative with the highest value of f or ranking the alternatives indecreasing order of the values of f—might be unconsidered as illustrated above.The usual way out is extensive sensitivity analysis, which could be described aspart of the validation of the model. This part of the job is seldom carried outwith the required exhaustivity because it is a delicate task at least in two respects.On the one hand there are many possible strategies for varying the values of theimprecisely determined parameters; usually parameters are varied one at a timewhich is not sufficient but is possibly tractable; the range in which the parametersmust be varied is not even clear as suggested above. On the other hand, oncethe sensitivity analysis has been performed, one is likely to be faced with severalalmost equally valuable alternatives; in the car problem for instance, the simpleremarks made above strongly suggest that it will be very difficult to discriminatebetween cars 3 and 11.

In view of the previous discussion, there are two main approaches to solve thedifficulties raised by the weighted sum:

1. Either one tries to prepare the inputs of the model (linearised evaluations andtrade-offs) as carefully as possible, paying permanent attention to reducingimprecision and finishing with extensive sensitivity analysis;

2. Or one takes imprecision into account from the start, by avoiding to exploitprecise values when knowing that they are not reliable but rather workingwith classes of values and ordered categories. Note that imprecision may well


lie in the link between evaluations and preferences rather than in the eval-uations themselves; detailed preferential information, even extracted fromperfectly precise evaluations, may prove rather difficult to elicit.

The former option will lead us to the construction of multi-attribute value orutility functions, while the latter leads to the outranking approach. These twoapproaches will be developed in the sequel. There is however a whole family ofmethods that we shall not consider here, the so-called interactive methods (Steuer(1986), Vincke (1992), Teghem (1996)). These implement various strategies forexploring the efficient boundary, i.e. the set of non-dominated solutions; the ex-ploration jumps from one solution to another; it is guided by the decision-makerwho is asked to tell, for instance, which characteristics of the current solution hewould like to see improved. Such methods are mainly designed for dealing withinfinite and even continuous sets of alternatives; moreover, they do not lead to anexplicit model of the decision-maker’s preferences. On the contrary, we have set-tled on problems with a (small) finite number of alternatives and we concentrateon obtaining explicit representations of the decision-maker’s preferences.

6.2.5 Conclusion

The weighted sum is useful for obtaining a quick and rough draft of an overallevaluation of the alternatives. One should however keep in mind that there arerather restrictive assumptions underlying a proper use of the weighted sum. As aconclusion to this section we summarise these conditions.

1. Cardinal character of the evaluations on all scales. The evaluationsof the alternatives for all criteria are numbers and these values are used assuch even if they result from the re-coding of ordinal data.

2. Linearity of each scale. Equal differences between values on scale i,whatever the location of the corresponding intervals on the scale (at thebottom, in the middle or at the top of the scale), produce the same effect onthe overall evaluation f : if alternatives a, b, c, d are such that gi(a)−gi(b) =gi(c)− gi(d) for all i, then f(a)− f(b) = f(c)− f(d).

3. The weights are trade-offs. Weights depend on the scaling of the cri-teria; transforming the (linearised) scales results in a related transformationof the weights. Weights tell how many units on the scale of criterion i areneeded to compensate one unit of criterion j.

4. Preference independence. Criteria do not interact. This property,called preference independence, can be formulated as follows. Consider twoalternatives that share the same evaluation on at least one criterion, saycriterion i. Varying the level of that common value on criterion i does notalter the way the two alternatives compare in the overall ranking.


6.3 The additive multi-attribute value model

Our analysis of the weighted sum brought us very close to the requirements foradditive multi-attribute value functions. The most common model in multiplecriteria decision analysis is a formalisation of the idea that the decision-maker,when making a decision, behaves as if he was trying to maximise a quantity calledutility or value (the term “utility” tends nowadays to be used preferably in thecontext of decision under risk, but we shall use it sometimes for “value”).This postulates that all alternatives may be evaluated on a single “super-scale”reflecting the value system of the decision-maker and his preferences. In otherwords, the alternatives can be “measured”, in terms of “worth” on a syntheticdimension of value or utility. Accordingly, if we denote by % the overall preferencerelation of the decision-maker on the set of alternatives, this relation relates to thevalues u(a), u(b) of the alternatives in the following way:

a % b iff u(a) ≥ u(b)(6.7)

As a consequence, the preference relation % on the set of alternatives is a completepreorder, i.e. a complete ranking possibly with ties. Of course, the value u(a)usually is a function of the evaluations {gi(a), i = 1, . . . , n}. If this function isa linear combination of gi(a), i = 1, . . . , n, we get back to the weighted sum. Aslightly more general case is the following additive model:

u(a) =n∑

i=1

ui(gi(a))(6.8)

where the function ui (single-attribute value function) is used to re-code theoriginal evaluation gi in order to linearise it in the sense described in the previoussection; the weights ki are incorporated in the ui functions. The additive valuefunction model can thus be viewed as a clever version of the weighted sum since itallows us to take some of the objections—mainly the second hypothesis in Section6.2.5—against a naive use of it into account. Note however that the imprecisionissue is not dealt with inside the model (sensitivity analysis has to be performedin the validation phase, but is neither part of the model nor straightforward inpractice); the elicitation of the partial value functions ui may also be a difficulttask.

Much effort has been devoted to characterising various systems of conditionsunder which the preferences of a decision-maker can be described by means ofan additive value function model. Depending on the context, some systems ofconditions may be interpretable and tested, at least partially, i.e. it may be possibleto ask the decision-maker questions that will determine whether an additive valuemodel is compatible with what can be perceived of his system of preferences. If thepreferences of the decision-maker are compatible with an additive value model, amethod of elicitation of the ui’s may then be used; if not, another model should belooked for: a multiplicative model or, more generally, a non-additive one, a non-independent one, a model that takes imprecision more intrinsically into account,etc. (see Krantz, Luce, Suppes and Tversky (1971), Chapter 7, Luce, Krantz,Suppes and Tversky (1990), Vol. 3, Chapter 19).

6.3. THE ADDITIVE VALUE MODEL 107

6.3.1 Direct methods for determining single-attributevalue functions

A large number of methods have been proposed to determine the u′is in an additivevalue function model. For an accessible account of such methods, the reader isreferred to von Winterfeldt and Edwards (1986), Chapter 8.

There are essentially two families of methods, one based on direct numericalestimations and the other on indifference judgements. We briefly describe theapplication of a technique of the latter category relying on what is called dualstandard sequences, (Krantz et al. (1971), von Winterfeldt and Edwards (1986),Wakker (1989)) that builds a series of equally spaced intervals on the scale ofvalues.

An assessment method based on indifference judgments

Suppose we want to assess the u′is in an additive model for the Cars case. It isassumed that the suitability of such a model for representing the decision-maker’spreferences has been established. Consider a pair of criteria, say Cost and Ac-celeration. We are going to outline a simulated dialog between an analyst anda decision-maker that could yield an assessment of u1 and u2, the correspondingsingle-attribute value functions, for ranges of evaluations corresponding to accept-able cars. Note that we start the construction of the sequence from a “centralpoint” instead of taking a “worst point” (see for instance von Winterfeldt andEdwards (1986), pp. 267 sq for an example starting from a worst point)

The range for the cost will be the interval between 21 500e to 13 500e andfrom 28 to 31 seconds for acceleration. First ask the decision-maker to select a“central point” corresponding to medium range evaluations on both criteria. Inview of the set of alternatives selected by Thierry, let us start with (17 500, 29.5) as“average” values for cost and acceleration. Also ask the decision-maker to define aunit step on the cost criterion; this step will consist, say, of passing from a cost of17 500e to 16 500e. Then the standard sequence is constructed by asking whichvalue x1 for the acceleration would make a car costing 16 500e and acceleratingin 29.5 seconds indifferent to a car costing 17 500e and accelerating in x1 seconds.Suppose the answer is 29.2 meaning that from the chosen starting point, a gainof 0.3 second on the acceleration time is worth an increase of 1 000e in cost. Theanswer could be explained by the fact that at the starting level of performancefor the acceleration criterion, the decision-maker is quite interested by a gain inacceleration time. Relativising the gains as percentages of the half range fromthe central to the best values on each scale, this means that the decision-makeris ready to lose 1000

4000=25% of the potential reduction in cost for gaining .31.5=20%

of acceleration time. We will say in the sequel that the parity is equal when thedecision-maker agrees to exchange a percentage of the half range on a criterionagainst an equal percentage on another criterion.

The second step in the construction of the standard sequence is asking thedecision-maker which value to assign to x2 to have (16 500, 29.2) ∼ (17 500, x2),where ∼ denotes “indifferent to”. The answer might be, for instance, 28.9. Con-tinuing along the same line would for instance yield the following sequence of


28 28.5 29 29.50

0.5

1

1.5

2

2.5

3

3.5

acceleration (sec)

valu

e

Figure 6.5: Single-attribute value function for acceleration criterion (half range)

indifferences:(16 500, 29.5) ∼ (17 500, 29.2)(16 500, 29.2) ∼ (17 500, 28.9)(16 500, 28.9) ∼ (17 500, 28.7)(16 500, 28.7) ∼ (17 500, 28.5)(16 500, 28.5) ∼ (17 500, 28.3)(16 500, 28.3) ∼ (17 500, 28.1)

Such a sequence gives the analyst an approximation of the single-attributevalue function u2, on the half range from 28 to 29.5 seconds but it is easy to devisea similar procedure for the other half range, from 29.5 to 31. Figure 6.5 shows there-coding u2 of the evaluations g2 on the interval [28, 29.5]; there are two linearparts in the graph: one ranging from 28 to 28.9 where the slope is proportional to1.2 and the other valid between 28.9 and 29.5 with a slope proportional to 1

.3 .From there, using the same idea, one is able to re-code the scale of the cost cri-

terion into the single-attribute value function u1. Then, considering (for instance)the cost criterion with criteria 3, 4 and 5 in turn, one obtains a re-coding of eachgi into a single-attribute value function ui.

The trade-off between u1 and u2 is easily determined through solving the fol-lowing equation that just expresses the initial indifference in the standard sequence(16 500, 29.5) ∼ (17 500, 29.2)

k1u1(16 500) + k2u2(29.5) = k1u1(17 500) + k2u2(29.2)

from which we getk2

k1=u1(16 500)− u1(17 500)u2(29.2)− u2(29.5)

.


If we set k1 to 1, this formula yields k2 and the trade-offs k3, k4 and k5 are obtainedsimilarly. Notice that the re-coding process of the original evaluations into valuefunctions results in a formulation in which all criteria have to be maximised (invalue).

The above procedure, although rather intuitive and systematic is also quitecomplex; the questions are far from easy to answer; starting from one referencepoint or another (worst point instead of central point) may result in variations inthe assessments. There are however many possibilities for checking for inconsisten-cies. Assume for instance that a single-attribute value function has been assessedby means of a standard sequence that links its scale to the cost criterion; onemay validate this assessment by building a standard sequence that links its scaleto another criterion and compare the two assessments of the same value functionobtained in this way; hopefully they will be consistent; otherwise some sort ofretroaction is required.

Note finally that such methods may not be used when the scale on which theassessments are made only has a finite number of degrees instead of being the setof real numbers; at least numerous and densely spaced degrees are needed.

Methods relying on numerical judgements

In another line of methods, simplicity and direct intuition are more praised thanscrupulous satisfaction of theoretical requirements, although the theory is notignored. An example is SMART (“Simple Multi-Attribute Rating Technique”),developed by W. Edwards, which is more a collection of methods than a singleone. We just outline here a variant referring to von Winterfeldt and Edwards(1986), pp. 278 sq., for more details. In order to re-code, say, the evaluations forthe acceleration criterion, one initially fixes two “anchor” points that may be theextreme values of the evaluations on the set of acceptable cars, here 28 and 31seconds. On the value scale, the anchor points are associated to the endpoints of aconventional interval of values, for instance 31 to 0 and 28 to 100. Since 29 secondsseems to be the value under which Thierry considers that a car becomes definitelyattractive from the acceleration viewpoint, it should be assigned to the interval[28, 29] a range of values larger than 1

3 , its size (in relative terms) in the originalscale. Thierry could for instance assign 29 seconds to 50 on the value scale. Then28.5 and 30 could be located respectively in 70 and 10, yielding the initial sketchof a value function shown on Figure 6.6(a), (with linear interpolation between thespecified values. This picture can be further improved by asking Thierry to seewhether the relative spacings of the locations correctly reflect the strength of hispreferences. Thierry might say that almost the same gain in value (40) from 30seconds to 29 as from 29 to 28 (gain of 50) is unfair and he could consequentlypropose to lower to 40 the value associated with 29 seconds; he also lowers to 65the value of 28.5 seconds. Suppose he is then satisfied with all other differencesof values; the final version is drawn in Figure 6.6(b). A similar work has to becarried over for all criteria and the weights must be assessed.

The weights are usually derived through direct numerical judgements of relativeattribute importance. Thierry would be asked to rank-order the attributes; an


28 29 30 310

10

20

30

40

50

60

70

80

90

100(a)

acceleration (sec)

valu

e

28 29 30 310

10

20

30

40

50

60

70

80

90

100(b)

valu

e

acceleration (sec)

Figure 6.6: Value function for acceleration criterion: (a) initial sketch; (b) final,with initial sketch in dotted line

“importance” of 10 could be arbitrarily assigned to the least important criterionand the importance of each other criterion should be assessed in relation to the leastimportant one, directly as an estimation of the ratio of weights. This approachin terms of “importance” can be and has been criticised. In assessing the relativeweights no reference is made to the underlying scales. This is not appropriatesince weights are trade-offs between units on the various value scales and mustvary with the scaling.

For instance, on the acceleration value scale that is normalised in the 0-100range, the meaning of one unit varies depending on the range of original evalua-tions (acceleration measured in seconds) that are represented between value 0 andvalue 100 of the value scale. If we had considered that the acceleration evaluationsof admissible cars range from 27 to 32 seconds, instead of from 28 to 31, we wouldhave constructed a value function u′2 with u′2(32) = 0 and u′2(27) = 100; a differ-ence of one unit of value on the scale u2 illustrated in Figure 6.6 corresponds to a(less-than-unit) difference of u′2(28)−u′2(31)

100 on the scale u′2. The weight attached tothat criterion must vary in inverse proportion to the previous factor when passingfrom u2 to u′2. It is unlikely that a decision-maker would take the range of evalua-tions into account when asked to assess weights in terms of relative “importance”of criteria, a formulation that seems independent of the scalings of the criteria.A way of avoiding these difficulties is to give up the notion of importance thatseems misleading in this context and to use a technique called swing-weighting ;the decision-maker is asked to compare alternatives that “swing” between theworst and the best level for each attribute in terms of their contribution to theoverall value. The argument of simplicity in favour of SMART is then lost sincethe questions to be answered are similar, both in difficulty and in spirit, to thoseraised in the approach based on indifference judgements.


6.3.2 AHP and Saaty’s eigenvalue method

The eigenvalue method for assessing attribute weights and single-attribute valuefunctions is part of a general methodology called “Analytic Hierarchy Process”; itconsists in structuring the decision problem in a hierarchical manner (as it is alsoadvocated for building value functions, for instance in Keeney and Raiffa (1976)),constructing numerical evaluations associated with all levels of the hierarchy andaggregating them in a specific fashion, formally a weighted sum of single-attributevalue functions (see Saaty (1980), Harker and Vargas (1987)).

In our case, the top level of the hierarchy is Thierry’s goal of finding the bestcar according to his particular views. The second level consists in the 5 criteriainto which his global goal can be decomposed. The last level can be described asthe list of potential cars. Thus the hierarchical tree is composed of 1 first levelnode, 5 second level nodes and 5 times 14 third level nodes also called leaves.What we have to determine is the “strength” or priority of each element of a levelin relation to their importance for an element in the next level.

The assessment of the nodes may start (as is usually done) from the bottomnodes; all nodes linked to the same parent node are compared pairwise; in ourcase this amounts to comparing all cars from the point of view of a criterion andrepeating this for all criteria. The same is then done for all criteria in relation to thetop node; the influence of all criteria on the global goal are also compared pairwise.At each level, the pairwise comparison of the nodes in relation to the parent node isdone by means of a particular method that allows, to some extent, to detect andcorrect inconsistencies. For each pair of nodes a, b, the decision-maker is askedto assess the “priority” of a as compared to the “priority” of b. The questionsare expressed in terms of “importance” or “preference” or “likelihood” accordingto the context. It is asked for instance how much alternative a is preferred toalternative b from a certain point of view. The answers may be formulated eitheron a verbal or a numerical scale. The levels of the verbal scale correspond tonumbers and are dealt with as such in the computations. The conversion of verballevels into numerical levels is described in Table 6.6. There are five main levels onthe verbal scale, but 4 intermediary levels that correspond to numerical codings2, 4, 6, 8 can also be used. For instance, the level “Moderate” corresponds to analternative that is preferred 3 times more than another or a criterion that is 3times more important than another. Such an interpretation of the verbal levelshas very strong implications; it means that preference, importance and likelihoodare considered as perceived on a ratio scale (much like sound intensity). This isindeed Saaty’s basic assumption; what the decision-maker expresses as a level onthe scale is postulated to be the ratio of values associated to the alternatives orthe criteria. In other words, a number f(a) is assumed to be attached to all a;when comparing a to b, the decision-maker is assumed to give an approximationof the ratio f(a)

f(b) . Since verbal levels are automatically translated into numbers inSaaty’s method, we shall concentrate on assessing directly on the numerical scale.

Let α(a, b) denote the level of preference (or of relative importance) of a over bexpressed by the decision-maker; the results of the pairwise comparisons may thusbe encoded in a square matrix α. If Saaty’s hypotheses are correct, there should


Verbal Equal Moderate Strong Very strong ExtremeNumeric 1 3 5 7 9

Table 6.6: Conversion of verbal levels into numbers in Saaty’s pairwise comparisonmethod; e.g. “Moderate” means “3 times more preferred”

be some sort of consistency between elements of α, namely, for all a, b, c,

α(a, c) ≈ α(a, b)× α(b, c)(6.9)

and in particular,

α(a, b) ≈ 1α(b, a)

(6.10)

In view of the latter relation, only one half (roughly) of the matrix has to beelicited, which amounts to answering n(n−1)

2 questions.Relation (6.9) implies that all columns of matrix α should be approximately

proportional to f . The pairwise comparisons enable to

1. detect departure from the basic hypothesis in case the columns of α are toofar from proportional;

2. correct errors made in the estimation of the ratios; some sort of averaging ofthe columns is performed yielding an estimation of f .

A test based on statistical considerations allows the user to determine whether theassessments in the pairwise comparison matrix show sufficient agreement with thehypothesis that they are approximations of f(a)

f(b) , for an unknown f . If the testconclusion is negative, it is recommended either to revise the assessments or tochoose another approach more suitable for the type of data.

If one wants to apply AHP in a multiple criteria decision problem, pairwisecomparisons of the alternatives must be performed for each criterion; criteria mustalso be compared in a pairwise manner to model their importance. This processresults in functions ui that evaluate the alternatives on each criterion i and incoefficients of importance ki. Each alternative a is then assigned an overall valuev(a) computed as

v(a) =n∑

i=1

kiui(a)(6.11)

and the alternatives can be ranked according to the values of v.

Applying AHP to the case

Since Thierry did not apply AHP to his analysis of the case, we have answeredthe questions on pairwise comparisons on the basis of the information contained inhis report. For instance, when comparing cars on the cost criterion, more weightwill be put on a particular cost difference, say 1 000e, when located in the range


from 17 500e to 21 500e than when lying between 13 500e and 17 500e. Thiscorresponds to the fact that Thierry said he is rather insensitive to cost differencesup to about 17 500e, which is the amount of money he had budgeted for his car.For the sake of concision, we have restricted our comparisons to a subset of cars,namely the top four cars plus the Renault 19, Mazda 323 and Toyota Corolla.

A major issue in the assessment of pairwise comparisons, for instance of alter-natives in relation to a criterion, is to determine how many times a is preferred to bon criterion i from looking at the evaluations gi(a) and gi(b). Of course the (ratio)scale of preference on i is not in general the scale of the evaluations gi. For ex-ample, Car 11 costs approximately 17 500e and Car 12 costs about 16 000e. Theratio of these costs, 17 500

16 000 , is equal to 1.09375 but this does not necessarily meanthat Car 12 is preferred 1.09375 times more than Car 11 on the cost criterion; thisis because the cost evaluation does not measure the preferences directly. Indeed, atransformation (re-scaling) is usually needed to go from evaluations to preferences;for the cost, according to Thierry himself, the transformation is not linear sinceequal ratios corresponding to costs located either below or above 17 500e do notcorrespond to equal ratios of preference. But even in linear parts, the questionis not easily answered. A decision-maker might very well say that Car 12 is 1.5times more preferred than Car 11 for the cost criterion; or he could say 2 times or4 times. All depends on what the decision-maker would consider as the minimumpossible cost; for instance (supposing that the transformation of cost into prefer-ence is linear), if Car 12 is declared to be 1.5 times more preferred to Car 11, thezero of the cost scale x would be such that

17 500− x

16 000− x= 1.5 ,

i.e. x = 14 500e. The problem is even more crucial for transforming scales suchas those on which braking or road-holding are evaluated. For instance, how manytimes is Car 3 preferred to Car 10 with respect to the braking criterion? In otherwords, how many times is 2.66 better than (preferred to) 2.33?

Similar questions arise for the comparison of importance of criteria. We discussthe determination of the “weights” ki of the criteria in formula 6.11 below. Forcomputing those weights, the relative importance of each criterion with respectto all others must be assessed. Our assessments are shown in Table 6.7. Wemade them directly in numerical terms taking into account a set of weights thatThierry considered as reflecting his preferences; those weights have been obtainedusing the Prefcalc software and a method that is discussed in the next session.By default, the blanks on the diagonal should be interpreted as 1’s; the blanksbelow the diagonal are supposed to be 1 over the corresponding value above thediagonal, according to equation 6.10.

Once the matrix in Table 6.7 has been filled, several algorithms can be proposedto compute the “priority” of each criterion with respect to the goal symbolised bythe top node of the hierarchy (under the hypothesis that the elements of theassessment matrix are approximations of the ratios of those priorities). The mostfamous algorithm, which was initially proposed by Saaty, consists in computingthe eigenvector of the matrix corresponding to the largest eigenvalue (see Harker


Relative importance Cost Accel Pick-up Brakes Road-hCost 1.5 2 3 3Acceleration 1.5 2 2Pick-up 1.5 1.5Brakes 1Road-holding

Table 6.7: Assessment of the comparison of importance for all pairs of criteria.For instance, the number 2 at the intersection of 1st row and 3rd column meansthat “Cost” is considered twice as important as “Pick-up’

and Vargas (1987), for an interpretation of the “eigenvector method” as a wayof “averaging ratios along paths”). Since eigenvectors are determined up to amultiplicative factor, the vector of priorities is the normalised eigenvector whosecomponents sum up to unity; the special structure of the matrix (reciprocal matrix)guarantees that all priorities will be positive. Alternative methods for correctinginconsistencies have been elaborated; most of them are based on some sort ofa least squares criterion or on computing averages (see e.g. Barzilai, Cook andGolany (1987) who argue in favour of a geometric mean). Applying the eigenvectormethod to the matrix in Table 6.7, one obtains the following values that reflectthe importance of the criteria:

(.352, .241, .172, .117, .117)

.Note that only the lowest degrees of the 1 to 9 scale have been used in Table 6.7.

This means that the weights are not perceived as very contrasted; in order to getthe sort of gradation of the weights as above (the ratio of the highest to the lowestvalue is about 3), some comparisons have been assessed by non-integer degrees,which normally are not available on the verbal counterpart of the 1 to 9 scaledescribed in Table 6.6. When the assessments are made through this verbal scale,approximations should be made, for instance by saying that cost and accelerationare equally important and substituting 1.5 by 1. Note that the labelling of thedegrees on the verbal scale may be misleading; one would quite naturally qualifythe degree to which “Cost” is more important than “Acceleration” as “Moderate”until it is fully realised that “Moderate” means “three times as important”; usingthe intermediary level between “Equal” and “Moderate” would still mean “twiceas important”.

It should be emphasised that the “eigenvalue method” is not linear. Whatwould have changed if we had scaled the importance differently, for instance as-sessing the comparisons of importance by degrees twice as large as those in Table6.7 (except for 1’s that remain constant)? Would the coefficients of importancehave been twice as large? Not at all! The resulting weights would have been muchmore contrasted, namely:

(.489, .254, .137, .060, .060) .


Name of car Nr 7 11 3 12 10 4 6Honda Civic 7 1.0 1.0 2.0 4.0 4.0 5.0 5.0Peugeot 309/16V 11 1.0 1.0 2.0 3.0 4.0 4.0 4.0Nissan Sunny 3 0.50 0.50 1.0 1.50 2.0 3.0 3.0Peugeot 309 12 0.25 0.33 0.67 1.0 1.0 2.0 2.0Renault 19 10 0.25 0.25 0.5 1.0 1.0 1.0 1.5Mazda 323 4 0.2 0.25 0.33 0.5 1.0 1.0 1.0Toyota Corolla 6 0.2 0.25 0.33 0.5 0.67 1.0 1.0

Table 6.8: Pairwise comparisons of preferences of 7 cars on the acceleration crite-rion

Using the latter set of weights instead of the former would substantially change thevalues attached to the alternatives through formula 6.11 and might even alter theirordering. So, contrary to the determination of the trade-offs in an additive valuemodel (which may be re-scaled through multiplying them by a positive number,without altering the way in which alternatives are ordered by the multi-attributevalue function), there is no degree of freedom in the assessment of the ratios inAHP; in other words, these assessments are made on an absolute scale.

As a further example, we now apply the method to determine the evaluationof the alternatives in terms of preference on the “Acceleration” criterion. Supposethe pairwise comparison matrix has been filled as shown in Table 6.8, in a waythat seems consistent with what we know of Thierry’s preferences. Applying theeigenvalue method yields the following “priorities” attached to each of the cars inrelation to acceleration:

(.2987, .2694, .1507, .0934, .0745, .0584, .0548).

A picture of the resulting re-scaling of that criterion is provided in Figure6.7; the solid line is a linear interpolation of the priorities in the eigenvector. Are-scaling of the same criterion had been obtained through the construction of astandard sequence (see Figure 6.5). Comparing these scales is not straightforward.Notice that the origin is arbitrary in the single-attribute value model; one may addany constant number to the values without changing the ranking of the alternatives(a term equal to the constant number times the trade-off associated to the attributewould just be added to the multi-attribute value function); since trade-offs dependon the scaling of their corresponding single-attribute value function, changing theunit on the vertical axis amounts to multiplying ui by a positive number; thecorresponding trade-off must then be divided by the same number. In the multi-attribute value model, the scaling of the single-attribute value function is relatedto the value of the trade-off; transformation of the former must be compensatedfor by transforming the latter. In AHP since the assessments of all nodes aremade independently, no transformation is allowed. In order to compare the twofigures, one may transform the value function of Figure 6.5 so it coincides withAHP priority on the extreme values of the acceleration half range, i.e. 28 and 29.5.Figure 6.7 shows the transformed single-attribute value function superimposed


28 28.5 29 29.5 30 30.5 310

0.05

0.1

0.15

0.2

0.25

0.3

acceleration (sec)

prio

ritie

s (s

olid

); v

alue

(do

tted)

Figure 6.7: Priorities relatively to acceleration as obtained through the eigenvectormethod are represented by the solid line; the linearly transformed single-attributevalues of Figure 6.5 are represented by the dotted line on the range from 28 to29.5 seconds

(dotted line) on the graph of the priorities. There seems to be a good fit of thetwo curves but this is only an example from which no general conclusion can bedrawn.

Comments on AHP

Although the models for describing the overall preferences of the decision-makerare identical in multi-attribute value theory and in AHP, this does not mean thatapplying the respective methodologies of these theories normally yields the sameoverall evaluation of the alternatives. There are striking differences between thetwo approaches from the methodological point of view. The ambition of AHP isto help construct evaluations of the alternatives for each viewpoint (in terms ofpreferences) and of the viewpoints with regard to the overall goal (in terms ofimportance); these evaluations are claimed to belong to a ratio scale, i.e. to bedetermined up to a positive multiplicative constant. Since the eigenvalue methodyields a particular determination of this constant and this determination is nottaken into account when assessing the relative importance of the various criteria,the evaluations in terms of preference must be considered as if they were made onan absolute scale, which has been repeatedly criticised in the literature (see forinstance Belton (1986) and Dyer (1990)). This weakness (that can also be blamedon direct rating techniques, as mentioned above) could be corrected by asking thedecision-maker about the relative importance of the viewpoints in terms of passingfrom the least preferred value to the most preferred value on criterion i compared


to a similar change on criterion j (Dyer 1990). Taking this suggestion into accountwould however go against one of the basic principles of Saaty’s methodology, i.e.the assumption that the assessments at all levels of the hierarchy can be madealong the same procedure and independently of the other levels. That is probablywhy the original method, although seriously attacked, has remained unchanged.

AHP has been criticised in the literature in several other respects. Besides thefact already mentioned that it may be difficult to reliably assess comparisons ofpreferences or of importance on the standard scale described in Table 6.6, thereis an issue about AHP that has been discussed quite a lot, namely the possibilityof rank reversal. Suppose alternative x is removed from the current set andnothing is changed to the pairwise assessments of the remaining alternatives; itmay happen that an alternative, say, a among the remaining ones could now beranked below an alternative b whilst it was ahead of b in the initial situation. Thisphenomenon was discussed in Belton and Gear (1983) and Dyer (1990) (see alsoHarker and Vargas (1987) for a defense of AHP).

6.3.3 An indirect method for assessing single-attribute valuefunctions and trade-offs

Various methods have been conceived in order to avoid direct elicitation of amulti-attribute value function. A class of such methods consists in postulatingan additive value model (as described in formulae 6.7 and 6.8) and inferring alltogether the shapes of all single-attribute value functions and the values of allthe trade-offs from declared global preferences on a subset of well-known alterna-tives. The idea is thus to infer a general preference model from partial holisticinformation about the decision-maker’s preferences.

Thierry used a method of disaggregation of preferences described in Jacquet-Lagreze and Siskos (1982); it is implemented in a software called Prefcalc, whichcomputes piece-wise linear single-attribute value functions and is based on linearprogramming (see also Jacquet-Lagreze (1990), Vincke (1992)). More precisely,the software helps to build a function

u(a) =n∑

i=1

ui(gi(a))

such that a % b ⇐⇒ u(a) ≥ u(b). Without loss of generality, the lowest (resp.highest) value of u is conventionally set to 0 (resp. 1); 0 (resp. 1) is the value of an(fictitious) alternative whose assessment on each criterion would be to the worst(resp. best) evaluation attained for the criterion on the current set of alternatives.This fictitious alternative is sometimes called the anti-ideal (resp. ideal) point.In our example, the “anti-ideal” car, costs 21 334e, needs 30.8 seconds to cover1 km starting from rest and 41.6 seconds, starting in fifth gear at 40km/h; itsperformance regarding brakes and road-holding are respectively 1.33 and 1.25.The “ideal car” on the opposite side of the range, costs 13 841e, needs 28 secondsto cover 1km starting from rest and 34.7 seconds, starting in fifth gear at 40km/h;its performance regarding brakes and road-holding are respectively 2.66 and 3.25.


13.84 21.3317.59

Cost .43

28 3029

Acc .23

34 4238

Pick .13

1.3 2.72.0

Brake .1

1.2 3.22.2

Road .1

Figure 6.8: Single-attribute value functions computed by means of Prefcalc in the“Choosing a car” problem; the value of the trade-off is written in the right uppercorner of each box

The shape of the single-attribute value function for the cost criterion for in-stance is modelled as follows. The user fixes the number of linear pieces; supposethat you decide to set it to 2 (which is a parsimonious option and the defaultvalue proposed in Prefcalc); the single-attribute value function of the cost couldfor instance be represented as in Figure 6.8. Note that the maximal value of theutility (reached for a cost of 13 841e) is scaled in such a way that it correspondsto the value of the trade-off associated with the cost criterion, i.e. .43 in the exam-ple shown in Figure 6.8. Note also that with two linear pieces, one for each halfof the cost range, the single-attribute value function is completely determined bytwo numbers, i.e. the utility value at mid-range and the maximal utility. Thosevalues, say u1,1, u1,2 are variables of the linear program that Prefcalc writes andsolves. The pieces of information on which the formulation of the linear programrelies are obtained from the user. The user is asked to select a few alternativesthat he is familiar with and feels able to rank-order according with his overallpreferences. The ordering of these alternatives, which include the fictitious idealand anti-ideal ones, induces the corresponding order on their overall value andhence, generates constraints of the linear program. Prefcalc then tries to findlevels ui,1, ui,2 for each criterion i, which will make the additive value functioncompatible with the declared information. If the program is not contradictory,i.e. if an additive value function (with 2-piece piece-wise linear single-attributevalue functions) proves compatible with the preferences, the system tries to find asolution among all feasible solutions, that maximises the discrimination between


the selected alternatives. If no feasible solution can be found, the system proposesto increase the number of variables of the model, for instance by using a highernumber of linear pieces in the description of the single-attribute value functions.

This method could be described as a learning process; the system fits theparameters of the model on the basis of partial information about the user’s pref-erences; the set of alternatives on which the user declares his global preferencesmay be viewed as a learning set. For more details on the method, the reader isreferred to Vincke (1992), Jacquet-Lagreze and Siskos (1982).

In his ex post study Thierry selects five cars, besides the ideal and anti-idealones and ranks them in the following order:

1. Peugeot 309 GTI 16 (Car 11)

2. Nissan Sunny (Car 3)

3. Mitsubishi Galant (Car 13)

4. Ford Escort (Car 9)

5. Renault 21 (Car 14)

This ranking is compatible with an additive value function. Such a compatiblevalue function is described in Figure 6.8.

Thierry examines this result and makes the following comments. He agreeswith many features of the fitted single-attribute value functions and in particularwith,

1. the lack of sensitivity in the price in the range from 13 841e to 17 576e (hewas a priori estimating his budget at about 17 500e);

2. the high importance (weight = .23) given to approaching 28 seconds on the“acceleration” criterion (above 29 seconds, the car is useless since a differenceof 1 second in acceleration results in the faster car being two car lengthsahead of the slower one at the end of the test; Thierry declares this criterionto be the second most important after cost (weight = .43);

3. the importance (weight = .13) of getting as close as possible to 34 seconds inthe acceleration test starting from 40 km/h (above 38 seconds he agrees thatthe car loses all attractiveness; the car is not only used in competition; itmust be pleasant in everyday use and hence, the third criterion has a certainimportance although it is of less importance than the second one);

4. the modelling of the road-holding criterion.

However, Thierry disagrees with the modelling of the braking criterion, whichhe considers equally important as road-holding. He believes that the relativeimportance of the fourth and fifth criteria should be revised. Thierry then looksat the ranking of the cars according to the computed value function. The rankingas well as the multi-attribute value assigned to each car are given in Table 6.9.


Rank Cars Value1 * Peugeot 309/16 (Car 11) 0.842 * Nissan Sunny (Car 3) 0.683 Renault 19 (Car 10) 0.664 Peugeot 309 (Car 12) 0.655 Honda Civic (Car 7) 0.616 Fiat Tipo (Car 1) 0.547 Opel Astra (Car 8) 0.548 Mitsubishi Colt (Car 5) 0.539 Mazda 323 (Car 4) 0.52

10 Toyota Corolla (Car 6) 0.5011 Alfa 33 (Car 2) 0.4912 * Mitsubishi Galant (Car 13) 0.4813 * Ford Escort (Car 9) 0.3214 * R 21 (Car 14) 0.16

Table 6.9: Ranking obtained using Prefcalc. The cars ranked by Thierry are thosemarked with a *

Thierry feels that Car 10 (Renault 19) is ranked too high while Car 7 (HondaCivic) should be in a better position.

In view of these observations, Thierry modifies the single-attribute value func-tions for criteria 4 and 5. For the braking criterion, the utility (0.01) associatedwith 2 remains unchanged while the utility of the level 2.7 is raised to 0.1 insteadof 0.01. The road-holding criterion is also modified; the value (0.2) associated withthe level 3.2 is lowered to 0.1 (see Figure 6.9). Note that Prefcalc normalises thevalue function in order that the ideal alternative is always assigned the value 1;of course due to the numbers display format with two decimal positions, the sumof the maximal values of the single-attribute value functions may be only approx-imately equal to 1. Running Prefcalc with the altered value functions returns theranking in table 6.10 and the revised multi-attribute value after each car name.

After he sees the modified ranking yielded by Prefcalc, Thierry feels that thenew ranking is fully satisfactory. He observes that if he had used Prefcalc a fewyears earlier, he would have made the same choice as he actually did; he considersthis as a good point as far as Prefcalc is concerned. He finally makes the followingcomments: “Using Prefcalc has enhanced my understanding of both the data andmy own preferences; in particular I am more conscious of the relative importanceI give to the various criteria”.

Comments on the method

First let us emphasise an important psychological aspect of the empirical validationof a method or a tool, which is common in human practice: the fact that previousintuition or previous more informal analyses are confirmed by using a tool, herePrefcalc, contributes to raising the level of confidence the user puts in the tool.


1.3 2.72.0

Brake .1

1.2 3.22.2

Road .1

Figure 6.9: Modified single-attribute value functions for the braking and road-holding criteria

Rank Cars Value1 * Peugeot 309/16 (Car 11) 0.852 * Nissan Sunny (Car 3) 0.753 Honda Civic (Car 7) 0.664 Peugeot 309 (Car 12) 0.655 Renault 19 (Car 10) 0.616 Opel Astra (Car 8) 0.557 Mitsubishi Colt (Car 5) 0.548 Mazda 323 (Car 4) 0.539 Fiat Tipo (Car 1) 0.51

10 Toyota Corolla (Car 6) 0.5011 * Mitsubishi Galant (Car 13) 0.4812 Alfa 33 (Car 2) 0.4713 * Ford Escort (Car 9) 0.3214 * R 21 (Car 14) 0.16

Table 6.10: Modified ranking using Prefcalc. The cars ranked by Thierry are thosemarked with *


Observe that the user may well have a very vague understanding of the methoditself; he simply validates the method by using it to reproduce results that he hasconfidence in. After such a successful empirical validation step he will be moreprone to use the method in new situations that he does not master that well.

What are the drawbacks and traps of Prefcalc? Obviously Prefcalc can only beused in cases where the overall preference of the decision-maker can be representedby an additive multi-attribute value function (as described by Equation 6.8). Inparticular, this is not the case when preferences are not transitive or not complete(for arguments supporting the possible observation of non-transitive preferences,see the survey by Fishburn (1991)). There are some additional restrictions dueto the fact that the shapes of the single-attribute value functions that can bemodelled by Prefcalc are limited to piece-wise linear functions. This is hardly arestriction when dealing with a finite set of alternatives; by adapting the numberof linear pieces one can obtain approximations of any continuous curve that canbe as accurate as desired. When bounded to a small number of pieces, this mayhowever be a more serious restriction.

Stability of ranking

The main problem raised by the use of such a tool is the indetermination of theestimated single-attribute value functions (including the estimation of the trade-offs). Usually, if the preferences declared on the set of well-known alternatives arecompatible with an additive value model, there will be several value functions thatcan represent these preferences. Prefcalc chooses one such representation accordingto the principles outlined above, i.e. the most discriminating (in a sense). Otherchoices of a model albeit compatible with the declared preferences on the learningset, may lead to variations in the rankings of the remaining alternatives. Slightvariations in the trade-off values can yield rank reversals. For instance, with alltrade-offs within ±.02 of their value in Figure 6.9, changes already occur. Passingfrom the set of trade-offs (.43, .23, .13, .10, .10) to (.45, .21, .11, .12, .10) results inexchanging the positions of Honda Civic and Peugeot 309, which are ranked 3rdand 4th respectively after the change. This rank reversal is obtained by puttingslightly more emphasis on cost and slightly less on performance. Note that sucha slight change in the trade-offs has an effect on the ranking of the top 4 cars,those on which Thierry focused after his preliminary analysis (see Table 6.3). Itshould thus be very clear that in practice, determining the trade-offs with sufficientaccuracy could be both crucial and challenging. It is therefore of prime importanceto carry out a lot of sensitivity analyses in order to identify which parts of theresult remain reasonably stable.

Dependence on the learning set

In view of the fact that small variations of the trade-offs may even result in changesin the ranking of the top alternatives, one may question the influence of the se-lection of a learning set. In the case under examination, the top two alternativeswere chosen to be in the learning set and hence, are constrained to appear in the


correct order in the output of Prefcalc. What would have happened if the learningset had been different?

Let us take another subset of 5 cars and declare preferences that agree withthe ranking validated by Thierry (Table 6.10). When substituting the top 2 cars(Peugeot 309/16V, Nissan Sunny) by Renault 19, Mitsubishi Colt, two cars in themiddle segment of the ranking, the vector of trade-offs is (.53, .06, .08, .08, .25)and the top four in the new ranking are Renault 19 (1), Peugeot 309 (2), Peugeot309/16V (3), and Nissan Sunny (4); Honda Civic is relegated to the 12th position.In the choice of the present learning set, stronger emphasis has been put on costand safety (brakes and road-holding) and much less on performance (accelerationand pick up); three of the former top cars remain in the top four; Honda recedesdue to its higher cost and its weakness on road-holding; Renault 19 is heading therace mainly due to excellent road-holding.

Further experiments have been performed, reintroducing in turn one of the 4top cars and removing Renault 19. Clearly, the value of the trade-offs may dependdrastically on the learning set. Some sort of preliminary analysis of the user’spreferences can help to choose the learning set or understand the variations in theranking and the trade-offs a posteriori. In the present case, one can be relativelysatisfied with the results since the top 3 cars are usually well-ranked; the rankingof the Honda Civic is much more unstable and it is not difficult to understand why(weakness on road-holding and relatively high cost). The Renault 19 appears asan outsider due to excellent road-holding. Of course for the rest of the cars hugevariations may appear in their ranking, but one is usually more interested in thetop ranked alternatives.

From a general point of view, the option implemented in the mathematicalprogramming model to reduce the indeterminacies (essentially, by choosing tomaximise the contrast between the evaluations of the alternatives in the learningset) is not aimed at being as insensitive as possible with regard to the selectionof a learning set. Other options could be experimentally investigated in order tosee whether some could consistently yield more stable evaluations. It should benoted however that stability, which may be a desirable property in the perspectiveof uncovering an objective model of preferences measurement, is not necessarilya relevant requirement when the goal is to exploit partial available information.One may expect that the decision-maker will naturally choose alternatives thathe considers as clearly distinct from one another as members of the learning set;the analyst might alternatively instruct the decision-maker to do so. In a learningprocess, where, typically, information is incomplete, it must be decided how tocomplement the available facts by some arbitrary default assumptions. The infor-mation should then be collected while taking the assumptions made into account;one may consider that in the case of Prefcalc, the analyst’s instructions of select-ing alternatives that are as contrasted as possible, is in good agreement with theimplementation options.


6.3.4 Conclusion

This section has been devoted to the construction of a formal model that representspreferences on a numerical scale. Such a model can only be expected to exist whenpreferences satisfy rather demanding hypotheses; it thus relies on firm theoreticalbases, which is undoubtedly part of the intellectual appeal of the method. Thereis at least one additional advantage to theoretically well-founded decision models;such models can be used to legitimate a decision to persons that have not beeninvolved in the decision making process. Once the hypotheses of the model havebeen accepted or proved valid in a decision context and provided the process ofelicitation of the various parameters of the model has been conducted correctly,the decision becomes transparent.

The additive multi-attribute value model is rewarding, when established andaccepted by the stake-holders, since it is directly interpretable in terms of decision;the best decision is the one the model values most (provided the imprecisions inthe establishment of the model and the uncertainties in the evaluation informationallow to discriminate at least between the top alternatives). The counterpart ofthe clear-cut character of the conclusions that can be drawn from the model isthat establishing the model requires a lot of information and of a very precise andparticular type. This means that the model may be inadequate not only becausethe hypotheses could not be fulfilled but also because the respondents might feelunable to answer the questions or because their answers might not be reliable.Indirect methods based on exploiting partial information and extrapolating it (ina recursive validation process) may help when the information is not available inexplicit form; it remains that the quality of the information is crucial and that a lotof it is needed. In conclusion, direct assessment of multi-attribute value functionsis a narrow road between the practical problem of obtaining reliable answers todifficult questions and the risks involved in building a model on answers to simplerbut ambiguous questions.

In the next section we shall explore a very different formal approach that maybe less demanding with regard to the precision of the information, but also providesless conclusive outputs.

6.4 Outranking methods

6.4.1 Condorcet-like procedures in decision analysis

Is there any alternative way of dealing with multiple criteria evaluation in viewof a decision to the one described above for building a one-dimensional syntheticevaluation on some sort of super-scale? To answer this question (positively), in-spiration can be gained from the voting procedures discussed in Chapter 2 (seealso Vansnick (1986)). Suppose that each voter expresses his preferences througha complete ranking of the candidates. With Borda’s method, each candidate isassigned a rank for each of the voters (rank 1 if candidate is ranked first by a voter,rank 2 if he is ranked second, and so on); the Borda score of a candidate is thesum of the ranks assigned to him by the voters; the winner is the candidate with

6.4. OUTRANKING METHODS 125

Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 141 5 3 1 2 2 3 3 2 3 2 2 2 2 32 2 5 2 4 2 3 2 3 3 1 1 1 4 33 4 4 5 4 4 4 4 4 4 3 2 3 5 44 3 1 1 5 1 3 1 2 1 2 1 1 4 25 3 3 1 5 5 3 2 2 2 3 1 1 5 26 2 2 1 2 2 5 2 2 2 2 1 1 3 27 3 3 1 4 4 4 5 3 4 3 2 2 4 48 3 2 1 4 4 4 3 5 3 2 0 2 4 39 2 3 1 4 4 3 1 2 5 2 1 2 4 310 4 4 2 3 2 3 2 3 3 5 3 2 4 311 4 4 3 4 4 4 4 5 4 3 5 4 4 512 4 4 2 4 4 4 4 4 3 4 3 5 5 413 3 2 0 2 1 2 1 2 1 1 1 0 5 114 2 3 1 3 3 3 1 3 3 2 0 1 4 5

Table 6.11: Number of criteria in favour of a when compared to b for all pairs ofcars a, b in the “Choosing a car” problem

the smallest Borda score. This method can be seen as a method of construction ofa synthetic evaluation of the alternatives in multiple criteria decision analysis, thepoints of view corresponding to the voters and the alternatives to the candidates;all criteria-voters have equal weight and coding by the rank number of the positionof the candidate in a voter’s preference looks like a form of evaluation.

Condorcet’s method consists of a kind of tournament where all candidatescompare in pairwise “contests”. A candidate is declared to be preferred to anotheraccording to a majority rule, i.e. if more voters rank him before the latter thanthe converse. The result of such a procedure is a preference relation on the setof candidates that in general is neither transitive nor acyclic. A further step isthus needed in order to exploit this relation in view of the selection of one orseveral candidates or in view of ranking all the candidates. This idea can of coursebe transposed in the multiple criteria decision context. We do this below, usingThierry’s case again for illustrative purpose; we show how the problems raised bya direct transposition rather naturally lead to elementary “outranking methods”.

For each pair of cars a and b, we count the number of criteria according towhich a is at least as good as b. This yields the matrix given in Table 6.11; theelements of the matrix are integers ranging from 0 to 5. Note that we might havealternatively decided to count the criteria for which a is better than b, not takinginto account criteria for which a and b are tied.

What we could call the “Condorcet preference relation” is obtained by deter-mining for each pair of alternatives a, b whether or not there is a (simple) majorityof criteria for which a is at least as good as b. Since there are 5 criteria, the ma-jority is reached as soon as at least 3 criteria favour alternative a when comparedto b. The preference matrix is thus obtained by substituting 1 to any numberlarger or equal to 3 in Table 6.11 and 0 to any number smaller than 3 yielding the


Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 141 1 1 0 0 0 1 1 0 1 0 0 0 0 12 0 1 0 1 0 1 0 1 1 0 0 0 1 13 1 1 1 1 1 1 1 1 1 1 0 1 1 14 1 0 0 1 0 1 0 0 0 0 0 0 1 05 1 1 0 1 1 1 0 0 0 1 0 0 1 06 0 0 0 0 0 1 0 0 0 0 0 0 1 07 1 1 0 1 1 1 1 1 1 1 0 0 1 18 1 0 0 1 1 1 1 1 1 0 0 0 1 19 0 1 0 1 1 1 0 0 1 0 0 0 1 110 1 1 0 1 0 1 1 1 1 1 1 0 1 111 1 1 1 1 1 1 1 1 1 1 1 1 1 112 1 1 0 1 1 1 1 1 1 1 0 1 1 113 1 0 0 0 0 0 0 0 0 0 0 0 1 014 0 1 0 1 1 1 0 1 1 0 0 0 1 1

Table 6.12: Condorcet Preference relation for the “Choosing a car”problem. A“1” at the intersection of the a row and the b column means that a is rated notlower than b on at least 3 criteria

relation described by the 0-1 matrix in Table 6.12. Note that a criterion countsboth in favour of a and in favour of b only if a and b are tied on that criterion;the relation is reflexive since any alternative is at least as good as itself along allcriteria.

Majority rule and cycles

It is not immediately apparent that this relation has cycles and even cycles thatgo through all alternatives; an instance of such a cycle is 1, 7, 10, 11, 3, 12, 5, 2,14, 8, 9, 4, 6, 13, 1. Obviously it is not straightforward to suggest a good choice onthe basis of such a relation since one can find 3 criteria (out of 5) saying that 1 isat least as good as 7, 3 (possibly different) criteria saying that 7 is at least as goodas 10, . . . , and finally 3 criteria saying that 13 is at least as good as 1. How canwe possibly obtain something from this matrix in view of our goal of selecting thebest car? A closer look at the preference relation reveals that some alternativesare preferred to most others while some to only a few ones; among the former arealternatives 11 (preferred to all), 3 (preferred to all but one), 12 (preferred to allbut 2), 7 and 10 (preferred to all but 3). The same alternatives appear as seldombeaten: 3 and 11 (only once, excluding by themselves), 12 (twice), then come 10(5 times) and 7 (6 times).

To make things appear more clearly, by avoiding cycles as much as possible,one might decide to impose more demanding levels of majority in the definition ofa preference relation. We might require that an alternative be at least better thananother on 4 criteria. The new preference relation is shown in Table 6.13.

All cycles in the previous relation disappeared. When ranking the alternatives


Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 141 1 0 0 0 0 0 0 0 0 0 0 0 0 02 0 1 0 1 0 0 0 0 0 0 0 0 1 03 1 1 1 1 1 1 1 1 1 0 0 0 1 14 0 0 0 1 0 0 0 0 0 0 0 0 1 05 0 0 0 1 1 0 0 0 0 0 0 0 1 06 0 0 0 0 0 1 0 0 0 0 0 0 0 07 0 0 0 0 0 0 1 0 1 0 0 0 1 18 0 0 0 1 1 0 0 1 0 0 0 0 1 09 0 0 0 0 0 0 0 0 1 0 0 0 1 010 1 1 0 0 0 0 0 0 0 1 0 0 1 011 1 1 0 1 1 0 1 1 1 0 1 1 1 112 1 1 0 1 1 1 1 1 0 1 0 1 1 113 0 0 0 0 0 0 0 0 0 0 0 0 1 014 0 0 0 0 0 0 0 0 0 0 0 0 0 1

Table 6.13: Condorcet preference relation for the “Choosing a car” problem. A“1” at the intersection of the a row and the b column means that a is rated notlower than b on at least 4 criteria

by the number of those they beat (i.e. are at least as good on 4 criteria or more)one sees that 3, 11 and 12 come in the first position (they are preferred to 10 othercars), then there is a big gap after which come 7, 8 and 10 that beat only 3 othercars. Conversely, there are two non-beaten cars, 3 and 11, then come 10 and 12(beaten by one car); 7 is beaten by 3 cars.

In the present case, we see that the simple approach that was used essentiallymakes the same cars emerge as the methods used so far. There are at least tworadical differences between approaches based on the weighted sum and some moresophisticated way of assessing each alternative by a single number that synthesisesall the criteria values. One is that all criteria have been considered equally impor-tant; it is possible however to take information on the relative importance of thecriteria into account as will be seen in section 6.4.3.

The second difference is more in the nature of the type of approach; the moststriking point is that the size of the differences in the evaluations of a and b for allcriteria does not matter; only the signs of those differences do. In other words, hadthe available information been rankings of the cars with respect to each criterion(instead of numeric evaluations), the result of the “Condorcet” procedure wouldhave been exactly the same. More precisely, suppose that all that we know (orthat Thierry considers relevant in terms of preferences) about the cost criterion isthe ordering of the cars according to the estimated cost, i.e.

Car 6 �1 Car 5 �1 Car 2 �1 Car 4 �1 Car 12 �1

Car 10 �1 Car 3 �1 Car 13 �1 Car 11 �1 Car 8 �1

Car 1 �1 Car 7 �1 Car 9 �1 Car 14


where �1 represents “ is preferred to . . . on Criterion 1 ”, i.e. “ is cheaper than. . . ”. Suppose that similar hypotheses are made for the other 4 criteria; if this werethe case we would have obtained the same matrices as in Tables 6.12 and 6.13.Of course, neglecting the size of the differences for a criterion such as cost mayappear as misusing the available information; there are at least two considerationsthat could mitigate this commonsense reaction:

• the assessments for the cars on the cost criterion are rather rough estimationsof an expected cost (see section 6.1.1); in particular it is presumed that onaverage the lifetimes of all alternatives are equal; is it reasonable in thosecircumstances to rely on precise values of differences of these estimations toselect the “best” alternative?

• estimations of cost, even reliable ones, are not necessarily related with pref-erences on the cost criterion in a simple way.

Such issues were discussed extensively in section 6.2.4. The whole analysis carriedout there was aimed towards the construction of a multiple criteria value function,which implies making any difference in evaluations on a criterion equivalent tosome uniquely defined difference for any other criterion. The many methods thatcan be used to build a value function by questioning a decision-maker about hispreferences may well fail however; let us list a few reasons for the possible failureof these methods:

• time pressure may be so intense that there is not enough time available toengage in the lengthy elicitation process of a multiple criteria value function;

• it may be that the importance of the decision to be made does not justifysuch an effort;

• the decision-maker might not know how to answer the questions or mighttry to answer but prove inconsistent or might feel discomfort in being forcedto give precise answers where things are vague to him;

• in case of group decision, the analyst may be unable to make the variousdecision-makers agree on the answers to be given to some of the questionsraised in the elicitation process.

In such cases it may be inappropriate or inefficient to try building a value functionand other approaches may be preferred. This appears perhaps better if we considerthe more artificial scales associated with criteria 4 and 5 (see section 6.1.1 concern-ing the construction of these scales). Take, for instance, criterion 4 (Brakes). Doesthe difference between the levels 2.33 and 2.66 have a quantitative meaning? If itdoes, is this difference, in terms of preferences, more than, less than or equal tothe difference between the levels 1.66 and 2? How much would you accept to pay(in terms of criterion 1) to raise the value for criterion 4 from 2.33 to 2.66 or from1.33 to 2.33? Of course questions raised for eliciting value functions are more indi-rect but they still require a precise perception of the meaning of the levels on thescale of criterion 4 by the decision-maker. Such a perception can only be obtained


by having experienced the braking behaviour of specific cars rated at the variouslevels of the scale, but such knowledge cannot be expected from a decision-maker(otherwise there would be no room on the marketplace for all the magazines thatevaluate goods in order to help consumers spend their money while making thebest choice). Also remember that braking performance has been described by theaverage of 3 indices evaluating aspects of the cars’ braking behaviour; this doesnot favour a deep intuitive perception of what the levels on that scale may reallymean. So, one has to admit that in many cases the definition of the levels on scalesis quite far from precise in quantitative terms and it may be “hygienic” not to usethe fallacious power of numbers. This is definitely the option chosen in the meth-ods discussed in the present section. Not that these methods are purely ordinal;but differences between levels on a scale are carefully categorised, yet usually in acoarse-grained fashion, in order not to take into account differences that are onlydue to the irrelevant precision of numbers.

6.4.2 A simple outranking method

The Condorcet idea for a voting procedure has been transposed in decision analysisunder the name of outranking methods. Such a transposition takes the peculiari-ties of the decision analysis context into account, in particular the fact that criteriamay be perceived as unequally important; additional elements such as the notionof discordance have also been added. The principle of these methods is as fol-lows. Each pair of alternatives is considered in turn independently of third partalternatives; when looking at alternatives a and b, it is claimed that a “outranks”b if there are enough arguments to decide that a is at least as good as b, whilethere is no essential reason to refute that statement (Roy (1974), cited by Vincke(1992), p. 58). Note that taking strong arguments against declaring a preferenceinto account is typically what is called “discordance” and is original with respectto the simple Condorcet rule. Such an approach has been operationalised throughvarious procedures and particularly the family of ELECTRE methods associatedwith the name of B. Roy. (For an overview of outranking methods, the reader isreferred to the books by Vincke (1992) and Roy and Bouyssou (1993)). Below, wediscuss an application of the simplest of these methods, ELECTRE I, to Thierry’scase; ELECTRE I is a tool designed to be used in the context of a choice deci-sion problem; it builds up a set of which the best alternative—according to thedecision-maker’s preferences—should be a member. Let us emphasise that this setcannot be described as the set of best alternatives, not even a set of good alter-natives, but just a set that contains the “best” alternatives. We shall then showhow the fundamental ideas of ELECTRE I can be sophisticated, in particular inview of helping to rank the alternatives. Our goal is not to make a survey of alloutranking methods; we just want to present the basic ideas of such methods andillustrate some problems they may raise.


The lack of transitivity, acyclicity and completeness issues

As a preamble, it may be useful to emphasise the fact that outranking methods(and more generally methods based on pairwise comparisons) do not generally yieldpreferences that are transitive (not even acyclic). This point was already made inChapter 2 about Condorcet’s method. Since the hypotheses of Arrow’s theoremcan be re-formulated to be relevant in the framework of multiple criteria decisionanalysis (through the correspondence candidate-alternative, voter-criterion; seealso Bouyssou (1992) and Perny (1992)), it is no wonder that methods based oncomparisons of alternatives by pairs, independently of the other alternatives, willseldom directly yield a ranking of the alternatives. The pairs of alternatives thatbelong to the outranking relation are normally those between which the preferenceis established with a high degree of confidence; contradictions are reflected either incycles (a outranks b that outranks c that . . . that outranks a) or incomparabilities(neither a outranks b nor the opposite).

Let us emphasise that the lack of transitivity or of completeness, although rais-ing operational problems, may be viewed not as a weakness but rather as faithfullyreflecting preferences as they can be perceived at the end of the study. Defendersof the approach support the idea that forcing preferences to be expressed in theformat of a complete ranking is in general too restrictive; there is experimentalevidence that backs their viewpoint (Tversky (1969), Fishburn (1991)). Explicitrecognition that some alternatives are incomparable may be an important piece ofinformation for the decision-maker.

In addition, as repeatedly stressed in the writings of B. Roy, the outrankingrelation should be interpreted as what is clear-cut in the preferences of the decision-maker, something like the surest and most stable expression of a complex, vagueand evolving object that is named, for simplicity, “the preferences of the decision-maker”. In this approach very little hypotheses are made on preferences (likerationality hypotheses); one may even doubt that preferences pre-exist the processfrom which they emerge.

The analysis of a decision problem is conceived as an informational process,in which, carefully, prudently and interactively, models are built that reflect, tosome extent, the way of thinking, the feelings and the values of a decision-maker;in this concept, the concern is not making a decision but helping a decision-makerto make up his mind, helping him to understand a decision problem while takinghis own values into account in the modelling of the decision situation.

The approach could be called constructive; it has many features in commonwith a learning process; however, in contrast with most artificial intelligence prac-tice, the model of preferences is built explicitly and formally; preferences are notsimply described through rules extracted from partial information obtained on alearning set. For more about the constructive approach including comparisons withthe classical normative and descriptive approaches (see Bell, Raiffa and Tversky(1988)), the reader is referred to Roy (1993).

Once the outranking relation has been constructed, the job of suggesting adecision is thus not straightforward. A phase of exploitation of the outrankingrelation is needed in order to provide the decision-maker with information more


directly interpretable in terms of a decision. Such a two-stage process offers theadvantage of good control on the transformation of the multi-dimensional infor-mation into a model of the decision-maker’s preferences including a certain degreeof inconsistency and incompleteness.

6.4.3 Using ELECTRE I on the case

We briefly review the principles of the ELECTRE I method. For each pair ofalternatives a and b, the so-called concordance index is computed; it measuresthe strength of the coalition of criteria that support the idea that a is at least asgood as b. The strength of a coalition is just the sum of the weights associated tothe criteria that constitute the coalition. The notion of weights will be discussedbelow. If all criteria are equally important, the concordance index is proportionalto the number of criteria in favour of a as compared to b as in the Condorcet-likemethod discussed above. The level from which a coalition is judged strong enoughis determined by the so-called concordance threshold; in the Condorcet votingmethod, with the simple majority rule, this threshold is just half the number ofcriteria and in general one will choose a number above half the sum of the weightsof all criteria. Another feature that contrasts ELECTRE with pure Condorcet butalso with purely ordinal methods, is that some large differences in evaluation, whenin disfavour of a, might be pinpointed as preventing a from outranking b. Onetherefore checks whether there is any criterion for which b is so much better than athat it would make it meaningless for a to be declared preferred overall to b; if thishappens for at least one criterion one says that there is a veto to the preference of aover b. If the concordance index passes some threshold (“concordance threshold”)and there is no veto of b against a, then a outranks b. Note that the outrankingrelation is not asymmetric in general; it may happen that a outranks b and that boutranks a.

This process yields a binary relation on the set of alternatives, which mayhave cycles and be incomplete (neither a outranks b nor the opposite). In orderto propose a set of alternatives of particular interest to the decision-maker fromwhich the best compromise alternative should emerge, one extracts the kernel ofthe graph of the outranking relation after having the cycles reduced; in other words,all alternatives in a cycle are considered to be equivalent; they are substituted bya unique representative node; in the resulting relation without cycles, the kernelis defined as a subset of alternatives that do not outrank one another and suchthat each alternative not in the kernel is outranked by at least one alternative inthe kernel; in particular all non-outranked alternatives belong to the kernel. In agraph without cycles, a unique kernel always exists. It should be emphasised thatall alternatives in the kernel are not necessarily good candidates for selection; analternative incomparable to all others is always in the kernel; alternatives in thekernel may be beaten by alternatives not in the kernel. So, the kernel may beviewed as a set of alternatives on which the decision-maker’s attention should befocused.

In order to apply the method to Thierry’s case, we successively have to deter-mine


• weights for the criteria

• a concordance threshold

• ordered pairs of evaluations that lead to a veto (and this for every criterion)

Evaluating coalitions of criteria

The concordance index c(a, b), that measures the coalition of criteria along whicha is at least as good as b may be computed by the formula

c(a, b) =∑

i:gi(a)≥gi(b)

pi(6.12)

where the pi’s are normalised weights that reflect the relative importance of thecriteria; gi(a) denotes, as usual, the evaluation of alternative a for criterion i (whichis assumed to be maximised; if it were to be minimised, the weight pi would beadded when the converse inequality holds, i.e. gi(a) ≤ gi(b)). So, as often as theevaluation of a passes or equals that of b on a criterion, its weight now enters intothe weight of the coalition (additively) in favour of a. A criterion can count bothfor a against b and the opposite if and only if gi(a) = gi(b).

In the context of outranking, the weights are not trade-offs; they are completelyindependent of the scales for the criteria. A practical consequence is that onemay question the decision-maker in terms of relative importance of the criteriawithout reference to the scales on which the evaluations for the various viewpointsare expressed. This does not mean however that they are independent of themethod and that one could use values given spontaneously by the decision-makeror through questioning in terms of “importance” without care, without referenceto the evaluations as is done in Saaty’s procedure. It is important to bear in mindhow the weights will be used, in this case to measure the strength of coalitionsin pairwise comparisons and decide on the preference only on the basis of thecoalitions.

To be more specific and contrast the meaning of the weights from those usedin weighted sums, let us first consider those suggested by Thierry in section 6.2.2,i.e. (1, 2, 1, 0.5, 0.5). Note that these were not obtained through questioning on therelative importance of criteria but in the context of the weighted sum with Thierrybearing re-scaled evaluations in mind: the evaluations on each criterion had beendivided by the maximal value gi,max attained for that criterion. Dividing theweights by their sum (= 5), yields the normalised weights (.2, .4, .2, .1, .1). Usingthese weights in outranking methods would lead to an overwhelming predominanceof criteria 2 (Acceleration) and 3 (Pick-up), which are also linked since they arefacets of the cars performance. With such weights and a concordance threshold ofat least .5 , it is impossible for a car to be outranked when it is better on criteria 2and 3 even if all other criteria are in favour of an opponent. It was never Thierry’sintention that once a car is better on criteria 2 and 3, there is no need for lookingat the other criteria; the whole initial analysis shows on the contrary, that a fastand powerful car is useless, for instance, if it is bad on the braking or road-holdingcriterion. Such a feature of the preference structure could indeed be reflected


through the use of vetoes, but only in a negative manner, i.e. by removing theoutranking of a safe car by a powerful one, not by allowing a safe car to outranka powerful one. Note that the above weights may nevertheless be appropriatefor a weighted sum because in such a method, the weights are multiplied by theevaluations (or re-coded evaluations). To make it clearer, consider the followingreformulation of the condition under which a is preferred to b in the weighted summodel (a similar formulation is straightforward in the additive value model)

a % b iffn∑

i=1

ki × (gi(a)− gi(b)) ≥ 0.(6.13)

If a is slightly better than b on a point of view i, the influence of this fact in thecomparison between a and b is reflected by the term ki × (gi(a) − gi(b)) whichis presumably small. Hence, important criteria count for little in pairwise com-parisons when the difference between the evaluations of the alternatives are smallenough. On the contrary, in outranking methods, weights are not divided; when ais better than b on some criterion, the full weight of the criterion counts in favourof a, whether a is either slightly or by far better than b.

Since the weights in a weighted sum depend on the scaling of each criterion andthere is no acknowledged standard scaling, it makes no sense in principle to usethe weights initially provided by Thierry as coefficients measuring the importanceof the criteria in an outranking method. If we nevertheless try to use them, wemight consider the weights used with the normalised criteria of Table 6.4. We seethat the importance of the “safety coalition” (Criteria 4 and 5) would be negligible(weight = .20), while the importance of the “performance coalition” (Criteria 2and 3) would be overwhelming (weight = .60). There is another reasonable nor-malisation of the criteria that does not fix the zero of the scale but rather mapsthe smallest attained value gi,min onto 0 and the largest gi,max onto 1. Transform-ing the weights accordingly (i.e. multiplying them by the inverse of the range ofthe values for the corresponding criterion prior to the transformation) one wouldobtain (.28, .14, .13, .20, .25) as a weight vector. With these values as coefficients ofimportance, the “safety coalition” (Criteria 4 and 5; weight = .45) becomes moreimportant than the “performance coalition” (Criteria 2 and 3; weight = .27) thatThierry may consider unfair. As an additional conclusion, one may note that thevalues of the weights vary tremendously depending on the type of normalisationapplied.

Now look at the weights (.35, .24, .17, .12, .12 ) obtained through Saaty’s ques-tioning procedure in terms of “importance” (see section 6.3.2). Using these weightsfor measuring strength of coalitions does not seem appropriate, since criteria 1 and2’s predominance is too strong (joint weight = .35 + .24 = .59).

Due to the all or nothing character of the weights in ELECTRE I, one isinclined to choose less contrasted weights than those examined above. Althoughthere are procedures that have been proposed to elicit such weights (see Mousseau(1993), Roy and Bouyssou (1993)), we will just choose a set of weights in anintuitive manner; let us take weights proportional to (10, 8, 6, 6, 6) as reflecting therelative importance of the criteria. At least the ordering of the values seems to be


Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 141 1 .5 .17 .33 .33 .56 .61 .33 .61 .33 .33 .33 .33 .612 .49 1 .44 .83 .33 .56 .44 .61 .61 .28 .28 .28 .83 .613 .83 .73 1 .73 .73 .73 .78 .78 .83 .56 .44 .56 1 .784 .66 .17 .28 1 .17 .56 .28 .44 .28 .44 .28 .28 .78 .445 .66 .66 .28 1 1 .56 .44 .44 .44 .66 .28 .28 1 .446 .44 .44 .28 .44 .44 1 .44 .44 .44 .44 .28 .28 .61 .447 .56 .56 .22 .73 .73 .73 1 .56 .83 .56 .39 .39 .73 .838 .66 .39 .22 .73 .73 .73 .61 1 .66 .39 0 .39 .73 .669 .39 .56 .17 .73 .73 .56 .17 .33 1 .39 .17 .39 .73 .6110 .83 .73 .44 .56 .33 .56 .61 .61 .61 1 .61 .33 .83 .6111 .83 .73 .56 .73 .73 .73 .78 1 .83 .56 1 .73 .73 112 .83 .73 .44 .73 .73 .73 .78 .78 .61 .83 .61 1 1 .7813 .66 .39 0 .39 .17 .39 .28 .44 .28 .17 .28 0 1 .2814 .39 .56 .22 .56 .56 .56 .17 .56 .56 .39 0 .22 .73 1

Table 6.14: Concordance index (rounded to two decimals) for the “Choosing acar” problem

in agreement with what is known about Thierry’s perceptions. Normalising theweight vector yields (.27, .22, .17, .17, .17) after rounding in such a way that thenormalised weights sum up to 1.00. The weights of the three groups of criteriaare rather balanced; .27 for cost, .39 for performance and .34 for safety. Theconcordance matrix c(a, b) computed with these weights is shown in Table 6.14.

Determining which coalitions are “strong enough”

At this stage we have to build the concordance relation, a binary relation obtainedthrough deciding which coalitions in Table 6.14 are strong enough; this is done byselecting a concordance threshold above which we consider that they are. If we setthe concordance threshold at .60, we obtain a concordance relation with a cyclepassing through all alternatives but one, which is Car 3. This tells us somethingabout coalitions that we did not know. Previous analysis with equal weights (seeSection 6.4.1) showed that the relation in Table 6.12, obtained through looking atconcordant coalitions involving at least three criteria, had a cycle passing throughall alternatives; with the weights we have now chosen, the “lightest” coalitionof three criteria involves criteria 3, 4 and 5 and weighs .51; then, in increasingorder, we have three different coalitions weighing .56 (two of the criteria 3, 4, 5with criterion 2), and three coalitions weighing .61 (two of the criteria 3, 4, 5 withcriterion 1); finally there are three coalitions weighing .66 (one of the three criteria3, 4, 5 together with criteria 1 and 2). Cutting the concordance index at .60 thusonly keeps the 3-coalitions that contain criterion 1 with the coalitions involving atleast 4 criteria.

The new thing that we can learn is the following: the relation obtained bylooking at coalitions of at least 4 criteria plus coalitions of three that involvecriterion 1 has a big cycle. When we cut above .62 there is no longer a cycle. The“lightest” 4-coalition weighs .73 and there is only one value of the concordanceindex between .61 and .73, namely .66. So cutting between .66 and .72 will yieldthe relation in Table 6.13, which we have already looked at; a poorer relation(i.e. with fewer arcs) is obtained when cutting above .73. In the sequel we will


concentrate on two values of the concordance threshold, .60 and .65, that areon both sides of the borderline separating concordance relations with and withoutcycles; above these values, concordance relations tend to become increasingly poor;below, they are less and less discriminating.

In the above presentation the weights sum up to 1. Note that multiplyingall the weights by a positive number would yield the same concordance relationsprovided the concordance threshold is multiplied by the same factor; the weightsin ELECTRE I may be considered as being assessed on a ratio scale, i.e. up to apositive scaling factor.

Supporting choice or ranking

Before studying discordance and veto we show how a concordance relation, whichis just an outranking relation without veto, can be used for supporting a choiceor a ranking in a decision process. Introducing vetoes will just remove arcs fromthe concordance relation but the operations performed on the outranking relationduring the exploitation phase are exactly those that are applied below to theconcordance relation.

In view of supporting a choice process, the exploitation procedure of ELECTREI firstly consists in reducing the cycles, which amounts to consider all alternativesin a cycle as equivalent. The kernel of the resulting acyclic relation is then searchedfor and it is suggested that the kernel contains all alternatives on which the at-tention of the decision-maker should be focused. Obviously, reducing the cyclesinvolves some drawbacks. For example, cutting the concordance relation of Table6.14 at .60 yields a concordance relation with cycles involving all alternatives butCar 3; there is no simple cycle passing once through all alternatives except Car 3;an example of (non-simple) cycle is 1, 7, 9, 5, 10, 11, 12, 2, 14, 13, 1 plus, startingfrom 12, 12, 8, 4, 1 and again, 12, 6, 1. Reducing the cycles of this concordancerelation results in considering two classes of equivalent alternatives; one class iscomposed of the single Car 3 while the other class comprises all other alternatives.Beside the fact that this partition is not very discriminating it also considers asequivalent alternatives that are not in the same simple cycle. Moreover, the infor-mation on how the alternatives compare with respect to all others is completelylost; for instance Car 12, which beats almost all other alternatives in the cut at.60 of the concordance relation, would be considered as equivalent to Car 6 whichbeats almost no other car.

For illustrative purposes, we consider the cut at level .65 of the concordanceindex, which is the largest acyclic concordance relation that can be obtained; thisrelation is shown in Table 6.15. Its kernel is composed of cars 3, 10 and 11. Cars 3and 11 are not outranked and car 10 is the only alternative that is not outrankedeither by car 3 or by car 11. This seems to be an interesting set in a choice process,in view of the analysis of the problem carried out so far.

Rankings of the alternatives may also be obtained from Table 6.15 in a rathersimple manner. For instance, consider the alternatives either in decreasing orderof the number of alternatives they beat in the concordance relation or in increasingorder of the number of alternatives by which they are beaten in the concordance


Cars 1 2 3 4 5 6 7 8 9 10 11 12 13 141 1 0 0 0 0 0 0 0 0 0 0 0 0 02 0 1 0 1 0 0 0 0 0 0 0 0 1 03 1 1 1 1 1 1 1 1 1 0 0 0 1 14 1 0 0 1 0 0 0 0 0 0 0 0 1 05 1 1 0 1 1 0 0 0 0 1 0 0 1 06 0 0 0 0 0 1 0 0 0 0 0 0 0 07 0 0 0 1 1 1 1 0 1 0 0 0 1 18 1 0 0 1 1 1 0 1 1 0 0 0 1 19 0 0 0 1 1 0 0 0 1 0 0 0 1 010 1 1 0 0 0 0 0 0 0 1 0 0 1 011 1 1 0 1 1 1 1 1 1 0 1 1 1 112 1 1 0 1 1 1 1 1 0 1 0 1 1 113 1 0 0 0 0 0 0 0 0 0 0 0 1 014 0 0 0 0 0 0 0 0 0 0 0 0 1 1

Table 6.15: Concordance relation for the “Choosing a car” problem with weights.28, .22, .17, .17, .17 and concordance threshold .65

Class 1 2 3 4 5 6 7 8 9A 11 3, 12 8 7 5 9, 10 2, 4 13, 14 1, 6

(11) (10) (7) (6) (5) (3) (2) (1) (0)B 3, 11 12 10 7, 8 9 2, 6, 14 5 1, 4 13

(0) (1) (2) (3) (4) (5) (6) (8) 11

Table 6.16: Rankings obtained from counting how many alternatives are beaten(ranking “A”) or beat (ranking “B”) each alternative in the concordance relation(threshold .65); the numbers between parentheses in the second row of ranking A(resp. ranking B) are the numbers of beaten (resp. beating) alternatives for eachalternative of the same column in the first row

relation. This amounts to counting the 1’s respectively in rows and columns ofTable 6.15 and ranking the alternatives accordingly (we do not count the 1’s onthe diagonal since the coalition of criteria saying that an alternative is at leastas good as itself always encompasses all criteria); the corresponding rankings arerespectively labelled “A” and “B” in Table 6.16. We observe that the usual groupof “good” alternatives form the top two classes of these rankings.

There are more sophisticated ways of obtaining rankings from outranking re-lations. ELECTRE II, which we do not describe here, was designed for fulfillingthis goal. To some extent, it makes better use of the information contained in theconcordance index, since the ranking is based on two cuts, one linked with a weakpreference threshold, the other, with a strong preference threshold; for instancein our case, one could consider that the .60 cut corresponds to weak preference(or weak outranking) while the .65 cut corresponds to strong preference. In the


above method, the information contained in other cutting levels has been totallyignored although the rankings obtained from them may not be identical. Theymay even differ significantly as can be seen when deriving a ranking from the .60cut by using the method we applied to the .65 cut.

Thresholding

To this point, both in the Condorcet-like method and the basic ELECTRE Imethod (without veto), we treated the assessments of the alternatives as if theywere ordinal data, i.e. we could have obtained exactly the same results (kernelor ranking) by working with the orders induced from the set of alternatives bytheir evaluations on the various criteria. Does this mean that outranking methodsare purely ordinal? Not exactly! More sophisticated outranking methods exploitinformation that is richer than purely ordinal but not as demanding as cardinal.This is done through what we shall call “thresholding”. Thresholding amounts toidentifying intervals on the criteria scales, which represent the minimal differenceevaluation above which a particular property holds. For instance, consider thatthe assessment of b on criterion i, gi(b), is given and criterion i is to be maximised;from which value gi(b) + ti(gi(b)) onwards, will an alternative a be said to bepreferred to b? Implicitly, we have considered previously that b was preferred to aon criterion i as soon as gi(b) ≥ gi(a), i.e. we have considered that ti(gi(b)) = 0. Inview of imprecision in the assessments and since it is not clear for all criteria thatthere is a marked preference when the difference |gi(a)−gi(b)| is small, one may beled to consider a non-null threshold to model preference. In our case, for instance,it is not likely that Thierry would really mark a preference between cars 3 and 10on the Cost criterion since their estimated costs are within 10e (see Table 6.2).Thresholding is all the more important that, as mentioned at the end of section6.4.1, the size of the interval between the evaluations is not taken into accountwhen deciding that a is overall preferred to b. Hence one should be prudent whendeciding that a criterion is or is not an argument for saying that a is at least asgood as b; therefore, it is reasonable to determine a threshold function ti and saythat criterion i is such an argument as soon as gi(a) ≥ gi(b) + ti(gi(b)); since weexamine reasons for saying that a is at least as good as b, not for saying that a is(strictly) better than b, the function ti should be negatively valued.

Determining such a threshold function is not necessarily an easy task. Onecould ask the decision-maker to tell, ideally for each evaluation gi(a) of each al-ternative on each criterion, from which value onwards an evaluation should beconsidered at least as good as gi(a). Things may become simpler if the thresholdmay be considered constant or proportional to gi(a) (e.g. ti(gi(a)) = .05× gi(a)).Note that constant thresholds could be used when a scale is “linear” in the sensethat equal differences throughout a scale have the same meaning and consequences(see end of section 6.2.3); however this is not a necessary condition since some dif-ferences, but not all, need to be equivalent throughout the scale. In any case,Definition 6.12 of the concordance index is adapted in a straightforward manner


as follows and the method for building an outranking relation remains unchanged:

c(a, b) =∑

i:gi(a)≥gi(b)+ti(gi(b))

pi.(6.14)

Note that preference thresholds, that lead to indifference zones, are used ina variant of the ELECTRE I method called ELECTRE IS (see Roy and Skalka(1984) or Roy and Bouyssou (1993)).

Thresholding is a key tool in the original outranking methods; it allows oneto bypass the necessity of transforming the original evaluations to obtain linearscales. There is another occasion for invoking thresholds, which is in the analysisof discordance.

Discordance and vetoes

Remember that the principle of the outranking methods consists in examining thevalidity of the proposition “a outranks b”; the concordance index “measures” thearguments in favour of saying so, but there may be arguments strongly against thatassertion (discordant criteria). These discordant voices can be viewed as vetoes;there is a veto against declaring that a outranks b if b is so much better thana on some criterion that it becomes disputable or even meaningless to pretendthat a might be better overall than b. Let us emphasise that the effect of a vetois quite radical, just like in the voting context. If a veto threshold is passed ona criterion when comparing two alternatives, then the alternative against whichthere is a veto, say a, may not outrank the other one, say b; this may result inincomparabilities in the outranking relation if in addition b does not outrank a,either because the coalition of criteria stating that b is at least as good as a is notstrong enough or because there is also a veto of a against b on another criterion.

To be more precise, a veto threshold on criterion i is in general a function vi

encoding a difference in evaluations so big that it would be out of the question tosay that a outranks b if

gi(a) > gi(b) + vi(gi(b))(6.15)

when criterion i is to be minimised, or

gi(a) < gi(b)− vi(gi(b))(6.16)

when criterion i is to be maximised. Of course it may be the case that the functionvi be a constant.

In our case, in view of Thierry’s particular interest in sporty cars, the criterionmost likely to yield a veto is acceleration. Although there was no precise indica-tion on setting vetoes in Thierry’s preliminary analysis (section 6.1.2), one mightspeculate that on the acceleration criterion, pairs such as (28, 29.6), (28.3, 30),(29, 30.4), (29, 30.7) (all evaluations expressed in seconds) and all intervals widerthan those listed, lead to a veto (against claiming that the alternative with thehigher evaluation could be preferred to the other one, since here, the criterion isto be minimised). If this would seem reasonable then we would not be far from


accepting a constant veto threshold of about 1.5 or 1.6 second. If we decide thatthere is a veto with a constant threshold on the acceleration criterion for differ-ences exceeding 1.5 second, it means that a car that accelerates from 0 to 100km/h in 29.6 seconds (as is the case of Peugeot 309 GTI) could not conceivablyoutrank a car which does it in 28 (as Honda Civic does) whatever the evaluationson the other criteria might be. Of course, setting the veto threshold to 1.5 impliesthat a car needing 30.4 seconds (like Mazda 323) may not outrank a car thataccelerates in 28.9 (like Opel Astra or Renault 21) but might very well outranka car that accelerates in 29 (like Nissan Sunny) if the performances on the othercriteria are superior. Using 1.5 as a veto threshold thus implies that differencesof at least 1.5 from 28 to 29.6 or from 28.9 to 30.4 have the same consequencesin terms of preference. Setting the value of the veto threshold obviously involvessome degree of arbitrariness; why not set the threshold at 1.4 second, which wouldimply that Mazda 323 may not outrank Nissan Sunny? In such cases, it must beverified whether small variations around the chosen value of a parameter (such asa veto threshold) do not influence the conclusions in a dramatic manner; if smallvariations do have a strong influence, detailed investigation is needed in order todecide which setting of the parameter’s value is most appropriate. A related facetof using thresholds is that growing differences that are initially not significant,brutally crystallise into significant ones as soon as a crisp threshold is passed; ob-viously methods using thresholds may show discontinuities in their consequencesand that is why sensitivity analysis is even more crucial here than with more clas-sical methods. However, the underlying logic is quite similar to that on whichstatistical tests are based; here as well, conventional levels of significance (like thefamous 5% rejection intervals) are widely used to decide whether a hypothesismust be rejected or not. We will allude in the next section to more “gradual”methods that can be designed on the basis of concordance-discordance principlessimilar to those outlined above.

In order not to be too long we do not develop the consequences of introducingveto thresholds in our example. It suffices to say that the outranking relation, itskernel and the derived rankings are not dramatically modified in the present case.

6.4.4 Main features and problems of elementary outrankingapproaches

The ideas behind the methods analysed above may be summarised as follows. Foreach pair of alternatives (a, b) it is determined whether a outranks b by comparingtheir evaluations gi(a) and gi(b) on each point of view i. The pairs of evaluationsare compared to intervals that can be viewed as typical of classes of ordered pairs ofevaluations on each criterion (for instance the classes “indifference”, “preference”and “veto”). On the basis of the list of classes to which it belongs for each criterion(its “profile”), the pair (a, b) is declared to be or not to be in the outrankingrelation.

Note that

• a credibility index of outranking (for instance “weak” and “strong” outrank-ing) may be defined; to each value of the index corresponds a set of profiles;


if the profile of the pair (a, b) is one of those associated with a particularvalue of credibility of outranking, then the outranking of b by a is assignedthis value of credibility index; there are of course rationality requirementsfor the sets of profiles associated with the various values of the credibilityindex; this credibility index is to be interpreted in logical terms; it modelsthe degree to which it is true that there are enough arguments in favour ofsaying that a is better than b while there is no strong reason of refuting thisstatement (see the definition of outranking in Section 6.4.2);

• thresholds may be used to determine the classes in differences for preferenceon each criterion, provided differences gi(a) − gi(b) equal to such thresh-olds have the same meaning independently of their location on the scale ofcriterion i (linearity property);

• the rules for determining whether a outranks b (eventually to some degree ofa credibility index) generally involve weights that describe the relative impor-tance of the criteria; these weights are typically used additively to measurethe importance of coalitions of criteria independently of the evaluations ofthe alternatives.

The result of the construction, i.e. the outranking relation (possibly qualifiedwith a degree of a credibility index), is then exploited in view of a specific type ofdecision problems (choice, ranking, . . . ). It is supposed to include all the relevantand sure information about preference that could be extracted from the data andthe questions answered by the decision-maker.

Due to their lack of transitivity and acyclicity, procedures are needed to derivea ranking or a choice set from the outranking relation. In the process of derivinga complete ranking from the outranking relation, the property of independenceof irrelevant alternatives (see Chapter 2 where this property is evoked) is lost;this property was satisfied in the construction of the outranking relation sinceoutranking is decided by looking in turn at the profiles of each pair of alternatives,independently of the rest. Since this is an hypothesis of Arrow’s theorem and it isviolated, the conclusion of the theorem is not necessarily valid and one may hopethat there is no criterion playing the role of dictator.

The various procedures that have been proposed for exploiting the outrank-ing relation (for instance transforming it into a complete ranking) are not abovecriticism; it is especially difficult to justify them rigorously since they operate onan object that has been constructed, the outranking relation. Since the decision-maker has no direct intuition of this object, one can hardly expect to get reliableanswers when questioning him about the properties of this relation. On the otherhand, a direct characterisation of the ranking produced by the exploitation of anoutranking relation seems out of reach.

Non-compensation

The weights count entirely or not at all in the comparison of two alternatives; thesmaller or larger difference in evaluations between alternatives does not matteronce a certain threshold is passed. This fact, which was discussed in the second


paragraph of this section 6.4.3, is sometimes called the non-compensation propertyof outranking methods. A large difference in favour, say, of a over b on somecriterion is of no use to compensate for small differences in favour of b on manycriteria since all that counts for deciding that a outranks b is the list of criteria infavour of a. Vetoes only have a “negative” action, impeding that outranking bedeclared. The reader interested in the non-compensation property is referred toFishburn (1976), Bouyssou and Vansnick (1986), Bouyssou (1986).

Incomparability and indifference

For some pairs (a, b) it may be the case that neither a outranks b nor the opposite;this can occur not only because of the activation of a veto but alternatively becausethe credibility of both the outranking of a by b and of b by a are not sufficientlyhigh. In such a case a and b are said to be incomparable. This may be interpreted intwo different ways. One may advance that some alternatives are too contrasted tobe compared. It has been argued, for instance, that comparing a Rolls Royce witha small and cheap car proves impossible because the Rolls Royce is incomparablybetter on many criteria but is also incomparably more expensive. Another exampleconcerns the comparison of projects that involve the risk of loss of human life;should one prefer a more expensive project with a lower risk or a less expensive onewith higher risk (see Chapter 5, Section 5.3.3, for evaluations of the cost of humanlosses in various countries)? Other people support the idea that incomparabilityresults from insufficient information; the available information sometimes does notallow to make up one’s mind on whether a is preferred to b or the converse.

In any case, incomparability should not be assimilated to indifference. Indiffer-ence occurs when alternatives are considered as almost equivalent; incomparabilityis more concerned with very contrasted alternatives. The treatment of the two cat-egories is quite different in the exploitation phase; indifferent alternatives shouldappear in the same class of a ranking or in neighbouring one, while incomparablealternatives may be ranked in classes quite far apart.

6.4.5 Advanced outranking methods: from thresholding to-wards valued relations

Looking at the variants of the ELECTRE method suggests that there is a generalpattern on which they are all built:

• alternatives are considered in pairs and eventually, outranking is determinedon the basis of the profiles of performance of the pair only;

• the differences between the evaluations of a pair of alternatives for each cri-terion are categorised in discrete classes delimited by thresholds (preference,veto, . . . );

• rules are invoked to decide which combinations of these classes lead to out-ranking; more generally, there are several grades of outranking (weak, strongin ELECTRE II, . . . ) and rules associate specific combinations of classes toeach grade;


• specialised procedures are used to exploit the various grades of outrankingin view of supporting the decision process.

Defining the classes through thresholding raises the problem of discontinuityalluded to in the previous section. It is thus appealing to work with continuousclasses of differences of preference for each criterion, i.e. directly with valued re-lations. A value cj(a, b) on arc (a, b) models the degree to which alternative a ispreferred to alternative b on criterion j. These degrees are often interpreted inlogical fashion as a degree of credibility of the preference. Then each combinationof values of the credibility index on the various criteria may be assigned an overallvalue of the credibility index for outranking; the outranking relation is also valuedin such a context.

Dealing with valued relations and especially combining “values” raises a ques-tion: which operations may be meaningfully (or just reasonably) performed onthem. Our analysis of the weighted sum in section 6.2 has taught us that opera-tions that may appear as natural, rely on strong assumptions that suppose verydetailed information on the preferences.

Consider the following formula which is used in ELECTRE III, a methodleading to a valued outranking relation (see Roy and Bouyssou (1993) or Vincke(1992)), to compute the overall degree of credibility S(a, b) of the outranking of bby a.

S(a, b) =

c(a, b) if Dj(a, b) ≤ c(a, b) ∀j

c(a, b)×∏

j:Dj(a,b)>c(a,b)1−Dj(a,b)1−c(a,b) otherwise

In the above formula, Dj(a, b) is a degree of credibility of discordance. We do notenter into the detail of how c(a, b) or Dj(a, b) can be computed; just rememberthat they are valued between 0 and 1.

The justification of such a formula is mainly heuristic in the sense that the re-sponse of the formula to the variation of some inputs is not counter-intuitive: whendiscordance raises outranking decreases; the converse with concordance; when dis-cordance is maximal there may not be any degree of outranking at all. This doesnot mean that the formula is fully justified. Other formulae might have beenchosen with similarly good heuristic behaviour. The weighted sum also has goodheuristic properties at first glance, but deeper investigation shows that the val-ues it yields cannot be trusted as a valid representation of the preferences unlessadditional information is requested from the decision-maker and used to re-codethe original evaluations gj . The formula above involves operations such as mul-tiplication and division that suppose that concordance and discordance indicesare plainly cardinal numbers and not simply labels of ordered categories. This isindeed a strong assumption that does not seem to us to be supported by the restof the approach, in particular by the manner in which the indices are elaborated;in the elementary outranking methods (ELECTRE I and II) much care was taken,for instance, to avoid performing arithmetical operations on the evaluations gi(a);only cuts of the concordance index were considered (which is typically an opera-tion valid for ordinal data); vetoes were used in a very radical fashion. No special


attention, comparable to what was needed to build value functions from the eval-uations, was paid to building concordance and discordance indices; in particular,nothing guarantees that these indices can be combined by means of arithmeticoperations and produce an overall index S representative of a degree of credibilityof an outranking. For instance, consider the following two cases which lead to anoutranking degree of .4:

• the concordance index c(a, b) is equal to .40 and there is no discordance (i.e.Dj(a, b) = 0 for all j);

• the concordant coalition weighs .80 but there is a strong discordance oncriterion 1; D1(a, b) = .90 while Dj(a, b) = 0 for all j 6= 1.

For both, the formula yields a degree of outranking of .40. Obviously anotherformula with similar heuristic behaviour might have resulted in quite differentoutputs. Consider for instance the following:

S(a, b) = min{c(a, b),min{1−Dj(a, b), j = 1, . . . , n}}

On the first case, it yields an outranking degree of .40 as well but on the secondcase, the degree falls to .10. It is likely that in some circumstances a decision-makermight find the latter model more appropriate. Note also that the latter formuladoes not involve arithmetic operations on c(a, b) and the 1 − Dj(a, b)’s but onlyordinal operations, namely taking the minimum. This means that transformingc(a, b) and the 1−Dj(a, b)’s by an increasing transformation of the [0, 1] intervalwould just amount to transforming the original value of S(a, b) by the same trans-formation. This is not the case with the former formula. Hence, if the informationcontent of the c(a, b) and the 1 − Dj(a, b)’s just consists in the ordering of theirvalues in the [0, 1] interval, then the former formula is not suitable. For a surveyof possible ways of aggregating preferences into a valued relation, the reader isreferred to chapters 2 and 3 of the book edited by S lowinski (1998).

The fact that the value obtained for the outranking degree may involve some de-gree of arbitrariness did not escape Roy and Bouyssou (1993) who explain (p.417)that the value of the degree of outranking obtained by a formula like the aboveshould be handled with care; they advocate that thresholds be used when com-paring two such values: the outranking of b by a can be considered to be morecredible than the outranking of d by c only if S(a, b) is significantly larger thanS(c, d). We agree with this statement but unfortunately it seems quite difficult toassign a value to a threshold above which the difference S(a, b)− S(c, d) could beclaimed as “significant”.

There are thus two directions that can be followed for taking the objections tothe formula of ELECTRE III into account. In the first option, one considers thatthe meaning of the concordance and discordance degrees is ordinal and one tries todetermine a family of aggregation formulae that fulfil basic requirements includingcompatibility with the ordinal character of concordance and discordance. Theother option consists in revising the way concordance and discordance indices areconstructed in order to have a quantitative meaning that allows to use arithmeticoperations for aggregating them. That is, at least tentatively, the option followed


in the PROMETHEE methods (see Brans and Vincke (1985) or Vincke (1992);these methods may be interpreted as aiming towards building a value functionon the pairs of alternatives; this function would represent the overall difference inpreference between any two alternatives. The way that this function is constructedin practice however, leaves the door open to remarks analogous to those addressedto the weighted sum in Section 6.2.

6.5 General conclusion

This long chapter has enabled us to travel through the continent of formal methodsof decision analysis; by “formal” we mean those methods relying on an explicitmathematical model of the decision-maker’s preferences. We neither looked intoall methods nor did we explore those we looked into completely. There are othercontinents that have been almost completely ignored, in particular all the methodsthat do not rely on a formal modelling of the preferences (see for instance thebook edited by Rosenhead (1989) in which various approaches are presented forstructuring problems in view of facilitating decision making).

On the particular topic of multi-attribute decision analysis, we may summariseour main conclusions as follows:

• Numbers do not always mean what they seem to. It makes no sense to ma-nipulate raw evaluations without taking the context into account. Numbersmay have an ordinal meaning, in which case it cannot be recommended toperform arithmetic operations on them; they may be evaluations on an inter-val scale or a ratio scale and there are appropriate transformations that areallowed for each type of scale. We have also suggested that the significanceof a number may be intermediate between ordinal and cardinal; in that case,the interval separating two evaluations might be given an interpretation: onemight take into consideration the fact that intervals are e.g. large, mediumor small. Evaluations may also be imprecise and knowing that should influ-ence the way they will be handled. Preference modelling is specifically theactivity that deals with the meaning of the data in a decision context.

• Preference modelling does not only take objective information linked withthe evaluations or with the data, such as the type of scale or the degreeof precision or the degree of certainty into account. It also incorporatessubjective information in relation to the preferences of the decision maker.Even if numeric evaluations actually mean what they seem to, their signifi-cance is not immediately in terms of preferences: the interval separating twoevaluations must be reinterpreted in terms of difference in preferences.

• The (vague) notion of importance of the criteria and its implementation arestrongly model-dependent. Weights and trade-offs should not be elicited inthe same manner depending on the type of model since e.g. they may or maynot depend on the scaling of the criteria.

• There are various types of models that can be used in a decision process.There is no best model; all have their strong points and their weak points.

6.5. GENERAL CONCLUSION 145

The choice of a particular approach (including a type of model) should be theresult of an evaluation, in a given decision situation, of the chances of beingable to elicit the parameters of the corresponding model in a reliable manner;these “chances” obviously depend on several factors including the type andprecision of the available data, the way of thinking of the decision-maker,his knowledge of the problem. Another factor that should be considered forchoosing a model, is the type of information that is wanted as output: thedecision maker needs different information when he has to rank alternativesto when he has to choose among alternatives or when he has to assign themto predefined (ordered) categories (we put the latter problem aside in ourdiscussion of the car choosing case). So, in our view, the ideal decisionanalyst, should master several methodologies for building a model. Noticethat additional dimensions make the choice and the construction of a modelin group decision making even more difficult; the dynamics of such decisionprocesses is by far more complex, involving conflicts and negotiation aspects;constructing complete formal models in such contexts is not always possible,but it remains that using problem structuring tools (such as cognitive maps)may prove profitable.

• A direct consequence of the possibility of using different models is that theoutput may be discordant or even contradictory. We have encountered sucha situation several times in the above study; cars may be ranked in differentpositions according to the method that is used. This does not puzzle us toomuch. First of all, because the observed differences appear more as vari-ants than as contradictions; the various outputs are remarkably consistentand the variants can be explained to some extent. Second, the approachesuse different concepts and the questions the decision maker has to answerare accordingly expressed in different languages; this of course induces vari-ability. This is no wonder since the information that decision analysis aimsat capturing cannot usually be precisely measured. It is sufficient to recallthat experiments have shown that there is much variability in the answersof subjects submitted to the same questions at time intervals. Does thismean that all methods are acceptable? Not at all. There are several criteriaof validity. One is that the method has to be accepted in a particular de-cision situation; this means that the questions asked to the decision-makermust make sense to him and he should not be asked for information he isunable to provide in a reliable manner. There are also internal and externalconsistency criteria that a method should fulfil. Internal consistency impliesmaking explicit the hypotheses under which data form an acceptable inputfor a method; then the method should perform operations on the input thatare compatible with the supposed properties of the input; this in turn inducesan output which enjoys particular properties. External consistency consistsin checking whether the available information matches the requirements ofacceptable inputs and whether the output may help in the decision process.The main goal of the above study was to illustrate the issue of internal andexternal validity on a few methods in a specific simple problem.


Besides the above points that are specific to multiple criteria preference models,more general lessons can also be drawn.

• If we consider our trip from the weighted sum to the additive multi-attributevalue model in retrospect, we see that much self-confidence and therefrommuch convincing power can be gained by eliciting conditions under whichan approach such as the weighted sum would be legitimate. The analysisis worth the effort because precise concepts (like trade-offs and values) aresculptured through analysis that also results in methods for eliciting theparameters of the model. Another advantage of theory is to provide uswith limits, i.e. conditions under which a model is valid and a method isapplicable. From this viewpoint and although the outranking methods havenot been fully characterised, it is worth noticing that their study has recentlymade theoretical progress (see e.g. Arrow and Raynaud (1986), Bouyssouand Perny (1992), Vincke (1992), Fodor and Roubens (1994), Tsoukias andVincke (1995) , Bouyssou (1996), Marchant (1996), Bouyssou and Pirlot(1997)), Pirlot (1997)) .

• An advantage of formal models that could not be overemphasised is thatthey favour communication. In the course of the decision process, the con-struction of the model requires that pieces of information, knowledge andpriorities that are usually implicit or hidden, be brought into light and takeninto account; also, the choice of the model reflects the type of available in-formation (more or less certain, precise, quantitative). The result is oftena synthesis of what is known and what has been learnt about the decisionproblem in the process of elaborating the model. The fact that a model isformal also allows for some sort of calculations; in particular, testing to whatextent the conclusions are stable when the evaluation of imprecise data arevaried is possible within formal models. Once a decision has been made, themodel does not lose its utility. It can provide grounds for arguing in favouror against a decision. It can be adapted to make ulterior decisions in similarcontexts.

• The “decisiveness” of the output depends on the “richness” of the infor-mation available. If the knowledge is uncertain, imprecise or simply non-quantitative in nature, it may be difficult to build a very strong model;by “strong”, we mean a model that clearly suggests a decision as, for in-stance, those that produce a ranking of the alternatives. Other models(and especially those based on pairwise comparisons of alternatives and ver-ifying the independence of irrelevant alternatives property) are not able—structurally—to produce a ranking; they may nevertheless be the best possi-ble synthesis of the relevant information in particular decision situations. Inany case, even if the model leads to a ranking, the decision is to be taken bythe decision-maker and it is not in general an automatic consequence of themodel (due for instance to imprecisions in the data that calls for a relativi-sation of the model’s prescription). As will be illustrated in greater detail inChapter 9, the construction of a model is not all of the decision process.

7DECIDING AUTOMATICALLY:

THE EXAMPLE OF RULE BASEDCONTROL

7.1 Introduction

The increasing development of automatic systems in most sectors of human ac-tivities (e.g. manufacturing, management, medicine, etc.) has progressively ledto involving computers in many tasks traditionally reserved to humans, even themore “strategic” ones such as control, evaluation and decision-making. The mainfunction of automatic decision systems is to act as a substitute for humans (deci-sion makers, experts) in the execution of repetitive decision tasks. Such systemscan be in charge of all or part of the decision process. The main tasks to be per-formed by automatic decision systems are collecting information (e.g. by sensors),making a diagnosis of the current situation, selecting relevant actions, executingand controlling these actions. Automatisation of these tasks requires the elabo-ration of computational models able to simulate human reasoning. Such modelsare, in many respects, comparable to those involved in the scientific preparationof human decisions. Indeed, deciding automatically is also a matter of representa-tion, evaluation and comparison. For this reason, we introduce and discuss somevery simple techniques used to design rule-based decision/control systems. This isone more opportunity for us to address some important issues linked to descrip-tive, normative and constructive aspects of mathematical modelling for decisionsupport:

• descriptive aspects: the function of automatic decision systems is, to someextent, to be able to predict, simulate and extrapolate human reasoning anddecision-making in an autonomous way. This requires different tasks suchas the collection of human expertise, the representation of knowledge, theextraction of rules and the modelling of preferences. For all these activities,the choice of appropriate formal models, symbolical as well as numerical, iscrucial in order to describe situations and process information.

• constructive aspects: in most fields of application, there is no completelyfixed and well formalised body of knowledge that could be exploited by theanalyst responsible for the implementation of a decision system. Valuableinformation can be obtained from human experts, but this expertise is often

147

148 CHAPTER 7. DECIDING AUTOMATICALLY

very complex and ill-structured, with a lot of “exceptions”. Hence, theformal model handling the core of human skill in decision-making must beconstructed by the analyst, in close cooperation with experts. They mustdecide together what type of input should be used, what type of output isneeded, and what type of consideration should play a role in linking outputto input. One must also decide how to link subjective symbolic information(close to the language of the expert) and objective numeric data that can beaccessible to the system.

• normative aspects: it is generally not possible to ask the expert to producean exhaustive list of situations with their adequate solution. Usually, thistype of information is given only for a sample of typical situations, whichimplies that only a partial model can be constructed. To be fully efficient,this model must be completed with some general principles and rules usedby the expert. In order to extrapolate examples as well as expert decisionrules in a reasonable way, there is a need for normative principles puttingconstraints on inference so as to decide what can seriously be inferred by thesystem from any new input. Hence, the analysis of the formal properties ofour model is crucial for the validation of the system.

These three points show how the use of formal models and the analysis of themathematical properties of the models are crucial in automatic decision-making.In this respect, the modelling exercise discussed here is comparable to those treatedin the previous chapters, concerning human decision-making, but includes spe-cial features due to the automatisation (stable pre-existing knowledge and pref-erences, real-time decision-making, closed system completely autonomous, etc.).We present a critical introduction to the use of simple formal tools such as fuzzysets and rule-based system to model human knowledge and decision rules. We alsomake explicit multiple criteria aggregation problems arising in the implementationof these rules and discuss some important issues linked to rule aggregation.

For the sake of illustration, we consider two types of automatic decision Systemsin this chapter:

• decision systems based on explicit decision rules: such systems are used inpractical situations where the decision-maker or the expert is able to makeexplicit the principles and rules he uses to make a decision. It is also assumedthat these rules constitute a consistent body of knowledge, sufficiently ex-haustive to reproduce, predict and explain human decisions. Such systemsare illustrated in section 7.2 where the control of an automatic watering sys-tem is discussed, and in section 7.4 where a decision problem in the contextof the automatic control of a food process is briefly presented. In the firstcase, the decision problem concerns the choice of an appropriate duration forwatering, whereas in the second case, it concerns the determination of ovensettings aimed at preserving the quality of biscuits.

• decision systems based on implicit decision rules: such systems are used inpractical applications for which it is not possible to obtain explicit decision

7.2. A SYSTEM WITH EXPLICIT DECISION RULES 149

rules. This is very frequent in practice. The main possible reasons for it arethe following:

– the decision-maker or the expert is unable to provide sufficiently clearinformation to construct decision rules, or his expertise is too complexto be simply representable by a consistent set of decision rules,

– the decision-maker or the expert is able to provide a set of decision rules,but these decision rules are not easily expressible using variables thatcan be observed by the system. A typical example of such a situationoccurs in the domain of subjective evaluation (see Grabisch, Guely andPerny 1997) where the quality of a product is defined on the basis ofhuman perception.

– the decision-maker or the expert does not want to reveal his own strat-egy for making decisions. This can be due to the existence of strategicor confidential information that cannot be revealed or alternatively be-cause this expertise represents his only competence making him indis-pensable to his organisation.

Such systems are illustrated in section 7.3, also in the context of the auto-matic control of food processes. We will use the problem of controlling thebiscuit quality during baking as an illustrative case where numerical deci-sion models based on pattern matching procedures can be used to performa diagnosis of disfunction and a regulation of the oven, without any explicitrule.

7.2 A System with Explicit Decision Rules

Automatising human decision-making is often a difficult task because of the com-plexity of the information involved in human reasoning. In some cases, however,the decision making process is repetitive and well-known so that automatisationbecomes feasible. In this section, we would like to consider an interesting sub-class of “easy” problems where human decisions can be explained by a small setof decision rules of type:

if X is A and Y is B then Z is C

where the X and Y variables are used to describe the current decision context(input variables) and Z is a variable representing the decision (output variable).Whenever X and Y can be automatically observed by the decision system (e.g.using sensors), human skill and experience in problem solving can be approximatedand simulated using the fuzzy control approach (see e.g. Nguyen and Sugeno1998). Such an approach is based on the use of fuzzy sets and multiple criteriaaggregation functions. Our purpose is to emphasise the interest as well as thedifficulty of resorting to such formal notions on real practical examples.


7.2.1 Designing a decision system for automatic watering

Let us consider the following case: the owner of a nice estate has the responsibilityof watering the family garden, and this task must be performed several timesper week. Every evening, the man usually estimates the air temperature and theground moisture so as to decide the appropriate time required for watering hisgarden. This amount of time is determined so as to satisfy a twofold objective:on the one hand he wants to preserve the nice aspect of his garden (especially thedahlias put in by his wife at the beginning of the summer) but on the other hand,he does not want to use too much water for this, preferring to allocate his financialresources to more essential activities. Because this small decision problem is veryrepetitive and also because the occasional gardener does not want to delegate theresponsibility of the garden to somebody else, he decided to purchase an automaticwatering system. The function of this system is first to check every evening,whether watering is necessary or not, and second to determine automatically thewatering time required. The implicit aim of the occasional gardener is to obtaina system that implement the same rules as he does; in his mind, this is the bestway to really preserve the current beautiful aspect of the garden.

In this case, we need a system able to periodically measure the air temperatureand the soil moisture and a decision module able to determine the appropriateduration of watering, as shown in Figure 7.1.

Figure 7.1: The Decision Module of the Watering System

7.2.2 Linking symbolic and numerical representations

Let t denote the current temperature of the air (in degrees Celsius), and m themoisture of the ground defined as the water content of the soil. This secondquantity, expressed in centigrams per gram (cg/g), corresponds to the ratio:

m = 100× x1 − x2

x2

where x1 is the weight of a soil sample and x2 the weight of the same sampleafter drying in a low-temperature oven (75–105

o

C). Assuming the quantities tand m can be observed automatically, they will constitute the input data of thedecision module in charge of the computation of the watering time w (expressedin minutes), which is the sole output of the module.

Clearly, w must be defined as a function of the input parameters. Thus, we arelooking for a function f such that w = f(t,m) that can simulate the usual decisionsof the gardener. Function f must be defined so as to include the subjectivity of


the gardener both in diagnosis steps (evaluation of the current situation) and indecision-making steps (choice of an appropriate action). A common way to achievethis task is to elicit decision rules from the gardener using a very simple language,as close as possible to the natural language used by the gardener to explain hisdecision. For instance, we can use propositional logic and define rules of thefollowing form:

If T is A and M is B then W is C

where T and M are descriptive variables used for temperature and soil moisture, Wis an output variable used to represent the decision and A,B,C are linguistic values(labels) used to describe temperature, moisture and watering time respectively.For example, suppose the gardener is able to formulate the following empiricaldecision rules:

Decision rules provided by the gardener:

R1 if air temperature is Hot and soil moisture is Lowthen watering time is VeryLong;

R2 if air temperature is Warm and soil moisture is Lowthen watering time is Long;

R3 if air temperature is Cool and soil moisture is Lowthen watering time is Long;

R4 if air temperature is Hot and soil moisture is Mediumthen watering time is Long;

R5 if air temperature is Warm and soil moisture is Mediumthen watering time is Medium;

R6 if air temperature is Cool and soil moisture is Mediumthen watering time is Medium;

R7 if air temperature is Hot and soil moisture is Highthen watering time is Medium;

R8 if air temperature is Warm and soil moisture is Highthen watering time is Short;

R9 if air temperature is Cool and soil moisture is Highthen watering time is VeryShort

R10 if air temperature is Cold then watering time is Zero

Notice that the elicitation of such rules is usually not straightforward, even if itis the result of a close collaboration with experts in that domain. Indeed, generalrules used by experts may appear to be partially inconsistent and must ofteninclude explicit exceptions to be fully operational. Even without any inconsistency,the individual acceptance of each rule is not sufficient to validate the whole setof rules. In some situations, unsuitable conclusions may appear, resulting fromseveral inferences due to the coexistence of apparently “reasonable” rules. Thismakes the validation of a set of rules particularly difficult. Even in the case ofcontrol rules where there is no need for chaining inferences (we assume here thatthe rules directly link inputs (observations) to outputs (decisions)), structuring


the expert knowledge so as to obtain a synthesis of the expert rules in the formof a decision table (table linking outputs to inputs) requires a significant effort.We will show alternative approaches that do not require the explicit formulationof decision rules in Section 7.3.

Now, assuming that the above set of decision rules has been obtained, theproblem is the following: suppose the current air temperature and soil moistureare known, how can a watering time be computed from these sentences, in otherwords how can f be defined so as to properly reflect the strategy underlying theserules? Some partial answers could be obtained if we could define a formal relationlinking the various labels occurring in the decision rules and the physical quantitiesobservable by the system. We can observe that the decision rules are expressedusing only three variables, i.e. the air temperature T , the soil moisture M , andthe watering time W . Moreover, they all take the following form:

either if T is Ti then W is Wk

or if T is Ti and M is Mj then W is Wk

The possible labels Ti,Mj and Wk for temperature, moisture and wateringtime are given by the sets Tlabels, Mlabels and Wlabels respectively:

• Tlabels = {Cold, Cool, Warm, Hot}. These labels can be seen as differentwords used to specify different areas on the temperature scale.

• Mlabels = {Low, Medium, High}. These labels can be seen as words used tospecify different areas on the moisture scale

• Wlabels = {Zero, VeryShort, Short, Medium, Long,VeryLong}. These labelscan be seen as different words used to specify different areas on the timescale

Using these labels, the rules can be synthesised by the following decision table(see Table 7.1):

Mj \ Ti Cold Cool Warm HotLow Zero (R10) Long (R3) Long (R2) VeryLong (R1)

Medium Zero (R10) Medium (R6) Medium (R5) Long (R4)High Zero (R10) VeryShort (R9) Short (R8) Medium (R7)

Table 7.1: The decision table of the gardener

This decision table represents a symbolic function F linking Tlabels and Mla-bels to Wlabels (Wk = F (Ti,Mj)). Now, we need to produce a numerical trans-lation of function F in order to construct a numerical function f called “transferfunction”, whose role is to compute a watering time w from any input (t,m). Tobuild such a function, the standard process consists in the following stages:

1. identify the current state (diagnosis) and provide a symbolic description ofthis state,


2. activate the relevant decision rules for the current state (inference),

3. synthesise the recommendations induced from the rules and derive a numer-ical output (decision)

The diagnosis stage consists in identifying the current state of the system usingnumerical measures and describing this state in the language used by the expertto express his decision rules. The inference stage consists of an activation of therules whose premises match the description of the current state. The decisionstage consists of a synthesis of the various conclusions derived from the rules andthe selection of the most appropriate action (at this stage, the selected action isprecisely defined by numerical output values). Thus, the definition of the decisionfunction f relies on a symbolic translation of the initial numerical information inthe diagnosis stage, a purely symbolic inference implementing the usual decision-making reasoning and then a numerical translation of the conclusions derivedfrom the rules. The symbolic/numerical translation possibly includes the subjec-tivity of the decision-maker (perceptions, beliefs, etc), both in the diagnosis anddecision stages. For example, in the gardener example, the subjectivity of thedecision maker is not only expressed in choosing particular decision rules, but alsoin linking input labels (T labels and Mlabels) to observable values chosen on thebasis of the temperature and moisture scales. In the decision step, the expert ordecision-maker’s subjectivity can also be expressed by linking output labels (Wla-bels) with elements of the time scale. There are several ways of establishing thesymbolic/numeric translation first in the diagnosis stage and then in the decisionstage. In both stages, symbols can be linked to scalars, intervals or fuzzy sets,depending of the level of sophistication of the model. In the following subsections,we present the main basic possibilities and discuss the associated representationand aggregation problems.

7.2.3 Interpreting input labels as scalars

A first and simple way of building the symbolic/numerical correspondence is byasking the decision-maker to associate a typical scalar value to each input labelused in the rules. Note that the simplicity of the task is only apparent. Anindividual, expert or not, may feel uncomfortable in specifying the scalar transla-tion precisely. This is particularly true concerning parameters like “soil moisture”which are not easily perceived by humans and whose qualification requires an im-portant cognitive effort. Even for apparently simpler notions such as temperatureand duration, the expert may be reluctant to make a categorical symbolic/scalartranslation. If nevertheless he is constrained to produce scalars, he will have tosacrifice a large part of his expertise and the resulting model may lose much of itsrelevance to the real situation. We will see later how the difficulty can partly beovercome by the use of non-scalar translations of labels. Let us assume now, forthe sake of illustration, that the following numerical information has been providedby the expert (see Tables 7.2, 7.3 and 7.4).

A possible way of constructing such tables is to put the expert in various situ-ations, to ask him to qualify each situation with one of the admissible labels, and


Tlabels Cold Cool Warm HotTemperatures (oC) 10 20 25 30

Table 7.2: Typical temperatures associated to labels Ti

Mlabels Low Medium HighSoil water content (cg/g) 10 20 30

Table 7.3: Typical moisture levels associated to labels Mj

Wlabels VeryShort Short Medium Long VeryLongTimes (mn) 5 10 20 35 60

Table 7.4: Typical times associated to Wk

to measure the observable parameters with gauges so as to make the correspon-dence. Of course, the reliability of the information elicited with such a process isquestionable. The analyst must be aware of the share of arbitrariness attached tosuch a symbolic/numerical translation. He must keep it in mind during the wholeconstruction of the system and also later in interpreting the outputs of the system.

From the above tables of scalars, the rules allow the following reference pointsto be constructed:

t 30 25 20 30 25 20 30 25 20 10 10 10m 10 10 10 20 20 20 30 30 30 10 20 30w 60 35 35 35 20 20 20 10 5 0 0 0

Table 7.5: Typical reference points

Hence, the “transfer function” f linking watering time w to the pair (t,m) isknown for a finite list of cases and must be extrapolated to the entire range ofpossible inputs (t,m). This leads to a well-known mathematical problem sincefunction f must be defined so as to interpolate points of type (t,m,w) wherew = f(t,m). Of course, the solution is not unique and some additional assumptionsare necessary to define precisely the surface we are looking for. There is no spacein this chapter to discuss the relative interest of the various possible interpolationmethods that could be used to obtain f . The simplest method is to perform a linearinterpolation from the reference points given in Table 7.5. This implies averagingthe outputs associated to the reference points located in the neighbourhood of theobserved parameters (t,m). For instance, if the observation is (t,m) = (29, 16) theneighbourhood is given by 4 reference points obtained from rules R1, R2, R4, andR5. This yields points P1 = (30, 10), P2 = (25, 10),P4 = (30, 20), and P5 = (25, 20)with the respective weights 0.32, 0.08, 0.48, 0.12, weight ωij of point (xi, yj) being


defined by:

ωij =(

1− |29− xi|30− 25

)×

(1− |16− yj |

20− 10

)(7.1)

The watering times associated to points P1, P2, P4 and P5 are 60, 35, 35, 20 andtherefore, the final time obtained by a weighted linear aggregation is 41 minutesand 12 seconds. Performing the same approach for any possible input (t,m) leadsto the following piecewise linear approximation of function f , see Figure 7.2.

Figure 7.2: Approximation of f by linear interpolation

This piecewise linear interpolation method is however not completely satis-factory. First of all, no information justifies that function f is linear betweenpoints to be interpolated. Many other interpolation methods could be used aswell, making a non-linear f possible. For example, one can use more sophisticatedinterpolating methods based on B-spline functions that produce very smooth sur-faces with good continuity and locality properties (see e.g. Bartels, Beatty andBarsky 1987). Moreover, as mentioned above, the definition of reference pointsfrom the gardener’s rules is far from being easy and other relevant sets of scalarvalues could be considered as well. As a consequence, the need of interpolatingthe reference points given in Table 7.5 is itself questionable. Instead of performingan exact interpolation of these points, one may prefer to modify the link betweensymbols and numerical scales in order to allow symbols to be represented by sub-sets of plausible numerical values. Thus, reference points are replaced by referenceareas in the parameter’s space (t,m,w), and the interpolation problem must bereformulated. This point is discussed below.


7.2.4 Interpreting input labels as intervals

In the gardener’s example, substituting labels Ti and Mj by scalar values on thetemperature and moisture scales has the advantage of simplicity. However, itdoes not provide a complete solution since function f is only known for a finitesample of inputs and requires interpolation to be extended to the entire set ofpossible inputs. Moreover, in many cases, each label represents a range of valuesrather than a single value on a numerical scale. In such cases, representing thedifferent labels used in the rules by intervals seems preferable. If the intervals aredefined so as to cover all plausible values, any possible input belongs to at leastone interval and therefore, can be translated into at least one label. Basically, wecan distinguish two cases, depending on whether the intervals associated to labelspartially overlap or not.

Labels represented by disjoint intervals

Suppose that the gardener is able to divide the temperature scale into consecutiveintervals, each corresponding to the most plausible values attached to a labelTi. Assuming this is also possible for the moisture scale, these intervals form apartition of the temperature and moisture scales respectively. Hence, each input(t,m) corresponds to a pair {Ti,Mj} where Ti (resp. Mj) is the label associatedto the interval containing t (resp. m). In this case, thereis a unique active rulein Table 7.1 and the conclusion is easy to reach. For example, let us consider thefollowing intervals:

Tlabels Cold Cool Warm HotTemperatures (oC) (−∞, 17.5) [17.5, 22.5) [22.5 27.5) [27.5,+∞)

Table 7.6: Intervals associated to labels Ti

Mlabels Low Medium HighSoil water content (cg/g) [0, 15) [15, 25) [25, 100]

Table 7.7: Intervals associated to labels Mj

If (t,m) = (29, 16), then the associated labels are {Hot,Medium} and there-fore, the only active rule is R4 whose conclusion is “watering time is long”. Thus,if we keep the interpretation of “long” given in Table 7.4 the numerical output is35.

This process is simple but has serious drawbacks. The granularity of the lan-guage used to describe the current state of the system is poor and many signif-icantly different states are seen as equivalent. This is the case, for example, ofthe two inputs (17.5, 15) and (22.4, 24.9) that both translate as (Cool,Medium).On the contrary, for some other pairs of inputs that are very similar, the trans-lation diverges. This is the case of (17.4, 14.9) and (17.5, 15) that respectively


give (Cold, Low) and (Cool,Medium). In the first case, rule R10 is activated anda zero watering time is decided. In the second case, rule R6 is activated and amedium watering time is recommended, 20 minutes according to Table 7.4. Suchdiscontinuities cannot really be justified and make the output f(t,m) arbitrarilysensitive to the inputs (t,m). This is not suitable because such decision systemsare often included in a permanent observation/reaction loop. Suppose for examplethat several consecutive situations of temperature and moisture in a stable situa-tion yield different values for parameter t and m due to the imperfection of gaugesand that these variations occur around a point of discontinuity in the system.This can produce alternated sequences of outputs such as Short, Zero, Medium,Zero, leading to alternate starts and stops of the system, and possibly leading todysfunctions.

It is true that narrowing the intervals and multiplying the labels would reducethese drawbacks and refine the granularity of the description, but the numberof rules necessary to characterise f would grow significantly with the number oflabels. Expressing so many labels and rules requires a very important cognitiveeffort that cannot reasonably be expected from the expert. Nevertheless, reducingdiscontinuity induced by interval boundaries without multiplying labels is possible.A first option for this is allowing for overlap between consecutive intervals, asshown below.

Labels represented by overlapping intervals

In order to improve on the previous solution, we have to specify the links betweenthe values of physical variables describing the system and the symbolic labels usedto describe the current state of the system more carefully. Since it is difficultto separate such intervals with precise boundaries, one can make them partiallyoverlap. As a consequence, in some intermediary areas of the temperature scale,two consecutive labels are associated to a given temperature, reflecting the possiblehesitation of the gardener in the choice of a unique label. Typically, if Warm andHot are represented by intervals [20, 30] and [25,+∞) respectively, 29oC becomesa temperature compatible with the two labels. More precisely, from 20oC to25oC, Warm is a valid label (a possible source of rule activation) but not Hot,from 25oC to 30oC both labels are valid, and from 30oC, hot is valid but notwarm. This progressive transition between the two states warm and hot refinesthe initial sharp transition from warm to hot by introducing an intermediary statecorresponding to an hesitation between the two labels. This is more realistic,especially because there is no reasonable way of separating the “warm” and “hot”with a precise boundary. Note however that measuring a temperature of 29oCpossibly allow several rules to be active in the same time. This raises a newproblem since these rules may possibly conclude to diverging recommendationsfrom which a synthesis must be derived. Any output label (labels Wk in theexample) must be translated by numbers and these numbers must be aggregatedto obtain the numerical output of the system (the value of w in the example). Thus,the definition of a numerical output can be seen as an aggregation problem, whereaggregation is used to interpolate between conflicting rules. As an illustration, we


assume now that the labels are represented by the intervals given in Tables 7.8and 7.9:

Tlabels Cold Cool Warm HotTemperatures (oC) (−∞, 20] [15, 25] [20, 30] [25,+∞)

Table 7.8: Intervals associated to labels Ti

Mlabels Low Medium HighSoil water content (cg/g) [0, 20] [10, 30] [20, 100]

Table 7.9: Intervals associated to labels Mj

If the observation of the current situation is t = 29oC and m = 16cg/g, therelevant labels are {Warm,Hot} for temperature and {Medium,High} for mois-ture. These qualitative labels allow some of the gardener’s rules to be activated,namely R1, R2, R4, R5. This gives several symbolic values for the watering dura-tion, namely Medium (by R5), Long (by R2, R4) and VeryLong (by R1). Therefore,we can observe 3 conflicting recommendations and the final decision must be de-rived from a synthesis of these results. Of course, defining what could be a fairsynthesis of conflicting qualitative outputs is not an easy task. Deriving a numer-ical duration from this synthesis is not any easier.

A simple idea is to process symbols as numbers. For this, one can link symbolicand numerical information using Table 7.4. In the example, we obtain three dif-ferent durations, i.e 20, 35 and 60 minutes that must be aggregated. For example,one can calculate the arithmetic mean of the 3 outputs. More generally, we candefine a weight ω(R) for each decision rule R in the gardener database B. Thisweight represents the activity of the rule and, by convention, for any state (t,m),we set ω(R) = 1 when the decision rule R is activated and ω(R) = 0 otherwise.Let B(α) denote the subset of rules concluding to a watering time α. For anypossible value α of w, a weight ω(α) measuring the activity or importance of theset B(α) can be defined as a continuous and increasing function of the quantitiesω(R), R ∈ B(α). For example, we can choose:

ω(α) = supR∈B(α)

ω(R)(7.2)

Hence, each watering time activated by at least one rule receives the weight 1and any other time receives the weight 0. For example, with the observation(t,m) = (29, 16), we have seen that the active rules are R1, R2, R4 and R5 andtherefore ω(R1) = ω(R2) = ω(R4) = ω(R5) = 1 whereas ω(R) = 0 for anyother rule R. Let us now present in detail the calculation of ω(35). Since 35(minutes) is the scalar translation of Long, we obtain from the gardener’s rulesB(35) = {R2, R3, R4}. Hence ω(35) = sup{ω(R2), ω(R3), ω(R4)} = 1. Similarly


we get ω(20) = 1 thanks to R5 and ω(60) = 1 thanks to R1. Because there are noactive rules left, ω(α) = 0 for all other α.

Another option taking account of the number of rules supporting each time αcould be:

ω(α) =∑

R∈B(α)

ω(R)(7.3)

Coming back to the example, we now obtain: ω(60) = ω(R2)+ ω(R3)+ ω(R4) =2 whereas the others ω(α) remain unchanged. This second option gives more im-portance to a time α supported by several rules than to a time α′ supported bya single rule. Everything works as if each active rule was voting for a time. Themore a given time is supported by the set of active rules, the more it becomesimportant in the calculation of the final watering time. The option (7.3) couldbe preferred when the activation of the various rules are independent. On thecontrary, when the activation of a subset of rules necessarily implies that anothersubset of rules is also active, one could prefer resorting to (7.2) so as to avoid pos-sible overweighing due to redundancy in the set of rules. In a practical situation,one can easily imagine that the choice of one of these options is not easy to justify.

Since there is a finite number of rules, there is only a finite number of timesactivated by the rules in a given state. In order to synthesise these different times,the most popular approach is the “centre of gravity” method which amounts toperforming a weighted sum (see also chapter 6) of all possible times α. Formallythe final output is defined by:

w =∑

α ω(α).α∑α ω(α)

(7.4)

From the observation (t,m) = (29, 16), equations (7.2) and (7.4) yield a water-ing time of (60 + 35 + 20)/3 yielding 38 minutes and 20 seconds, whereas equation(7.3) yields: w = 0.25 × (60 + 35 + 35 + 20) that amounts to 37 minutes and 30seconds. Note that the choice of a weighted sum as final aggregator in equation(7.4) is questionable and one could formulate criticisms similar to those addressedto the weighted average in the previous chapters (especially in chapter 6).

In this approach, as in the linear interpolation approach used in the previoussubsection, the final result has been obtained as a result of the following sequence:

1. read the current values of input parameters t and m

2. find the symbolic qualifiers that best fit these values

3. detect the decision rules activated by these observations

4. collect the symbolic outputs resulting from the inferences

5. translate symbols into quantitative numerical outputs

6. aggregate these numerical outputs


This process is perhaps the more elementary way of using a set of symbolic decisionrules to build a numerical decision function. It shows a simple illustration of theso-called “computing with words” paradigm advocated by Zadeh (see Zadeh 1999).The main advantages of such a process are the following:

• it relies on simple decision rules expressed in a language close to the naturallanguage used by the expert,

• it allows one to define a reasonable decision function allowing numericaloutputs to be computed from any possible numerical input,

• if necessary, any decision can be explained very simply. The outputs canalways be presented as a compromise between recommendations derived fromseveral of the expert’s decision rules.

Nevertheless, interpreting labels as intervals does not really prevent discontinu-ous transfers from inputs to outputs. In fact, it is not easy to describe a continuumof states (characterised by all pairs (t,m) in the gardener example) with a finitenumber of labels of type (Ti,Mj). This induces arbitrary choices in the descrip-tion of the current state which could disrupt the diagnosis stage and make theautomatic decision process discontinuous, as shown by the following example.

Example (1). Consider two very similar states s1 and s2 characterised by theobservations (t,m) = (25.01, 19.99) and (t,m) = (24.99, 20.01). According to Ta-bles 7.8 and 7.9, state s1 makes valid the labels {Warm,Hot} for temperature,and {Low,Medium} for soil moisture. This activates rules R1, R2, R4 and R5

whose recommendations are VeryLong, Long, Long, Medium respectively. Theresulting watering time obtained by equation (7.4) is therefore 38 minutes and45 seconds. Things are really different for s2 however. The valid labels are{Cool,Warm} for temperature, and {Medium,High} for soil moisture. Thisactivates rules R5, R6, R8 and R9 whose recommendations are Medium, Medium,Short, VeryShort respectively. The resulting watering time obtained by equation(7.4) is therefore 13 minutes and 45 seconds. It is worth noting that, despite theclose similarity between states s1 and s2, there is a significant difference in the wa-tering times computed from the two input vectors. This is due to the discontinuityof the transfer function that defines the watering time from the input (t,m) for(t,m) = (25, 20). In the right neighbourhood of this entry (t > 25 and m < 20),the decision rules R1, R2 and R4 are fully active but this is no longer the case inthe left neighbourhood of the point (t < 25 and m > 20) where they are replacedby rules R6, R8 and R9, thus leading to a much shorter time. The activationsand computations performed for s1 and s2 differ significantly. They lead to verydifferent outputs, despite the similarity of the states. ♦

This criticism is serious, but the difficulty can partly be overcome. It is truethat, depending on the choice of the numerical encoding of the labels, the numer-ical outputs resulting from the decision rules may vary significantly. Since thenumerical/symbolic and then symbolic/numerical translations are both sources ofarbitrariness, the following question can be raised: why not usenumbers directly?


There are two partial answers: first, in many decision contexts, the possibilityof justifying decisions is a great advantage. Although this is not crucial in ourillustrative example, the ability of automatic decision systems to simulate humanreasoning and explain decision by rules is generally seen as an important advan-tage. This argument often justifies the use of rule-based systems to automatisedecision-making, even if each decision considered separately is of marginal impor-tance. Second, there are several ways of improving the process proposed aboveand of refining the formal relationship between qualitative labels and numericalvalues. It is not our purpose to cover all possibilities in detail. We only present anddiscuss some very simple and intuitive ideas used to construct more sophisticatedmodels and tools in this context.

7.2.5 Interpreting input labels as fuzzy intervals

One step back in the modelling process, we can redefine the relationship betweena given label and the numerical scale associated to the label more precisely. As anexpert, the gardener can easily specify the typical temperatures associated witheach label. He can also define areas that are definitely not concerned with eachlabel. For example, he could explain that Warm means between 20 and 30 degreeswith 25 as the most plausible value. More precisely, one can define the relativelikelihood of each temperature when the temperature has been qualified as Hot,Warm, Cool or Cold. In this case, each label Ti is represented by a [0, 1]-valuedfunction µTi defined on the temperature scale in such a way that µTi(t) representsthe compatibility degree between temperature t and label Ti. As a convention,we set µTi

(t) = 0 when temperature t is not connected to the label Ti, and µTi(t)

= 1 when t is perfectly representative of the label. Thus, each label Ti is definedwith fuzzy boundaries and characterised by the function µTi

. These fuzzy labelscan partially overlap but they must be defined in such a way that any part of thetemperature scale is covered by at least one label. A simple example of such fuzzylabels is represented in Figures 7.3 and 7.4.

Figure 7.3: Fuzzy labels for the air temperature

Note that sometimes, the fuzzy labels are defined in such a way that member-ship adds up to 1 for any possible value of the numerical parameter. This is thecase of labels defined in figure 7.4 for which we have:

∀m ≥ 0, µLow(m) + µMedium(m) + µHigh(m) = 1(7.5)


Figure 7.4: Fuzzy labels for the soil moisture

This property (7.5) the numerical translation of a natural condition requiringthat the fuzzy labels Low, Medium and High form a partition of the set of possiblemoistures. Note however that this property makes sense only when membershipvalues have a cardinal meaning.

With such fuzzy labels, each decision rule can be activated to a certain degree.This is the degree to which the numerical inputs match the premises of the rule.More precisely, for any rule Rij of type:

if T is Ti and M is Mj then W is Wk

where Wk = F (Ti,Mj), and for any numerical observation (t,m), the weight (oractivation degree) ωij of the rule Rij reflects the importance (or relevance) of therule in the current situation. This importance depends on the matching of theinput (t,m) and the premise (Ti,Mj). It is therefore natural to state:

ωij = h(µTi(t), µMj (m))(7.6)

where h is an aggregation function representing the logical “and” used in the rule,e.g. h(x, y) = min(x, y).

As a numerical example, consider the gardener’s rule R1. The observation(t,m) = (29, 16) leads to µHot(t) = 0.8 and µLow(m) = 0.4. Thus, the temperatureis Hot to the degree 0.8 and the moisture is Low to the degree 0.4 and therefore,the weight of the rule R1 is min(0.8, 0.4) = 0.4. Using this approach for each rulewith h = min yields the following activation weights (see Table 7.10):

ωij Ti Cold Cool Warm HotMj µMj

\ µTi0 0 0.2 0.8

Low 0.4 0 (R10) 0 (R3) 0.2 (R2) 0.4 (R1)Medium 0.6 0 (R10) 0 (R6) 0.2 (R5) 0.6 (R4)

High 0 0 (R10) 0 (R9) 0 (R8) 0 (R7)

Table 7.10: The weights of the rules when (t,m) = (29, 16)


Hence, from equation (7.4) we get

w =0.4× 60 + 0.2× 35 + 0.6× 35 + 0.2× 20

0.4 + 0.2 + 0.6 + 0.2

and therefore the watering time is 40 minutes.Note that the definition of an aggregation function yields a compromise solution

between the various active decision rules whose outputs are partially conflicting.In the additive formulation characterised by equation (7.4), everything works asif each active rule was voting for one candidate chosen in the set Wlabels. Themore the premise of the rule matches the current situation, the more importantthe rule is in the voting process. The activation level of each rule is graduatedon the [0, 1] scale and the weights directly reflect the adequacy of the rule in thecurrent situation. This enables a soft control of the output that can be perfectlyillustrated by the example discussed at the end of subsection 7.2.4. If we considerthe two neighbour states s1 and s2 introduced in this example, and if we chooseh = min in equation (7.6), the resulting activation weights are those given inTables 7.11 and 7.12.

ωij Ti Cold Cool Warm HotMj µMj \ µTi 0 0 0.998 0.002Low 0.001 0 (R10) 0 (R3) 0.001 (R2) 0.001 (R1)

Medium 0.999 0 (R10) 0 (R6) 0.998 (R5) 0.002 (R4)High 0 0 (R10) 0 (R9) 0 (R8) 0 (R7)

Table 7.11: The weights of rules when (t,m) = (25.01, 19.99)


\ µTi0 0.002 0.998 0

Low 0 0 (R10) 0 (R3) 0 (R2) 0 (R1)Medium 0.999 0 (R10) 0.002 (R6) 0.998 (R5) 0 (R4)

High 0.001 0 (R10) 0.001 (R9) 0.001 (R8) 0 (R7)

Table 7.12: The weights of rules when (t,m) = (24.99, 20.01)

Hence, using equation (7.4) and Table 7.4, we get w(s1) = 20 minutes and5 seconds as the final output. Similarly, for state s2, the activation of the rulesobtained from equation (7.6) are only slightly different from those for s1 andthe final output derived from Table 7.12 using equation (7.4) gives w(s2) = 19minutes and 58 seconds. Here, we notice that the activity of each rule does notvary significantly when passing from state s1 to state s2. This is due to theway activation weights are defined and used in the process. These weights dependcontinuously on input parameters t and m, and the membership functions definingthe labels have soft variations. As a consequence, since the aggregation function


used to derive the final watering time w is also a continuous function of quantitiesω(R) (see equation (7.4)), quantity w depends continuously on input parameters tandm. This explains the observed improvement with respect to the previous modelbased on the use of all or nothing activation rules. Thus, the use of fuzzy labelsto interpret input labels has a significant advantage: it makes it possible to definea continuous transformation of numerical input data (temperature, moisture) intosymbolic variables used in decision rules. The resulting decision system is morerealistic and robust to slight variations of inputs. This advantage is due to theuse of fuzzy sets and has greatly contributed to the practical success of the fuzzyapproach in automatic control (fuzzy control, (see e.g. Mamdani 1981, Sugeno1985, Bouchon 1995, Gacogne 1997, Nguyen and Sugeno 1998). However, severalcriticisms can be addressed to the small fuzzy decision module presented above.Among them, let us mention the following:

• the choice h = min in equation (7.6) requires that quantities of type µTi(t)and µMj

(m) are commensurate. This assumption, which is rarely explicit,is very strong because it requires much more than comparing the relativefit of two temperatures (resp. two moistures) to a Label Ti (resp. Mj).It also requires comparing the fit of any temperature to any label Ti withthe fit of any moisture to any label Mj . A perfectly sound definition ofsuch membership values would require more information than can easily beobtained in practice. Moreover, the choice of min is often justified by the factthat h is used to evaluate a conjunction between several premises of a givenrule (a conjunction of type “temperature is Ti and moisture is Mj”). Notehowever that the idea of the conjunction is captured by any other t-norm (seefor instance, Fodor and Roubens (1994)). Thus, the product could perhapsreplace the min and the particular choice of the min is not straightforward.This is problematic because this choice is not without consequence on thedefinition of the watering time.

• the interpretation of symbolic labels used to describe outputs of the rules asscalar values is not easy to justify. Why not use a description of these labelsas intervals, in the same way as for input labels?

The last criticism suggests an improvement of the current system. We haveto sophisticate the previous construction so as to improve the output processing.Paralleling the treatment of symbolic inputs, we can use intervals or fuzzy inter-vals later in the process so as to continuously link symbolic outputs of the rules(Wlabels) to numerical outputs (watering times). This point is discussed in thenext subsection.

7.2.6 Interpreting output labels as (fuzzy) intervals

Suppose for example that Wlabels are no longer described by scalar values but bysubsets of the time scale. For instance, the labels Wk could be represented by aset of intervals (overlapping or not) with advantages similar to those mentionedfor input labels Ti and Mj . More generally, we assume here that Wlabels are


represented by fuzzy intervals of the time scale. For the sake of illustration, we letus consider the labels represented in Figure 7.5.

Figure 7.5: Fuzzy labels for the watering time

For any state (t,m) of the system, the range of relevant watering times is theunion of all values compatible with labels Wk derived from active rules. In theexample, the active rules are R1, R2, R4, R5, and therefore the Wlabels concernedare “Medium”, “Long” and “VeryLong”. Hence the set of relevant watering timesis [10, 70]. However, all times are not equivalent inside this set. Each of themrepresents a possible numerical translation of a label Wk obtained by the acti-vation of one or several rules. To be fully considered, a time must be perfectlyrepresentative of a label Wk that has been obtained by a fully active rule. In morenuanced situations, the weight attached to a possible time is function of the fitnessof the times activated to a certain degree by the rules. For example, by analogywith Mamdani’s approach to fuzzy control (Mamdani 1981), the weight of anywatering time α can be defined by:

ωt,m(α) = supRij∈B

h(µTi(t), µMj (m), µWk(α))(7.7)

where B represents the set of rules (here the gardener’s rules) and Rij representsthe rule:

If T = Ti and M = Mj then W = Wk

and h is a non-decreasing function of its arguments (in Mamdani’s approach,h = min). The idea in equation (7.7) is that a watering time α must receivean important weight when there is at least one rule Rij whose premises (Ti,Mj)are valid for the observation (t,m) and whose conclusion Wk is compatible withα. This explains that ωt,m(α) is defined as an increasing function of quantitiesµTi

(t), µMj(m) and µWk

(α). Notice that equation (7.7) is a natural extensionof equation (7.2). In our example, the observation (t,m) = (29, 16) leads to afunction ω29,16(w) represented in Figure 7.6.

In order to obtain a precise watering time, we can use an equation similar to(7.4). However, this equation must be generalised because there may be an infinityof times activated by the rules (e.g. a whole interval). The usual extension of theweighted average to an infinite set of values is given by the following integral:

w =∫α ωt,m(α) dα∫ωt,m(α) dα

(7.8)


Figure 7.6: Weighted times induced by rules

that can be approximated by the following quantity:

w =∑

i ωt,m(αi).αi∑i ωt,m(αi)

(7.9)

where (αi) is a strictly increasing sequence of times resulting from a fine discreti-sation of the time scale. In our example, a discretisation with step 0.1 gives a finaltime of 37 minutes and 32 seconds.

This last sophistication meets our objective because it provides a transfer func-tion f with good continuity properties. However, the use of equations (7.7–7.9)can be seriously criticised:

• the definition of ωt,m(α) proposed in equation (7.7) from an increasing ag-gregation function h is not very natural. Indeed, bearing in mind the formof rule Rij , the quantity h(µTi

(t), µMj(m), µWk

(α)) stands for the numericaltranslation of the proposition:

(Ti = t and Mj = m) implies Wk = α

In the fields of multi-valued logic and fuzzy sets theory, admissible functionsused to translate implications are required to be non-increasing with re-spect to the value of the left hand-side of the implication and non-decreasingwith respect to the value of the right hand-side (Fodor and Roubens 1994,Bouchon 1995, Perny and Pomerol 1999). As an example the value attachedto the sentence “A implies B” can be defined by the Lukasiewicz implicationmin(1 − v(A) + v(B), 1) where v(A) and v(B) are the values of A and Brespectively. In our case, the conjoint use of the min operator to interpretthe conjunction on the left hand-side and that of the Lukasiewicz implicationwould lead to the following h function:

h(x, y, z) = min(1−min(x, y) + z, 1)

Note that this function is not increasing in its arguments, as required above inthe text. However, resorting to implication operators instead of conjunctionsin order to implement an inference via rule Rij also seems legitimate. This


is usual in the field of fuzzy inference and approximate reasoning where aformula like (7.7) is used to generalise the so-called “modus ponens” inferencerule (Zadeh 1979), (Baldwin 1979), (Dubois and Prade 1988), (Bouchon1995). To go further in this direction, one could also discuss the use of minto interpret a conjunction whereas the Lukasiewicz implication is used tointerpret implications. A reasonable alternative to min(x, y) could be theLukasiewicz t-norm: max(x+y−1, 0). As a conclusion, the definition of h isnot straightforward and must be justified in the context of the application.Some general guidelines for choosing a suitable h are given in (Bouchon1995).

• Equation (7.7) requires even more commensurability than equation (7.6).Now, inequalities of type µTi

(t) > µWk(α) play a role in the process. Thus,

we should be able to determine whether any temperature t is a better rep-resentative of a label Ti than time α is representative of label Wk. Thisis a very strong assumption, especially if we consider the way these labelsare represented in the model. Usually, a label thought as a fuzzy interval isassessed on the basis of 3 elements:

– the support, i.e. the interval of all numerical values compatible withthe label, their membership must be strictly positive,

– the core, i.e. the interval of all numerical values perfectly representativeof the label (the core is a subset of the support), their membership isequal to 1,

– the membership function making a continuous transition from the bor-der of the support to the border of the kernel.

For example, the label Long in Figure 7.5 is defined by support [20, 55], core[30, 40] and two linear transitions (membership to non-membership) in therange [20, 30] ∪ [40, 55]. One could expect that the decision-maker is able tospecify the support and core of each fuzzy label, as well as the trend of themembership function (increasing from the border of the support to the borderof the core). Even with this information, however, the choice of a precisemembership function often remains arbitrary. The above information leavesroom for an infinity of functions. In practice, the shape of the membershipfunction in the transition area is often chosen as linear or gaussian (forderivability) but rarely justified by questioning the decision-maker. Thus,in many cases, the only reliable information contained in the membershipfunction is the relative adequation of each temperature, moisture, time, toeach label. For example, µLong(21) = 0.1 and µLong(25) = 0.5 only meansthat 25 minutes is a better numerical translation of the qualifier Long than21 minutes. This does not necessarily mean that 25 minutes is more Longthan 30 minutes is Medium, even if µMedium(30) = 0.4, nor that 25 minutesis more Long than 26oC is Hot, even if µHot(26o) = 0.2. However, withoutsuch assumptions, the definition of weights ωt,m(α) in equation (7.8) withh = min is difficult to justify.


• Bearing in mind that the weights ωt,m are used as cardinal weights in (7.4)while they are defined from membership values µTi

(t), µMj(m), and µWk

(α),the membership values should have a cardinal interpretation. This is onemore very strong hypothesis. For example, we need to consider that 25minutes is 5 times better than 21 minutes to represent “long”, because themembership value is 5 times larger. Even when the commensurability as-sumption of membership scales is realistic, the weights cannot necessarilybe interpreted as cardinal values and the weighted aggregation proposed inequation (7.8) is questionable.

As an illustration of the latter, consider the following example showing the im-pact of an increasing transformation of membership values on the output wateringtime:

Example (2). Consider the two following input vectors i1 = (29, 29) and i2 =(18, 16). These two inputs lead to activation weights given in Tables 7.13 and7.14. Then, for the sake of simplicity, we use the non-fuzzy labels given in Table7.4 for interpretation of labels Wk. Then, assuming we use equations (7.2) and(7.4) to define the watering time w, we obtain the following result: w(i1) = 19minutes and 33 seconds and w(i2) = 21 minutes and 40 seconds. Notice thatthe times as not so different, despite the important difference between inputs i1and i2. This can be easily explained by observing that, in the second case, thetemperature is lower, but the soil water content is also lower, and the two aspectscompensate each other. Now, we transform all membership functions of the labelsby the function φ(x) = 3

√x. This preserves the support and the core of each label,

as well as the slope (increasing or decreasing) of membership functions. In fact,it represents the same ordinal information about membership degrees. However,the activation tables are altered as shown in Tables 7.15 and 7.16. This gives thefollowing watering times: w(i1) = 20 minutes and 34 seconds, w(i2) = 19 minutesand 42 seconds. Note that we now have w(i1) > w(i2) whereas it was just theopposite before the transformation of membership values. ♦


\ µTi0 0 0.2 0.8

Low 0 0 (R10) 0 (R3) 0 (R2) 0 (R1)Medium 0.1 0 (R10) 0 (R6) 0.1 (R5) 0.1 (R4)

High 0.9 0 (R10) 0 (R9) 0.2 (R8) 0.8 (R7)

Table 7.13: The weights of the rules for input i1

This example shows that comparison of output values is not invariant to mono-tonic transformations of membership values and this explains the “more than ordi-nal” interpretation of membership values in the computation of w. Although thisinversion of duration is not a crucial problem in the case of the watering system,it could be more problematic in other contexts. For instance, if we use a similarsystem (based on fuzzy rules) to rank candidates in a competition, the choice of


ωij Ti Cold Cool Warm HotMj µMj \ µTi 0.2 0.6 0 0Low 0.4 0.2 (R10) 0.4 (R3) 0 (R2) 0 (R1)

Medium 0.6 0.2 (R10) 0.6 (R6) 0 (R5) 0 (R4)High 0 0 (R10) 0 (R9) 0 (R8) 0 (R7)

Table 7.14: The weights of the rules for input i2


\ µTi0 0 0.585 0.928

Low 0 0 (R10) 0 (R3) 0 (R2) 0 (R1)Medium 0.464 0 (R10) 0 (R6) 0.464 (R5) 0.464 (R4)

High 0.965 0 (R10) 0 (R9) 0.585 (R8) 0.928 (R7)

Table 7.15: The modified weights of the rules for input i1


\ µTi0.585 0.843 0 0

Low 0.737 0.585 (R10) 0.737 (R3) 0 (R2) 0 (R1)Medium 0.843 0.585 (R10) 0.843 (R6) 0 (R5) 0 (R4)

High 0 0 (R10) 0 (R9) 0 (R8) 0 (R7)

Table 7.16: The modified weights of the rules for input i2

a particular shape for membership must be well justified because it may reallychange the winner.

Another possibility is resorting to other aggregation methods that do not re-quire the same level of information. Several alternatives to the weighted sum arecompatible with ordinal weights, e.g. Sugeno integrals (see Sugeno 1977, Duboisand Prade 1987), and could be used advantageously to process ordinal weights.However, they also have some limitations. They are not as discriminating as theweighted sum and they cannot completely avoid commensurability problems (seeDubois, Prade and Sabbadin 1998, Fargier and Perny 1999).

There is no room here to discuss the use of numerical representations in rule-based automatic decision systems further.

To go further with rule-based systems using fuzzy sets, the reader should con-sult the literature about fuzzy inference and fuzzy control, which has receivedmuch attention in the past decades. As a first set of references for theory andapplications, one can consult (Mamdani 1981), (Sugeno 1985), (Bouchon 1995),(Gacogne 1997) and (Nguyen and Sugeno 1998) for a recent synthesis on the sub-ject. These works present formal models but also empirical principles derived frompractical applications and thus provide a variety of techniques that have proved


efficient in practice. Moreover, some theoretical justifications of choices of repre-sentations and operators are now available, bringing justifications to some methodsused by engineers in practical applications and also suggesting also multiple im-provements (see Dubois, Prade and Ughetto 1999).

7.3 A System with Implicit Decision Rules

7.3.1 Controlling the quality of biscuits during baking

The control of food processes is a typical example where humans traditionally playan important role to preserve the standard quality of the product. The overallefficiency of production lines and the quality of the final product highly dependon the ability of human supervisors to identify a degradation of the quality ofthe final product and on their aptitude to best fit the control parameters to thecurrent situation.

As an example, let us report some elements of an application concerning thecontrol of the quality of biscuits through oven regulation during baking (for moredetails see Trystram, Perrot and Guely 1995, Perrot, Trystram, Le Guennec andGuely 1996, Perrot 1997, Grabisch et al. 1997).

In the field of biscuit manufacturing, human operators controlling biscuit bak-ing lines have the possibility of regulating the ovens during the baking process.This implies periodic evaluation, diagnosis and decision tasks that could perhapsbe automatised. However, such automatisation is not obvious because human ex-pertise in oven control during the baking of biscuits mainly relies on a subjectiveevaluation, e.g. a visual inspection of the general aspect, the colour of the bis-cuits and on the operator’s skill in reacting to possible perturbations of the bakingprocess.

For instance, when an overcooked biscuit is detected, the operator properlyretroacts on the oven settings after checking its current temperature. In the caseof an automatic system, the only information accessible to the system consists ofphysical objective parameters obtained from measures and sensors, which are noteasily linked to human perception.

In the example of automatic diagnosis during baking, the only available mea-sures are the following:

• a sensor located in the oven measures the air moisture, within the oven, nearthe biscuit line. The evaluation m is given in cg/g (centigrams per one gramof dry matter) in the range [0, 10] with the desired values being around 4cg/g.

• the thickness t of the biscuit is measured every 10 minutes. t is defined asthe mean of 6 consecutive measures performed on biscuits and expressed inmm and the desired values are about 33 or 34 mm.

• concerning the biscuit aspect, a colour sensor is located in the oven. Itmeasures colours with 3 parameters, which are the luminance L, a level a on

7.3. A SYSTEM WITH IMPLICIT DECISION RULES 171

the red-green axis and a level b on the yellow-blue axis. The desired color isnot easy to specify.

Moreover, it is not always possible to obtain sufficiently explicit knowledgefrom the expert to construct a satisfactory rule database (in section 7.4 we willsee an approach integrating expert rules in the control of baking). Sometimes, theonly information accessible must be directly inferred from the expert’s observationduring his control activity. Hence, following the approach adopted in section 7.2seems problematic, especially concerning the aspect of the biscuit that cannotbe easily linked by the expert to the physical parameters (L, a, b) measured byan automatic system. The following subsection presents an alternative way ofestablishing this link using similarity from known examples.

7.3.2 Automatising human decisions by learning from ex-amples

In performing oven control, the decision-making process consists of two consecutivestages: a diagnosis stage, which consists in evaluating the state of the last biscuits,and a decision stage, which must determine a regulation action on the oven, ifnecessary. Like in many other domains, the diagnosis task performed by theexpert controlling baking can be seen as a pattern recognition task. It is notunrealistic to assume that usual disfunctions have been identified and categorisedby the expert and that for each of them, a standard regulation action is known.Thus, assuming that a finite list of categories is implicitly used by the expert(each of them being associated to a pattern, i.e. a characteristic set of “irregular”biscuits) the diagnosis stage consists in identifying the relevant pattern for anyirregular biscuit and the decision stage consists in performing the regulation actionappropriate to the pattern.

In this context, the patterns are implicit and subjective. They can be approx-imated by observing the action of a human controller on the oven in a variety ofcases. However, we can construct an explicit representation of patterns in a more“objective” space formed by the observable variables. In this space, subjectiveevaluation of biscuits can be partially explained by their objective description.

Assuming a representative sample of biscuit is available, using sensors, we canrepresent each biscuit i of the sample by a vector xi = (mi, ti, Li, ai, bi) in themultiple attribute space of physical variables used to describe biscuits. Then, eachbiscuit can be evaluated by the expert and a diagnosis of disfunction d(xi) canbe obtained for each description xi, explaining the bad quality of biscuit i (e.g.“oven too hot”, “oven not hot enough”). Hence, a pattern associated to eachdisfunction z is defined by the set of points xi such that d(xi) = z. Determiningthe right pattern for any new input vector x is a classification problem where thecategories C1, . . . , Cq are the q possible disfunctions and the objects to be assignedare vectors x = (m, t, L, a, b).

Let X be the set of all possible vectors x = (m, t, L, a, b) describing an object(e.g. a biscuit), a classification procedure can be seen as a function assigning to


each vector x ∈ X the vector (µC1(x), . . . , µCq(x)) giving the membership of x

to each category (e.g possible disfunction of the oven). One of the most popularclassification methods is the so called Bayes rule which is known to minimise theexpected error rate. However, the rule requires knowing the prior and conditionalprobability densities of all categories, which is not frequent in practice. When thisinformation is not available (this is the case in our example) the nearest neighbouralgorithm is very useful. The basic principle of the k−Nearest Neighbour assign-ment rule (k−NN) introduced in (Fix and Hodges 1951) is to assign an object tothe class to which the majority of its k-nearest neighbours belong.

More precisely, for any sample S ⊂ X of vectors whose correct assignment isknown, if Nk(x) represents the subset of S formed by the k nearest neighbours ofx within S, the k−NN rule is defined for any k ∈ {1, . . . , n} by:

µCj(x) =

{1 if j = Arg maxi{

∑y∈Nk(x) µCi

(y)}0 otherwise

(7.10)

where Arg maxi, g(i) represents, the value i for which g(i) is maximal. Thissupposes that the maximum is reached for a unique i. When this is not thecase, one can use a second criterion for discriminating between all g-maximalsolutions or, alternatively, choose all of them. In equation (7.10), function g(i)equals

∑y∈Nk(x) µCi

(y) and represents the total number of vectors, among thek-nearest neighbours of x that have been assigned to category i.

It has been proved that the error rate of the k−NN rule tends towards theoptimal Bayes error rate when both k and n tend to infinity while k/n tends to 0(see Cover and Hart 1967). The main drawback of the k −NN procedure is thatall elements of Nk(x) are equally weighted. Indeed, in most cases, the neighboursare not equally distant from x and one may prefer to give less importance toneighbours very distant from x. For this reason, several weighted extensions ofthe k−NN algorithm has been proposed (see Keller, Gray and Givens 1985, Bezdek,Chuah and Leep 1986, Bereau and Dubuisson 1991). For example, the fuzzy k−NNrule proposed by Keller et al. (1985) is defined by:

µCj(x) =

∑y∈Nk(x)

µCj(y)

‖x−y‖2

m−1∑y∈Nk(x)

1

‖x−y‖2

m−1

(7.11)

where m ∈ (1,+∞) is a technical parameter. Note that membership induction of anew input x is also a matter of aggregation. Indeed, the membership value µCj

(x)is defined as the weighted average of quantities µCj

(y), y ∈ Nk(x), weighted bycoefficients inversely proportional to a power of the Euclidean distance between xand y. This formula seems natural but several points are questionable. Firstly,the choice of the weighted sum as an aggregator of membership values µCj (y)for all y in the neighbourhood Nk is not straightforward. It includes several im-plicit assumptions that are not necessarily valid (see chapter 6) and alternativecompromise aggregators could possibly be used advantageously. The choice of acompromise operator itself can be criticised and one can readily imagine caseswhere a disjunctive or a conjunctive operator should be preferred. Moreover, even

7.3. A SYSTEM WITH IMPLICIT DECISION RULES 173

when the weighted arithmetic mean seems convenient, the use of weights linked todistances of type ‖ x− y ‖ and to parameter m is not obvious. Indeed, the normof x− y is not necessarily a good measure of the relative dissimilarity between thetwo biscuits represented by x and y. This is the case, for instance, when unitsare different and non commensurate on the various axis. In order to distinguishbetween significant and non significant differences on each dimension, one mayinclude discrimination thresholds (see chapter 6) in the comparison, allowing todistinguish differences that are significant for the expert from those that are neg-ligible. This is particularly suitable in the field of subjective evaluation in whichpreferences and perceptions of the expert (or decision-maker) are not usually lin-early related to the observable parameters. For instance, one could define a fuzzysimilarity relation ∼ (x, y) as a function of quantities of type ‖ xi − yi ‖ for anyattribute i, representing the relative closeness of x and y for the expert. Then, wecan use a general aggregation rule of type:

µCj (x) = ψ(µCj (y1), . . . , µCj (yk);∼ (x, y1), . . . ,∼ (x, yk))(7.12)

where Nk(x) = {y1, . . . , yk} and ψ is an aggregation function.

This is the proposition made in (Henriet 1995), (Henriet and Perny 1996) and(Perny and Zucker 1999) where the membership of µCj

(x) is defined by:

µCj(x) = 1−

k∏i=1

(1− ∼ (x, yi).µCj(yi))(7.13)

and ∼ (x, y) is the weighted average of one-dimensional similarity indices (∼i

(x, y), one per attribute i) defined as follows:

∼i (x, y) =

1 if |xi − yi| ≤ qi|xi−yi|−qi

pi−qiif qi < |xi − yi| < pi

0 if |xi − yi| ≥ pi

(7.14)

In the above formula, qi and pi are thresholds (possibly varying with the levelxi or yi) used to define a continuous transition from full similarity to dissimilarityas shown in the example given in Figure 7.7. It should be noted however that thedefinition of similarity indices ∼i (x, y) is very demanding. It requires assessingtwo thresholds for attribute level xi. Moreover the linear transition from similarityto non-similarity is not easy to justify and a full justification of the shape of thesimilarity function ∼i would require a lot of information about difference of typexi − yi. Usually, the construction of such similarity functions is only based onempirical evidence and common sense principles.

Coming back to the example, the k−NN algorithm can be used for periodicallycomputing two coefficients µtoo hot(x) and µnot hot enough(x). These coeffi-cients evaluate the necessity for a regulation action, by analysing the measure xof the last biscuit. For instance, µtoo hot(x) = 1 and µnot hot enough(x) = 0means that decreasing the oven temperature is necessary. The decision process


(x, y) i1

0 y ix - p x - q x + q x + p i i i i i i i i

~

Figure 7.7: One-dimensional similarity indices ∼i (x, y)

is improved if we use the fuzzy version of the k−NN algorithm in the diagno-sis stage. In this case, the values µtoo hot(x) and µnot hot enough(x) possiblytake any value within the unit interval, and these values can be interpreted asindicators of the amplitude of the regulation and help the system in choosing asoft regulation action. The main drawback of this automatic decision process isthe absence of explicit decision rules explaining the regulation actions. This isnot a real drawback in this context because the quality of biscuits is a sufficientargument for validation. However, in many other decision problems involving anautomatic system, e.g. the automatic pre-filtering of loan files in a bank, the needfor explanations is more crucial, first to validate a priori the system, and secondlyto explain decisions a posteriori to the clients. The use of rules in the context ofbaking control is discussed in the next section.

7.4 An hybrid approach for automatic decision-making

In the case reported in (Perrot 1997) about the control of biscuits during baking,the diagnosis stage was not uniquely based on the k−NN algorithm. Indeed, inthis application, it was possible to elicit decision rules for the diagnosis stage.Actually, the quality of the biscuit is evaluated by the expert on the basis of3 attributes, subjectively evaluated, which are the moisture (m), the thickness(t) and the aspect of the biscuit (colour). The qualifiers used for labelling theseattributes are:

• moisture: “dry”, “normal”, “humid”

• thickness: “too thin”, “good”, “too thick”

• aspect “burned”, “overdone”, “done”, “underdone”, “not done”,

Then, the human expertise in the diagnosis stage is expressed using these labelsby rules of type:

If moisture is normal or dry and colour is overdonethen the oven is too hot

7.4. AN HYBRID APPROACH FOR AUTOMATIC DECISION-MAKING 175

If moisture is humid or normal and colour is underdonethen the oven is not hot enough

It has therefore been decided to construct membership functions linking param-eters (m, t, L, a, b) to the labels used in the rules, in order to be able to implementa hybrid approach based on k−NN algorithms to get a fuzzy symbolic descriptionof the biscuit and the fuzzy rule-based approach presented in section 7.2 to in-fer a regulation action. The numeric-symbolic translation is natural for moistureand thickness. The labels used for these two parameters are represented by thefollowing fuzzy sets (see Figures 7.8 and 7.9).

1

0

dry normal humid

m(cg/g)3 3.8 4.7 5.8

Figure 7.8: Fuzzy labels used to describe biscuit moisture

1

0

too thin good too thick

t(mm)28 32 35 38

Figure 7.9: Fuzzy labels used to describe biscuit thickness

The translation is more difficult for labels used for the biscuit aspect becausethe aspect is represented by a fuzzy subset of the 3-dimensional space characterisedby the components (L, a, b). This problem has been solved by the fuzzy k−NNalgorithm. It is indeed sufficient to ask an expert in baking control to qualify,with a label yi each element i of a representative sample of biscuits, using onlythe 5 labels introduced to describe aspect. At the same time, the sensors assessthe vector xi = (Li, ai, bi) describing the biscuit i in the physical space. Then thefuzzy k−NN algorithm is applied with reference points (xi, yi) for all biscuits iin the sample. For any input x = (L, a, b) it gives the membership values µyj

(x)


for any label yj , j ∈ {1, . . . , 5} used to describe the biscuit’s aspect. The fuzzynearest neighbour algorithm provides a representation of labels yj , j = 1, . . . , 5 byfuzzy subsets of the (L, a, b) space. This makes it possible to resort to the fuzzycontrol approach presented in section 7.2.

In the biscuit example, the integration of the k − NN algorithm to a fuzzyrule-based system provides a soft automatic decision system whose action can beexplained by the expert’s rules. This control system can be integrated within acontinuous regulation loop, alternating action and retroaction steps, as illustratedin Figure 7.10

Decision M odule ∆t

µ (x)

µ (x)

too hot

not hot enough

Diagnosis Module

mtLab

x

ovensettingsbiscuits Baking Measures

Figure 7.10: The action-retroaction loop controlling baking

7.5 Conclusion

We have presented simple examples illustrating some basic techniques used tosimulate human diagnosis, reasoning and decision-making, in the context of re-peated decision problems, convenient for an automatisation. We have shown theimportance of constructing suitable mathematical representation of knowledge anddecision rules. The task is difficult because human diagnosis is mainly based onhuman perception whereas sensors naturally give numerical measures, and be-cause human reasoning is mainly based on words and propositions drawn from thenatural language, whereas computers are basically suited to perform numericalcomputations. As shown in this chapter, some simple and intuitive formal mod-els have been proposed, enabling to establish a formal correspondence betweensymbolic and numeric information. They are based on the definition of fuzzy setslinking labels to observable numerical measures through membership functions.However, a proper use of these fuzzy sets requires a very careful analysis. Indeed,we have shown that many “apparently natural” choices in the modelling processpossibly hide strong assumptions that can turn out to be false in practice. Forinstance, small numerical examples given in the chapter show that, in the contextof rule based control systems, the output of the system highly depends on thechoice of numbers used to represent symbolic knowledge. In particular, one mustbe aware that multiplying arbitrary choices in the construction of membershipfunctions can make the output of the system completely meaningless.

7.5. CONCLUSION 177

Moreover, we have shown that, at any level of computation, there is a needof weighting propositions and aggregating numerical information. This shows thegreat importance of mastering the variety of aggregation operations, their proper-ties and the constraints to be satisfied in order to preserve the meaningfulness ofconclusions. It must be clear that by not thoroughly respecting these constraints,the outputs of any automatic decision system are more the consequences of arbi-trary choices in the modelling process than those of a sound deduction justifiedby the observations and the decision rules. Designing an automatic decision pro-cess in which the arbitrary choice of numbers used to represent knowledge is moredecisive than the knowledge itself is certainly the main pitfall of the modellingexercise.

Since one cannot reasonably expect to avoid all arbitrary choices in the mod-elling process, both theoretical and empirical validations of the decision system arenecessary. The theoretical validation consists in investigating the mathematicalproperties of the transfer function that forms the core of the decision module. Thisis the opportunity to control the continuity and the derivatives of the function, butalso to check whether the computation of the outputs is meaningful with respectto the nature of the information given to the system as input. The empirical orpractical validation consists in testing the decisional behaviour of the system invarious typical states of the system. It takes the form of trial and errors sequencesenabling a progressive tuning of the fuzzy-rule based model to better approxi-mate the expected decisional behaviour. This can be used to determine suitablemembership functions characterising the rules. This can even be used to learnthe rules themselves. Indeed, when a sufficiently rich basis of examples is avail-able, the rules and the membership values can be learned automatically (see e.gBouchon-Meunier and Marsala 1999) or (Nauck and Kruse 1999) for neuro-fuzzymethods in fuzzy rule generation. The neuro-fuzzy approach is very interestingfor designing an automatic decision system, because it takes advantage of the ef-ficiency of neural networks while preserving the “easy to interpret” feature of arule based-system. Notice however that, due to the need for learning examples toshow the system what the right decisions in a great number of situations are, thelearning-oriented approach is only possible when the decision task is completelyunderstood and mastered by a human. This is usually the case when the automa-tisation of a decision task is expected, but one should be aware that this approachis not easily transposable to more complex decision situations where preferencesas well as decision rules are still to be constructed.

8DEALING WITH UNCERTAINTY:AN EXAMPLE IN ELECTRICITY

PRODUCTION PLANNING

8.1 Introduction

In this chapter, we describe an application that was the theme of a research col-laboration between an academic institution and a large company in charge of theproduction and distribution of electricity. We do not give an exhaustive descrip-tion of the work that was done and of the decision-aiding tool that was developed.A detailed presentation of the first discussions, of the progressive formulation ofthe problem, of the assumptions chosen, of the hesitations and backtrackings,of the difficulties encountered, of the methodology adopted and of the resultingsoftware would require nearly a whole book. Our purpose is to point out somecharacteristics of the problem, especially on the modelling of uncertainties. Thedescription was thus voluntarily simplified and some aspects, of minor interest inthe framework of this book, were neglected. The main purpose of this presenta-tion is to show how difficult it is to build (or to improvise) a pragmatic decisionmodel that is consistent and sound. It illustrates the interest and the importanceof having well-studied formal models at our disposal when we are confronted witha decision problem. Sections 8.2 and 8.3 present the context of the applicationand the model that was established. Section 8.4 is based on a didactical example:it first illustrates and comments some traditional approaches that could have beenused in the application; then it gives a detailed description of the approach thatwas applied in the concrete case. Section 8.5 provides some general comments onthe advantages and drawbacks of this approach.

8.2 The context

The company must periodically make some choices for the construction or closure

179

180 CHAPTER 8. DEALING WITH UNCERTAINTY

of coal, gas and nuclear power stations, in order to ensure the production of elec-tricity and satisfy demand. Due to the diversity of points of view to be taken intoaccount, the managers of the production department wanted to develop a multiplecriteria approach for evaluating and comparing potential actions. They consideredthat aggregating financial, technical and environmental points of view into a typeof generalised cost (see Chapter 5) was neither possible nor very serious. A collab-oration was established between the company and an academic department (wewill call it “the analyst”) that rapidly discovered that, beside the multiple criteriaaspect, an enormous set of potential actions, a significant temporal dimension anda very high level of uncertainty on the data needed to be managed. The nextsection points out these aspects through the description of the model as it wasformulated in collaboration with the company’s engineers.

8.3 The model

8.3.1 The set of actions

In this chapter, we call decision a choice made at a specific point in time: itconsists in choosing the number of production units of the different types of fuel(Nuclear, Coal, Gas) to be planned and in specifying whether the downgrade plan(previously defined by another department of the company) has to be followed,or partially anticipated (A) or delayed (D). In terms of electricity production anddelay, each unit and modification of the downgrade plan has different specificities(see Table 8.1).

Type Power (MW) Delay (years)N 900 9C 400 6G 350 3A −300 0D +300 0

Table 8.1: Power and construction delay for the different types of production unit

For simplicity, the decisions are only taken at chosen milestones, separated bya time period of about 3 years (this period between two decisions is called block).At most one unit of each type per year may be ordered, and the choice concerningthe downgrade plan (follow, anticipate or delay) is of course exclusive. A decisionfor a block of 3 years could thus be for example

{1N, 1C, 2G,A},

meaning that one nuclear, one coal and two gas production units are planned andthat the downgrade plan has to be anticipated.

8.3. THE MODEL 181

Each decision is irrevocable and naturally has consequences for the future, notonly on the production of electricity, as seen in Table 8.1, but also in terms ofinvestment, exploitation cost, safety, environmental effects, ... (see Section 8.3.2).

An action is a succession of decisions over the whole time period concerned bythe simulation (the horizon), i.e. a period of about 20-25 years or 7 blocks. Anaction is thus for example(

{1N, 1C, 2G,A}, {1C}, {2G}, {}, {3G}, {1G, 1C}, {1N, 2G}).

The number of possible actions is of course enormous. Even after adding somesimple rules–only one (or zero) nuclear units are allowed exclusively on the first andlast block, anticipation and delay are only allowed on the first and second blocks,an anticipation followed by a delay (or the inverse) is forbidden–the number ofactions is still of around 108. Many of these actions are completely unrealistic,as for example no new unit for 20 years or 3G and 3C in every block: they canbe eliminated by fixing reasonable limits on the power production of the park.In this problem, the decision-maker only kept the actions so that, for each block,the surplus is less than 1 000 MW and the deficit be less than 200 MW. Theselimitations led to a set of approximately 100 000 potential actions. The temporaldimension of the problem naturally leads to a tree structure for these actions, builton decision nodes (represented by squares in Figure 8.1). Depending on the blockconsidered, there are typically between 3 and 30 branches leaving each decisionnode.

8.3.2 The set of criteria

The list of criteria was defined by the industrial partner in order to avoid unbear-able difficulties in data collection and to work on a sufficiently realistic situation.Remember that the purpose of the study was to build a decision-aiding method-ology and was not to make a decision. It was important to test the methodologywith a realistic set of criteria but it was also clear that the methodology should beindependent of the criteria chosen. In the application described here, the followingeight criteria were taken into account, for the time period of the simulation:

• fuel cost, in Belgian Francs (BEF), to minimise;

• exploitation cost, in BEF, to minimise;

• investment cost, in BEF, to minimise;

• marginal cost, i.e. the amount of total cost for a variation of 1 GWh, in BEF,to minimise;

• deficient power in TWh, to minimise;


A :({}, {2G}, {3G}, {2G}, {3G}, {}, {}

)

B :({1N, 2G, 2C}, {2C, 1G}, {3C}, {2C}, {1N}, {}, {}

)A B

Fuel cost 33 500 31 000 MBEFExploitation cost 45 000 49 000 MBEFInvestment cost 360 000 770 000 MBEFMarginal cost 730 620 KBEF/GWH

Deficient power 16.7 10.3 TWHCO2 emissions 22 000 16 000 Ktons

SO2 +NOx emissions 70 48 KtonsSales Balance 23 000 30 000 MBEF

Table 8.2: The evaluations of two particular actions

• CO2 emissions, in tons, to minimise;

• SO2 and NOx emissions, in tons, to minimise;

• purchase and sales balance, in BEF, to maximise.

The evaluations of the actions on these criteria are of course not known withcertainty, because they depend on many factors that are not or not well known bythe decision-maker. The uncertainties have an impact on the evaluations, whichcan be direct (the prices of the raw materials influence their total costs) or indirect(if the gas price increases more than the coal price, the coal power stations will bemore intensively exploited than the gas ones; this will have an impact on the fuelcosts and the environmental impacts of the production park). Table 8.2 presentsan example of evaluations for two particular actions in a scenario where the fuelprice is low and the demand for electricity is relatively weak. Other scenarios mustbe envisaged in order to improve the realism and usefulness of the model.

8.3.3 Uncertainties and scenarios

Generally speaking, the determination of the value of a parameter at a givenmoment can lead to the following situations:

• the value is not known: the value is relative to the past and was not measured,the value is relative to the present but is technically impossible or veryexpensive to obtain, the value is relative to the future for a parameter witha completely erratic evolution;

8.3. THE MODEL 183

• the value can be approximated by an interval: the bounds result from theproperties of the system considered, the interval is due to the imprecision ofthe measure or to the use of a forecasting method; sometimes, a probability,a possibility or a confidence index can be associated with each value of theinterval;

• the value is not unique: several measures did not yield the same value, severalscenarios are possible; again a probability, a possibility, a confidence indexor the result of a voting process can be associated with each value;

• the value is unique but not reliable, with a certain information on the degreeof reliability.

In the particular situation described here, the industrial partner was alreadyusing stochastic programming for the management of the production park. Hewanted to have another methodology in order to take better account of the num-ber of potential actions and the multiple criteria aspects. For the uncertainties,however, they were used to working with probabilities and the framework of thestudy did not allow to suggest anything else. So, scenarios were defined and subjec-tive probabilities were assigned to them by the company’s experts. More precisely,two types of uncertainties were distinguished and respectively called “aleas” and“major uncertainties”: the difference between them is based on the more or lessstrong dependence between the past and the future. The industrial partner con-sidered that nuclear availability in the future was completely independent of theknowledge of the past and called this type of uncertainty “alea”: this means thatthe level of nuclear availability was completely open for each period of three years(a breakdown at a given time does not imply that there will be no breakdown inthe near future). The selling price of electricity was also considered as an “alea”in order to be able to capture the deregulation phenomena due to a forthcomingnew legislation.

The “major uncertainties” (for which some dependence can exist between thevalues at different moments) were the fuel price (the market presents global ten-dencies and a high price for the first two blocks reinforces the probability of havinga high price for the third one), the demand for electricity (same reasoning) andthe legislation concerning pollution (in this example, the law may change for thethird block, and the uncertain parameters after this block are thus strongly re-lated: either the same as for the first blocks, or more severe, but in both cases,constant over all blocks after block 2).

The “major uncertainties” allow for a learning process that must be taken intoaccount in the analysis: each decision, at a given time, may use the previous valuesof the uncertain parameters and deduce information from them about the future.This information may modify the choices of the decision-maker. Suppose for in-stance that a variable x may be equal to 0 or 1 in the future. The correspondingprobabilities are assessed as follows:


{P (x = 0) > 0.5, after past scenario A,P (x = 0) < 0.5, after past scenario B,

where the “past scenario” is known at the time of decision. The decision-makerhas to choose between two decisions: a and b. If he prefers a when x = 0 and bwhen x = 1, a reasonable decision will be to choose a after scenario A and b afterscenario B.

The previous explanation is not valid for “aleas”, because their independencedoes not allow for direct inference from the past.

Because of the statistical dependence and of the possible learning process inthe major uncertainty case, a complete treatment and a tree-structure for thesescenarios (a scenario is a succession of observed uncertainties) are necessary. Ifthere are 3 levels for the fuel price, 3 levels for the demand, 2 levels for thelegislation, and if the horizon is divided into 7 blocks, there are, a priori, (3× 3×2)7 ' 6×108 possible scenarios. Fortunately, most of these scenarios are negligiblebecause the probability of a very fluctuating scenario is very small: the “majoruncertainty” scenarios are rather strongly correlated, and a sequence of levels forthe fuel price such as HHLMHLH (H for high, M for medium and L for low) ismuch less probable than a sequence HHHMMMM. In practice, two sequences wereretained for legislation (MMMMMMM and MMHHHHH ), it was imposed thatscenarios could only change after two blocks, and each modification was penalisedso that very fluctuating scenarios were hardly possible. The analyst finally retainedaround 200 representative scenarios that were gathered in a tree-structure of majoruncertainty nodes (represented by circles in Figure 8.1).

Of course, the complete scenario for a decision node at time t is not known buta probability is associated to each of them, allowing to compute the conditionalprobability of each complete scenario knowing the already observed partial scenarioat time t.

On the contrary, the “aleas” are by essence uncorrelated and there is no reasonto neglect any scenario. If there are 3 levels for the selling price and 2 levels for theavailability of nuclear units , then the number of scenarios is (3 × 2)7 = 279 936.Fortunately, the tree structure of the “aleas” is obvious: each node gives rise tothe same possibilities, with the same probability distribution. For these reasons,the aleas act much more simply than the major uncertainties, and it is possible totake the whole set of scenarios into account.

8.3.4 The temporal dimension

Independently of the dependence between the past and the future in the modellingof the uncertainties, the temporal dimension plays an important role in this kindof problem.

First, the time period between the decision to build a certain type of powerstation and the beginning of the exploitation of that station is far from beingnegligible. Second, some consequences of the decisions appear after a very longtime (as the environmental consequences for example). Third, the consequences

8.3. THE MODEL 185

decisionsLastdecisions

Lastperiod

ConsequencesSecondFirstperiod

Firstdecisions

Figure 8.1: The decision tree


themselves can be dispersed over rather long periods and vary within these pe-riods. Fourth, the consequences of a decision can be different according to themoment that decision is taken. It is rather usual, in planning models, to introducea discounting rate that decreases the weight of the evaluations for distant conse-quences (see Chapter 5) and the industrial partner did this here. However, for along term decision problem with important consequences for future generations,such an approach may not be the best one and the decision-maker could be moreconfident in the flexible approach and the richness of the scenarios. That is whythe analyst kept the possibility to introduce discounting or not.

8.3.5 Summary of the model

The complete model can be described by a tree structure including decision nodes(squares) and uncertainty nodes (circles), as illustrated in Figure 8.1. At t = 0(square node at the beginning of block 1), a first decision is made (a branch ischosen) without any information on the scenario, leading to a circle node. Duringblock 1, one may observe the actual values of the uncertain parameters (nucleardisponibility, electricity selling price, fuel price, electricity demand and environ-mental legislation), determining one branch leaving the considered circle node andleading to one of the decision nodes at time t = 1. A new decision is then made,taking the previous information into account, and so on until the last decision(square) node and the last scenario (circle) node that determine the whole actionand the whole observed scenario. In the resulting tree (Figure 8.1), the decisionnodes (squares) correspond to active parts of the analysis where the decision-makerhas to establish his strategy, while the uncertainty nodes (circles) correspond topassive parts of the analysis where the decision-maker undergoes the modificationsof the parameters.

8.4 A didactic example

Consider Figure 8.2 describing two successive time periods. At time t = 0, twodecisions A and B are eligible; during the first period, two events S and T arepossible, each with probability 1/2. At the beginning of the second period, twodecisions C and D are eligible if the first decision was A and three decisions E,F, G are eligible if the first decision was B. During the second period, two eventsU and V are possible after S (with respective probabilities 1/4 and 3/4) and twoevents Y and Z are possible after T (with respective probabilities 3/4 and 1/4).

Figure 8.2 presents the tree and the evaluation of each action (set of decisions)for each complete scenario. Remark that this didactic example contains only one

8.4. A DIDACTIC EXAMPLE 187

evaluation for each action (problem with one criterion). We do not insist on themultiple criteria aspect of the problem here (this was treated in Chapter 6) andfocus on the treatment of uncertainty.

8.4.1 The expected value approach

In the traditional approach, the nodes of the tree are considered from the leavesto the root (“folding back”) and the decisions are taken at each node in orderto maximise their expected values, i.e. the mean of the corresponding probabilitydistributions for the evaluations. Of course, this is only possible when the eval-uations are elements of a numerical scale. At node N2 (beginning of the secondperiod), the expected value of decision C is (1/4 × 7 + 3/4 × 4.5) = 41/8 whilethe expected value of decision D is (1/4 × 4.5 + 3/4 × 5.5) = 42/8. So, the bestdecision at node N2 is D and the expected value associated to N2 is 42/8. Makingsimilar calculations for N3, N4 and N5, one obtains the tree represented in Figure8.3.

At node N1, the expected values of decisions A and B are respectively 39/8and 5, so the best decision is B.

In conclusion, the “optimal action” obtained by the traditional approach willconsist in applying decision B at the beginning of the first period and decisionE or G at the beginning of the second period, depending on whether the eventoccurred in the first period was S or T.

8.4.2 Some comments on the previous approach

Just as the weighted sum (already discussed in the other chapters of this book), theexpected value presents some characteristics that the user must be aware of. Forexample, probabilities intervene as tradeoffs between the values for different events:the difference of one unit in favour of C over D for event V, whose probability is 3/4,would be completely compensated by a difference of three units in favour of D overC for event U because its probability is 1/4. A consequence is that a big differencein favour of a specific decision in some scenario could be sufficient to overcome asystematic advantage for another decision in all the other scenarios, as illustratedin the example presented in Figure 8.4. In this example, if the probabilities of S, Tand U are all equal to 1/3, the expected value will give preference to A, althoughB is better than A in two scenarios out of three.

Remember the famous St. Petersburg game (see for example Sinn 1983)showing that the expected value approach does not always represent the attitudeof the decision-maker towards risk very well. The game consists of tossing a coinrepeatedly until the first time it lands on “heads”; if this happens on the kth toss,the player wins 2k

e. The question is to find out how much a player would beready to bet in such a game. Of course, the answer depends on the player but,in any case, the amount would not be very big. However, applying the expectedvalue approach, we see that the expected gain is

∞∑k=1

12k.2k = +∞.


N1

U (1/4)

V (3/4)

U (1/4)

V (3/4)

U (1/4)

V (3/4)

U (1/4)

V (3/4)

C

D

C

D

N4

N5

N3

N2

N6

N7

N8

N9

N10

N11

N12

N13

N14

N15

N16

N17

N18

N19

N20

N21

N22

N23

N24

N25

Y (3/4)

Z (1/4)

Y (3/4)

Z (1/4)

U (1/4)

V (3/4)

Y (3/4)

Z (1/4)

Y (3/4)

Z (1/4)

Y (3/4)

Z (1/4)

F

E

F

E

G

G

A

B

S (1/2)

T (1/2)

T (1/2)

S (1/2)

Value

7

4.5

4.5

5.5

4.5

4.5

1

5

3.5

5.5

3

1

1

1

6

1

2

2

5

5

Figure 8.2: A didactic example


The expected utility model, which is the subject of the next section, allowsto resolve this paradox and, more generally, to take different possible attitudestowards risk into account.

8.4.3 The expected utility approach

As the preferences of the decision-maker are not necessarily linearly linked to theevaluations of the actions, it may be useful to replace these evaluations by the“psychological values” they have for the decision-maker through so-called utilityfunctions (Fishburn 1970).

Denoting by u(xi) the utility of the evaluation xi, the expected utility value ofa decision leading to the evaluation xi with probability pi(i = 1, 2, ..., n) is givenby ∑

i

piu(xi).

This model dates back at least to Bernoulli (1954) but the basic axioms, interms of preferences, were only studied in the present century (see for instance vonNeumann and Morgenstern 1944).

In the case of the St. Petersburg game, if we denote by u(x) the utility of“winning xe”, the expected utility of refusing the game is u(0), while the expectedutility of betting an amount of se in the game is

∞∑k=1

1/2ku(2k − s).

As an exercise, the reader can verify that for a utility function defined by

u(x) ={x/220 iff x ≤ 220,1 iff x > 220,

the expected utility of betting se in the game is positive (hence superior to theexpected utility of refusing the game) as long as s is less than 21(1 − 1/220)e,and is negative for larger values. The expected utility can also be finite with anunbounded utility function such as, for example, the logarithmic function.

In the example in Figure 8.2 and with a utility function defined byu(1) = u(2) = 1,u(3) = u(3.5) = 2,u(4.5) = u(5) = u(5.5) = 3,u(6) = u(7) = 4,

we obtain the tree given in Figure 8.5.The optimal action is then to apply decision A at the beginning of the first

period and decision C at the beginning of the second period, contrary to what wasobtained with the expected value approach.


N1

N3 C 4.5

5.25N2 D

Best decision Value

E 5N4

GN5 5

A

B

S(1/2)

T(1/2)

S(1/2)

T(1/2)

Figure 8.3: Application of the expected value approach

A

B

U

10

20

U

S

T

S

T

9

20

15

15

Figure 8.4: Illustration of the compensation effect


N1

A

B

S(1/2)

T(1/2)

S(1/2)

T(1/2)

13/4N2 C

Best decision Value

N3 C 1/2

E 11/4N4

EN5 1/2

Figure 8.5: Application of the expected utility approach

8.4.4 Some comments on the expected utility approach

Much literature is devoted to this approach, the probabilities being objective orsubjective: see for example Savage (1954), Luce and Raiffa (1957), Ellsberg (1961),Fishburn (1970) and Fishburn (1982), Allais and Hagen (1979), McCord and deNeufville (1983), Loomes (1988), Bell et al. (1988), Barbera, Hammond and Seidl(1998))

We simply recall one or two characteristics here that every user should be awareof. As in every model, the expected utility approach implicitly assumes that thepreferences of the decision-maker satisfy some properties that can be violated inpractice. The following example illustrates the well-known Allais paradox (seeAllais 1953) . It is not unusual to prefer a guaranteed gain of 500 000e to analternative providing 500 000e with probability 0.89, 2 500 000e with probability0.1 and 0e with probability 0.01. Applying the expected utility model leads tothe following inequality

u(500 000) > 0.89u(500 000) + 0.1u(2 500 000) + 0.01u(0),

hence, grouping terms,

0.11u(500 000) > 0.1u(2 500 000) + 0.01u(0).

At the same time, it is reasonable to prefer an alternative providing 2 500 000ewith probability 0.1 and 0e with probability 0.9 to an alternative providing500 000e with probability 0.11 and 0e with probability 0.89. In this case, theexpected utility model yields

0.1u(2 500 000) + 0.9u(0) > 0.11u(500 000) + 0.89u(0),


W R GA 100 0 0B 0 100 0

W R GC 100 0 100D 0 100 100

Table 8.3

hence, grouping the terms,

0.1u(2 500 000) + 0.01u(0) > 0.11u(500 000),

which is in contradiction with the inequality obtained above. So, the expectedutility model cannot explain the two previous preference situations simultaneously.

A possible attitude in this case is to consider that the decision-maker shouldrevise his judgment in order to be more “rational”, that is, in order to satisfy theaxioms of the model. Another interpretation is that the expected utility approachsometimes implies unreasonable constraints on the preferences of the decision-maker (in the previous example, the violated property is the so-called independenceaxiom of Von Neumann and Morgenstern). This last interpretation led scientiststo propose many variants of the expected utility model, as in Kahneman andTversky (1979), Machina (1982, 1987), Bell et al. (1988), Barbera et al. (1998).

Before explaining why the expected utility model (or one of its variants) wasnot applied by the analyst in the electricity production planning problem, let usmention why using probabilities may cause some trouble in modelling uncertaintiesor risk. The following example illustrates the so-called Ellsberg paradox and isextracted from Fishburn (1970, p.172). An urn contains one white ball (W) andtwo other balls. You only know that the two other balls are either both red (R),or both green (G), or one is red and one is green. Consider the two situations inTable 8.3 where W, R, and G represent the three states according to whether oneball drawn at random is white, red or green. The figures are what you will be paid(in Euros) after you make your choice and a ball is drawn.

Intuition leads many people to prefer A to B and D to C, while the expectedutility approach leads to indifference between A and B and as well as between Cand D.

This type of situation shows that the use of the probability concept may bedebatable for representing attitude towards risk or uncertainty; other tools (pos-sibility theory, belief functions or fuzzy integrals) can also be envisaged.


Events Probab. C DU 1/4 7 4.5V 3/4 4.5 5.5

Table 8.4

8.4.5 The approach applied in this case: first step

We will now present the approach that was applied in the electricity productionplanning problem. This approach is certainly not ideal (some drawbacks will bepointed out in the presentation). However, it does not aggregate the multiplecriteria consequences of the decisions into a single dimension, thus avoiding someof the pitfalls mentioned in Chapter 6 on the multi-attribute value functions.Moreover, it does not introduce a discounting rate for the dynamic aspect (seeChapter 5) and it allows to model the particular preferences of the decision-makeralong each evaluation scale.

In the electricity production planning problem described in Section 8.3, theanalyst did not know whether the probabilities given by the company were reallyprobabilities (and not “plausibility coefficients”) and it was not sure that the con-sequences of one scenario were really comparable to the consequences of another.On the one hand, it was definitely excluded to transform all the consequences intomoney and to aggregate them with a discounting rate (as in Chapter 5). On theother hand, the company was not prepared to devote much time to the clarificationof the probabilities and to long discussions about the multiple criteria and dynamicaspects of the problem, so that it was impossible to envisage an enriched variant ofthe expected utility model. The analyst decided to propose a paired comparisonof the actions, scenario by scenario, as illustrated below for the didactical examplepresented in Figure 8.2.

At node N2, we have to consider Table 8.4.The comparison between C and D was made on the basis of the differences

in preference between them for each of the considered events similarly to whatis done in the Promethee method (Brans and Vincke 1985). Let us consider apreference function defined by

f(x) ={

1 ∀x > 1,0 elsewhere,

where x is the difference in the evaluations of two decisions. Other functions canbe defined similarly to what is done in the Promethee method. This functionexpresses the fact that a difference which is smaller or equal to 1 is consideredto be non significant. As we see, an advantage of this approach is to enable theintroduction of indifference thresholds.

The analyst proposed the following index to measure the preference of C overD, on the basis of the data contained in Table 8.4:

1/4× f(7− 4.5) + 3/4× f(4.5− 5.5) = 1/4,


C DC 0 1/4D 0 0

Table 8.5

Events Probab. C DY 3/4 4.5 1Z 1/4 4.5 5

Table 8.6

while the preference of D over C is given by

1/4× f(4.5− 7) + 3/4× f(5.5− 4.5) = 0.

These preference indices are summarised in Table 8.5. The score of each deci-sion is then the sum of the preferences of this decision over the other minus thesum of the preferences of the other over it. In the case of Table 8.5, this triviallygives 1/4 and −1/4 as respective scores for C and D. The maximum score deter-mines the chosen decision. So, the chosen decision at node N2 is C. Remark that,despite the analyst’s doubt about the real nature of the “probabilities”, he usedthem to calculate a sort of expected index of preference for each decision over eachother decision. This is certainly a weak point of the method and other tools, whichwill be described in a volume in preparation, could have been used here. Note alsothat, in the multiple criteria case, a (possibility weighted) sum is computed for allthe criteria in order to obtain the global score of a decision.

At node N3, we have to consider Table 8.6, leading to the preference indicespresented in Table 8.7. For example, the preference index of C over D is

3/4× f(4.5− 1) + 1/4× f(4.5− 5) = 3/4.

The scores of C and D are respectively 3/4 and −3/4, so that the chosendecision at node N3 is also C.

At node N4, decision E dominates F and G and is thus chosen (where “domi-nates” means “is better in each scenario”).

At node N5, we must consider Table 8.8.The preference index of G over E (for example) is

C DC 0 3/4D 0 0

Table 8.7


Probab. E F GY 3/4 6 2 5Z 1/4 1 2 5

Table 8.8

E F GE 0 3/4 0F 0 0 0G 1/4 1 0

Table 8.9

3/4× f(5− 6) + 1/4× f(5− 1) = 1/4.

The other preference indices are presented in Table 8.9; they yield 1/2, −7/4and 5/4 as respective scores for E, F and G, so that G is the chosen decision atnode N5.

We can now consider Table 8.10 associated to N1. The values in this table arethose that correspond to the chosen decisions at the nodes N2 to N5 (they areindicated in parentheses).

On basis of this table, the preference of A over B is

1/8f(3.5) + 3/8f(−1) + 3/8f(−0.5) + 1/8f(−0.5) = 1/8,

while the preference of B over A is

1/8f(−3.5) + 3/8f(1) + 3/8f(0.5) + 1/8f(0.5) = 0,

giving A as the best first decision.In conclusion, the “optimal action” obtained through this first step consists

in choosing A at the beginning of the first period and C at the beginning of thesecond period.

This approach allows to take the comparisons of the decisions separately foreach scenario into account. Let us illustrate this point for the example of Figure

Scenarios Probab. A BS-U 1/8 7(C) 3.5(E)S-V 3/8 4.5(C) 5.5(E)T-Y 3/8 4.5(C) 5(G)T-Z 1/8 4.5(C) 5(G)

Table 8.10


8.4, where 9 has been replaced by 10 in the evaluation of B for event U. If theprobabilities of S,T and U are equal to 1/3, the expected utility approach givesthe same value 1/3

(u(10) + u(15) + u(20)

)to A and B that are thus considered

as indifferent. However, if we compare A and B separately for each event, we seethat B is better than A for events S and T, with a probability equal to 2/3. Theapproach described in this section will give a preference index of A over B equalto

1/3× f(10− 15) + 1/3× f(15− 20) + 1/3× f(20− 10)

and a preference index of B over A equal to

1/3× f(15− 10) + 1/3× f(20− 15) + 1/3× f(10− 20).

With the same function f as before, this will lead to the choice of B. Makingthe (natural) assumption that f(x) = 0 when x is negative, we see that thisapproach will lead to indifference between A and B only with a function f suchthat f(20− 10) = f(15− 10) + f(20− 15).

8.4.6 Comment on the first step

As this approach is based on successive pairwise comparisons, it also presents somepitfalls which must be mentioned. The example presented in Figure 8.6 will allowto illustrate a first drawback. In this example, three periods of time are considered,but there are no uncertainties during the first two periods. Two decisions A andB are possible at the beginning of the first period. At the beginning of the secondperiod, two decisions C and D are possible after A and only one decision is possibleafter B. At the beginning of the third period, two decisions E and F are possibleafter C while only one decision is possible in each of the other cases. During thelast period, three events S, T and U can occur, each with a probability of 1/3.Let us apply the approach described in Section 4.5 with the same function f .

At node N4, the preference index of E over F will be

1/3× f(10− 15) + 1/3× f(15− 20) + 1/3× f(20− 0) = 1/3,

while the preference index of F over E will be

1/3× f(15− 10) + 1/3× f(20− 15) + 1/3× f(0− 20) = 2/3,

so that F will be the decision chosen at node N4.At node N2, we must consider Table 8.11, where the values of C are those of

F (decision chosen at node N4).On basis of Table 8.11, we compute the preference index of C over D by

1/3× f(15− 20) + 1/3× f(20− 0) + 1/3× f(0− 5) = 1/3,

and the preference of D over C by


T

S

S

U

T

T

T

S

S

U

U

U

N7

N8

N11

N14

N15N1

N3

N2

N4

N5

N6

N18

N17

N16

N13

N12

N10

N9

B

C

F20

15

20

15

10

0

20

0

5

0

5

10

A

E

D

Figure 8.6: A pitfall of the first step

Events Probab. C DS 1/3 15 20T 1/3 20 0U 1/3 0 5

Table 8.11


Events Probab. A BS 1/3 20 0T 1/3 0 5U 1/3 5 10

Table 8.12

Events Probab. B (A,C,E)S 1/3 0 10T 1/3 5 15U 1/3 10 20

Table 8.13

1/3× f(20− 15) + 1/3× f(0− 20) + 1/3× f(5− 0) = 2/3,

so that D will be the decision chosen at node N2.At node N1, we must consider the Table 8.12, where the values of A are those

of D (decision chosen at node N2).On basis of Table 8.12, the preference index of A over B is given by

1/3× f(20− 0) + 1/3× f(0− 5) + 1/3× f(5− 10) = 1/3,

while the preference index of B over A is

1/3× f(0− 20) + 1/3× f(5− 0) + 1/3× f(10− 5) = 2/3,

so that B will be chosen at node N1.In conclusion, the methodology leads to the choice of the action B despite the

fact that it is dominated by the action (A,C,E) as is shown in Table 8.13.This is due to the fact that the comparisons are “too local” in the tree; in the

concrete application described in this chapter another drawback was the fact that,for decisions at nodes relative to the last periods, the evaluations were not verydifferent, due to the large common part of the actions and scenarios precedingthese decisions. The conclusion was many indifferences between the decisions ateach decision node.

To improve the methodology, the analyst proposed to introduce a second stepthat is the subject of the next section.

8.4.7 The approach applied in this case: second step

In order to introduce more information into the comparisons of local decisions andto take the tree as a whole into account, a second step was added by the analyst.


Events Probab. C D E(N4)U 1/4 7 4.5 3.5V 3/4 4.5 5.5 5.5

Table 8.14

At each decision node, the local decisions are also compared to the best actions inthe same scenarios in each of the branches of the tree.

In Figure 8.2, at node N2, C and D are also compared to the best decision inN4, i.e. to E (after event S).

This leads to the consideration of Table 8.14Using the same preference function as before, the preference of C over D is

still 1/4 (see section 4.4), the preference of D over C is still 0, the preference ofC over E is [1/4 × f(3.5) + 3/4 × f(−1)] = 1/4, the preference of E over C is[1/4× f(−3.5) + 3/4× f(1)] = 0, the preference of D over E is [1/4× f(1) + 3/4×f(0)] = 0 and the preference of E over D is [1/4× f(−1) + 3/4× f(0)] = 0.

Table 8.15 summarises these values.

C D EC 0 1/4 1/4D 0 0 0E 0 0 0

Table 8.15

The scores for C and D are respectively 1/2 and −1/4, C is therefore chosenat node N2.

At node N3, we compare C and D with the best decision in N5, i.e. with G(after event T), on basis of Table 8.16.

Table 8.17 gives the preference indices.The scores of C and D are respectively 3/4 and −3/2, so that C is also chosen

in N3.The analysis of N4 (comparison of E, F, G and C (N2)) and of N5 (comparison

of E, F, G and C (N3)) lead to the same conclusions as in the first step, so that,in this example, the second step does not change anything.

However, the interest of this second step is to choose, at each decision node,a decision leading to a final result that is strong, not only locally, but also in

Events Probab. C D GY 3/4 4.5 1 5Z 1/4 4.5 5 5

Table 8.16


C D GC 0 3/4 0D 0 0 0G 0 3/4 0

Table 8.17

Prob. E F D B1/3 10 15 20 01/3 15 20 0 51/3 20 0 5 10

Table 8.18

comparison with the strongest results obtained during the first step in the otherbranches of the tree (always in the same scenarios). This is illustrated by theexample in Figure 8.6 where the second step works as follows. At node N4, wecompare E and F with D and B (the best actions in the other branches as theyare unique), through Table 8.18

Table 8.19 presents the preference indices.The scores of E and F respectively become 1 and 1/3, so that the best decision

at N4 is now E.At N2, we have to compare C (followed by E) with D and B (best action in

the other branch): the scores of C and D are respectively 4/3 and -2/3, so thatthe best decision in N2 is now C.

At N1, we have to compare A (followed by C and E) with B and we chooseA (that dominates B). So we see that this second step somehow avoids to choosedominated actions, although this property is not guaranteed in all cases.

8.5 Conclusions

This approach (first and second steps) was successfully implemented and appliedby the company (after many difficulties due to the combinatorial aspects of theproblem) and some visual tools were developed in order to facilitate the decision-

E F D BE 0 1/3 2/3 1F 2/3 0 1/3 2/3D 1/3 2/3 0 1/3B 0 1/3 2/3 0

Table 8.19

8.5. CONCLUSIONS 201

maker’s understanding of the problem.

Let us now summarise the characteristics of this approach. It presents thefollowing advantages:

• it compares the consequences of a decision in a scenario with the conse-quences of another decision in the same scenario;

• it allows to introduce indifference thresholds or, more generally, to modelthe preferences of the decision-maker for each evaluation scale.

However, this approach also presents some mysterious aspects that should bemore thoroughly investigated:

• it computes a sort of expected index for preference of each action over eachother action, although the role of the so-called probabilities is not that clearin the modelling of uncertainty;

• it is a rather bizarre mixture of local (first step) and global (second step)comparisons of the actions, but it does not guarantee that the chosen actionis non-dominated.

The literature on the management of uncertainty is probably one of the mostabundant in decision analysis. Beside the expected utility model (traditional ap-proach), a lot of other approaches were studied by many authors, such as Dekel(1986), Jaffray (1989), Munier (1989), Quiggin (1993), Gilboa and Schmeidler(1993), . . . They pointed out more or less desirable properties: linearity, replace-ment separability, mixture separability, different kinds of independence, stochasticdominance, . . . Moreover, as mentioned by Machina (1989), it is important to makethe distinction between what he calls static and dynamic choice situations. A dy-


A

B

0.5

0.2

0.80

10

0

500.5

N1

1

Figure 8.7: The dynamic consistency

namic choice problem is characterised by the fact that at least one uncertaintynode is followed by a decision node (this is typically the case of the application de-scribed in this chapter). In such a context, an interesting property is the so-calleddynamic consistency: a decision-maker is said to be dynamically inconsistentif his actual choice when arriving at a decision node differs from his previouslyplanned choice for that node.

Let us illustrate this concept by a short example. Assume that a decision-maker prefers a game where he wins 50e with probability 0.1 (and nothing withprobability 0.9) to a game where he wins 10e with probability 0.2 (and nothingwith probability 0.8). At the same time, he prefers to receive 10e with certaintyto a game where he wins 50e with probability 0.5 (and nothing with probability0.5). Note that these preferences violate the independence axiom of Von Neumannand Morgenstern. Now consider the tree of Figure 8.7.

According to the previous information, the actual choice of the decision-maker,at node N1, will be B. However, if he has to plan the choice between A and Bbefore knowing the first choice of nature, he can easily calculate that if he choosesA, he wins 50e with probability 0.1 (and nothing with probability 0.9), while if hechooses B, he wins 10e with probability 0.2 (and nothing with probability 0.8),so that the best choice for him (before knowing the first choice of nature) is A.

So, the actual choice at N1 differs from the planned choice for that node,illustrating the so-called dynamic inconsistency. It can be shown that any depar-ture from the traditional approach can lead to dynamic inconsistency. However,Machina (1989) showed that this argument relies on a hidden assumption con-cerning behaviour in dynamic choice situations (the so-called consequentialism)and argued that this assumption is inappropriate when the decision-maker is a“non-expected utility maximiser”.

This example shows that no approach can be considered as ideal in the contextof decision under uncertainty. As for the other situations studied in this book,each model, each procedure, can present some pitfalls that have to be known bythe analyst. Knowing the underlying assumptions of the decision-aid model which


will be used is probably the only way, for the analyst, to guarantee an as scientificas possible approach of the decision problem. It is a fact that, due to lack oftime and other priorities, many decision tools are developed in real applicationswithout taking enough precautions (this is also the case in the example presentedin this chapter, due to the short delays and to the necessity of overcoming thecombinatorial aspects of the problem). This is why we consider providing someguidelines for modelling a decision problem important to the analysts: this will bethe subject of a volume in preparation.

9SUPPORTING DECISIONS: AREAL-WORLD CASE STUDY

Introduction

In this chapter1 we report on a real world decision aiding process which took placein a large Italian firm, in late 1996 and early 1997, concerning the evaluationof offers following a call for tenders for a very important software acquisition.We will try to extensively present the decision process for which the decisionsupport was requested, the actors involved, the decision aiding process, includingthe problem structuring and formulation, the evaluation model created and themultiple criteria method adopted. The reader should be aware of the fact thatvery few real world cases of decision support are reported in literature althoughmuch more occur in reality (for noteworthy exceptions see Belton, Ackermann andShepherd 1997, Bana e Costa, Ensslin, Correa and Vansnick 1999, Vincke 1992,Roy and Bouyssou 1993).

We introduce such a real case description for two reasons.1. The first reason consists in our will to give an account of what providing de-cision support in a real context means and to show the importance of elementssuch as the participating actors, the problem formulation, the construction of thecriteria etc., often neglected in many conventional decision aiding methodologiesand in operational research. From this point of view the reader may find questionsalready introduced in previous chapters of the book, but here they are discussedfrom a decision aiding process perspective.2. The second reason is our will to introduce the reader to some concepts andproblems that will be extensively discussed in a forthcoming volume by the au-thors. Our objective is to stimulate the reader to reflect on how decision supporttools and concepts are used in real life situations and how theoretical researchmay contribute to aide real decision– makers in real decision situations. Moreprecisely, the chapter is organised as follows. Section 1 introduces and definessome preliminary concepts that will be used in the rest of the chapter such asdecision process, actors, decision aiding process, problem formulation, evaluationmodel etc.. Section 2 presents the decision process for which the decision sup-port was requested, the actors involved and their concerns (stakes), the resources

1A large part of this chapter uses material already published in Paschetta and Tsoukias (1999).

205

206 CHAPTER 9. SUPPORTING DECISIONS

involved and the timing. Section 3 describes the decision aiding process, mainlythrough the different “products” of such a process that are specifically analysed(the problem formulation, the evaluation model and the final recommendation)and discusses the experience conducted. The clients’ comments on the experienceare also included in this section. Section 4 summarises the lessons learned in suchan experience. All technical details are included in Appendix A (an ELECTRE-TRI type procedure is used), while the complete list of the evaluation attributesis provided in Appendix B.

9.1 Preliminaries

We will make extensive use of some terms (like actor, decision process etc.) in thischapter that, although present in literature (see Simon 1957, Mintzberg, Rais-inghani and Theoret 1976, Jacquet-Lagreze, Moscarola, Roy and Hirsch 1978,Checkland 1981, Heurgon 1982, Masser 1983, Humphreys, Svenson and Vari 1993,Moscarola 1984, Nutt 1984, Rosenhead 1989, Ostanello 1990, Ostanello 1997, Os-tanello and Tsoukias 1993), can have different interpretations. In order to helpthe reader understand how such terms are used in this presentation we introducesome informal definitions.

• Decision Process: a sequence of interactions amongst persons and/or organ-isations characterising one or more objects or concerns (the “problems”).

• Actors: the participants in a decision process.

• Client: an actor in a decision process who asks for a support in order todefine his behaviour in the process. The term decision–maker is also usedin the literature and in other chapters of this book, but in this context weprefer to use the term client.

• Analyst: an actor in a decision process who supports a client in a specificdemand.

• Decision Aiding Process: part of the decision process and more precisely theinteractions occurring at least between the client and the analyst.

• Problem Situation: a descriptive model of what happens in the decision pro-cess when the decision support is requested and what the client is expectingto obtain form the decision support (this is one of the products of the decisionaiding process).

• Problem Formulation: a formal representation of the problem for which theclient asked the analyst to support him (this is one of the products of thedecision aiding process).

• Evaluation Model: a model creating a specific instance of the problem for-mulation for which a specific decision support method can be used (this isone of the products of the decision aiding process).

9.2. THE DECISION PROCESS 207

9.2 The Decision Process

In early 1996 a very large Italian company operating a network based service de-cided, as part of a strategic development policy, to equip itself with a GeographicalInformation System (GIS) on which all information concerning the structure of thenetwork and the services provided all over the country was to be transferred. How-ever, since (at that time) this was quite a new technology, the company’s Infor-mation Systems Department (ISD) asked the affiliated research and developmentagency (RDA) and more specifically the department concerned with this type ofinformation technology (GISD) to perform a pilot study of the market in order toorient the company towards an acquisition. The GISD of the RDA noticed that:

• the market offered a very large variety of software which could be used as aGIS for the company’s purposes;

• the company required a very particular version of GIS that did not exist asa ready made product on the market, but had to be created by customisingand combining different modules of existing software, with the addition ofad-hoc written software for the purpose of the company;

• the question asked by the ISD was very general, but also very committing,because it included an evaluation prior to an acquisition and not just a simpledescription of the different products;

• the GISD felt able to describe and evaluate different GIS products based ona set of attributes (at the end several hundreds), but was not able to providea synthetic evaluation, the purpose of which was just as obscure (the useof a weighted sum was immediately set aside because it was perceived as“meaningless”).

At this point of the process the GISD found out that a unit concerned withthe use of the MCDA (Multiple Criteria Decision Analysis) methodology in soft-ware evaluation (MCDA/SE) was operating within the RDA and presented thisproblem as a case study opening a specific commitment. The MCDA/SE unitresponsible then decided to activate its links with an academic institution in orderto get more insight and advice on the problem that soon appeared to overcome theknowledge level of the unit at that time. At this point we can make the followingremarks.

• The decision process for which the decision aid was provided concerned the“acquisition of a GIS for X (the company)”. The actors involved at this levelare the company’s IS manager, acquisition (AQ) manager, the RDA, differ-ent suppliers of GIS software, some of the company’s external consultantsconcerned with software engineering.

• A first decision aiding process was established where the client was the ISmanager and the analyst was the GIS department of the RDA.


• A second decision aiding process was established where the client was theGIS department of the RDA and the analyst was the MCDA/SE unit. Athird actor involved in this process was the “supervisor” of the analyst inthe sense of someone supporting the analyst in different tasks, providing himwith expert methodological knowledge and framing his activity.

We will focus our attention on this second decision aiding process where fouractors are involved: the IS manager, the GISD (or team of analysts) as the client(bear in mind their particular position of clients and analysts at the same time),the MCDA/SE unit as the analyst and the supervisor.

The first advice by the analyst to the GISD was to negotiate a more specificcommitment such that their task could be more precise and better defined withtheir client. After such a negotiation the GISD’s activity has been defined as“technical assistance to the IS manager in a bid, concerning the acquisition of aGIS for the company” and its specific task was to provide a “technical evaluation”of the offers that were expected to be submitted. For this purpose the GISD drafteda decision aiding process outline where the principal activities to be performed werespecified, as well as the timing, and submitted this draft to its client (see figure9.1). At this point it is important to note the following.

1. The call for tenders concerned the acquisition of hundreds of software li-censes, plus the hardware platforms on which such software was expected torun, the whole budget being several million e. From a financial point of viewit represented a large stake for the company and a high level of responsibilityfor the decision–makers.

2. From a procedural point of view the administration of a bid of this type isdelegated to a committee which in this case included the IS manager, theAQ manager, a delegate of the CEO and a lawyer from the legal staff. Fromsuch a perspective the task of the GISD (and of the decision aiding process)was to provide the IS manager with a “global” technical evaluation of theoffers that could be used in the negotiations with the AQ manager (insideof the committee) and the suppliers (outside of the committee).

3. As already noted before, the bid concerned software that was not ready made,but a collection of existing modules of GIS software which was expected tobe used in order to create ad-hoc software for the specific necessities of thecompany. Two difficulties arose from this:

• the a priori evaluation of the software behaviour and its performancewithout being able to test it on specific company-related cases;

• the timing of the evaluation (including testing the offers) could be ex-tremely long compared with the rapidity of the technological evolutionof this type of software.

9.2. THE DECISION PROCESS 209

Bid Start

Preparation of call for tenders

Client desired environment

study

Methodology study

First Selection

Call for tenders answer

preparation

Definition of requirements,

points of view & decision problem

Problem Formulation

Make invitation letter

Call for tenders

First set of answers from suppliers

Tender preparation

Completion of decision model

for second selection

Completion of decision model for ranking:

definition of criteria & aggregation procedure

Lab preparation for prototype

evaluation

Invitation letter

Second selection

Second set of answers from suppliers

Definition of prototype

requirements

Prototype Development

Prototype analysis; sorting & final ranking

Final Choice

Prototype Requirements

Prototypes

technical advisor

client

supplier

advisor + client

Figure 9.1: the bid process


Once the call for tenders had been prepared (including the software require-ments sections, the tenderers requirements section, the timing and evaluation pro-cedure), a set of was presented to the company and the technical evaluation activitywas settled. It is interesting to notice that the GISD staff charged with this evalu-ation has been “supported” by external consultants, software engineering expertsin the company’s sector who practically acted as the IS manager’s delegates in thegroup. It is this extended group that signed the final recommendation presentedto the IS manager and that we will hereafter call “team of analysts” (for the ISmanager) or client (for the MCDA/SE unit and for us).

A second step in the decision aiding process was the generation of a problemformulation and of an evaluation model. Although we formally consider the twoas two distinct products of the process, in reality and in this case specifically, theyhave been generated contemporaneously. We will discuss the problem formulationand the evaluation model in detail in the next section, but we can anticipate thatthe final formulation consisted in an absolute evaluation of the offers under a setof points of view that could be divided into two parts: the “quality evaluation”and the “performance evaluation”. Although the set of alternatives was relativelysmall (only six alternatives were considered), the set of attributes was extremelycomplex (as often happens in software evaluation). Actually there were seven basicevaluation dimensions, expanded in an hierarchy with 134 leaves resulting in 183evaluation nodes (see Appendix B).

A third and final step in the decision aiding process was the elaboration of thefinal recommendation after all the necessary information for the evaluation hadbeen obtained and the evaluation performed. We will discuss such constructionsin detail in the next sections, but we can anticipate that such an elaborationhighlighted some questions (substantial and methodological) that have not beenconsidered before.

Some months after the end of the process and the delivery of the final reportwe asked our client (the team of analysts) to discuss their experience with us andto answer some questions concerning the methodology used, how they perceived it,what they learned and what their appreciation was. The discussion was conductedin a very informal way, but the client provided us with some written remarks thatwere also reported during a conference presentation (see Fiammengo, Buosi, Iob,Maffioli, Panarotto and Turino 1997). Such remarks are introduced in the followingsection.

9.3 Decision Support

We present the three products of the decision aiding process here: the problemformulation, the evaluation model and the final recommendation. We should re-member that the problem formulation and a first outline of the evaluation modelwere established while the call for tenders was under elaboration for two reasons:

• for legal reasons, an outline of the evaluation model has to be included in

9.3. DECISION SUPPORT 211

the call for tenders;

• the evaluation model implicitly contains the software requirements of theoffers which in turn defines the information to be provided by the tenderers.For instance, the call for tenders specified that a prototype was requested inorder to test some performances. The tenderers therefore knew that they hadto produce a prototype within a certain time frame. The choice to introducesome tests was made during the definition of the evaluation model.

9.3.1 Problem Formulation

From the presentation of the process we can make the following observations:

1. It was extremely important for the client (the team of analysts) to under-stand his role in the process, what his client expected and what they wereable to provide. In fact, at the beginning of the process, the problem situ-ation was absolutely unclear. Moreover, the client considered to be able tounderstand that the expectations of the other actors involved in the processwere extremely relevant both for strategic reasons (having to do with or-ganisational problems of the company) and operational reasons (recommendsomething reliable in a clear and sound way for all the actors involved in thebid).

Reporting the client’s remarks: “....MCDA (Multi Criteria Decision Analy-sis) was very useful in organising the overall process and structure of the bid:what were the important steps to do, how to define the call for tenders,....”,“....MCDA was used as a background for the whole decision process. Withsuch a perspective it turned out to be very useful because every activity had ajustification....”, “....as a formal process MCDA guaranteed greater controland transparency to the process....”, “A complex process, such as a bid, couldbe greatly eased by the use of any process centred methodology.”

It is this last sentence which clearly highlights the necessity for the client tohave a support along the whole process and for all its aspects, which couldbe able to take what was happening in the decision process into account. Weactually agree with their comment that “any process modelled methodologycould be useful” and we consider that their positive perception of MCDA isbased on the fact that it was the first decision support approach process theycame to know.

2. We recall the client’s remarks: “....as a formal approach MCDA generatedgreater control and transparency....”. Complex decision processes are basedon human interactions and these are based on the intrinsic ambiguity ofhuman communication (thanks to ambiguity human communication is alsovery efficient). However, such an ambiguity might result in an impossibilityto understand and ultimately to propose viable solutions. Moreover, whensignificant stakes are considered (as in our case), decision–makers may con-sider it dangerous to make a decision without having a clear idea of the


consequences of their acts. The use of a formal approach enables the reduc-tion of ambiguity (without completely eliminating it) and thus appears tobe an important support to the decision process.

It is clear that defining a precise problem formulation became a key issue forthe client because it clarified his role in the decision process (the bid management),his relation with the IS manager (his client) and gave him a precise activity toperform.

We define (Morisio and Tsoukias 1997) a problem formulation as the collectionof: a set of actions, a set of points of view and a problem statement. The only pointthat caused a discussion in the analysts team concerning the problem formulationwas the problem statement. The set of alternatives was considered to be the set ofoffers submitted after the call for tenders. A first idea to evaluate the tenderers,as well as the offers, was eliminated due to the particular technology where noconsolidated producers exist. The set of points of view was defined using theteam of analysts’ technical knowledge and can be viewed in two basic sets. Oneconcerning “quality” including specific technical features required for the softwareplus some ISO/IEC 9126 (1991) based dimensions and the second concerning theperformance of the offered software to be tested on prototypes. Such points ofview formed a huge hierarchy (see further on for details). No cost estimates wererequired by the client and so they were not considered in this set.

After some discussion the problem statement adopted was the one of an “ab-solute” evaluation of the offers both on a disaggregated level and on a global one.Actually, the team of analysts interpreted the client’s demand as a question ofwhether the offers could be considered as intrinsically “good”, “bad” etc. and notto compare bids amongst themselves. There were two reasons for this choice.

1. A simple ranking of the offers could conceal the fact that all of them couldbe of very poor quality or satisfy the software requirements to a very lowlevel. In other words it could happen that the best bid could be “bad” andthis was incompatible with the importance and cost of the acquisition.

2. The team of analysts felt uncomfortable with the idea of comparing themerits (or de-merits) of an offer with merits (or de-merits) of another offer.A first informal discussion of the problem of compensation convinced themto overcome this question by comparing the offers to profiles about whichthey had sufficient knowledge.

If we interpret the concept of measurement in a wide sense (comparing theoffers to pre-established profiles can be viewed as a measurement procedure) theresult that the team of analysts was looking for appeared to be the conclusionof repeated aggregations of measures. Using the terminology introduced by Roy(1996), the problem statement appeared to be an hierarchically organised sortingof the offers, the sorting being repeated at all levels of the hierarchy.

As far as the problem formulation is concerned, an ex-post remark made by theteam of analysts concerned the length of the evaluation process. They consideredthat such a process was so long that the information available at the beginning and


the formulation itself could no longer be valid at the end of the process. This waspartly due to the very rapid evolution of GIS technology that could completelyinnovate the state of the art in six months. Another observation made by part ofthe team of analysts was that towards the end of the process, due to the knowledgeacquired in this period (mainly due to the process itself), they could revise someof their judgements. Actually, the length of the evaluation was considered as anegative critical issue in the client’s remarks.

The final report did not consider any revision of the formulation and the eval-uations since in the context of a call for tenders, it could be considered unfair tomodify the evaluations just before the final recommendation.

We consider that this is a critical issue for decision support and decision aidingprocesses. Information is valid only for a limited period of time and consequentlythe same is true for all evaluations based on such information. Moreover theclient himself may revise the problem formulation or update his perception of theinformation and modify his judgements. This is rarely considered in decision aidingmethodologies. While for relatively short decision aiding processes the problemmay be irrelevant, it is certain that in long processes such a problem cannot beneglected and requires specific consideration.

9.3.2 The Evaluation Model

The different components of the evaluation model were specified in an iterativefashion. In the following we present their definition as they occurred in the decisionaiding process. We may notice that despite the fact that we had a large amountof information to handle in our model, the case did not present any exogenousuncertainty since the client considered the basic data and its judgements reliableand felt confident with them.

The set of alternatives was identified as the set of offers legally accepted by thecompany in reply to the call for tenders. No preliminary screening of the offerswas expected to be made. Although each offer was composed of different modulesand software components, they have been considered as wholes.

The set of evaluation dimensions was a complex hierarchy with seven rootnodes, 134 leaves and 183 nodes in total (the complete list is available in AppendixB). This is a typical situation in software evaluation (see Morisio and Tsoukias1997, Blin and Tsoukias 1998, Stamelos and Tsoukias 1998). The key idea was thateach node of the hierarchy was an evaluation model itself for which the evaluationdimensions to aggregate and the aggregation procedure had to be defined. Eachnode was subject to extensive discussion before arriving at a final version. Basicallytwo issues have been considered in such discussions:- the choice of the attributes to use;- the semantics of each attribute.

Regarding the first issue, a frequent attitude of technical committees chargedwith evaluating complex objects (as in our case) is to define an “excellence list”where every possible aspect of the object is considered. Such a list is generallyprovided by the literature, the experience, international standards etc.. The resultis that such a list is an abstract collection of attributes, independent from the spe-


cific problem at hand, thus containing redundancies and conceptual dependencieswhich can invalidate the evaluation. Our client was aware of the problem, but hadno knowledge and no tools to enable him to simplify and reduce the first versionof the list they had defined. The repeated use of a coherence test (in the senseof Roy and Bouyssou 1993) for each intermediate node of the hierarchy made itpossible to eliminate a significant number of redundant and dependent attributes(more than 30%) and to better understand the semantics of each attribute used.Verifying the separability of each sub–dimension with respect to the parent nodewas very helpful, in the sense that each sub–node should be able to discriminatealone the offers with respect to the evaluation considered at the parent level.

Despite this work, the client wrote, in his ex-post considerations: “....it wasnot necessary to be so detailed in the evaluation; the whole process could be fasterbecause we needed the software for a due date; it could be preferable to use a limitednumber of criteria....”. On the other hand it is also true that it is only after theprocess that the client was able to determine which were the really significantcriteria that discriminated among the alternatives.

With respect to the second issue we pushed the client to provide us with a shortdescription of each attribute and when a preference model was associated to it, ashort description of the model (why a certain value was considered as better thananother). Such an approach helped the client both to eliminate redundancies (be-fore using the coherence test which is time consuming) and in better understandingthe contents of the evaluation model.

For instance, at a certain point in the hierarchy definition process, there wasa discussion about some attributes that could also be considered as leaves at thetop level of the hierarchy. These were the so called “process attributes”, i.e. theywere intended to evaluate special functionality inside different processes (in thiscontext “process” means a chunk of functionality aiming towards supporting astream of activities of a software). In fact, one can consider a process attribute (atthe final level) and then subdivide it in quality aspects, or alternatively considersingle independent quality aspects whose evaluation depends on how the processattribute is considered. The final choice was to put process attributes at the toplevel because directly emanating from the evaluation scope.

Such an activity also helped the client to realise that they needed an absoluteevaluation of the alternatives for almost all the intermediate nodes of the hierarchythus implicitly defining the problem statement of the model.

The basic information available was of the “subjective ordinal measurement”type. With this term we want to indicate that each alternative could be describedby a vector of the 134 elementary pieces of information that were in the largemajority either subjective evaluations by experts (mostly part of the team of an-alysts, the client) of the “good”, “acceptable” etc. type or descriptions of the“operating system X”, “compatible with graphic engine Y” etc. type. The latterwere expressed on nominal scales, while the former were expressed on ordinalscales. It was almost impossible that the experts could be able to give more in-formation than such an order and it was exactly this type of information thatpushed the client to look for another evaluation model than the usual weightedsum widely diffused in software evaluation manuals and standards (see ISO/IEC


9126 1991, IEEE 92 1992).Obtaining the information was not a difficult task, but a time consuming pro-

cess that required the establishment of an ad-hoc procedure during the process(see figure 9.1). We consider that this is also a critical issue in a decision aidingprocess. Gathering and obtaining the relevant information for an evaluation modelis often considered as a second level activity and therefore neglected from furtherspecific considerations. But such a problem can invalidate the problem formulationadopted. Moreover, the information used in an evaluation model results from themanipulation of the rough information available at the beginning of the process.We can consider that the information is constructed during the decision aidingprocess and cannot be viewed as a simple input.

Before continuing the definition of the model associated to each node the prob-lem of the aggregation procedure was faced since it could influence the constructionof such models. An important discussion with the client concerned the distinctionbetween measures and preferences.

As already reported, the basic information consisted either in observations con-cerning the offers (expressed in nominal scales) or in expert judgements (expressedin ordinal scales of value of the “good”, “acceptable” etc.. type). All the interme-diate nodes were expected to provide information of the second type. Clearly allnominal scales had to be transformed into ordinal ones, associating a preferencemodel on the elements of the nominal scale of the attribute. Under such a perspec-tive it was important for the client to understand on what they were expressingtheir preferences on.

Actually, the client did not compare the alternatives amongst themselves, butto a-priori defined (by the client) standards of “good”, “acceptable” etc.. Whenasked to formulate preferences they concerned the elements of the nominal scalesand not the alternatives themselves. The preference among the alternatives was ex-pected to be induced once the alternatives could be “measured” by the attributes.

From a certain point of view we can claim that, except for the final aggregationlevel, the client needed to aggregate ordinal measures and not preferences (in thesense that they had to aggregate the ordinal measures obtained when comparingthe alternatives to the standards and not to compare the alternatives amongstthemselves). Such an observation greatly helped the client to understand thenature and scope of the evaluation model and ultimately to define the problemstatement of the model. Moreover, the discussion on the different typologies ofmeasurement scales helped the client to understand the problem of choosing anappropriate aggregation procedure.

In our case, the presence of ordinal information for almost all leaves and theproblem statement that required a “repeated sorting” of the offers, oriented theteam of analysts to choose an aggregation procedure based on the ELECTRE-TRImethod (see Yu 1992). See also appendix A for a presentation of the procedure.

At this point the team was ready to define their specific evaluation models for allnodes. In particular we had the following cases.

1. For all leave nodes an ordinal scale was established. The available technicalknowledge consisted in different possible “states” in which an offer could find


itself. For instance, consider the leave nodes 1.1.1 (type of presentation onthe user interface in the land-base management), 1.1.2 (graphic engine ofthe user interface in the land-base management), 1.1.3 (customisation of theuser interface in the land-base management). The possible states on thesecharacteristics were:1.1.1: standard graphics (SG), non standard graphics (NSG);1.1.2: station M (M; graphic engine already adopted in other software usedin the company), other acceptable graphic engine (OA), other non accept-able graphic engine (ON);1.1.3: availability of a graphic tool (T), availability of an advanced graphiclanguage (E), availability of a standard programming language (S), no cus-tomisation available (N). In this case different possible combinations werepossible (for instance a software could provide both an advanced graphiclanguage and a standard programming language: value E,S). The three or-dinal scales associated to the three nodes were (� representing the scaleorder):1.1.1: SG � NSG;1.1.2: M � OA � ON;1.1.3: T,E,S � T,E � T,S � T � E,S � E � S � N.

2. For all parent nodes, a brief descriptive text of what the node was expectedto evaluate was provided. All parent nodes were equipped with the samenumber of classes: unacceptable (U), acceptable (A), good (G), very good(VI), excellent (E). Then, two possibilities for defining the relationship be-tween the values on the sub–nodes and the values on the parent nodes wereestablished.

2.1 When possible, an exhaustive combination of the values of the sub–nodes was provided. For instance consider node 1.1 (user interfaceof the land-base management) which has the three evaluation modelsintroduced in the previous example as sub–nodes. In this case we havethe following evaluation model:- E: T,E,S;M;SG or T,E;M;SG or T,S;M;SG;- VG: T;M;SG or T,E,S;OA;SG or T,E;OA;SG or T,S;OA;SG;- G: T;OA;SG or E,S;M;SG or E;M;SG;- A: all remaining cases except the unacceptable;- U: all cases where 1.1.1 is NSG or 1.1.2 is ON or 1.1.3 is N.

2.2 When an exhaustive combination of the values was impossible, an ELECTRE-TRI procedure was used. For this purpose, the following informationwas requested:- the relative importance of the different sub nodes;- the concordance threshold for the establishment of the outranking re-lation among the offers and the profiles;- a veto condition on the sub node such that the value on the parentnode could be limited (possibly unacceptable).

The relative importance of the sub–nodes and the concordance threshold


have been established using a reasoning on coalitions (for details see Chapter6). In other words the team of analysts established the characteristics of thesub–nodes for which an offer could be considered very good (therefore shouldoutrank the very good profile) and consequently compared the values of theparameters of relative importance and of the concordance threshold. Theveto condition was established as the presence of the value “unacceptable”at a sub–node. The presence of a veto also produced an “unacceptable”value at the level of the parent node. In other words, the team of analystsconsidered any “unacceptable” value to be a severe technical limitation of theoffer. The reader may notice that this is a very strong interpretation of a vetocondition among the ones used in the outranking based sorting procedures,but it was the one with which the team of analysts felt comfortable at thetime of construction of the evaluation model. The team of analysts alsoestablished very high concordance thresholds (never less than 80%, veryoften around 90%) that result in very severe evaluations. Such a choicereflected the conviction, of at least a part of the team of analysts, that verystrong reasons were required to qualify an offer as very good. Since the wholemodel was calibrated starting from the very good value, this conviction hadwider effects than the team of analysts could imagine. For example we cantake node 1 (land-base management) which has eight sub nodes:1.1: User interface;1.2: Functionality;1.3: Development environment;1.4: Administration tools;1.5: Work flow connection;1.6: Interoperability;1.7: Integration between land-base products and the Spatial Data manager;1.8: Integration among land-base products;The relative importance parameters were established as follows:w(1.1) =4, w(1.2) = 8, w(1.3) = 5, w(1.4) = 4, w(1.5) = 1, w(1.6) = 8, w(1.7) =8, w(1.8) = 2 and the concordance threshold was fixed as 29/36 (around0.8). Such choices imply that no coalition that excluded nodes 1.2 or 1.7was acceptable and that the smallest acceptable coalition should necessarilyinclude the nodes 1.2, 1.7, 1.3 and any two of the nodes 1.1, 1.4 and 1.6.The analyst and the supervisor explained this aspect to the client who onthis basis, revised the importance parameters several times.

3. As already mentioned, the set of dimensions was built around two basicpoints of view: the “quality” and the “performances”. The first generatedsix evaluation dimensions, which will be called the “quality attributes” or“quality criteria” or “quality part of the hierarchy” hereafter, correspondingto six (among seven) of the root nodes of the model. The seventh rootnode (node 7, sub–nodes 7.1, 7.2, 7.3, 7.4) concerned the evaluation of theperformances of the prototypes submitted to tests by the team of analysts.Such performances are basically measured in the time necessary to executea set of specific tasks under certain conditions and with some external fixed


parameters. For instance, consider node 7.3 (performance under load). Thedimension is expected to evaluate the performance of the prototype whilethe quantity of data that have to be elaborated increases. The value v(x)(x being an offer) combines an observed measure Wx(t) and an interpolatedone Tx(t) (t representing the data load; the interpolation is not necessarilylinear). The combination is obtained, in this case, through the followingformula:

v(x) =∫Wx(t)Tx(t)dt

In this case there are no external profiles with which to compare the perfor-mances because the prototypes are created ad-hoc, the technology is quitenew and there are no standards of what a “ very good” performance could be.An ordinal scale was created considering the best performances as “first”,all performances presenting a difference of more than 5% and less than 20%“second”, all performances presenting a difference of more than 20% and lessthan 25% “third”, all performances presenting a difference of more than 25%and less than 50% “fourth” and all performances presenting a difference ofmore than 50% “fifth”. The same model was applied to all sub–nodes ofnode 7. A sorting procedure could then be established to obtain the finalevaluation.

This process was repeated for all the intermediate nodes up to the seven rootnodes representing the seven basic evaluation dimensions. It took four to fivemonths for all the nodes to be equipped with their evaluation model and theprocess generated several discussions inside the team of analysts, mainly of atechnical nature (concerning the specific contents of the values for each node).The most discussed concept of the model was the concordance threshold and theveto condition since part of the team considered that the required levels wereextremely severe. However, since such an approach corresponded to a cautiousattitude, it prevailed in the team and finally was accepted. The length of theprocess is justified, not only by the quantity of nodes to define, but also becausethe team of analysts was obliged to define a new measurement scale and a precisemeasurement aggregation procedure for each node. Although this process canbe often qualified as “subjective measurement”, it was the only way to obtainmeaningful values for the offers. The set of criteria to be used, if a preferenceaggregation comparing the alternatives amongst themselves was requested, wasdefined as the seven root nodes equipped with a simple preference model: theweak order induced by the ordinal scale associated to each of these nodes.

No exogenous uncertainty was considered in the evaluation model. The in-formation provided by the tenderers concerning their offers was considered to bereliable and the use of ordinal scales made it possible to avoid the problems of im-precision or of measurement errors. This reasoning however, is less true for node 7and its sub–nodes, but the team of analysts felt sufficiently confident with the testsand did not analyse the problem further. Some endogenous uncertainty appearedas soon as the model was put into practice (the offers being available). We shall


discuss this problem in more detail in the next section (concerning the elaborationof the final recommendation), but we can anticipate that the problem was createdby the “double” evaluation provided by the chosen ELECTRE-TRI type aggrega-tion consisting in an “optimistic” and a “pessimistic” evaluation which may notnecessarily coincide.

The evaluation model was coded in a formal document that was submitted(and explained) to the final client receiving his consensus. It is worthwhile tonote that the final client was not able to participate in the elaboration of themodel (technical details, establishment of the parameters etc.). Part of the teamof analysts (some of the external consultants) were acting as his delegates. Theestablishment of the evaluation model and its acceptance by the client opened theway for its application on the set of offers received and for the elaboration of thefinal recommendation.

The client greatly appreciated his involvement in the establishment of the eval-uation model that turned out to be a product considered to be their own (fromtheir ex-post remarks: “....this (the involvement) turned out to be important....forthe acceptability of the evaluation results”). The fact that each node of the hier-archy was discussed, analysed and finally defined by the team of analysts allowedthem to understand the consequences for the global level, to be able to explain thecontents of the model to their client and justify the final result on the grounds oftheir own knowledge and experience, not of the procedure adopted.

In other words we can claim that the model was validated during its construc-tion. Such an approach helped both the acceptability of the model and the finalresult, eased the discussion when the question of the final aggregation was settledand definitely legitimated the model in the eyes of the client.

9.3.3 The final recommendation

The evaluation of the six offers, which effectively had been submitted after the callfor tenders was elaborated, was carried out in two main steps. The first consistingin evaluating the six “quality attributes” and the second consisting in testing theprototypes provided by the tenderers.

The method adopted to aggregate the information and construct the final eval-uations was a variant of the ELECTRE TRI procedure (see Yu 1992). The readercan also see Appendix A and refer to Chapter 6 for more details. We have thefollowing remarks on the use of such a method.

1. The key parameters used in the method are the profiles (to which the al-ternatives are compared in order to be classified in a specific class), theimportance of each criterion for each parent criterion classification and theconcepts of concordance thresholds and veto conditions.

For each intermediate node such parameters were extensively discussed be-fore reaching a precise numerical representation. As already mentioned insection 3.2 the relative importance of each criterion and the concordancethreshold were established using a reasoning based on the identification ofthe “winning coalitions” enabling the outranking relation to hold. The veto


condition was initially perceived as a theoretical possibility of no practicaluse, then, as an eliminatory threshold, but the client soon realised its impor-tance mainly when it was necessary to have an incomparability instead of anindifference that was a counterintuitive situation when very different objectswere compared. Further on and as soon as the veto conditions were under-stood by the client, they decided to introduce a similar concept each timesthey wanted to distinguish between positive reasons (for the establishmentof the outranking relation) and negative reasons (against the establishmentof the outranking relation), since they are not necessarily complementaryand must be evaluated in a separate and independent way.

The profiles were established using the knowledge of the team of analysts(experts in their domain) that were able to identify the minimal requirementsto qualify an object in a certain class. It is interesting to notice that for theclient, the intuitive idea of a profile was that of a typical object of a classand not of the lower bound of the class. The shift from the intuitive idea tothe one used in the case study was immediate and presented no problems.The fact remains, that the distinction between the two concepts of profileis crucial, while the lower bound approach appears to be less intuitive thanthe typical element one.

2. The whole method (and the model) was implemented on a spreadsheet. Thiswas of great importance because spreadsheets are a basic tool for communi-cation and work in all companies and enable an immediate understanding ofthe results. Moreover, they enable on-line what-if operations when specificproblems, concerning precise information and/or evaluation, appeared dur-ing the discussions inside the team of analysts. The experimental validationof the model was greatly eased by the use of the spreadsheet.

Further on it helped the acceptability and legitimation of the model throughthe idea that “if it can be implemented on a spreadsheet it is sufficientlysimple and easy to be used by our company”. In fact some of the critiques bythe client about the approach adopted in this case were that “....MCDA isnot yet a universally known method....”, “....seems less intuitive than otherwell known techniques such as the weighted sum...”, “....it is time consumingto apply a new methodology....”, all these problems limiting the acceptabilityof the methodology towards the client’s client (the IS manager) and thecompany more generally. Being able to implement the method and the modelon a spreadsheet was, for them, a proof that, although new, complex andapparently less intuitive, the method was simple and easy and thereforelegitimately used in the decision process.

A specific problem which was raised in the first step was the generation of un-certainty due to the aggregation procedure. The ELECTRE-TRI type procedureadopted produces an interval evaluation consisting in a lower value (the pessimisticevaluation) and an upper value (the optimistic evaluation). When an alternativehas a profile on the sub–nodes that is very different from the profiles of the classeson the parent node then, due to the incomparabilities that occur when comparing


O1 O2 O3 O4 O5 O6C1 A-A G-G A-VG A-G G-VG A-AC2 A-A G-VG A-VG A-VG G-G A-GC3 A-A G-G A-VG G-G A-A A-AC4 A-G G-VG A-VG G-VG A-VG A-GC5 U-U G-VG G-G A-G G-VG U-UC6 A-A VG-VG E-E VG-VG G-G VG-VG

Table 9.1: the values of the alternatives on the six quality criteria (U: unacceptable,A: acceptable, G: good, VG: very good, E: excellent)

the alternative to the profiles, it may happen that the two values do not coin-cide (see more details in Appendix A). When the user of the model is not able tochoose one of the two evaluations in an hierarchical aggregation can be a problemsince at the next aggregation the sub–nodes may have evaluations expressed onan interval. This is a typical case of endogenous uncertainty created by a methoditself and not by the available information. The client was keen to consider thepessimistic and optimistic evaluation as bounds of the “real” value, but there wasno uncertainty distribution on the interval. For this purpose, the following pro-cedure was adopted. Two distinct aggregations were made, one where the lowervalues were used and the other where the upper values were used. Each of these,in turn, may produce a lower value and an upper value. At the next aggregationstep, the lowest of the two lower values and the highest of the two upper values isused. This is a cautious attitude and has the drawback of widening the intervalsas the aggregation goes up the hierarchy. However, this effect did not occur hereand the final result for the six dimensions is represented in table 9.1 (from hereon we will represent the criteria by Ci and the alternatives by Oi).

We consider that the problem of interval evaluation on ordinal scales is an opentheoretical problem that deserves future consideration (very little literature on thesubject is available to our knowledge: (see Roubens and Vincke 1985, Vincke 1988,Pirlot and Vincke 1997, Tsoukias and Vincke 1999).

Another modification introduced in the aggregation procedure concerned theuse of the veto concept. As already mentioned, a strong veto concept was usedin the evaluation model such that the presence of an “unacceptable” value on anynode (among the ones endowed with such veto power) could result in a global“unacceptable” value. However, during the evaluation of the offers, weaker con-cepts of veto appeared necessary. The idea was that certain values could have a“limitation” effect of the type: “if an offer has the value x on a sub–node then itcannot be more than y on the parent node”.

The results on node 7 concerning the performances of the prototypes are pre-


O1 O2 O3 O4 O5 O6C7 A-A G-G G-G A-A E-E A-A

Table 9.2: the values of the alternatives on the performance criterion (U: unac-ceptable, A: acceptable, G: good, VG: very good, E: excellent)

sented in table 9.2. Remember that such a result is an ordinal scale obtained byaggregating the four scales defined as explained in the previous section. Therefore,it could be considered more as a ranking than as an absolute evaluation. For thisreason the team of analysts decided to use such an attribute only to rank thedifferent offers after their sorting obtained by using the six quality attributes. Forthis purpose the team of analysts tested three different aggregation scenarios cor-responding to three different hypotheses about the importance of the performanceattribute.

1. The performance attribute is considered to have the same importance as theset of six quality attributes. This scenario represents the idea that the testson the software performances correspond to the only “real” or “objective”measurement of the offers and it should therefore be viewed as a validation ofthe result obtained through the subjective measurement carried out on thesix quality attributes. The aggregation procedure consisted in using the sixquality attributes as criteria equipped with a weak order from which to obtaina final ranking. Since the evaluations for some of the six attributes were inthe form of an interval, an extended ordinal scale was defined in order to in-duce the weak order: E � V G � G−V G � G � A−V G � A−G � A � U .The importance parameters are w(1.) = 2, w(2.) = 2, w(3.) = 4, w(4.) =1, w(5.) = 4, w(6.) = 2 and the concordance threshold 12/15 (0.8). The sixorders are the following (x,y standing for indifference between x and y):- O5 � O2 � O3 � O4 � O1, O6;- O2 � O5 � O3 � O4 � O6 � O1;- O2 � O4 � O3 � O5, O1, O6;- O2, O4 � O3, O5 � O1, O6;- O2, O5 � O3, O4 � O1, O6;- O3 � O2 � O6, O4 � O5 � O1.The final result is presented in table 9.3. In order to rank the alternativesa “score” is computed for each of them. It is the difference of the numberof alternatives preferred to this specific alternative and the number of alter-natives to which this specific alternative is preferred. Then, the alternativesare ranked by decreasing magnitude of this score. The final ranking thusobtained is given in figure 9.2 2a (it is worthwhile noting that the indiffer-ence obtained in the final ranking corresponds to incomparabilities obtainedin the aggregation step). An intersection was therefore operated with the


2a 2b

O1 O1

O6 O6

O3,O4,O5 O4

O2 O3

O2 O5

? ?

? ?

??

@@

@R

��

�

Figure 9.2: 2a: the final ranking using the six quality criteria. 2b: the final rankingas intersection of the six quality criteria and the performance criterion

ranking obtained on node 7. resulting in a final ranking reported in figure9.2 2b.

2. The performance attribute is considered to be of secondary importance, tobe used in order to distinguish among the alternatives assigned in the sameclass using the six quality attributes. In other words, the principal evalua-tion was to be considered as the one using the six quality attributes and theperformance evaluation was only a supplement enabling an eventual furtherdistinction. Such an approach resulted in a low confidence evaluation beingawarded to the performance and the undesirability of assigning it high im-portance. A lexicographic aggregation has been therefore applied using thesix quality criteria as in the previous scenario and applying the performancecriterion to the equivalence classes of the global ranking. The final rankingis O2 � O5 � O3 � O4 � O6 � O1.

3. A third approach consisted in considering the seven attributes as seven cri-teria to be aggregated to obtain a final ranking assigning them a reasonedimportance parameter. The idea was that while the client could be inter-ested in having the absolute evaluation of the offers (result obtainable onlyusing the six quality attributes) he could also be interested in a ranking ofthe alternatives that could help him in the final choice. From this point of


O1 O2 O3 O4 O5 O6O1 1 0 0 0 0 0O2 1 1 1 1 1 1O3 1 0 1 0 0 1O4 1 0 0 1 0 1O5 1 0 0 0 1 1O6 1 0 0 0 0 1

Table 9.3: the outranking relation aggregating the six quality criteria

O1 O2 O3 O4 O5 O6O1 1 0 0 0 0 0O2 1 1 1 1 0 1O3 1 0 1 0 0 1O4 1 0 0 1 0 1O5 1 0 0 0 1 1O6 1 0 0 0 0 1

Table 9.4: the outranking relation aggregating the seven criteria

view the absolute evaluations on of the six quality attributes were trans-formed into rankings as in the first scenario adding the seventh attribute asa seventh criterion. The seven weak orders are the following:- O5 � O2 � O3 � O4 � O1, O6;- O2 � O5 � O3 � O4 � O6 � O1;- O2 � O4 � O3 � O5, O1, O6;- O2, O4 � O3, O5 � O1, O6;- O2, O5 � O3, O4 � O1, O6;- O3 � O2 � O6, O4 � O5 � O1.- O5 � O2, O3 � O4, O6, O1.The importance parameters are w(1.) = 2, w(2.) = 2, w(3.) = 4, w(4.) =1, w(5.) = 4, w(6.) = 2, w(7.) = 4 and the concordance threshold 16/19(more than 0.8). The final result is reported in table 9.4.

Using the same ranking procedure the final ranking is now: O2 � O5 �O3, O4 � O6 � O1.

Finally and after some discussions with the client, the third scenario wasadopted and used as the final result. The two basic reasons were:- while it was meaningful to interpret the ordinal measures for the six quality at-tributes as weak orders representing the client’s preferences, it was not meaningfulto translate the weak order obtained for the performance attribute as an ordinalmeasurement of the offers;


- the first and second scenarios implicitly adopted two extreme positions concern-ing the importance of the performance attribute that correspond to two different“philosophies” present in the team of analysts, but not to the client’s perception ofthe problem. The importance parameters and the concordance threshold adoptedin the final version made it possible to define a compromise of these two extremepositions expressed during the decision aiding process.

In fact the performance criterion is associated with an importance parameterof 4 which combined with the concordance threshold of 16/19 implies that it isimpossible for an alternative to outrank another if its value on the performancecriterion is worse (and this satisfied the part of the team of analysts that consideredthe performance criterion as a critical evaluation of the offers). Giving a regularimportance parameter to the performance criterion avoided the extreme situationin which all other evaluations could become irrelevant. The final ranking obtainedrespects this idea and the outranking table could be understood by all the membersof the team of analysts. As already reported, the client considered the approachto be useful because “every activity was justified”. A major concern for peopleinvolved in complex decision processes is to be able to justify their behaviour,recommendations and decisions towards a director, a superior in the hierarchy ofthe company, an inspector, a committee etc.. Such a justification applies both tohow a specific result was obtained and to how the whole evaluation was conducted.

In this case, for instance, the choice of the final aggregation was justified bya specific attitude towards the two basic evaluation “points of view”: the qualityinformation and the performance of the prototypes. It was extremely importantfor the client to be able to summarise the correspondence between an aggregationprocedure and an operational attitude because it enabled them to better argueagainst the possible objections of their client.

A final question that arose during the elaboration of the final recommendationwas elaborated was whether it would be possible to provide a numerical represen-tation of the values obtained by the offers and of the final ranking. It was soonclear that the question originated from the will of the final client to be able tonegotiate with the AQ manager on a monetary basis since it was expected that hewould introduce the cost dimension into the final decision.

For this purpose an appendix was included in the final recommendation wherethe following was emphasised:- it is possible to give a numerical representation to both the ordinal measurementobtained using the six quality attributes and to the final ranking obtained usingthe seven criteria, but is was meaningless to use such a numerical representationin order to establish implicit or explicit trade-offs with a cost criterion;- it is possible to compare the result with a cost criterion following two possibleapproaches:1.) either induce an ordinal scale from the cost criterion and then, using anordinal aggregation procedure construct a final choice (then the negotiation shouldconcentrate on defining the importance parameters, the thresholds etc.);2.) or establish a value function of the client using one of the usual protocolsavailable in literature (see also in Chapter 6) to obtain the trade-offs between the


quality evaluations, the performance evaluations and the cost criterion (then thenegotiations should concentrate on a value function);- the team of analysts was also available to conduct this part of the decision aidingprocess if the client desired it.

The final client was very satisfied with the final recommendation and was alsoable to understand the reply about the numerical representation. He neverthelessdecided to conduct the negotiations with the AQ manager personally and so theteam of analysts terminated its task with the delivery of the final recommendation.

A final consideration can be the fact that it is sure that there was space (butno time) to experiment with more variants and methods for the aggregation pro-cedure and the construction of the final recommendation. Valued relations, valuedsimilarity relations, interval comparisons using extended preference structures, dy-namic assignment of alternatives to classes and other innovative techniques wereconsidered too “new” by the client who already considered the use of an approachdifferent from the usual grid and weighted sum a revolution (compared with thecompany’s standards). In their view, the fact of being able to aggregate the ordinalinformation available in a correct and meaningful way was more than satisfactoryas they report in their ex-post remarks: “....pointed out that it was not necessaryto always use ratio scales and weighted sums, as we thought before, but that it waspossible to use judgements and aggregate them....”.

9.4 Conclusions

Concluding this chapter we may try to summarise the lessons learned in this realexperience of decision support.

The most important lesson perhaps concerns the process dimension of decisionsupport. What the client needed was continuous assistance and support duringthe decision process (the management of the call for tenders) enabling them tounderstand their role, the expected results, and the way to provide a useful con-tribution. If the support was limited to answering the client demand on how todefine a global evaluation (based on the weighted sum of their notes on the prod-ucts) we may have provided them with an excellent multi-attribute value modelthat would have been of no interest for their problem. This is not against multi-attribute value based methods, which in other decision aiding processes can beextremely useful, but an emphasis on a process based decision aiding activity.A careful analysis of the problem situation, a consensual problem formulation, acorrect definition of the evaluation model and an understandable and legitimatedfinal recommendation are the products that we have to provide in a decision aidingprocess.

A second lesson learned concerns the “ownership” of the final recommendation.By this we want to indicate the fact that the client will be much more confident inthe result and much more ready to apply it if he feels that he owns the result in thesense that it is a product of his own convictions, values, computations, experience,simulations and whatever else. Such ownership can be achieved if the client notonly participates in elaborating the parameters of the evaluation model, but actu-


ally build the model with the help of the analyst (which has been the case in ourexperience). Although the specific case may be considered exceptional (due to thespecific dimension of the evaluation model and the double role of the client beinganalyst for another client at the same time) we claim that is always possible toinclude the client in the construction of the evaluation model in a way that allowshim to feel responsible and to own the final recommendation. Such “ownership”greatly eases the legitimisation of the recommendation since it is not just the “ad-vice recommended by the experts who do not understand anything”. It might beinteresting to notice that a customised implementation of the model on the toolson which the client is accustomed (as in our case the company spreadsheet) greatlyimproves the acceptance and legitimisation of the evaluation model.

A third lesson concerns the key issue of meaningfulness. The construction ofthe evaluation model must obey two dimensions of meaningfulness. The first isa theoretical and conceptual one and refers to the necessity to manipulate theinformation in a sound and correct way. The second is a practical one and refersto the necessity to manipulate the information in a way understandable by theclient and corresponding to his intuitions and concerns. It is possible that suchtwo dimensions may conflict. However, the evaluation model has to satisfy bothrequirements, thus implying a process of adaptation guided by reciprocal learningfor the client and the analyst. The existence of clear and sound theoretical re-sults for the use of specific preference modelling tools, preference and/or measureaggregation procedures and other modelling tools definitely helps such a process.

A fourth lesson concerns the importance of the distinction between measuresand preferences. The first refer to observations made on the set of alternativeseither through “objective” or through “subjective” measures. The seconds referto the clients values, is always subjective and depends on the problem situation.Moving from one to the other might be possible, but not obvious and has to becarefully studied. Knowing that a software has n function points, while anotherhas m function points does not imply any particular preference between them. Wehope that the case study offered an introduction to this problem.

A fifth lesson concerns the definition of the aggregation procedure in the evalu-ation model. The previous chapters of this book provide enough evidence that uni-versal methods for aggregating preferences and/or measures do not exist. There-fore, the aggregation procedures included in an evaluation model are choices thathave to be carefully studied and justified.

A sixth lesson is about uncertainty. Even when the available information isconsidered reliable, uncertainty may appear (as in our case). Moreover, uncer-tainty can appear in a very qualitative way and not necessarily in the form of anuncertainty distribution. It is necessary to have a large variety of uncertainty rep-resentation tools in order to include the relevant one in the evaluation model. Last,but not least, we emphasise the significant number of open theoretical problemsthe case study highlights (interval evaluation, ordinal measurement, hesitationmodelling, hierarchical measurement, ordinal value theory etc.).


Appendix A

The basic concepts adopted in the procedure used (based on ELECTRE TRI) arethe following.

• A set A of alternatives ai, i = 1 · · ·m.

• A set G of criteria gj , j = 1 · · ·n. A relative importance wj (usually nor-malised in the interval [0, 1]) is attributed to each criterion gj .

• Each criterion gj is equipped with an ordinal scale Ej with degrees elj , l =

1 · · · k.

• A set P of profiles ph, h = 1 · · · t, ph being a collection of degrees, ph =〈eh

1 · · · ehn〉, such that if eh

j belongs to profile ph, eh+1j cannot belong to profile

ph−1.

• A set C of categories cλ, λ = 1 · · · t+ 1, such that the profile ph is the upperbound of category ch and the lower bound of category ch+1.

• An outranking relation S ⊂ (A×P)∪ (P ×A), where s(x, y) should be readas “x is at least as good as y”.

• A set of preference relations 〈Pj , Ij〉 for each criterion gj such that:- ∀x ∈ A Pj(x, eh

j ) ⇔ gj(x) � ehj

- ∀x ∈ A Pj(ehj , x) ⇔ gj(x) ≺ eh

j

- ∀x ∈ A Ij(x, ehj ) ⇔ gj(x) ≈ eh

j

≺,≈ induced by the ordinal scale associated to criterion gj .

The procedure works in two basic steps.

1. Establish the outranking relation on the basis of the following rule:

s(x, y) ⇔ C(x, y) and not D(x, y)

where

∀x ∈ A, y ∈ P : C(x, y) ⇔∑

j∈G±

wj ≥ c and (∑

j∈G+

wj ≥∑

j∈G−

wj)

∀y ∈ A, x ∈ P : C(x, y) ⇔(

∑j∈G±

wj ≥ c and∑

j∈G+

wj ≥∑

j∈G−

wj) or (∑

j∈G+

wj >∑

j∈G−

wj)


∀(x, y) ∈ (A× P) ∪ (P ×A) : not D(x, y) ⇔∑j∈G−

wj ≤ d and ∀gj not vj(x, y)

where- G+ = {gj ∈ G : Pj(x, y)}- G− = {gj ∈ G : Pj(y, x)}- G= = {gj ∈ G : Ij(x, y)}- G± = G+ ∪G=

- c: the concordance threshold c ∈ [0.5, 1]- d: the discordance threshold d ∈ [0, 1]- vj(x, y): veto, expressed on criterion gj , of y on x

2. When the relation S is established, assign any element ai on the basis of thefollowing rules.

2.1 pessimistic assignment- ai is iteratively compared with pt · · · p1,- as soon as s(ai, ph) is established, assign ai to category ch.

2.2 optimistic assignment- ai is iteratively compared with p1 · · · pt,- as soon as is established s(ph, ai)∧¬s(ai, ph) then assign ai to categorych−1.

The pessimistic procedure finds the profile for which the element is not theworst. The optimistic procedure finds the profile against which the elementis surely the worse. If the optimistic and pessimistic assignments coincide,then no uncertainty exists for the assignment. Otherwise, an uncertaintyexists and should be considered by the user.

In order to better understand how the procedure works consider the followingexample.

• Four criteria g1 · · · g4, of equal importance (∀j wj = 1/4), each of themequipped with an ordinal scale A � B � C � D.

• Two profiles p1 = 〈C,C,C,C〉 and p2 = 〈A,B,B,B〉 defining three cate-gories: unacceptable (U), acceptable (A) and good (G) (p2 being the mini-mum profile for category G, p1 being the minimum profile for category A).

• Three alternatives:a1 = 〈D,B,B,B〉, a2 = 〈B,C,A,A〉, a3 = 〈A,B,B,C〉.

• Further on, fix c = 0.75, d = 0.40 and ∀j vj(x, y) ⇔ x = D


With such information it is possible to establish the outranking relation that isS = {(p2, a1), (p2, a2), (p2, a3), (a2, p1), (a3, p1)}. The reader can easily check thatthe pessimistic assignment puts alternative a1 in category U and alternatives a2

and a3 in category A, while the optimistic assignment puts all three alternativesin category A.


Appendix B

The complete list of the attributes used in the evaluation model

1 LAND-BASE MANAGEMENT

1.1 User interface

1.1.1 Graphics type

1.1.2 Graphics engine adequacy

1.1.3 Interface personalisation

1.2 Functionality

1.2.1 Availability

1.2.2 Adequacy

1.2.2.1 Planes analysis functions

1.2.2.2 Topological connectivity functions

1.2.2.3 Graphical rendering functions

1.3 Development environment

1.3.1 Libraries personalisation

1.3.2 Development support tools

1.3.3 Debugging support tools

1.3.4 Code documentation

1.3.4.1 Documentation support tools

1.3.4.2 Code browsing

1.3.5 Documentation Quality

1.3.5.1 Completeness

1.3.5.2 Documentation support type

1.3.5.3 Information retrieval ease

1.3.5.4 Contextual help

1.4 Administration tools

1.4.1 User administration functions

1.4.2 Software configuration management

1.4.3 Performance data collection

1.5 Work flow connection

1.6 Interoperability

1.7 Integration between Land-base products and the Spatial Data Manager

1.7.1 Vectorial data products integration

1.7.2 Descriptive data products integration

1.7.3 Raster data products integration

1.7.4 Digital Terrain Model products integration

1.8 Integration among Land-base products

1.8.1 Interfaces integration

1.8.2 Data sharing


2 GEOMARKETING

2.1 User interface

2.1.1 Graphics type



2.2 Functionality

2.2.1 Availability

2.2.2 Adequacy






2.3.3 Debugging support tools












2.6 Integration between Geomarketing products and the Spatial Data Manager




2.7 Integration among Geomarketing products


2.7.2 Data sharing

3 PLANNING, DESIGN, IMPLEMENTATION AND OPERATING SUPPORT

3.1 User interface

3.1.1 Graphics type



3.2 Functionality


3.2.1 Availability

3.2.2 Adequacy




3.2.2.4 Network schema creation




3.3.3 Debugging support










3.4.1 User administration functions



3.5 Work flow connection


3.7 Integration between this process products and the Spatial Data Manager




3.7.4 Digital Terrain Model products integration

3.8 Integration among this process products


3.8.2 Data sharing

4 DIAGNOSIS SUPPORT AND CUSTOMER CARE

4.1 User interface

4.1.1 Graphics type



4.2 Functionality

4.2.1 Availability


4.2.2 Adequacy




4.2.2.4 Network schema creation




4.3.3 Debugging support













4.6 Integration between this process products and the Spatial Data Manager




4.7 Integration among this process products


4.7.2 Data sharing

5 SPATIAL DATA MANAGER

5.1 Data base properties

5.1.1 Fundamental properties

5.1.2 Transaction typology support

5.1.3 Data / Function association

5.1.4 Client data access libraries

5.2 Basic properties of the Spatial Data Manager

5.2.1 Data model

5.2.2 Data management

5.2.3 Data integration

5.2.4 Spatial operators


5.2.5 Coordinate systems

5.2.6 Vectorial data continuous management

5.3 Special properties of the Spatial Data Manager

5.3.1 Data sharing constraints

5.3.2 Feature versioning

5.3.3 Feature life-cycle management

5.3.4 Data distribution

5.4 Integration between the Spatial Data Manager and the Data Layer

5.4.1 Server data access libraries

5.4.1.1 Public libraries for feature manipulation

5.4.1.2 Structured Query Language to access descriptive data

5.4.2 Independence from features structure

5.4.3 Integration with Oracle

5.4.4 Integration with Unix and MVS relational databases

5.4.5 Integration with Oracle Designer 2000

5.4.6 Logical scheme import capability

5.4.7 Spatial Data Manager platform

5.5 Data administration tools

5.5.1 Database distribution

5.5.2 Database access control

5.5.3 Backup

6 SOFTWARE QUALITY

6.1 Robustness

6.2 Maturity

6.3 Easiness of installation and maintenance

7 PERFORMANCES

7.1 Single transaction under different data volume

7.2 Data Manager under different operation typology

7.3 Data Manager under different concurrent transactions

7.4 Graphical interfaces performances

10CONCLUSION

10.1 Formal methods are all around us

The aim of this book was to provide a critical introduction to a number of “for-mal decision and evaluation methods”. By this, we mean a set of explicit andwell-defined rules to collect, assess and process information in order to make rec-ommendations in decision and/or evaluation processes. Although these methodsmay not be entirely formalised, their underlying logic should be explicit contraryto, say, astrology or graphology. Such methods emanate from many different dis-ciplines (Political Science, Education Science, Statistics, Economics OperationalResearch, Computer Science, Decision Theory, Engineering, etc.) and are used tosupport numerous kinds of decision or evaluation processes. It is not an overstate-ment to say that nowadays nearly everyone is, implicitly or explicitly, confrontedwith such methods.

We briefly summarise below the main methods presented in this book and thedifficulties that have been encountered.

“Following a democratic election Mr. X has been elected”As citizens, we hopefully have to cast several kinds of votes. As mentioned in

chapter 2, elections are governed by “rules” that are very far from being innocuous.Similar votes may well lead to very different results depending on the rules usedto process them. Such “electoral rules” contribute towards shaping the entirepolitical debate in a country and, thus, influence the type of democracy we live in.Therefore, under a slightly different electoral system, Mr. X might not have beenelected.

“Your child has a GPA of 9.54. Therefore we cannot allow him to continuewith this programme”

Our early life at school was governed to a large extent by the grades we ob-tained, the exams we passed or not. It is likely that the present professional lifeof many readers is still governed by some type of formal evaluation method thatsomehow uses “grades” (this is clearly the case for most academics). In chapter 3we saw, that a “grade”, although being a very familiar concept, is in fact a com-plex evaluation model. Not surprisingly, the aggregation of such evaluations is notan obvious task. Therefore, the decision made concerning your child might well

237

238 CHAPTER 10. CONCLUSION

have been significantly different depending on the grading policy and/or correctionhabits of some teachers, the fact that his exams were corrected late at night or onthe way his various grades were aggregated.

“Things are going well since the ‘well-being’ index in our country raised bymore than 10% over the last three years”

Statisticians have elaborated an incredible number of indicators or indices aim-ing at capturing many aspects of reality (including the quality of the air we breeze,the richness of a country, its state of development, etc.) by using numbers. Notonly are our newspapers full of these kinds of figures but they are also routinelyused to make important political or economic decisions. In chapter 4, we saw thatsuch “measures” should not be confounded with the familiar “measurement oper-ations” in Physics. The resulting numbers do not appear to be measured on somewell-defined type of scale. Their properties are sometimes intriguing and theysurely should be manipulated with care. Therefore, claiming that the ‘well-being’index has increased by 10% gives, at best, a very crude indication.

“Calculations show that it is not profitable to equip this hospital with a mater-nity department”

The quality of the roads on which we drive, the tariffing of public transporta-tion, the way our electricity is produced, the safety regulations applied to factoriesnear our homes, the quality of our social security system, etc., depend on partic-ular ways of assessing and summarising the costs and the benefits of alternativeprojects. Cost-benefit analysis evaluates such projects using money as a yardstick.This raises many difficulties outside simple cases: how to convert the variousconsequences of a complex project into monetary units, how to cope with equityconsiderations in the distribution of costs and benefits, how to take the distributionin time of these consequences into account? In chapter 5 we saw that cost-benefitanalysis can hardly claim to always solve all these difficulties in a satisfactorymanner. Therefore, the apparently objective calculations invoked to refuse thecreation of a maternity department in our hospital, are highly dependent on nu-merous debatable hypotheses (e.g. the pricing of a number of statistical “deliveryincidents” due to a longer transportation time for some mothers). It is not unlikelythat other reasonable hypotheses may have led to an opposite decision.

“Based on numerous tests it appears that the ‘best buy’ is car Z”How to take several, generally conflicting, criteria into account when making

a decision ? This area, known as Multiple Criteria Decision Making (MCDM)is the subject of chapter 6. We showed that, in most cases, the analyst has thechoice between several “aggregation strategies” that could lead to different results.Furthermore, apparently familiar concepts, like the “importance” of criteria, areshown to have little (if any) clear meaning outside a well-defined aggregationstrategy. Each of these strategies requires the assessment of more or less richand precise “inter-criteria” information. Since such assessments shape preferenceinformation as much as they collect it, the comparison of these strategies raisesmany problems. Therefore, because each potential buyer has his own preferencesand interests and there are many different and yet reasonable ways to aggregatethem, the very notion of a ‘best buy’ is highly debatable.

10.2. WHAT HAVE WE LEARNED? 239

“Relax, our new camera will choose the ‘optimal focus’ for you”Our washing machines, our cameras, our TV sets often take decisions on their

own, e.g. concerning the amount of water or energy to use, the right focus, the,supposedly “optimal” tuning of channels, the clarity of an image. The “decisionmodules” underlying such automatic decisions were studied in chapter 7. We sawthat they are based on concepts and techniques that are very similar to the onesexamined in chapter 6 and, thus, raise similar problems and questions. Contraryto the situation in chapter 6 however, they are used in real time without humanintervention after the implementation stage. This raises new difficulties and issues.Therefore, relying on the automatic decisions taken by the new camera might notalways be your best option.

“Given what you told me about your preferences and beliefs, you should notinvest in this project in view of its expected utility”

Standard decision analysis techniques (see e.g. Raiffa 1970) are often seen assynonymous with decision support methods in risky and/or uncertain situations.Using a real example in electricity production planning, in chapter 8, we showedwhy the implementation of these standard techniques may not be as straightfor-ward as is often believed. Besides possible computational problems, the assessmentand revision of (subjective) probability distributions in highly ambiguous environ-ments and in situations involving a long period of time, is an enormous task.

Alternative tools, such as possibilities, belief functions, fuzzy sets and otherkinds of non-additive uncertainty measures may appear as good contenders al-though their theoretical basis may be seen as less firm than the one underlyingstandard Bayesian analysis. Furthermore, important considerations, like the dy-namic consistency of choices and the aggregation of consequences over time wereshown to be largely open questions. Therefore, there might be more than oneway to assess preferences and beliefs and to combine them in order to make arecommendation.

Whether we like it or not, it seems difficult nowadays to escape from formaldecision and evaluation methods. We may ignore them. The authors of this bookbelieve that it may be interesting and profitable to give them a closer look. Thereal case-study presented in chapter 9 has shown that their proper use can have asignificant impact on real complex decision or evaluation processes.

10.2 What have we learned?

Although the methods examined in this book are apparently very different andemanate from various disciplines, they appear to have a lot in common. Thisshould not be much of a surprise since these methods have the common objective ofproviding recommendations in complex decision and evaluation processes. Whatmight be slightly more surprising, is that most of these methods and tools areplagued with many difficulties. Let us try to summarise the main findings andproblems encountered in the preceding chapters here.

• Objective and scope of formal decision/evaluation models


– Formal decision and evaluation models are implemented in complexdecision/evaluation processes. Using them rarely amounts to solving awell-defined formal problem. Their usefulness not only depends on theirintrinsic formal qualities but also on the quality of their implementation(structuration of the problem, communication with actors involved inthe process, transparency of the model, etc.). Having a sound theo-retical basis is therefore a necessary but insufficient condition to theirusefulness (see chapter 9).

– The objective of these models may be different from recommending thechoice of a “best” course of action. More complex recommendations,e.g. ranking the possible courses of action or comparing them to stan-dards, are also frequently needed (see chapters 3, 4, 6 and 7). Moreover,the usefulness of such models is not limited to the elaboration of sev-eral types of recommendations. When properly used, they may providesupport at all steps of a decision process (see chapter 9)

• Collecting data

– All models imply collecting and assessing “data” of various types andqualities and manipulating these data in order to derive conclusions thatwill hopefully be useful in a decision or evaluation process. This more orless inevitably implies building “evaluation models” trying to captureaspects of “reality” that are difficult to define with great precision (seechapters 3, 4, 6 and 9).

– The numbers resulting from such “evaluation models” often appear asconstructs that are the result of multiple options. The choice betweenthese various possible options is only partly guided by “scientific con-siderations”. These numbers should not be confounded with numbersresulting from classical measurement operations in Physics. They aremeasured on scales that are difficult to characterise properly. Further-more, they are often plagued with imprecision, ambiguity and/or un-certainty. Therefore, more often than not, these numbers seem, at best,to give an order of magnitude of what is intended to be captured (seechapters 3, 4, 6, 8).

– The properties of the numbers manipulated in such models should beexamined with care; using “numbers” may only be a matter of con-venience and does not imply that any operation can be meaningfullyperformed on them (see chapters 3, 4, 6 and 7).

– The use of evaluation models greatly contributes to shaping and trans-forming the “reality” that we would like to “measure”. Implementing adecision/evaluation model only rarely implies capturing aspects of re-ality that can be considered as independent of the model (see chapters6 and 9).

• Aggregating evaluations

10.2. WHAT HAVE WE LEARNED? 241

– Aggregating the results of complex “evaluation models” is far from be-ing an easy task. Although many aggregation models amount to sum-marising these numbers into a single one, this is not the only possibleaggregation strategy (see chapters 3, 4, 5 and 6).

– The pervasive use of simple tools such as weighted averages may lead todisappointing and/or unwanted results. The use of weighted averagesshould in fact be restricted to rather specific situations that are seldommet in practice.

– Devising an aggregation technique is not an easy task. Apparentlyreasonable principles can lead to a model with poor properties. A formalanalysis of such models may therefore prove of utmost importance (seechapters 2, 4 and 6).

– Aggregation techniques often call for the introduction of “preferenceinformation”. The type of aggregation model that is used greatly con-tributes to shaping this information. Assessment techniques, therefore,not only collect but shape and/or create preference information (seechapter 6).

– Many different tools can be envisaged to model the preferences of anactor in a decision/evaluation process (see chapters 2 and 6).

– Intuitive preference information, e.g. concerning the relative importanceof several points of view, may be difficult to interpret within a well-defined aggregation model (see chapter 6).

• Dealing with imprecision, ambiguity and uncertainty

– In order to allow the analyst to derive convincing recommendations,the model should explicitly deal with imprecision, uncertainty and in-accurate determination. Modelling all these elements into the classicalframework of Decision Theory using probabilities may not always leadto an adequate model. It is not easy to create an alternative frameworkin which problems such as dynamic consistency or respect of (first or-der) stochastic dominance are dealt with in a satisfactory manner (seechapters 6 and 8).

– Deriving robust conclusions on the basis of such aggregation modelsrequires a lot of work and care. The search for robust conclusions mayimply analyses much more complex than simple sensitivity analysesvarying one parameter at a time in order to test the stability of asolution (see chapters 6 and 8).

We saw that the methods reviewed in chapters 2 to 8 are far from being withoutproblems. Indeed these chapters can be seen as a collection of the defects of thesemethods. Some readers may think that, faced with such evidence, this type ofmethod should be abandoned and that “intuition” or “expertise” are not likely todo much worse, at lower cost and with less effort. In our opinion, this would be atotally unwarranted conclusion. It is the firm belief and conviction of the authors


that the use of formal decision and evaluation tools is both inevitable and useful.Three main arguments can be proposed to support this claim.

First, it should not be forgotten that formal tools lend themselves more easilyto criticism and close examination than other kinds of tools. However, whenever“intuition” or “expertise” has been subjected to close scrutiny, it has been more orless always shown that such types of judgements are based on heuristics that arelikely to neglect important aspects of the situation and/or are affected by manybiases (see the syntheses of Kahneman, Slovic and Tversky 1981, Bazerman 1990,Russo and Schoemaker 1989, Hogarth 1987, Poulton 1994, Thaler 1991)

Second, formal methods have a number of advantages that often prove crucialin complex organisational and/or social processes:

• they promote communication between the actors of a decision or evaluationprocess by offering them a common language;

• they require building models of certain aspects of “reality”; this impliesconcentrating efforts on crucial matters. Thus, formal methods are oftenindispensable structuration instruments.

• they lend themselves easily to “what-if” types of questions. These explo-ration capabilities are crucial in order to devise robust recommendations.

Although these advantages may have little weight compared to the obvious draw-backs of formal methods in terms of effort involved, money and time consumedin some situations (e.g. a very simple decision/evaluation process involving a sin-gle actor) they appear to us fundamental to us in most social or organisationalprocesses (see chapter 9).

Third, casual observation suggests that there is an increasing demand for suchtools in various domains (going from executive information systems, decision sup-port systems and expert systems to standardised evaluation tests and impact stud-ies). It is our belief that the introduction of such tools may have quite a beneficialimpact in many areas in which they are not commonly used. Although many com-panies use tools such as graphology and/or astrology in order to select betweenapplicants for a given position, we are more than inclined to say that the use ofmore formal methods could improve such selection processes (let alone on issuessuch as fairness and equity) in a significant way. Similarly, the introduction ofmore formal evaluation tools in the evaluation of public policies, laws and regu-lations (e.g. policy against crime and drugs, policy towards the carrying of guns,fiscal policy, the establishment of environmental standards, etc.), an area in whichthey are strikingly absent in many countries, would surely contribute to a moretransparent and effective government.

We would thus answer a clear and definite yes to the question of whetherformal decision and evaluation tools are useful.

10.3. WHAT CAN BE EXPECTED? 243

10.3 What can be expected?

Our plea for the introduction of more formal decision and evaluation tools mayappear paradoxical in view of the content of this book. Have we been overlycritical then? Certainly not. Our willingness to keep mathematics and formalismto the lowest possible level has not allowed us to explore many technical detailsand difficulties. Indeed, a thorough critical examination of each of the methodscovered in chapters 2 to 8 could be the subject of an entire book.

The paradox between our conviction in the usefulness of formal methods andthe content of this book is only apparent and results from a misunderstanding. Thefact that many decision and evaluation tools are plagued with serious difficulties istroublesome. It should not be unexpected however, unless one believes that thereis a single “best way” to provide support in each type of decision or evaluationprocess. We doubt that this is a reasonable belief. Indeed, the very way in whicha “good” formal decision/evaluation method is defined, is nothing but clear. Twomain, non-exclusive, paths have often been suggested for this purpose. None ofthem appear totally convincing to us.

• the engineering route that amounts to saying that a method is good because“it works”, i.e. has been applied several times in real-world problems andhas been well accepted by the actors in the process. Although we woulddefinitely not favour a method that would be unable to pass such a test, wedoubt that the “engineering” argument is sufficient to define what would dis-tinguish “good” formal decision or evaluation methods. First, it is importantto remember that the “quality” of the support provided by a formal tool isvery difficult to separate from considerations linked to the implementation ofthe method. As should be apparent from of chapter 9, the formal tools usedby an analyst are implemented in decision or evaluation processes that maybe highly complex (involving many different actors, lasting a long time andbeing governed by complex rules and/or regulations). The resulting deci-sion/evaluation aid process is therefore conditioned by many factors outsidethe realm of a formal method: the quality of the structuration of the prob-lem, of communication with stakeholders, the availability of user-friendlysoftwares, the timing and costs of the study, etc. are elements of utmostimportance in the quality of a decision/evaluation aid process. Supportinga decision or an evaluation process should not be confounded with solvinga “well-defined formal problem”. Although it may make sense to associatea “good” method for solving it to such a problem, supporting real decisionand evaluation processes should not be confounded with this formal exer-cise. Second, in practice, it is often difficult to know whether the proposedmodel “worked” or not. Even though the final decision is at variance withthe recommendations derived from the model, the very presence of analysts,the questions they raised, the type of reasoning they have promoted couldhave had a significant impact on the decision process. Should we say thenthat the method has “worked” or not?

A close variant of the engineering route could be called the naive route.


It amounts to saying that a formal tool is adequate if it consistently leadsto “good” decisions. The literature on “decision” (see Raiffa 1970, Russoand Schoemaker 1989, Keeney, Hammond and Raiffa 1999), however, hasalways insisted on the fact that “good decisions do not necessarily lead togood outcomes”. This literature shows that it is very difficult to define whatwould constitute a “good decision” a priori (good in which state of nature ?good for whom ? good according to what criteria ? at what moment in time?, etc.) and that the essential idea is to promote a good “decision process”.

• the rational route which amounts to saying that a method is adequate if itis backed by a sound theory of “rational choice”. Although we find theoriesmost useful, the criteria for separating sound from unsound theories of “ratio-nal choice” do not appear obvious to us. A striking example of this difficultycan be found in the area of decision under risk and uncertainty. While, untilthe beginning of the eighties, expected utility theory was considered almostunanimously as the “rational theory of choice under risk”, the proliferationof alternative theories since then (see e.g. Dubois, Fargier and Prade 1997,Fishburn 1988, Gilboa and Schmeidler 1989, Jaffray 1988, Jaffray 1989, Kah-neman and Tversky 1979, Loomes and Sugden 1982, Machina 1982, Quiggin1982, Schmeidler 1989, Wakker 1989, Yaari 1987) fostered by the resultof numerous empirical experiments (see e.g. Allais 1953, Hershey, Kun-reuther and Schoemaker 1982, Johnson and Schkade 1989, Kahneman andTversky 1979, McCord and de Neufville 1982, McCrimmon and Larsson1979) presently results in a very complex situation in which it is not easyto discriminate between theories both from an empirical (see e.g. Abdellaouiand Munier 1994, Carbone and Hey 1995, Harless and Camerer 1994, Heyand Orme 1994, Sopher and Gigliotti 1993) or a normative point of view(see e.g. Hammond 1988, Machina 1989, McClennen 1990, Nau 1995, Nauand McCardle 1991). This is true even though most, if not all, of thesetheories have been axiomatically characterised (i.e. a set of conditions isknown that completely characterises the proposed choice or evaluation mod-els). Having axioms is certainly useful in order to compare theories but the“rational” content of the axioms and their interpretation remain much de-bated. Furthermore, the relation between the formal axiomatic theory andthe assessment technologies derived from it are far from being obvious (seee.g. Bouyssou 1984).

Analysts implementing formal decision and evaluation tools are in a position sim-ilar to that of an engineer. Contrary to most engineers, however, these “decisionengineers” often lack clear criteria for appreciating the “success” or “failure” oftheir models.

At this point it should be apparent that research on formal decision and evalu-ation methods should not be guided by the hope of discovering models that wouldbe ideal under certain types of circumstances. Can something be done then? Inview of the many difficulties encountered with the models envisaged in this bookand the many fields in which no formal decision and evaluation tools are used, wedo think that this area will be rich and fertile for future research.

10.3. WHAT CAN BE EXPECTED? 245

Freed from the idea that we will discover the method, we can, more modestlyand more realistically, expect to move towards:

• structuring tools that will facilitate the implementation of formal decisionand evaluation models in complex and conflictual decision processes;

• flexible preference models able to cope with data of poor or unknown quality,conflicting or lacking information;

• assessment protocols and technologies able to cope with complex and unsta-ble preferences, uncertain trade-offs, hesitation and learning;

• tools for comparing aggregation models in order to know what they havein common and whether one is likely to be more appropriate in view of thequality of the data?

• tools for defining and deriving “robust” conclusions.

To summarise, the future as we see it: structuration methodologies allowing for anexplicit involvement and participation of all stakeholders, flexible preference mod-els tolerating hesitations and contradictions, flexible tools for modelling impreci-sion and uncertainty, evaluation models fully taking incommensurable dimensionsinto account in a meaningful way, assessments technologies incorporating fram-ing effects and learning processes, exploration techniques allowing to build robustrecommendations (see Bouyssou et al. 1993). Thus, “thanks to rigourous con-cepts, well-formulated models, precise calculations and axiomatic considerations,we should be able to clarify decisions by separating what is objective from whatis less objective, by separating strong conclusions from weaker ones, by dissipat-ing certain forms of misunderstanding in communication, by avoiding the trapof illusory reasoning, by bringing out certain counter-intuitive results” (Roy andBouyssou 1991).

This “utopia” calls for a vast research programme requiring many differenttypes of research (axiomatic analyses of models, experimental studies of models,clinical analyses of decision/evaluation processes, conceptual reflections on thenotions of “rationality” and “performance”, production of new pieces of software,etc.).

The authors are preparing another book that will hopefully contribute to thisresearch programme. It will cover the main topics that we believe to be useful inorder to successfully implement formal decision/evaluation models in real-worldprocesses :

• structuration methods and concepts,

• preference modelling tools,

• uncertainty and imprecision modelling tools,

• aggregation models,

• tools for deriving robust recommendations.


If we managed to convince you that formal decision and evaluation models are animportant topic and that the hope of discovering “ideal” methods is somewhatchimerical, it is not unlikely that you will find the next book valuable.

Bibliography

[1] Abbas, M., Pirlot, M. and Vincke, Ph. (1996). Preference structures and co-comparability graphs, Journal of Multicriteria Decision Analysis 5: 81–98.

[2] Abdellaoui, M. and Munier, B. (1994). The ‘closing in’ method: An ex-perimental tool to investigate individual choice patterns under risk, inB. Munier and M.J. Machina (eds), Models and experiments in risk andrationality, Kluwer, Dordrecht, pp. 141–155.

[3] Adler, H.A. (1987). Economic appraisal of transport projects: A manual withcase studies, Johns Hopkins University Press for the World Bank, Balti-more.

[4] Airaisian, P.W. (1991). Classroom assessment, McGraw-Hill, New York.

[5] Allais, M. and Hagen, O. (eds) (1979). Expected utility hypotheses and theAllais paradox, D. Reidel, Dordrecht.

[6] Allais, M. (1953). Le comportement de l’homme rationnel devant le risque :Critique des postulats et axiomes de l’ecole americaine, Econometrica21: 503–46.

[7] Armstrong, W.E. (1939). The determinateness of the utility function, TheEconomic Journal 49: 453–467.

[8] Arrow, K.J. and Raynaud, H. (1986). Social choice and multicriteriondecision-making, MIT Press, Cambridge.

[9] Arrow, K.J. (1963). Social choice and individual values, 2nd edn, Wiley, NewYork.

[10] Atkinson, A.B. (1970). On the measurement of inequality, Journal of Eco-nomic Theory 2: 244–263.

[11] Baldwin, J.F. (1979). A new approach to approximate reasoning using a fuzzylogic, Fuzzy Sets and Systems 2: 309–325.

[12] Balinski, M.L. and Young, H.P. (1982). Fair representation, Yale UniversityPress, New Haven.

[13] Bana e Costa, C.A., Ensslin, L., Correa, E.C. and Vansnick, J.-C. (1999).Decision support systems in action: Integrated application in a multi-criteria decision aid process, European Journal of Operational Research113: 315–335.

[14] Barbera, S., Hammond, P. and Seidl, C. (eds) (1998). Handbook of utilitytheory, Vol. 1: Principles, Kluwer, Dordrecht.

247

248 BIBLIOGRAPHY

[15] Bartels, R. H.., Beatty, J. C.. and Barsky, B.H.. (1987). An introductionto Spline for use in computer graphics and geometric Modeling, MorganKaufmann, Los Altos.

[16] Barzilai, J., Cook, W.D. and Golany, B. (1987). Consistent weights for judg-ments matrices of the relative importance of alternatives, Operations Re-search Letters 6: 131–134.

[17] Bazerman, M.H. (1990). Judgment in managerial decision making, Wiley,New York.

[18] Bell, D., Raiffa, H. and Tversky, A. (eds) (1988). Decision making: Descrip-tive, normative and prescriptive interactions, Cambridge University Press,Cambridge.

[19] Belton, V., Ackermann, F. and Shepherd, I. (1997). Integrated supportfrom problem structuring through alternative evaluation using COPE andV•I•S•A, Journal of Multi-Criteria Decision Analysis 6: 115–130.

[20] Belton, V. and Gear, A.E. (1983). On a shortcoming of Saaty’s analytic hi-erarchies, Omega 11: 228–230.

[21] Belton, V. (1986). A comparison of the analytic hierarchy process and a simplemulti-attribute value function, European Journal of Operational Research26: 7–21.

[22] Bereau, M. and Dubuisson, B. (1991). A fuzzy extended k-nearest neighborrule, Fuzzy Sets and Systems 44: 17–32.

[23] Bernoulli, D. (1954). Specimen theoriæ novæ de mensura sortis, Commen-tarii Academiæ Scientiarum Imperialis Petropolitanæ (5, 175–192, 1738),Econometrica 22: 23–36. Translated by L. Sommer.

[24] Bezdek, J., Chuah, S.K. and Leep, D. (1986). Generalised k-nearest neighborrules, Fuzzy Sets and Systems 18: 237–256.

[25] Blin, M.-J. and Tsoukias, A. (1998). Multicriteria methodology contributionto the software quality evaluation, Technical report, Cahier du LAMSADENo 155, Universite Paris-Dauphine, Paris.

[26] Boardman, A. (1996). Cost benefit analysis: Concepts and practices, Prentice-Hall, New-York.

[27] Boiteux, M. (1994). Transports : Pour un meilleur choix des investissements,La Documentation Francaise, Paris.

[28] Bonboir, A. (1972). La docimologie, PUF, Paris.

[29] Borda, J.-Ch. (1781). Memoire sur les elections au scrutin, Comptes Rendusde l’Academie des Sciences. Translated by Alfred de Grazia as “Mathe-matical derivation of an election system”, Isis, Vol. 44, pp. 42–51.

[30] Bouchon, B. (1995). La logique floue et ses applications, Addison Wesley, NewYork.

BIBLIOGRAPHY 249

[31] Bouchon-Meunier, B. and Marsala, C. (1999). Learning fuzzy decision rules,in D. D. J. Bezdek and H. Prade (eds), Fuzzy sets in approximate reason-ing and information systems, Vol. 3 of Handbook of Fuzzy Sets, Kluwer,Dordrecht, chapter 4, pp. 279–304.

[32] Bouyssou, D., Perny, P., Pirlot, M., Tsoukias, A. and Vincke, Ph. (1993).A manifesto for the new MCDM era, Journal of Multi-Criteria DecisionAnalysis 2: 125–127.

[33] Bouyssou, D. and Perny, P. (1992). Ranking methods for valued preferencerelations: A characterization of a method based on entering and leavingflows, European Journal of Operational Research 61: 186–194.

[34] Bouyssou, D. and Pirlot, M. (1997). Choosing and ranking on the basis offuzzy preference relations with the ‘Min in Favor’, in G. Fandel and T. Gal(eds), Multiple criteria decision making – Proceedings of the twelfth inter-national conference, Hagen, Germany, Springer Verlag, Berlin, pp. 115–127.

[35] Bouyssou, D. and Vansnick, J.-C. (1986). Noncompensatory and generalizednoncompensatory preference structures, Theory and Decision 21: 251–266.

[36] Bouyssou, D. (1984). Decision-aid and expected utility theory: A criticalsurvey, in O. Hagen and F. Wenstøp (eds), Progress in utility and risktheory, Kluwer, Dordrecht, pp. 181–216.

[37] Bouyssou, D. (1986). Some remarks on the notion of compensation in MCDM,European Journal of Operational Research 26: 150–160.

[38] Bouyssou, D. (1990). Building criteria: A prerequisite for MCDA, inC.A. Bana e Costa (ed.), Readings in multiple criteria decision aid,Springer Verlag, Berlin, pp. 58–80.

[39] Bouyssou, D. (1992). On some properties of outranking relations based ona concordance-discordance principle, in A. Goicoechea, L. Duckstein andS. Zionts (eds), Multiple criteria decision making, Springer-Verlag, Berlin,pp. 93–106.

[40] Bouyssou, D. (1996). Outranking relations: Do they have special properties?,Journal of Multi-Criteria Decision Analysis 5: 99–111.

[41] Brams, S.J. and Fishburn, P.C. (1982). Approval voting, Birkhauser, Basel.

[42] Brans, J.-P. and Vincke, Ph. (1985). A preference ranking organizationmethod, Management Science 31: 647–656.

[43] Brekke, K.A. (1997). The numeraire matters in cost-benefit analysis, Journalof Public Economics 64: 117–123.

[44] Brent, R.J. (1984). Use of distributional weights in cost-benefit analysis: Asurvey of schools, Public Finance Quarterly 12: 213–230.

[45] Brent, R.J. (1996). Applied cost-benefit analysis, Elgar, Adelshot Hants.

[46] Broome, J. (1985). The economic value of life, Economica 52: 281–294.

250 BIBLIOGRAPHY

[47] Carbone, E. and Hey, J.D. (1995). A comparison of the estimates of expectedutility and non-expected utility preference functionals, Geneva Papers onRisk and Insurance Theory 20: 111–133.

[48] Cardinet, J. (1986). Evaluation scolaire et mesure, De Boeck, Brussels.

[49] Chatel, E. (1994). Qu’est-ce qu’une note : recherche sur la pluralite desmodes d’education et d’evaluation, Les Dossiers d’Education et Forma-tions 47: 183–203.

[50] Checkland, P. (1981). Systems thinking, systems practice, Wiley, New York.

[51] Condorcet, M.J.A.N.C., marquis de. (1785). Essai sur l’application del’analyse a la probabilite des decisions rendues a la pluralite des voix, Im-primerie Royale, Paris.

[52] Cover, T. M. and Hart, P. E. (1967). Nearest neighbor pattern classification,IEEE, Transactions on Information Theory, IT-13 1: 21–27.

[53] Cross, L.H. (1995). Grading students, Technical Report Series EDO-TM-95-5,ERIC/AE Digest.

[54] Daellenbach, H.G. (1994). Systems and decision making. A management sci-ence approach, Wiley, New York.

[55] Dasgupta, P.S., Marglin, S. and Sen, A.K. (1972). Guidelines for project eval-uation, UNIDO, New York.

[56] Dasgupta, P.S. and Pearce, D.W. (1972). Cost-benefit analysis: Theory andpractice, Macmillan, Basingstoke.

[57] Davis, B.G. (1993). Tools for teaching, Jossey-Bass, San Francisco.

[58] de Jongh, A. (1992). Theorie du mesurage, agregation des criteres et appli-cation au decathlon, Master’s thesis, SMG, Universite Libre de Bruxelles,Brussels.

[59] Dekel, E. (1986). An axiomatic characterization of preference under uncer-tainty: Weakening the independence axiom, Journal of Economic Theory40: 304–318.

[60] Desrosieres, A. (1995). Refleter ou instituer : L’invention des indicateursstatistiques, Technical Report 129/J310, INSEE, Paris.

[61] de Ketele, J.-M. (1982). La docimologie, Cabay, Louvain-La-Neuve.

[62] de Landsheere, G. (1980). Evaluation continue et examens. Precis de doci-mologie, Labor-Nathan, Paris.

[63] Dinwiddy, C. and Teal, F. (1996). Principles of cost-benefit analysis for de-veloping countries, Cambridge University Press, Cambridge.

[64] Dorfman, R. (1996). Why benefit-cost analysis is widely disregarded and whatto do about it?, Interfaces 26: 1–6.

[65] Dreze, J. and Stern, N. (1987). The theory of cost-benefit analysis, inA.J. Auebach and M. Feldstein (eds), Handbook of public economics, El-sevier, Amsterdam, pp. 909–989.

BIBLIOGRAPHY 251

[66] Dubois, D., Fargier, H. and Prade, H. (1997). Decision-making under ordinalpreferences and uncertainty, in D. Geiger and P.P. Shenoy (eds), Proceed-ings of the 13th conference on uncertainty in artificial intelligence, MorganKaufmann, Los Altos, pp. 157–164.

[67] Dubois, D., Prade, H. and Sabbadin, R. (1998). Qualitative decision theorywith Sugeno integrals, Proceedings of the 14th conference on uncertaintyin artificial intelligence, Morgan Kaufmann, Los Altos, pp. 121–128.

[68] Dubois, D., Prade, H. and Ughetto, L. (1999). Fuzzy logic, control engi-neering and artificial intelligence, in H.B. Verbruggen, H.J. Zimmermannand R. Babuska (eds), Fuzzy algorithms for control, Kluwer, Dordrecht,pp. 17–58.

[69] Dubois, D. and Prade, H. (1987). The mean value of a fuzzy number, FuzzySets and Systems 24: 279–300.

[70] Dubois, D. and Prade, H. (1988). Possibility theory, Plenum Press, New-York.

[71] Dupuit, J. (1844). De la mesure de l’utilite des travaux publics, Annales desPonts et Chaussees (8).

[72] Dyer, J.S. (1990). Remarks on the analytic hierarchy process, ManagementScience 36: 249–258.

[73] Ebel, R.L. and Frisbie, D.A. (1991). Essentials of educational measurement,Prentice-Hall, New-York.

[74] Ellsberg, D. (1961). Risk, ambiguity and the Savage axioms, Quarterly Jour-nal of Economics 75: 643–669.

[75] Fargier, H. and Perny, P. (1999). Qualitative decision models under uncer-tainty without the commensurability hypothesis, in K.B.. Laskey andH. Prade (eds), Proceedings of the 15th conference on uncertainty in ar-tificial intelligence, Morgan Kaufmann, Los Altos, pp. 188–195.

[76] Farrell, D.M. (1997). Comparing electoral systems, Contemporary PoliticalStudies, Prentice-Hall, New-York.

[77] Fiammengo, A., Buosi, D., Iob, I., Maffioli, P., Panarotto, G. and Turino, M.(1997). Bid management of software acquisition for cartography applica-tions. Presented at AIRO ’97 Conference, Aosta.

[78] Fishburn, P.C. and Sarin, R.K. (1991). Dispersive equity and social risk,Management Science 37: 751–769.

[79] Fishburn, P.C. and Sarin, R.K. (1994). Fairness and social risk I: Unaggre-gated analyses, Management Science 40: 1174–1188.

[80] Fishburn, P.C. and Straffin, P.D. (1989). Equity considerations in public risksevaluation, Operations Research 37: 229–239.

[81] Fishburn, P.C. (1970). Utility theory for decision-making, Wiley, New York.

[82] Fishburn, P.C. (1976). Noncompensatory preferences, Synthese 33: 393–403.

[83] Fishburn, P.C. (1977). Condorcet social choice functions, SIAM Journal onApplied Mathematics 33: 469–489.

252 BIBLIOGRAPHY

[84] Fishburn, P.C. (1978). A survey of multiattribute/multicriteria evaluationtheories, in S. Zionts (ed.), Multicriteria problem solving, Springer Verlag,Berlin, pp. 181–224.

[85] Fishburn, P.C. (1982). The foundations of expected utility, D. Reidel, Dor-drecht.

[86] Fishburn, P.C. (1984). Equity axioms for public risks, Operations Research32: 901–908.

[87] Fishburn, P.C. (1988a). Nonlinear preference and utility theory, Johns Hop-kins University Press, Baltimore.

[88] Fishburn, P.C. (1988b). Normative theories of decision making under riskand under uncertainty, in M. Kacprzyk and M. Roubens (eds), Non-conventional preference relations in decision making, Springer Verlag,Berlin, pp. 469–489.

[89] Fishburn, P.C. (1991). Nontransitive preferences in decision theory, Journalof Risk and Uncertainty 4: 113–134.

[90] Fix, E. and Hodges, J.L. (1951). Discriminatory analysis, non-parametricdiscrimination: consistency properties, Technical report, USAF Scholl ofaviation and medicine, Randolph Field. 4.

[91] Fodor, J.C. and Roubens, M. (1994). Fuzzy preference modelling and multi-criteria decision support, Kluwer, Dordrecht.

[92] Folland, S., Goodman, A.C. and Stano, M. (1997). The economics of healthand health care, Prentice-Hall, New-York.

[93] French, S. (1981). Measurement theory and examinations, British Journal ofMathematical and Statistical Psychology 34: 38–49.

[94] French, S. (1993). Decision theory – An introduction to the mathematics ofrationality, Ellis Horwood, London.

[95] Gacogne, L. (1997). Elements de logique floue, Hermes, Paris.

[96] Gafni, A. and Birch, S. (1997). Equity considerations in utility-based mea-sures of health outcomes in economic appraisals: An adjustment algo-rithm, Journal of Health Economics 10: 329–342.

[97] Gehrlein, W.V. (1983). Condorcet’s paradox, Theory and Decision 15: 161–197.

[98] Gibbard, A. (1973). Manipulation of voting schemes: A general result, Econo-metrica 41: 587–601.

[99] Gilboa, I. and Schmeidler, D. (1989). Maxmin expected utility with a non-unique prior, Journal of Mathematical Economics 18: 141–153.

[100] Gilboa, I. and Schmeidler, D. (1993). Updating ambigous beliefs, Journal ofEconomic Theory 59: 33–49.

[101] Grabisch, M., Guely, F. and Perny, P. (1997). Evaluation subjective, Lescahiers du Club CRIN - Association ECRIN, Paris.

BIBLIOGRAPHY 253

[102] Grabisch, M. (1996). The application of fuzzy integrals to multicriteria de-cision making, European Journal of Operational Research 89: 445–456.

[103] Hammond, P.J. (1988). Consequentialist foundations for expected utility,Theory and Decision 25: 25–78.

[104] Hanley, N. and Spash, C.L. (1993). Cost-benefit analysis and the environ-ment, Elgar, Adelshot Hants.

[105] Harker, P.T. and Vargas, L.G. (1987). The theory of ratio scale estimation:Saaty’s analytic hierarchy process, Management Science 33: 1383–1403.

[106] Harless, D. and Camerer, C.F. (1994). The utility of generalized expectedutility theories, Econometrica 62: 1251–1289.

[107] Harvey, C.M. (1992). A slow-discounting model for energy conservation, In-terfaces 22: 47–60.

[108] Harvey, C.M. (1994). The reasonableness of non-constant discounting, Jour-nal of Public Economics 53: 31–51.

[109] Harvey, C.M. (1995). Proportional discounting of future costs and benefits,Mathematics of Operations Research 20: 381–399.

[110] Henriet, L. and Perny, P. (1996). Methodes multicrit res non-compensatoirespour la classification floue d’objets, Proceedings of LFA’96, pp. 9–15.

[111] Henriet, L. (1995). Probl mes d’affectation et methodes de classification,Memoire du dea 103, Universite Paris Dauphine.

[112] Hershey, J.C., Kunreuther, H.C. and Schoemaker, P.J.H. (1982). Sources ofbias in assessment procedures for utility functions, Management Science28: 936–953.

[113] Heurgon, E. (1982). Relationships between decision making process andstudy process in OR interventions, European Journal of Operational Re-search 10: 230–236.

[114] Hey, J.D. and Orme, C. (1994). Investigating generalizations of expectedutility theory using experimental data, Econometrica 62: 1251–1289.

[115] Hogarth, R. (1987). Judgement and choice: The psychology of decision, Wi-ley, New York.

[116] Holland, A. (1995). The assumptions of cost-benefit analysis: A philoso-pher’s view, in K.G. Willis and J.T. Corkindale (eds), Environmentalvaluation: New perspectives, CAB International, Oxford, pp. 21–38.

[117] Horn, R.V. (1993). Statistical indicators, Cambridge University Press, Cam-bridge.

[118] Humphreys, P.C., Svenson, O. and Vari, A. (1993). Analysis and aidingdecision processes, North-Holland, Amsterdam.

[119] IEEE 92 (1992). Standard for a software quality metrics methodology, Tech-nical report, The Institute of Electrical and Electronics Engineers.

[120] International Atomic Energy Agency (1993). Cost-benefit aspects of food ir-radiation processing, Bernan Associates, Washington D.C.

254 BIBLIOGRAPHY

[121] ISO/IEC 9126 (1991). Information technology – Software product evalua-tion, quality characteristics and guidelines for their use, Technical report,ISO, Geneve.

[122] Jacquet-Lagreze, E., Moscarola, J., Roy, B. and Hirsch, G. (1978). Descrip-tion d’un processus de decision, Technical report, Cahier du LAMSADENo 13, Universite Paris-Dauphine, Paris.

[123] Jacquet-Lagreze, E. and Siskos, J. (1982). Assessing a set of additive utilityfunctions for multicriteria decision making: The UTA method, EuropeanJournal of Operational Research 10: 151–164.

[124] Jacquet-Lagreze, E. (1990). Interactive assessment of preferences using holis-tic judgments. The PREFCALC system, in C.A. Bana e Costa (ed.), Read-ings in multiple criteria decision aid, Springer Verlag, Berlin, pp. 335–350.

[125] Jaffray, J.-Y. (1988). Choice under risk and the security factor: An axiomaticmodel, Theory and Decision 24: 169–200.

[126] Jaffray, J.-Y. (1989a). Some experimental findings on decision making un-der risk and their implications, European Journal of Operational Research38: 301–306.

[127] Jaffray, J.-Y. (1989b). Utility theory for belief functions, Operations ResearchLetters 8: 107–112.

[128] Johannesson, M. (1995a). A note on the depreciation of the societal perspec-tive in economic evaluation in health care, Health Policy 33: 59–66.

[129] Johannesson, M. (1995b). The relationship between cost-effectiveness anal-ysis and cost-benefit analysis, Social Science and Medicine 41: 483–489.

[130] Johannesson, M. (1996). Theory and methods of economic evaluation ofhealth care, Kluwer, Dordrecht.

[131] Johansson, P.O. (1993). Cost-benefit analysis of environmental change, Cam-bridge University Press, Cambridge.

[132] Johnson, E.J. and Schkade, D.A. (1989). Bias in utility assesments: Furtherevidence and explanations, Management Science 35: 406–424.

[133] Kahneman, D., Slovic, P. and Tversky, A. (1981). Judgement under uncer-tainty – Heuristics and biases, Cambridge University Press, Cambridge.

[134] Kahneman, D. and Tversky, A. (1979). Prospect theory: An analysis ofdecision under risk, Econometrica 47: 263–291.

[135] Keeler, E.B. and Cretin, S. (1983). Discounting of life-saving and other non-monetary effects, Management Science 29: 300–306.

[136] Keeney, R.L., Hammond, J.S. and Raiffa, H. (1999). Smart choices: A guideto making better decisions, Harvard University Press, Boston.

[137] Keeney, R.L. and Raiffa, H. (1976). Decisions with multiple objectives: Pref-erences and value tradeoffs, Wiley, New York.

BIBLIOGRAPHY 255

[138] Keller, J., Gray, M. and Givens, J. (1985). A fuzzy k−nearest neighboralgorithm, IEEE Transactions on Systems Man and Cybernetics. 15: 580–585.

[139] Kelly, J.S. (1991). Social choice bibliography, Social Choice and Welfare8: 97–169.

[140] Kerlinger, F.N. (1986). Foundations of behavioral research, 3rd edn, Holt,Rinehart and Winston, New York.

[141] Kirkpatrick, C. and Weiss, J. (1996). Cost-benefit analysis and project ap-praisal in developing countries, Elgar, Adelshot Hants.

[142] Kohli, K.N. (1993). Economic analysis of investment projects: A practicalapproach, Oxford University Press for the Asian Development Bank, Ox-ford.

[143] Krantz, D.H., Luce, R.D., Suppes, P. and Tversky, A. (1971). Foundations ofmeasurement, Vol. 1: Additive and polynomial representations, AcademicPress, New York.

[144] Krutilla, J.V. and Eckstein, O. (1958). Multiple purpose river development,Johns Hopkins University Press, Baltimore.

[145] Laska, J.A. and Juarez, T. (1992). Grading and marking in Americanschools: Two centuries of debate, Charles C. Thomas, Springfield.

[146] Laslett, R. (1995). The assumptions of cost-benefit analysis, in K.G. Willisand J.T. Corkindale (eds), Environmental valuation: New perspectives,CAB International, Oxford, pp. 5–20.

[147] Lesourne, J. (1975). Cost-benefit analysis and economic theory, North-Holland, Amsterdam.

[148] Lindheim, E., Morris, L.L. and Fitz-Gibbon, C.T. (1987). How to measureperformance and use tests, Sage Publications, Thousand Oaks.

[149] Little, I.M.D. and Mirlees, J.A. (1968). Manual of industrial project analysisin developing countries, O.E.C.D, Paris.

[150] Little, I.M.D. and Mirlees, J.A. (1974). Project appraisal and planning fordeveloping countries, Basic books, New York.

[151] Loomes, G. and Sugden, R. (1982). Regret theory: An alternative theory ofrational choice under uncertainty, Economic Journal 92: 805–824.

[152] Loomes, G. (1988). Different experimental procedures for obtaining valua-tions of risky actions: Implications for utility theory, Theory and Decision25: 1–23.

[153] Loomis, J., Peterson, G., Champ, P., Brown, T. and Lucero, B. (1998).Paired comparisons estimates of willingness to accept and contingent val-uation estimates of willingness to pay, Journal of Economic Behavior andOrganisation 35: 501–515.

[154] Luce, R.D., Krantz, D.H., Suppes, P. and Tversky, A. (1990). Foundationsof measurement, Vol. 3: Representation, axiomatisation and invariance,Academic Press, New York.

256 BIBLIOGRAPHY

[155] Luce, R.D. and Raiffa, H. (1957). Games and Decisions, Wiley, New York.

[156] Luce, R.D. (1956). Semiorders and a theory of utility discrimination, Econo-metrica 24: 178–191.

[157] Lysne, A. (1984). Grading of student’s attainement: Purposes and functions,Scandinavian Journal of Educational Research 28: 149–165.

[158] Machina, M.J. (1982). Expected utility without the independence axiom,Econometrica 50: 277–323.

[159] Machina, M.J. (1989). Dynamic consistency and non-expected utility modelsof choice under uncertainty, Journal of Economic Literature 27: 1622–1688.

[160] Mamdani, E. H.. (1981). Gaines fuzzy reasonning and its applications, Aca-demic Press, New York.

[161] Marchant, Th. (1996). Valued relations aggregation with the Borda method,Journal of Multi-Criteria Decision Analysis 5: 127–132.

[162] Masser, I. (1983). The representation of urban planning-processes: An ex-ploratory review, Environment and Planning B 10: 47–62.

[163] May, K.O. (1952). A set of independent necessary and sufficient conditionsfor simple majority decisions, Econometrica 20: 680–684.

[164] McClennen, E.F. (1990). Rationality and dynamic choice: Foundational ex-plorations, Cambridge University Press, Cambridge.

[165] McCord, M. and de Neufville, R. (1983). Fundamental deficiency of expectedutility analysis, in S. French, R. Hartley, L.C. Thomas and D.J. White(eds), Multiobjective decision making, Academic Press, London, pp. 279–305.

[166] McCord, M. and de Neufville, R. (1982). Empirical demonstration thatexpected utility decision analysis is not operational, in B. Stigum andF. Wenstøp (eds), Foundations of utility and risk theory, D. Reidel, Dor-drecht, pp. 181–199.

[167] McCrimmon, K.R. and Larsson, S. (1979). Utility theory: Axioms versusparadoxes, in M. Allais and O. Hagen (eds), Expected utility hypothesesand the Allais paradox, D. Reidel, pp. 27–145.

[168] McLean, J.E. and Lockwood, R.E. (1996). Why and how should we assessstudents? The competing measures of student performance, Sage Publica-tions, Thousand Oaks.

[169] Merle, P. (1996). L’evaluation des eleves. Enquete sur le jugement professo-ral, PUF, Paris.

[170] Mintzberg, H., Raisinghani, D. and Theoret, A. (1976). The structure of un-structured decision processes, Administrative Science Quarterly 21: 246–272.

[171] Mishan, E. (1982). Cost-benefit analysis, Allen and Unwin, London.

BIBLIOGRAPHY 257

[172] Moom, T.M. (1997). How do you know they know what they know? A hand-book of helps for grading and evaluating student progress, Grove Publish-ing, Westminster.

[173] Morisio, M. and Tsoukias, A. (1997). IUSWARE: A formal methodology forsoftware evaluation and selection, IEE Proceedings on Software Engineer-ing 144: 162–174.

[174] Moscarola, J. (1984). Organizational decision processes and ORASA inter-vention, in R. Tomlinson and I. Kiss (eds), Rethinking the process of oper-ational research and systems analysis, Pergamon Press, Oxford, pp. 169–186.

[175] Mousseau, V. (1993). Problemes lies a l’evaluation de l’importance en aidemulticritere a la decision : Reflexions theoriques et experimentations,PhD thesis, LAMSADE, Universite Paris-Dauphine, Paris.

[176] Munier, B. (1989). New models of decisions under uncertainty, EuropeanJournal of Operational Research 38: 307–317.

[177] Nas, T.F. (1996). Cost-benefit analysis: Theory and application, Sage Pub-lications, Thousand Oaks.

[178] Nauck, D. and Kruse, R. (1999). Neuro-fuzzy methods in fuzzy rule gener-ation, in D. D. J. Bezdek and H. Prade (eds), Fuzzy sets in approximatereasoning and information systems, Vol. 3 of Handbook of Fuzzy Sets,Kluwer, Dordrecht, chapter 5, pp. 305–333.

[179] Nau, R.F. and McCardle, K.F. (1991). Arbitrage, rationality and equilib-rium, Theory and Decision 31: 199–240.

[180] Nau, R.F. (1995). Coherent decision analysis with inseparable probabilitiesand utilities, Journal of Risk and Uncertainty 10: 71–91.

[181] Nguyen, H.T. and Sugeno, M. (1998). Modelling and control, Kluwer, Dor-drecht.

[182] Nims, J.F. (1990). Poems in translation: Sappho to Valery, The Universityof Arkansas Press, Arkansas.

[183] Noizet, G. and Caverini, J.-P. (1978). La psychologie de l’evaluation scolaire,PUF, Paris.

[184] Nurmi, H. (1987). Comparing voting systems, D. Reidel, Dordrecht.

[185] Nutt, P.C. (1984). Types of organizational decision processes, AdministrativeScience Quarterly 19: 414–450.

[186] Nyborg, K. (1998). Some Norwegian politician’s use of cost-benefit analysis,Public Choice 95: 381–401.

[187] Ostanello, A. and Tsoukias, A. (1993). An explicative model of ‘public’ in-terorganizational interactions, European Journal of Operational Research70: 67–82.

[188] Ostanello, A. (1990). Action evaluation and action structuring – Differentdecision aid situations reviewed through two actual cases, in C.A. Bana

258 BIBLIOGRAPHY

e Costa (ed.), Readings in multiple criteria decision aid, Springer Verlag,Berlin, pp. 36–57.

[189] Ostanello, A. (1997). Validation aspects of a prototype solution implemen-tation to solve a complex MC problem, in J. Clımaco (ed.), Multi-criteriaanalysis, Springer Verlag, Berlin, pp. 61–74.

[190] Ott, W.R. (1978). Environmental indices: Theory and practice, Ann ArborScience, Ann Arbor.

[191] Paschetta, E. and Tsoukias, A. (1999). A real world MCDA application:Evaluating software, Technical report, Document du LAMSADE No 113,Universite Paris-Dauphine, Paris.

[192] Perny, P. and Pomerol, J.-Ch.. (1999). Use of artificial intelligence multi-criteria decision making, in T. Gal, Th.J. Stewart and Th. Hanne (eds),Advances in MCDM models, algorithms, theory, and applications, Kluwer,Dordrecht, pp. 15.1–15.43.

[193] Perny, P. and Roubens, M. (1998). Fuzzy preference modelling, inR. S lowinski (ed.), Fuzzy sets in decision analysis, operations researchand statistics, Kluwer, Dordrecht, pp. 3–30.

[194] Perny, P. and Zucker, J.D. (1999). Collaborative filtering methods based onfuzzy preference relations, Proceedings of EUROFUSE-SIC’99, pp. 279–285.

[195] Perny, P. (1992). Sur le non-respect de l’axiome d’independance dans lesmethodes de type ELECTRE, Cahiers du CERO 34: 211–232.

[196] Perrot, N., Trystram, G., Le Guennec, D. and Guely, F. (1996). Sensorfusion for real time quality evaluation of biscuit during baking. compari-son between bayesian and fuzzy approaches, Journal of Food Engineering29: 301–315.

[197] Perrot, N. (1997). Maıtrise des procedes alimentaires et theorie des ensem-bles flous, PhD thesis, Ecole Nationale Superieure des Industries AgricolesAlimentaires.

[198] Pieron, H. (1963). Examens et docimologie, PUF, Paris.

[199] Pirlot, M. and Vincke, Ph. (1997). Semiorders. Properties, representations,applications, Kluwer, Dordrecht.

[200] Pirlot, M. (1997). A common framework for describing some outrankingprocedures, Journal of Multi-Criteria Decision Analysis 6: 86–93.

[201] Popham, W.J. (1981). Modern educational measurement, Prentice-Hall,New-York.

[202] Poulton, E.C. (1994). Behavioral decision theory: A new approach, Cam-bridge University Press, Cambridge.

[203] Quiggin, J. (1982). A theory of anticipated utility, Journal of EconomicBehaviour and Organization 3: 323–343.

BIBLIOGRAPHY 259

[204] Quiggin, J. (1993). Generalized expected utility theory – The rank-dependentmodel, Kluwer, Dordrecht.

[205] Raiffa, H. (1970). Decision analysis – Introductory lectures on choices underuncertainty, Addison-Wesley, New York.

[206] Riley, H.J., Checca, R.C., Singer, T.S. and Worthington, D.F.. (1994).Grades and grading practices: The results of the 1992 AACRAO survey,American Association of Collegiate Registrars and Admissions Officers,Washington D.C.

[207] Rosenhead, J. (1989). Rational analysis of a problematic world, Wiley, NewYork.

[208] Roubens, M. and Vincke, Ph. (1985). Preference modelling, Springer Verlag,Berlin.

[209] Roy, B. and Bouyssou, D. (1991). Decision-aid: an elementary introductionwith emphasis on multiple criteria, Investigacion Operativa 2: 95–110.

[210] Roy, B. and Bouyssou, D. (1993). Aide multicritere a la decision : Methodeset cas, Economica, Paris.

[211] Roy, B. and Skalka, J.-M. (1984). ELECTRE IS : Aspects methodologiqueset guide d’utilisation, Technical report, Document du LAMSADE No 30,Universite Paris-Dauphine, Paris.

[212] Roy, B. (1974). Criteres multiples et modelisation des preferences : l’apportdes relations de surclassement, Revue d’Economie Politique 1: 1–44.

[213] Roy, B. (1990). Science de la decision ou science de l’aide a la decision ?,Technical report, Cahier du LAMSADE No 97, Universite Paris-Dauphine,Paris.

[214] Roy, B. (1993). Decision science or decision-aid science?, European Journalof Operational Research 66: 184–204.

[215] Roy, B. (1996). Multicriteria methodology for decision aiding, Kluwer, Dor-drecht. Original version in French “Methodologie multicritere d’aide a ladecision”, Economica, Paris, 1985.

[216] Russo, J.E. and Schoemaker, P.J.H. (1989). Confident decision making, Pi-atkus, London.

[217] Saaty, T.L. (1980). The analytic hierarchy process, McGraw-Hill, New York.

[218] Sabot, R. and Wakeman, L.J. (1991). Grade inflation and course choice,Journal of Economic Perspectives 5: 159–170.

[219] Sager, C. (1994). Eliminating grades in schools: An allegory for change, AS Q Quality Press, Milwaukee.

[220] Salles, M., Barrett, C.R. and Pattanaik, P.K. (1992). Rationality and ag-gregation of preferences in an ordinally fuzzy framework, Fuzzy Sets andSystems 49: 9–13.

260 BIBLIOGRAPHY

[221] Satterthwaite, M.A. (1975). Strategy proofness and Arrow’s conditions: Ex-istence and correspondence theorems for voting procedures and social wel-fare functions, Journal of Economic Theory 10: 187–217.

[222] Savage, L. (1954). The foundations of statistics, 1972, 2nd revised edn, Wiley,New York.

[223] Schmeidler, D. (1989). Subjective probability and expected utility withoutadditivity, Econometrica 57: 571–587.

[224] Schneider, Th., Schieber, C., Eeckoudt, L. and Gollier, C. (1997). Eco-nomics of radiation protection: Equity considerations, Theory and De-cision 43: 241–51.

[225] Schofield, J. (1989). Cost-benefit analysis in urban and regional planning,Unwin and Hyman, London.

[226] Scotchmer, S. (1985). Hedonic prices and cost-benefit analysis, Journal ofEconomic Theory 37: 55–75.

[227] Sen, A.K. (1986). Social choice theory, in K.J. Arrow and M.D. Intriliga-tor (eds), Handbook of mathematical economics, Vol. 3, North-Holland,Amsterdam, pp. 1073–1181.

[228] Sen, A.K. (1997). Maximization and the act of choice, Econometrica 65: 745–779.

[229] Simon, H.A. (1957). A behavioural model of rational choice in Models ofman, Wiley, New York, pp. 241–260.

[230] Sinn, H.W. (1983). Economic decisions under uncertainty, North-Holland,Amsterdam.

[231] S lowinski, R. (ed.) (1998). Fuzzy sets in decision analysis, operations researchand statistics, Kluwer, Dordrecht.

[232] Sopher, B. and Gigliotti, G. (1993). A test of generalized expected utilitytheory, Theory and Decision 35: 75–106.

[233] Speck, B.W. (1998). Grading student writing: An annotated bibliography,Greenwood Publishing Group, Westport.

[234] Stamelos, I. and Tsoukias, A. (1998). Software evaluation problem situa-tions, Technical report, Cahier du LAMSADE No 156, Universite Paris-Dauphine, Paris.

[235] Steuer, R.E. (1986). Multiple criteria optimisation: Theory, computation,and application, Wiley, New York.

[236] Stratton, R.W., Myers, S.C. and King, R.H. (1994). Faculty behavior, gradesand student evaluations, Journal of Economic Education 25: 5–15.

[237] Sugden, R. and Wiliams, A. (1983). The principles of practical cost-benefitanalysis, Oxford University Press, Oxford.

[238] Sugeno, M. (1977). Fuzzy measures and fuzzy integrals: a survey, inM.M. Gupta, G.N. Saridis and B.R. Gains (eds), Fuzzy automata anddecision processes, North Holland, Amsterdam, pp. 89–102.

BIBLIOGRAPHY 261

[239] Sugeno, M. (1985). An introductory survey on fuzzy control, InformationSciences 36: 59–83.

[240] Suzumura, K. (1999). Consequences, opportunities and procedures, SocialChoice and Welfare 16: 17–40.

[241] Syndicat des Transports Parisiens (1998). Methodes d’evaluation des projetsd’infrastructures de transports collectifs en region Ile-de-France, Technicalreport, Syndicat des Transports Parisiens, Paris.

[242] Tchudi, S. (1997). Alternatives to grading student writing, National Councilof Teachers of English, Urbana.

[243] Teghem, J. (1996). Programmation lineaire, Editions de l’Universite deBruxelles-Editions Ellipses, Brussels.

[244] Thaler, R.H. (1991). Quasi rational economics, Russell Sage Foundation,New York.

[245] Toth, F.L. (1997). Cost-benefit analysis of climate change: The broader per-spectives, Birkhauser, Basel.

[246] Trystram, G., Perrot, N. and Guely, F. (1995). Application of fuzzy logic forthe control of food processes, Processing Automation 4: 504–512.

[247] Tsoukias, A. and Vincke, Ph. (1995). A new axiomatic foundation of partialcomparability, Theory and Decision 39: 79–114.

[248] Tsoukias, A. and Vincke, Ph. (1999). A characterization of PQI intervalorders, Proceedings OSDA ’98, Electronic Notes on Discrete Mathematics,pp. (http://www.elsevier.nl/locate/endm), to appear also in DiscreteApplied Mathematics.

[249] Tversky, A. (1969). Intransitivity of preferences, Psychological Review76: 31–48.

[250] United Nations Development Programme (1997). Human Development Re-port 1997, Oxford University Press, Oxford.

[251] van Doren, M. (1928). An anthology of world poetry, Albert and CharlesBoni, New York.

[252] Vansnick, J.-C. (1986). De Borda et Condorcet a l’agregation multicritere,Ricerca Operativa (40): 7–44.

[253] Vassiloglou, M. and French, S. (1982). Arrow’s theorem and examinationassessment, British Journal of Mathematical and Statistical Psychology35: 183–192.

[254] Vassiloglou, M. (1984). Some multi-attribute models in examination assess-ment, British Journal of Mathematical and Statistical Psychology 37: 216–233.

[255] Vincke, Ph. (1988). P, Q, I preference structures, in J. Kacprzyk andM. Roubens (eds), Non conventional preference relations in decision mak-ing, Springer Verlag, Berlin, pp. 72–81.

262 BIBLIOGRAPHY

[256] Vincke, Ph. (1992a). Exploitation of a crisp binary relation in a rankingproblem, Theory and Decision 32: 221–241.

[257] Vincke, Ph. (1992b). Multi-criteria decision aid, Wiley, New York. Origi-nal version in French “L’Aide Multicritere a la Decision”, Editions del’Universite de Bruxelles-Editions Ellipses, Brussels, 1989.

[258] Viscusi, W.K. (1992). Fatal tradeoffs: Public and private responsibilities forrisk, Oxford University Press, Oxford.

[259] von Neumann, J. and Morgenstern, O. (1944). Theory of games and eco-nomic behavior, Princeton University Press, Princeton.

[260] von Winterfeldt, D. and Edwards, W. (1986). Decision analysis and behav-ioral research, Cambridge University Press, Cambridge.

[261] Wakker, P.P. (1989). Additive representations of preferences – A new foun-dation of decision analysis, Kluwer, Dordrecht.

[262] Warusfel, A. (1961). Les nombres et leurs mysteres, Points Sciences, Seuil,Paris.

[263] Watson, S.R. (1981). Decision analysis as a replacement for cost-benefit anal-ysis, European Journal of Operational Research 7: 242–248.

[264] Weinstein, M.C. and Stason, W.B. (1977). Foundations of cost-effective-ness analysis for health and medical practices, New England Journal ofMedicine 296: 716–721.

[265] Weitzman, M.L. (1994). On the “environmental” discount rate, Journal ofEnvironmental Economics and Management 26: 200–209.

[266] Weymark, J.A. (1981). Generalized Gini inequality indices, MathematicalSocial Sciences 1: 409–430.

[267] Willis, K.G., Garrod, G.D. and Harvey, D.R. (1998). A review of cost-benefitanalysis as applied to the evaluation of new road proposals in the U.K.,Transportation Research – D 3: 141–156.

[268] Yaari, M.E. (1987). The dual theory of choice under risk, Econometrica55: 95–115.

[269] Yu, W. (1992). Aide multicritere a la decision dans le cadre de laproblematique du tri : Methodes et applications, PhD thesis, LAMSADE,Universite Paris-Dauphine, Paris.

[270] Zadeh, L.A. (1979). A theory of approximate reasoning, in J.E. Hayes,D. Michie and L.I. Mikulich (eds), Machine intelligence, Elsevier, Am-sterdam, pp. 149–194.

[271] Zadeh, L.A. (1999). From computing with numbers to computing with words.from manipulation of measurement to manipulation of perceptions, Pro-ceedings of EUROFUSE-SIC’99, pp. 1–2.

[272] Zarnowsky, F. (1989). The decathlon – A colorful history of track and field’smost challenging event, Leisure Press, Champaign.

BIBLIOGRAPHY 263

[273] Zerbe, R.O. and Dively, D.D. (1994). Benefit-cost analysis in theory andpractice, Harper Collins, New York.

Index

absolute scale, 115action, 212actor, 206acyclic, 126aggregation, 30, 41, 51, 148, 245

weighted sum, 172additive, 44compensation, 46, 57, 212conjunctive rule, 41, 96constructive approach, 130disaggregation, 117dominance, 93linearity, 46, 80monotonicity, 10, 61multi-attribute value function, 105non-compensation, 141paired comparison, 193procedure, 215rank reversal, 117screening process, 96single-attribute value function, 106tournament, 125utility, 105utility function, 106value function, 106weight, 35weighted average, 42, 59, 85, 241weighted sum, 155, 159, 166

AHP, 111rank reversal, 117

air quality, 61, 63Allais’ paradox, 191ambiguity, 212analyst, 206anchoring effect, 34Arrow, 16aspiration level, 96

astrology, 237attributes

hierarchy, 214attributes hierarchy, 213automatic decision, 239automatic decision systems, 148axiomatic analysis, 244

bayesian decision theory, 239, 244binary relation

acyclic, 126fuzzy, 21incomparability, 19, 130, 220outranking, 105semiorder, 20transitivity, 18–20

Borda, 14Borda’s method, 124

call for tenders, 208cardinal, 47client, 206coalition, 219coherence test, 214communication, 242compensation, 46, 57, 212computer science, 30, 237concordance, 216, 219concordance threshold, 134Condorcet, 13

paradox, 51Condorcet’s method, 125conjunctive rule, 41, 96consistency, 84constructive approach, 130corporate finance, 73correlation, 102cost-benefit analysis, 71, 238

264

INDEX 265

externalities, 78markets, 76net present social value, 76price, 75, 85price of human life, 81price of time, 80public goods, 78social benefits, 75social costs, 75social welfare, 77, 86

credibility index, 139criteria

coalition, 219coherence test, 214coherent family, 214hierarchy, 213, 214interaction, 103point of view, 212relative importance, 216, 219

cycle reduction, 135

decathlon, 63, 66decision

dynamic consistency, 202, 239legitimation, 220model, 1

decision aiding process, 206decision model, formal, 84, 212, 237decision process, 85, 206

actor, 206analyst, 206client, 206

decision rule, 149, 151decision support, 210

evaluation model, 206, 213final recommendation, 219learning process, 119problem formulation, 206, 211,

212problem situation, 206problem statement, 212, 215

decision table, 152decision theory, 237

Allais’ paradox, 191Ellsberg’s paradox, 192expected utility, 189, 201

expected value, 187St. Petersburg game, 187

disaggregation, 117discordance, 138discounting, 74, 86, 239

net present value, 74social rate, 76, 82

dominance, 51, 93, 96dynamic consistency, 202, 239

economics, 71, 237education science, 237elections, 237ELECTRE-TRI, 215, 228Ellsberg’s paradox, 192engineering, 30, 237environment, 71, 82equity, 79, 86evaluation

absolute, 212model, 1, 40, 51, 206, 213problem statement, 215software, 207, 213

evaluation modelproblem statement, 212

expected value, 187externalities, 78

final recommendation, 219forecasting, 80fuzzy, 21

control, 149, 165implication, 166interval, 161labels, 161rule, 169set, 161, 169

GPA, 48grade, 29, 237

anchoring effect, 34GPA, 48marking scale, 33minimal passing, 36, 42standardised score, 34

graphology, 237

266 INDEX

health, 71heuristics, 242hierarchy, 213, 214human development, 54, 61

ideal point, 117implication, 166imprecision, 51, 103incomparability, 19, 130, 220independence, 58, 102

of irrelevant alternatives, 15separability, 214

indicator, 238indices, 173indifference threshold, 201interaction, 103interactive methods, 105interpolation, 155interval scale, 99intuition, 242

kernel, 131

learning process, 119legitimation, 220linear scale, 104

majority rule, 125manipulability, 17markets, 76marking scale, 33mathematics, 30MCDM, 238

ideal point, 117interactive methods, 105sorting, 212

profiles, 220substitution rate, 57swing-weight, 110trade-off, 101

meaningfulness, 62, 227measurement, 38, 51, 67, 212

absolute scale, 115cardinal, 47interval scale, 99linear scale, 104meaningfulness, 62, 227

nominal scale, 214ordinal, 39, 47, 214ratio scale, 98reliability, 33scale, 32, 38, 79standard sequences, 107subjective, 214validity, 33

model, 40, 245structuration, 84, 242, 245

mono-criterion analysis, 76, 85monotonicity, 10, 61

nearest neighbours, 172net present social value, 76net present value, 74nominal scale, 214non-compensation, 141

operational research, 30, 237ordinal, 39, 47, 214outranking, 105outranking methods, 124, 129

discordance, 138concordance, 216, 219concordance threshold, 134credibility index, 139cycle reduction, 135ELECTRE-TRI, 215, 228incomparability, 130indifference threshold, 201majority rule, 125PROMETHEE, 193veto, 216, 219

paired comparison, 193point of view, 212political science, 237preference

model, 214, 245nontransitive, 130relation, 125threshold, 50

price, 75, 85price of human life, 81price of time, 80priority, 111

INDEX 267

probability, 239problem formulation, 206, 211, 212problem situation, 206problem statement, 212, 215PROMETHEE, 193public goods, 78

rank reversal, 117ranking, 212ratio scale, 98relative importance, 216, 219risk, 239robustness, 86, 242, 245rule

aggregation, 148

scale, 32, 38, 79screening process, 96security, 81semiorder, 20sensitivity analysis, 83

stability, 99, 101separability, 214similarity

indices, 173relation, 173

social benefits, 75social costs, 75social rate, 76, 82social welfare, 77, 86software, 207, 213sorting, 212St. Petersburg game, 187stability, 99, 101statistics, 237structuration, 84, 242, 245subjective, 214substitution rate, 57

t-norm, 164threshold, 50, 173tournament, 125trade-off, 101transitivity, 18–20transportation, 71, 79

unanimity, 13

uncertainty, 51, 79, 103, 179, 182,201, 239

endogenous, 218, 221exogenous, 218

utility, 105, 106expected, 189, 201

value function, 106multi-attribute, 105single-attribute, 106

veto, 216, 219voting procedure

Borda’s method, 124Concordet paradox, 51Condorcet’s method, 125manipulability, 17unanimity, 13

weight, 35weighted average, 42, 59, 85, 241weighted sum, 155, 159, 166, 172

bouyssou,marchant,pirlot,perny,tsoukias,vincke evaluation and decision models - a critical...

Documents