philosophies and types of evaluation reseach - elliot stern.pdf

Philosophies andtypes of evaluation research

Elliot Stern

In:

Descy, P.; Tessaring, M. (eds)

The foundations of evaluation and impact researchThird report on vocational training research in Europe: background report.

Luxembourg: Office for Official Publications of the European Communities, 2004(Cedefop Reference series, 58)

Reproduction is authorised provided the source is acknowledged

Additional information on Cedefops research reports can be found on:http://www.trainingvillage.gr/etv/Projects_Networks/ResearchLab/

For your information:

the background report to the third report on vocational training research in Europe contains originalcontributions from researchers. They are regrouped in three volumes published separately in English only.A list of contents is on the next page.

A synthesis report based on these contributions and with additional research findings is being published inEnglish, French and German.

Bibliographical reference of the English version:Descy, P.; Tessaring, M. Evaluation and impact of education and training: the value of learning. Thirdreport on vocational training research in Europe: synthesis report. Luxembourg: Office for OfficialPublications of the European Communities (Cedefop Reference series)

In addition, an executive summary in all EU languages will be available.

The background and synthesis reports will be available from national EU sales offices or from Cedefop.

For further information contact:

Cedefop, PO Box 22427, GR-55102 ThessalonikiTel.: (30)2310 490 111Fax: (30)2310 490 102E-mail: [email protected]: www.cedefop.eu.intInteractive website: www.trainingvillage.gr

Contributions to the background report of the third research report

Impact of education and training

Preface

The impact of human capital on economic growth: areviewRob A. Wilson, Geoff Briscoe

Empirical analysis of human capital development andeconomic growth in European regionsHiro Izushi, Robert Huggins

Non-material benefits of education, training and skillsat a macro levelAndy Green, John Preston, Lars-Erik Malmberg

Macroeconometric evaluation of active labour-marketpolicy a case study for GermanyReinhard Hujer, Marco Caliendo, Christopher Zeiss

Active policies and measures: impact on integrationand reintegration in the labour market and social lifeKenneth Walsh and David J. Parsons

The impact of human capital and human capitalinvestments on company performance Evidence fromliterature and European survey resultsBo Hansson, Ulf Johanson, Karl-Heinz Leitner

The benefits of education, training and skills from anindividual life-course perspective with a particularfocus on life-course and biographical researchMaren Heise, Wolfgang Meyer

The foundations of evaluation andimpact research

Preface

Philosophies and types of evaluation researchElliot Stern

Developing standards to evaluate vocational educationand training programmesWolfgang Beywl; Sandra Speer

Methods and limitations of evaluation and impactresearchReinhard Hujer, Marco Caliendo, Dubravko Radic

From project to policy evaluation in vocationaleducation and training possible concepts and tools.Evidence from countries in transition.Evelyn Viertel, Sren P. Nielsen, David L. Parkes,Sren Poulsen

Look, listen and learn: an international evaluation ofadult learningBeatriz Pont and Patrick Werquin

Measurement and evaluation of competenceGerald A. Straka

An overarching conceptual framework for assessingkey competences. Lessons from an interdisciplinaryand policy-oriented approachDominique Simone Rychen

Evaluation of systems andprogrammes

Preface

Evaluating the impact of reforms of vocationaleducation and training: examples of practiceMike Coles

Evaluating systems reform in vocational educationand training. Learning from Danish and Dutch casesLoek Nieuwenhuis, Hanne Shapiro

Evaluation of EU and international programmes andinitiatives promoting mobility selected case studiesWolfgang Hellwig, Uwe Lauterbach,Hermann-Gnter Hesse, Sabine Fabriz

Consultancy for free? Evaluation practice in theEuropean Union and central and eastern EuropeFindings from selected EU programmesBernd Baumgartl, Olga Strietska-Ilina,Gerhard Schaumberger

Quasi-market reforms in employment and trainingservices: first experiences and evaluation resultsLudo Struyven, Geert Steurs

Evaluation activities in the European CommissionJosep Molsosa

Philosophies andtypes of evaluation research

Elliot Stern

AbstractThis chapter considers different types of evaluation in vocational education and training (VET). It does sofrom two standpoints: debates among evaluation researchers and the way contexts of use and evalua-tion capacity shape evaluation in practice. The nature of VET as an evaluation object is discussed andtheories of evaluation are located in wider debates about the nature of knowledge and philosophies ofscience. The various roles of evaluation in steering and regulating decentralised policy systems arediscussed, as is the way evaluation itself is regulated through the development of standards and profes-sional codes of behaviour.

1. Introduction 121.1. Scope of this chapter 12

2. Can evaluation be defined? 132.1. Assessing or explaining outcomes 13

2.2. Evaluation, change and values 14

2.3. Quantitative and qualitative methods 15

2.4. Evaluation types 15

2.5. A focus on results and outcomes 17

2.6. Participatory methods and devolved evaluations 18

2.7. Theory and practice 19

3. The object of VET evaluation 213.1. The domain of VET 21

3.2. Overarching characteristics of evaluation objects 22

4. Philosophical foundations 244.1. Positivism, observation and theory 24

4.2. Scientific realism 24

4.3. Constructivists 26

5. Evaluation theory 305.1. Programme theory 30

5.2. Theory based evaluation and theories of change 31

5.3. A wider theoretical frame 32

6. Determinants of evaluation use 346.1. Instrumental versus cumulative use 34

6.2. Process use of evaluation 34

6.3. The importance of methodology 35

6.4. Evaluation and learning 35

6.5. The institutionalisation of evaluation 36

6.6. Organisation of evaluation in public agencies 37

6.7. Evaluation as dialogue 38

6.8. Strategies and types of evaluation use 39

7. Evaluation standards and regulation 407.1. Evaluation codes and standards 40

7.2. Evaluation as a method for regulating decentralised systems 43

8. Conclusions and future scenarios 45

List of abbreviations 47

References 48

Table of contents

List of tables

TablesTable 1: Overlaps between methodology and purpose 16

Table 2: Evaluation types 18

Table 3: Rules guiding realistic enquiry and method 26

Table 4: Comparing constructivist and conventional evaluation 27

Table 5: Grid for a synthetic assessment of quality of evaluation work 42

Until quite recently, evaluation thinking has beencentred in North America. Only over the last tenyears have we seen the growth and spread ofevaluation in Europe. This has been associatedwith a significant expansion of evaluations sup-ported by the European Union (EU), especially inrelation to Structural Funds (European Commis-sion, 1999), and the establishment of evaluationsocieties at Member State and European levels.There has also been the beginnings of a traditionof evaluation publishing in Europe, including theemergence of a major new evaluation journaledited for the first time from a European base andwith a high proportion of European content. Manyfactors account for the growth in evaluation activi-ties in Europe in recent years (Leeuw et al., 1999;Rist et al., 2001; Toulemonde, 2001). Theseinclude both structural and management consider-ations. Expenditure pressures, both at national andEuropean level, have increased demands forimproved performance and greater effectivenesswithin the public sector. Furthermore, public actionis becoming increasingly more complex both interms of the goals of programmes and policiesand the organisational arrangements throughwhich they are delivered.

The decentralisation of public agencies,together with the introduction of results basedmanagement and other principles commonlydescribed under the heading new public manage-ment, have created new demands for account-ability; these are often in multi-agency and part-nership environments. Without the bottom line offinancial measures to judge success, new ways ofdemonstrating impacts and results are beingdemanded; so are new ways of regulating andsteering decentralised systems. We see evaluationnowadays not only applied to programmes orpolicy instruments but also built into the routinesof administration. This is often associated withstandards that are set by policy-makers in relationto the performance of those expected to deliverpublic services. Standards, a concept central toevaluation, have also come to be applied to evalu-ation itself. Even evaluation is not free from thedemands to deliver reliable, high quality output.

1.1. Scope of this chapter

It is against this background that this chapter ontypes and philosophies of evaluation has beenprepared. The main sections are as follows.

Chapter 2 begins by seeking to define, or, moreaccurately, review, attempts to define evaluationthrough the work of various scholars and experts. Itallows us to clarify the main types of evaluationthat are in use in different institutional and adminis-trative settings, even though the writing of scholarsand experts focus on pure types when comparedwith evaluation as practised. This section alsobegins to highlight issues related to standards andtheir role in evaluation more broadly.

Chapter 3 then considers the nature of evalua-tion in the context of vocational education andtraining (VET) and addresses the question: whatcharacterises evaluation in this domain; is it in anyway distinctive? The section includes both specificconsideration of VET and more general character-istics of evaluation objects and configurations.

Chapter 4 seeks to locate evaluation theorywithin the broader setting of the nature of theoryin the philosophy of science. Many of the debatesin evaluation are shaped by, and reflect, thesewider debates.

Evaluation theory narrowly conceived is thenreviewed in Chapter 5. In many ways theory withinevaluation (as will be discussed) is a very particularconstruction, though this does not invalidate widerunderstandings of the role of theory.

The reality and practice of evaluation are thenconsidered in Chapter 6, bringing together asubstantial body of research into evaluation useand institutionalisation in order to understandbetter different types of evaluation in situ.

Evaluation standards and codes of behaviourand ethics for evaluators are reviewed inChapter 7, drawing on experience in NorthAmerica, Australasia and, more recently, Europe.

Finally, in Chapter 8, the discussion returns tothe question of evaluation standards and the rolethey play both in the evaluation process and ingovernance and regulatory processes to steerinstitutions and promote policies and reforms.

1. Introduction

There are numerous definitions and types of eval-uation. There are, for example, many definitionsof evaluation put forward in handbooks, evalua-tion guidelines and administrative procedures, bybodies that commission and use evaluation. All ofthese definitions draw selectively on a widerdebate as to the scope and focus of evaluation. Arecent book identifies 22 foundation models for21st century programme evaluation (Stufflebeam,2000a), although the authors suggest that asmaller subset of nine are the strongest. Ratherthan begin with types and models, this chapterbegins with an attempt to review and bringtogether the main ideas and orientations thatunderpin evaluation thinking.

Indicating potential problems with definitionby a question mark in the title of this sectionwarns the reader not to expect straightforward orconsistent statements. Evaluation has grown upthrough different historical periods in differentpolicy environments, with inputs from many disci-plines and methodologies, from diverse valuepositions and rooted in hard fought debates inphilosophy of science and theories of knowledge.While there is some agreement, there is alsopersistent difference: evaluation is contestedterrain. Most of these sources are from NorthAmerica where evaluation has been established as a discipline and practice and debated for 30or more years.

2.1. Assessing or explainingoutcomes

Among the most frequently quoted definitions isthat of Scriven who has produced an evaluationThesaurus, his own extensive handbook of evalu-ation terminology: evaluation refers to theprocess of determining the merit, worth or valueof something, or the product of that process []The evaluation process normally involves someidentification of relevant standards or merit, worthor value; some investigation of the performanceof evaluands on these standards; and some inte-

gration or synthesis of the results to achieve anoverall evaluation or set of associated evalua-tions. (Scriven, 1991; p. 139).

This definition prepares the way for what hasbeen called the logic of evaluation(Scriven, 1991; Fournier, 1995). This logic isexpressed in a sequence of four stages:(a) establishing evaluation criteria and related

dimensions;(b) constructing standards of performance in

relation to these criteria and dimensions;(c) measuring performance in practice;(d) reaching a conclusion about the worth of the

object in question.This logic is not without its critics (e.g.

Schwandt, 1997) especially among those of anaturalistic or constructivist turn who cast doubton the claims of evaluators to know, to judge andultimately to control. Other stakeholders, it isargued, have a role and this changed relationshipwith stakeholders is discussed further below.

The most popular textbook definition of evalu-ation can be found in Rossi et. al.s book Evalua-tion a systematic approach: Program evalua-tion is the use of social research procedures tosystematically investigate the effectiveness ofsocial intervention programs. More specifically,evaluation researchers (evaluators) use socialresearch methods to study, appraise, and helpimprove social programmes in all their importantaspects, including the diagnosis of the socialproblems they address, their conceptualizationand design, their implementation and administra-tion, their outcomes, and their efficiency. (Rossiet al., 1999; p. 4).

Using words such as effectiveness rather thanScrivens favoured merit worth or value begins toshift the perspective of this definition towards theexplanation of outcomes and impacts. This ispartly because Rossi and his colleagues identifyhelping improve social programmes as one of thepurposes of evaluation. Once there is an intentionto make programmes more effective, the need toexplain how they work becomes more important.Yet, explanation is an important and intentionally

2. Can evaluation be defined?

The foundations of evaluation and impact research14

absent element in Scrivens definitions of evalua-tion: By contrast with evaluation, which identifiesthe value of something, explanation involvesanswering a Why or How question about it or acall for some other type of understanding. Often,explanation involves identifying the cause of aphenomenon, rather than its effects (which is amajor part of evaluation). When it is possible,without jeopardizing the main goals of an evalua-tion, a good evaluation design tries to uncovermicroexplanations (e.g. by identifying thosecomponents of the curriculum package that areproducing the major part of the good or badeffects, and/or those that are having little effect).The first priority, however, is to resolve the evalua-tion issues (is the package any good at all, thebest available? etc.). Too often the research orien-tation and training of evaluators leads them to doa poor job on evaluation because they becameinterested in explanation. (Scriven, 1991, p. 158).

Scriven himself recognises that one pressuremoving evaluation to pay greater attention toexplanation is the emergence of programmetheory, with its concern about how programmesoperate so that they can be improved or betterimplemented. A parallel pressure comes from theuptake of impact assessment associated with thegrowth of performance management and othermanagerial reforms within public sector adminis-trations. The intellectual basis for this work wasmost consistently elaborated by Wholey andcolleagues. They start from the position that eval-uation should be concerned with the efficiencyand effectiveness of the way governments deliverpublic services. A core concept within thisapproach is what is called evaluability assess-ment (Wholey, 1981). The starting point for thisassessment is a critical review of the logic ofprogrammes and the assumptions that underpinthem. This work constitutes the foundation formost of the thinking about programme theory andlogical frameworks. It also prefigures a later gener-ation of evaluation thinking rooted more in policyanalysis that is concerned with the institutionalisa-tion of evaluation within public agencies (Boyleand Lemaire, 1999), as discussed further below.

These management reforms generally linkinterventions with outcomes. As Rossi et al.recognise, this takes us to the heart of broaderdebates in the social sciences about causality:

The problem of establishing a programs impactis identical to the problem of establishing that theprogram is a cause of some specified effect.Hence, establishing impact essentially amountsto establishing causality. (Rossi et al., 1999).

The difficulties of establishing perfect, ratherthan good enough, impact assessments arerecognised by Rossi and colleagues. This takesus into the territory of experimentation and causalinference associated with some of the most influ-ential founders of North American evaluationssuch as Campbell, with his interest in experi-mental and quasi-experimental designs, but alsohis interest in later years in the explanatorypotential of qualitative evaluation methods. Thedebate about experimentation and causality inevaluation continues to be vigorously pursued invarious guises. For example, in a recent authori-tative text on experimentation and causal infer-ence, (Shadish et al., 2002) the authors begin totake on board contemporary criticisms of experi-mental methods that have come from the philos-ophy of science and the social sciences moregenerally. In recent years, we have also seen asustained realist critique on experimentalmethods led in Europe by Pawson and Tilley(1997). But, whatever their orientations to experi-mentation and causal inference, explanationsremain at the heart of the concerns of an impor-tant constituency within evaluation.

2.2. Evaluation, change andvalues

Another important strand in evaluation thinkingconcerns the relationship between evaluation andaction or change. One comparison is betweensummative and formative evaluation methods,terms also coined by Scriven. The formerassesses or judges results and the latter seeks toinfluence or promote change. Various authorshave contributed to an understanding of the roleof evaluation and change. For example, Cron-bach (1982, 1989) rooted in policy analysis andeducation, sees an important if limited role forevaluation in shaping policy at the marginsthrough piecemeal adaptations. The role of eval-uation in Cronbachs framework is to inform poli-cies and programmes through the generation of

Philosophies and types of evaluation research 15

knowledge that feeds into the policy shapingcommunity of experts, administrators andpolicy-makers. Stake (1996) on the other hand,with his notion of responsive evaluation, seesthis as a service to programme stakeholdersand to participants. By working with those whoare directly involved in a programme, Stake seesthe evaluator as supporting their participation andpossibilities for initiating change. This contrastswith Cronbachs position and even more stronglywith that of Wholey (referred to earlier) givenStakes scepticism about the possibilities ofchange at the level of large scale national (or inthe US context Federal and State) programmesand their management. Similarly, Patton, (1997and earlier editions) who has tended to eschewwork at programme and national level, shareswith Stake a commitment to working with stake-holders and (local) users. His concern is forintended use by intended users.

Virtually everyone in the field recognises thepolitical and value basis of much evaluationactivity, albeit in different ways. While Stake,Cronbach and Wholey may recognise the impor-tance of values within evaluation, the values thatthey recognise are variously those of stake-holders, participants and programme managers.There is another strand within the general orienta-tion towards evaluation and change which isdecidedly normative. This category includesHouse, with his emphasis on evaluation for socialjustice and the emancipatory logic of Fettermanet al. (1996) and empowerment evaluation.Within the view of Fetterman and his colleagues,evaluation itself is not undertaken by externalexperts but rather is a self-help activity in which because people empower themselves the roleof any external input is to support self-help. So,one of the main differences among those evalua-tors who explicitly address issues of programmeand societal change is in terms of the role of eval-uators, be they experts who act, facilitators andadvocates, or enablers of self help.

2.3. Quantitative and qualitativemethods

Much of the literature that forms the foundation ofthe explanatory strand in evaluation is quantita-tive, even though we have noted that the later

Campbell began to emphasise more the impor-tance of qualitative understanding. However,among those concerned with formative, respon-sive and other change-oriented evaluations, thereis a predominance of qualitative methods. Thescope of qualitative includes processes as wellas phenomena, which by their nature requirequalitative description. This would include, forexample, the means through which a programmewas being implemented as well as the dynamicsthat occur during the course of evaluation (e.g.learning to do things better, improving proce-dures, overcoming resistance).

Stake emphasises qualitative methods. He isoften associated with the introduction of casestudies into evaluation practice, although he alsoadvocates a full range of observational, interviewand conversational techniques (Stake, 1995).Pattons commitment to qualitative methods isreinforced by his interest in the use byprogramme managers of the results of evalua-tions. They must be interested in the stories,experiences and perceptions of programmeparticipants (Patton, 2002; p. 10).

2.4. Evaluation types

After this tour around some of the main argu-ments and positions in evaluation, it becomespossible to return to the matter of definition andtypes of evaluation. This is not a simple or singledefinition but types of evaluation can be seen tocohere around two main axes. The first axis ismethodological and the second concernspurposes.

In terms of methodologies, looking across thedifferent approaches to evaluation discussed above,we can distinguish three methodological positions:(a) the criteria or standards based position,

which is concerned with judging success andperformance by the application of standards;

(b) the causal inference position, which isconcerned with explaining programmeimpacts and success;

(c) the formative or change oriented position,which seeks to bring about improvementsboth for programmes and for those whoparticipate in them.

Alongside these methodological distinctionsare a series of definitions that are concerned


Evaluation for the purpose of accountabilitytends to be concerned with criteria and stan-dards (or indicator studies). Development evalua-tions use change oriented methods to pursue thedesired improvements in programme delivery.Evaluations for the purpose of knowledgeproduction are often concerned with drawingcausal inference from evaluation data. Finally,evaluations for the purposes of social improve-ment are also preoccupied with change orientedmethods, though to improve the circumstancesof programme participants and citizens ratherthan programme management per se. However,this is not to suggest a one-to-one associationbetween methodologies and purposes.

In the world of evaluation in practice, there arealso incompatibilities and tensions. Thus, theaccountability driven goal of evaluation often sits

alongside, and sometimes competes with,management and delivery logic. Evaluation isoften seen by programme managers as a meansof supporting improved effectiveness of imple-mentation. In many institutional settings, fundsare committed for evaluation to meet account-ability purposes (Vedung, 1997), but are spentmainly for managerial, formative and develop-mental purposes. Nor is causal inference alwaysabsent from evaluation purposes concerned withsocial improvement. Nonetheless, the clusteringsrepresented in Table 1 do summarise the maintypes of evaluation. These are:(a) accountability for policy-making evaluations

that rely on criteria, standards and indicators;(b) development evaluations that adopt a change

orientated approach in order to improveprogrammes;

with evaluation purposes. Distinguishing evalua-tion in terms of purpose has been taken up bymany authors including Vedung (1997), evalua-tors at the Tavistock Instititute (Stern, 1992;Stern et al., 1992) and Chelimsky (1995, 1997).Most of these authors distinguish betweendifferent evaluation purposes that are clearlyconsistent with the overview presented above.Along this axis, we can distinguish between thefollowing purposes:(a) accountability, where the intention is to give

an account to sponsors and policy-makers

of the achievements of a programme orpolicy;

(b) development, where the intention is toimprove the delivery or management of aprogramme during its term;

(c) knowledge production, where the intention isto develop new knowledge and understanding;

(d) social improvement, where the intention is toimprove the situation of the presumed benefi-ciaries of public interventions.

There is a degree of correlation between thesetwo axes as Table 1 suggests.

Purposes Methodology

Criteria and standards Causal inference Change orientation

Outcome and impactAccountability evaluations.

Mainly summative

DevelopmentFormative evaluation of programmes

What works Knowledge production improving future

policy/practice

Social improvementEmpowerment and participative evaluations

Table 1: Overlaps between methodology and purpose

(c) knowledge production evaluations that areconcerned to establish causal links explana-tions and valid knowledge;

(d) social improvement evaluations that seek toimprove the circumstances of beneficiaries bydeploying change, advocacy and facilitationskills.

Already implicit in the above discussions anddefinitions is the dimension of time. Evaluations forthe purpose of accountability tend to occur at theend of a programme cycle. Development orientedevaluations tend to occur while the programme isongoing, and knowledge production evaluationscan continue long after the initial programme cyclehas ended. Wholeys concept of evaluabilityfocuses attention on the initial programme logicwhile Cronbachs interest in the policy shapingcommunity carries over into the long term and theperiods of transition between one programme andanother. Notions of ex-ante evaluation (andappraisal or needs analysis), ongoing or mid-termevaluations, and ex-post evaluations that havebeen adopted as a basic framework by the Euro-pean Commission and other agencies, derive fromthese different understandings of when evaluationactivity is most relevant.

The main types of evaluation identified abovecan be further elaborated in terms of the kinds ofquestions they ask, the stakeholders that areincluded and the focus of their activities.

Accountability for policy-making, evaluationmeets the needs of external stakeholders whorequire the delivery of programme or policyoutputs. Management may also demandaccountability but here we mean externalmanagement rather than management internal toa programme or policy area. This has become adominant form of evaluation in public administra-tions, consistent with the growth of performancemanagement philosophies more generally. Evalu-ations of this type tend to occur at the end of aprogramme or policy cycle and focus on results.

Development evaluation follows the lifecycle ofan initiative with the aim of improving how it ismanaged and delivered. These evaluations aremore likely to meet the needs of internalmanagers and partners rather than externalstakeholders. Formative evaluations and processevaluations tend to fall into this category.

Knowledge production evaluation is mainlyconcerned with understanding in the longer term.These evaluations often seek to synthesise under-standing coming from a number of evaluations.

While both of the previous evaluation types areexpected to affect current programme learningand knowledge production, this type looks toapply lessons to future programmes and policies.

Social improvement evaluation can take manyforms. Many social and economic programmesdepend for their success on consensus amongthe intended beneficiaries. Participative evalua-tions that seek to involve target groups contributeto the development of consensus and consent.This type of evaluation may also take on an advo-cacy role: promoting certain interests or groups.It is within this evaluation purpose thatprogramme beneficiaries are most likely to bedirectly involved, not merely consulted.

These different evaluation types, can be furtherelaborated, in terms of the following questions: (a) who are the stakeholders?(b) what is the focus of the evaluation?(c) what are the main approaches and methods?(d) what are the key questions that can be

asked?Table 2 presents the main elements of the four

evaluation types in relation to these questions.However, we would not wish to suggest that

these types do full justice to the diversity of eval-uation models; rather they summarise the mainhigh level differences. It is possible, for example,to see the emergence of sometimes contradictoryevaluation subtypes in recent years. Two exam-ples of these are outcome focused evaluationsand participation focused evaluations.

2.5. A focus on results andoutcomes

The concern that public interventions should leadto specific and measurable results is mirrored inthe development of evaluation practice. Incomplex socioeconomic programmes in partic-ular, the tendency is often to focus on interme-diate outcomes and processes of implementa-tion. Sometimes this is inevitable, when the finalresults of interventions will only be discernible inthe long term. Contemporary models of publicmanagement create a demand for methods thatfocus on results and there has been considerableinvestment in such methodologies in recentyears. These methodologies tend to be in threeareas.


The first deals with systematic reviews.Reaching policy conclusions and taking actionson the basis of the evaluation of single projects,or even programmes, has for long been criticised.The evidence-based policy movement works onthe assumption that it is necessary to aggregatethe results of different evaluations throughsystematic reviews in order to produce reliableevidence.

Next is results based management. This is nowa feature of most public management systemsand can be variously expressed in terms oftargets, league tables, payment-by-results andoutcome funding. Within the Commission therehas been a move in this direction, under the labelof activity based management. It is also theunderlying principle of the performance reservewithin the Structural Funds and relevant tocurrent debates about impact assessment.

Finally there are macro and micro economicmodels. These seek to simulate the relationshipbetween key variables and explain outputsthrough a mixture of available data and assumed

causal relationships. Such models are especiallyuseful where data sources are incomplete andresults have to be estimated rather than preciselymeasured.

2.6. Participatory methods anddevolved evaluations

There is a general tendency in programme andpolicy evaluation for multiple stakeholder andcitizen involvement. These general developmentshave led to a spate of innovations among evalua-tors, who are now able to draw on an extensiverepertoire of participative methods and tech-niques, many of them pioneered in internationaldevelopment contexts. They include: rapidappraisal methods, empowerment evaluation,methods for involving stakeholders, and user-focused evaluations.

Evaluation is often seen as an instrument fordeveloping social consensus and strengthening


Table 2: Evaluation types

Purpose Stakeholder Focus Main evaluation Key questions approaches

Accountability Parliaments, Ministers, Impacts, outcomes, Indicators, What have been for policy-makers funders/sponsors, achievement of targets, performance the results?

Management Boards value for money measures, value for Are they intended money studies, or unintended?quantitative surveys Are resources

well-used?

Development Project coordinators. Identifying constraints. Relating inputs to How well is the for programme Partner organisations. How they should be outputs, qualitative programme being improvement Programme managers overcome? Delivery description, following managed?

and implementation processes over time Can it be strategies. implemented better?

Knowledge Programme planners, Dissemination of Experimental and What is being learnt?production and policy-makers. good practice. quasi-experimental Are there lessons explanation Academics What works? studies, case studies, that can be applied

Organisational change systematic reviews elsewhere?and syntheses How would we do

it next time?

Social improvement Programme To ensure full Stakeholder What is the best and social change beneficiaries and involvement, influence involvement, way to involve

civil society and control by citizens participative reviews, affected groups?and affected groups. advocacy How can equal

opportunities and social inclusion be ensured?

social cohesion. The expectation is that owner-ship and commitment by citizens to public policypriorities will be maximised when they have alsobeen involved in setting these priorities and eval-uating the outcomes of interventions. There is astrong managerial logic within large scale decen-tralised programmes to use evaluation as aninstrument to strengthen programme manage-ment by diffusing the culture of evaluation amongall programme participants.

This also focuses attention more generally onwho undertakes evaluations and where is evalua-tion located? Among the types outlined above,the assumption is that some outside expert occu-pies the evaluator role. Already within the partici-pative subtype just referred to, the role of theevaluator is far less prominent. The role of theevaluator as orchestrator, facilitator and enabler is further elaborated in the discussion of construc-tivist evaluation in a later section of this chapter.However, even within other types of evaluationthere are different possible types of operationali-sations and locations of the evaluation role. Oneimportant variant is devolved evaluation.

It is becoming increasingly common for evalu-ation to become a devolved obligation forprogramme beneficiaries. In the European Struc-tural Funds, requirements for ex-ante andmid-term evaluations are now explicitly theresponsibility of Member States and monitoringcommittees. The same is true of internationaldevelopment aid within the CEC, where projectevaluation is consistently devolved to beneficia-ries. Often, those who evaluate on such aself-help basis are required to undertake theevaluations and must demonstrate that theyincorporate and use findings. In fact, the devolu-tion chain is far more extended. In the EuropeanStructural Funds, monitoring committees willoften require beneficiaries and programmemanagers to conduct their own evaluations. Thesame is true for national programmes. In the UKfor example, local evaluation conducted byprojects within a programme are the norm. Theseare variously intended to focus on local concerns,inform local management and generate data thatwill be useful for accountability purposes.

These intentions are not conflict-free. Forexample, top-down demands by the EU or bycentral governments can easily undermine thelocal focus on local needs (Biott and Cook, 2000).

Nonetheless, the role of devolved evaluation inthe management and steering of programmeshas been a noticeable trend over the last tenyears. By requiring programme participants toclarify their priorities, collect information, interpretfindings and reflect on the implications, it isassumed that programme management at asystemic level will be improved.

2.7. Theory and practice

It has not been the intention to focus on evaluationpractice in this chapter. However, it is worthreflecting briefly on how evaluation practice relatesto some of the main debates outlined above:(a) evaluations in the public sector are firmly

within accountability and programmemanagement purposes (e.g. Nagarajan andVanheukelen, 1997);

(b) the notion of goal-free evaluation that doesnot start from the objectives of programmeshas never been favoured by public adminis-trations in Europe or elsewhere. Althoughthere is often scope to examine overallimpacts and to consider unintentional conse-quences the design of most evaluations isfirmly anchored around goals and objectives;

(c) there is a trend to take on board evaluationcriteria such as relevance, efficiency, effec-tiveness, impact and sustainability (the nowstandard World Bank and OECD criteria). Inthe EU guide referred to above, these areapplied as evaluative judgements, in relationto programme objectives and in relation tosocioeconomic problems as they affect targetpopulations;

(d) there is sometimes confusion betweeneconomic appraisal and evaluation. In mostpublic administrations judgements have to bemade, before new policy initiatives arelaunched, on whether to proceed or not (e.g.the UK Treasurys Green Book and recent EUguidance on impact assessment). For manyeconomists, this pre-launch appraisal is seenas the same as evaluation. In general it issensible to confine the term evaluation towhat happens once a programme or policyhas been decided on;

(e) macro-economic methods in particular aremore difficult to apply, when the resource


input is relatively small. Where policy inputsare large scale and can be isolated from otherinputs (e.g. in Objective 1, but not Objective 2or 3 within EU Structural Funds), they can bemore easily applied;

(f) the status of stakeholders has undoubtedlybeen enhanced in most evaluations in recentyears. However, the role of stakeholders isgenerally as informants rather than sources ofevaluative criteria, let alone as judges of meritand worth. There is considerable scope withindecentralised and devolved evaluationsystems for participative and constructivistapproaches. What is more common is closeworking with stakeholder to define criteria for

evaluation and contribute to consensusprocess;

(g) the boundaries between research and evalua-tion remain clouded. Many studies commis-sioned as evaluation contribute to knowledgeproduction and are indistinguishable fromresearch. Various distinctions have beenproposed, including the short- rather thanlong-term nature of evaluation and its mainlyinstrumental intent. However, few of thesedistinctions are watertight. For example, manyof the elements within this overall study couldbe defined as research and, arguably, once aresearch-generated study is deployed forevaluative purposes, its character changes.


What is evaluated is a factor in how evaluation ispractised. The object of evaluation is different indifferent domains (health, transport, education,vocational training, etc.) and this partly shapeswhat we call evaluation in these various domains.At the same time, there are overarching charac-teristics of evaluation objects that are similaracross domains. In this section, we consider bothapproaches to the object of evaluation; those thatfollow from the nature of the domain and thosethat follow from the characteristics of what isbeing evaluated.

3.1. The domain of VET

VET is a broad field that, at a minimum, includesinitial vocational training, continuing vocationaltraining, work-based learning and VET systems.

However, this selection understates the scopeof VET as an object of evaluation. VET hasbecome more complex and multi-faceted overthe years as can be seen in previous Cedefopreports on vocational training research in Europe.This is mainly because there has been a shiftfrom decontextualised studies of impact tostudies that increasingly incorporate context. SoVET, even at the level of the firm, is seen as beingembedded in other corporate policies and proce-dures such as marketing, the organisation ofproduction, supervisory and managerial practiceand human resource management. In order todescribe, let alone explain, the impact of VET, thisbroader set of factors needs to be considered.The same is true for policy level interventions. Forexample, what is called active labour-marketpolicies, especially for those who aremarginalised in the labour market, usuallyincludes VET, but this is embedded in a raft ofother policies including subsidies to employers,restructuring of benefits and new screening andmatching processes.

This recontextualisation of the objects of eval-uation is happening across many fields of evalua-tive enquiry. Evaluations of health are no longerconfined to studies of illness. The new public

health incorporates environmental, lifestyle andpolicy elements alongside data on illness topicssuch as morbidity and mortality. Similarly, evalua-tion in education is now more likely to includelearning processes, socioeconomic and culturalfactors and broader pedagogic understandingsalongside studies of classroom behaviour. It isprobably more useful to think of evaluationconfigurations as composites of contingent eval-uation objects rather than a single evaluationobject.

As we shall see below, methodological devel-opments within evaluation mirror this contextuali-sation. There is a move away from ceteris paribusassumptions, to focus increasingly on impacts incontext. It is likely that the broadening concep-tion of evaluation configurations such as VET isthe result of new methods and theories helpingredefine the core concept. As is often the case,methods and methodologies interact with corecontent, which they also help to shape.

Overall, most classes of evaluation object canbe found under the umbrella of VET and it is notpossible to associate the evaluation and impactof VET with a particular type of evaluation objector configuration. What is clear is that VET, as anobject of evaluation, calls on a vast range ofdisciplinary understandings, levels of analysisand potential areas of impact. The importance ofinterdisciplinary evaluation efforts is highlightedby this discussion.

The scope of VET itself is further complicatedby the different understandings of impact thatcharacterise the field. We have particular studiesof the impact of continuing vocational training(CVT): on company performance; on active labourmarkets; on individual employment and payprospects; pedagogic methods as they influencelearning-outcomes and competences; VETsystem reform affecting training outcomes; andknowledge and qualifications as an influence onnational economic performance.

It is these clusters of interest the preoccupa-tions of a domain at any given time that circum-scribe the object of evaluation. It is the sets ofobjects and understandings around what has

3. The object of VET evaluation

been called configurations that best describeswhat distinguishes evaluation of VET from otherdomains. The impacts of CVT on companyperformance, the way in which it is possible toimprove initial vocational training throughchanging qualification systems, and how VETaffects economic performance and social integra-tion are all examples of what defines the object ofevaluation within VET. Such preoccupations alsochange over time. It is worth adding that suchpreoccupations are also encapsulated in theoret-ical form. Topical theories such as social exclu-sion, human capital, cultural capital, corporateinnovation will be widely accepted in the VETdomain as in others. Todays theories also helpdefine the evaluation object (see below for moregeneral discussion of theory in evaluation).

3.2. Overarching characteristicsof evaluation objects

Although it is not possible within the scope of thischapter to offer a full typology of evaluationobjects, it is worth highlighting the kinds of differ-ences that occur not only in the evaluation of VETbut also in many other evaluation domains. Thereare many ways in which evaluation configurationscan be differentiated; for simplicitys sake thefollowing examples concentrate on commondimensions such as similarity or difference, moreor less, etc. Of course, there are also much morecomplex descriptions of evaluation configurations.

There are a number of important dimensions ofevaluation configurations, including input charac-teristics. Most programmes are operationalisedthrough inputs or policy instruments; in VET theseinclude new curricula, new forms of funding forenterprise-based training or new training courses.Such inputs may be standardised across aprogramme or may be more or less diverse. It is,for example, common for inputs to be carefullytailored to individual, local or enterprise needs. Thiswill have consequences for sampling and scale ofan evaluation. More seriously it will have implica-tions for the possibilities of generalisations that canbe made on the basis of evaluation findings.

Another dimension is the immediate context.The context or setting within which an input islocated can also be relatively standardised orrelatively diverse. This statement might apply at a

spatial level (characteristics of the area) or interms of the context of delivery or the institutionalsetting within which programmes are located orpolicies are expected to have an impact. In VET,the relevant context may be a labour market, atraining provider or an enterprise. A highly diver-sified initiative may be located across differentkinds of contexts and, even within a singlecontext, there may be considerable variety. Thediversity or standardisation of the immediatecontext will have many implications, in particularfor how policies and programmes are imple-mented and how much effort needs to bedevoted to the evaluation of implementation.

Modes of delivery are also important since thesame input or instrument can be delivered in verydifferent ways. For example, a needs analysismay be undertaken through a local survey aspart of the recruitment process of potentialtrainees or by a company reanalysing itspersonnel data. Nowadays it would be morecommon for programmes to be delivered throughpartnership arrangements rather than through asingle administrative chain. This will often be thecase, for example, in VET measures deliveredthrough EU Structural Funds.

Settings need considerations as well giventhe embedded and contextualised nature ofmany evaluation objects and that isolated evalu-ation objects are increasingly rare. With concep-tualisations that incorporate context, evaluationobjects have a tendency to become configura-tions. A classic example of an evaluation objectthat is presumed to be isolated is classroom-based studies that ignore the overall schoolcontext or the socioeconomic characteristics ofa catchment area. By contrast, a VET measurethat is bundled together with a package ofincentives, vocational guidance measures andqualifications will need to be evaluated in thiswider context.

A further dimension is the number of stake-holders. In any evaluation, there will be those whohave an interest in the evaluation and what isbeing evaluated. Within decentralised, multi-agency programmes there are often many stake-holders, each with their own evaluation questionsand judgement criteria. These might, for example,include regional authorities, training providers,sectoral representatives, social partners andEuropean institutions.


Finally, there is the degree of consensus. Poli-cies and programmes may be contentious and willbe supported by a greater or lesser degree ofconsensus among stakeholders. Numerous stake-holders are often associated with lower levels ofconsensus. Evaluations which draw a high level ofconsensus will be able easily to apply agreedcriteria. Where there is lower consensus, quitedifferent criteria may need to be applied to evalua-tion data and more work may need to be done tobring together different interests and perspectives.This not only shapes methodology but also thework required of evaluators.

While each of these characteristics or dimen-sions has consequences for the design of anevaluation and how it is organised, they alsointeract. For example, we can envisage twodifferent scenarios. In the first, a single subsidyis available to employers within firms in the retailsector to provide additional training to youngapprentices following a recognised nationalqualification. In the second, a package ofmeasures locally determined by partnerships ofcompanies, training providers and regionalauthorities is available to firms, colleges andprivate training providers, to improve the voca-

tional skills and work preparedness of the youngunemployed.

Within the first scenario it would be possibleand appropriate to assess success in terms of alimited range of output and outcome measuresand possibly to apply experimental and randomassignment techniques as part of the evaluationprocedure. Within this scenario there would belimited resources devoted to the evaluation of theprocesses of implementation. There is also likelyto be a limited number of stakeholders involved inthe evaluation.

Within the second scenario there will be a needfor several different measures of output andoutcomes. Comparisons across the programmewill be difficult to standardise given the diversityof modes of delivery and types of input or policyinstrument. There is also likely to be limitedconsensus among the many different stake-holders involved in the programme and its imple-mentation. The use of experimental methods (e.g.control groups and before and after measures)may be possible in such a configuration. It is alsolikely that case studies that illustrate the way allthe various dimensions come together will beappropriate.


4.1. Positivism, observation andtheory

Before addressing particular aspects of evaluationtheory it is important to locate the role of theory inevaluation within the broader set of debates withinthe philosophy of science. The dominant school,much criticised but of continuing influence in theway we understand the world, is logical positivism.Despite being largely discredited in academiccircles for some 50 years, this school still holdssway in policy debates. It constitutes the basemodel around which variants are positioned. Witha history that stretches back to Compte, Hume,Locke, Hobbes and Mill, positivism emerged partlyas a reaction to metaphysical explanations: thatthere was an essence of a phenomenon thatcould be distinguished from its appearance. At theheart of positivism therefore, is a belief that it ispossible to obtain objective knowledge throughobservation and that such knowledge is verified bystatements about the circumstances in which suchknowledge is true.

In the field of evaluation, House (1983) hasdiscussed this tradition under the label of objec-tivism: Evaluation information is considered to bescientifically objective. This objectivity isachieved by using objective instruments liketests or questionnaires. Presumably, resultsproduced with these instruments are reproducible.The data are analysed by quantitative techniqueswhich are also objective in the sense that theycan be verified by logical inspection regardless ofwho uses the techniques. (House, 1983; p. 51).

House goes on to emphasise that part ofobjectivist tradition that he calls methodologicalindividualism in Mills work in particular. Thus,repeated observation of individual phenomena isthe way to identify uniformity within a category ofphenomena. This is one important strand in themainstream of explanations within the social andeconomic sciences. It is the basis for reduc-tionism: the belief that it is possible to understandthe whole by investigating its constituent parts.

By methodological individualism, I mean what-ever methodologically useful doctrine is assertedin the vague claim that social explanations should

be ultimately reducible to explanations in terms ofpeoples beliefs, dispositions, and situations. []It is a working doctrine of most economists, polit-ical scientists, and political historians in NorthAmerica and Britain. (Miller, 1991; p. 749).

In this world-view, explanations rest on theaggregation of individual elements and theirbehaviours and interactions. It is worth notingthat this has been described as a doctrine aswell as a methodological statement. It underpinsmany of the survey based and economic modelsthat are used in evaluation.

There is now widespread agreement that empir-ical work cannot rely only on observations. Thereare difficulties empirically observing the entirety ofany phenomena; all description is partial andincomplete, with important unobservableelements. Scientists must be understood asengaged in a metaphysical project whose veryrules are irretrievably determined by theoreticalconceptions regarding largely unobservablephenomena. (Boyd, 1991; p. 12). This is evenmore true for mechanisms which it is generallyrecognised can be imputed but not observed. AsBoyd goes on to say, it is an important fact, nowuniversally accepted, that many or all of the centralmethods of science are theory dependent.

This recognition of the theory dependence ofall scientific inquiry underpins the now familiarcritiques of logical positivism, even though thereis considerable difference between the alterna-tives that the critics of positivism advocate.

The two most familiar critiques of positivismare scientific realism and constructivism.

4.2. Scientific realism

Scientific realism, while acknowledging the limitsof what we can know about phenomena, assertsthat theory describes real features of a not fullyobservable world. Not all realists are the sameand the European tradition currently beinginspired mainly by the work of Pawson (Pawsonand Tilley, 1997; Pawson, 2002a and b) can bedistinguished in various ways from US realist

4. Philosophical foundations

thinking. For example, some prominent NorthAmerican realists commenting on Pawson andTilleys work have questioned the extent to whichrealists need completely to reject experimentaland quasi-experimental designs, and suggestthat more attention should be paid in the realistproject to values. This is especially important if, inaddition to explanation, realists are to influencedecisions (Julnes et al., 1998). Nonetheless, thischapter draws mainly on the work of Pawson andTilley to describe the realist position in evaluation.

In some ways realism continues the positivistproject: it too seeks explanation and believes inthe possibility of accumulating reliable knowledgeabout the real world, albeit through differentmethodological spectacles. According to Pawsonand Tilley, it seeks to open the black-box withinprogrammes or policies to uncover the mecha-nisms that account for what brings about change.It does so by situating such mechanisms incontexts and attributing to contexts the key towhat makes mechanisms work or not work. Thisis especially important in domains such as VETwhere the evaluation objects are varied anddrawn from different elements into differentconfigurations in differentiated contexts.

What we want to resist here is the notion thatprograms are targeted at subjects and that as aconsequence program efficacy is simply a matterof changing the individual subject. (Pawson andTilley, 1997; p. 64).

Rather than accept a logic that seesprogrammes and policies as simple chains of causeand effect, they are better seen as embedded inmultilayered (or stratified) social and organisationalprocesses. Evaluators need to focus on underlyingmechanisms: those decisions or actions that leadto change, which is embedded in a broader socialreality. However these mechanisms are not uniformor consistent even within a single programme.Different mechanisms come into play in differentcontexts, which is why some programmes or policyinstruments work in some, but not all, situations.

Like all those interested in causal inference,realists are also interested in making sense ofpatterns or regularities. These are not seen at thelevel of some programme level aggregation butrather at the underlying level where mechanismsoperate. As Pawson and Tilley (1997; p. 71) note:regularity = mechanism + context. Outcomesare the results of mechanisms unleashed by

particular programmes. It is the mechanisms thatbring about change and any programme willprobably rely on more than one mechanism, notall of which may be evident to programme archi-tects or policy-makers.

As Pawson and Tilley summarise the logic ofrealist explanation: The basic task of social inquiryis to explain interesting, puzzling, socially signifi-cant regularities (R). Explanation takes the form ofpositing some underlying mechanism (M) whichgenerates the regularity and thus consists ofpropositions about how the interplay betweenstructure and agency has constituted the regu-larity. Within realist investigation there is also inves-tigation of how the workings of such mechanismsare contingent and conditional, and thus only firedin particular local, historical or institutional contexts(C) (Pawson and Tilley, 1997; p. 71).

Applying this logic to VET, we may note, forexample, that subsidies to increase work-basedlearning and CVT in firms sometimes lead togreater uptake by the intended beneficiaries. Thisneed not lead to the assessment of the programmeas ineffective because, for example, positiveoutcomes can only be observed in 30 % of cases.We try rather to understand the mechanisms andcontexts which lead to success. Is the context onewhere firms showing positive outcomes are in aparticular sector or value chain or type of region?Or is it more to do with the skill composition of thefirms concerned? Are the mechanisms that work inthese contexts effective because a previous invest-ment has been made in work-based learning at thefirm level or is it because of the local or regionaltraining infrastructure? Which mechanisms are atplay and in what context:(a) the competitive instincts of managers (mech-

anism), who fear that their competitors willbenefit (context) unless they also increasetheir CVT efforts?

(b) the demands of trade unions concerned aboutthe professionalisation and labour-marketstrength of their members (mechanism),sparked off by their awareness of the avail-ability of subsidies (context)?

(c) the increased effectiveness of the marketingefforts of training providers (mechanism) madepossible by the subsidies they have received(context)?

According to the realists, it is by examining andcomparing the mechanisms and contexts in which


4.3. Constructivists

Constructivists deny the possibility of objectiveknowledge about the world. They follow more inthe tradition of Kant and other continental Euro-pean philosophers than the mainly Anglo Saxonschool that underpins positivism and realism. It isonly through the theorisations of the observerthat the world can be understood.

Socially constructed causal and metaphysicalphenomena are, according to the constructivist,

real. They are as real as anything scientists canstudy ever gets. The impression that there issome sort of socially unconstructed reality that issomehow deeper than the socially constructedvariety rests, the constructivist maintains, on afailure to appreciate the theory-dependence of allour methods. The only sort of reality any of ourmethods are good for studying is a theory-dependent reality. (Boyd, 1991; p. 13).

The way we know, whatever the instrumentsand methods we use, is constructed by human


they operate in relation to observed outcomes, thatit becomes possible to understand success anddescribe it. For Pawson and Tilley, all revolvesaround these CMO (context, mechanism, outcome)configurations.

Policy-makers are then in a position toconsider options such as:(a) focusing the programme more narrowly at

beneficiaries that are likely to changebecause of the mechanisms that work in thecontexts they inhabit;

(b) differentiating a programme and its instru-ments more clearly to ensure that differentmechanisms that work in different contextsare adequately covered;

(c) seeking to influence the contexts within whichthe programme aims to be effective.

The table below, taken from the concludingchapter of Pawson and Tilleys book, provides abrief summary of the realist position, in terms ofeight rules that are seen as encapsulating thekey ideas of realistic enquiry and method.

Table 3: Rules guiding realistic enquiry and method

Rule 1: Generative causation Evaluators need to attend to how and why social programmes have the potential to cause change

Rule 2: Ontological depthEvaluators need to penetrate beneath the surface of observable inputs and outputs of a programme

Rule 3: MechanismsEvaluators need to focus on how the causal mechanisms which generate social and behavioural problems are removed or countered through the alternative causal mechanisms introduced in a social programme

Rule 4: ContextsEvaluators need to understand the contexts within which problem mechanisms are activated and in which programme mechanisms can be successfully fired

Rule 5: OutcomesEvaluators need to understand what are the outcomes of an initiative and how they are produced

Rule 6: CMO configurationsIn order to develop transferable and cumulative lessons from research, evaluators need to orient their thinking to context-mechanism-outcome pattern configurations (CMO configurations)

Rule 7: Teacher-learner processesIn order to construct and test context-mechanism-outcome pattern explanations, evaluators need to engage in a teacher-learner relationship with program policy-makers, practitioners and participants

Rule 8: Open systemsEvaluators need to acknowledge that programmes are implemented in a changing and permeable social world, and that programme effectiveness may thus be subverted or enhanced through the unanticipated intrusion of new contexts and new causal powersAdapted from Pawson and Tilley (1997)

Source: Adapted from Pawson and Tilley (1997)

actors or stakeholders. According to Stufflebeamin his review of Foundation models for 21stcentury program evaluation: Constructivismrejects the existence of any ultimate reality andemploys a subjectivist epistemology. It seesknowledge gained as one or more humanconstructions, uncertifiable, and constantly prob-lematic and changing. It places the evaluators andprogram stakeholder at the centre of the inquiryprocess, employing all of them as the evaluationshuman instruments. The approach insists that

evaluators be totally ethical in respecting andadvocating for all the participants, especially thedisenfranchised. (Stufflebeam, 2000a; pp. 71-72).

The most articulate advocates of construc-tivism in evaluation are Guba and Lincoln. Theyhave mapped out the main differences betweenconstructivists and the conventional position (asthey label positivists) in their well-known textFourth generation evaluation (Guba and Lincoln,1989). The highlights of this comparison issummarised in the table below:


Table 4: Comparing constructivist and conventional evaluation

Conventional Constructivist

The truth of any proposition (its credibility) can be determined by submitting it semiotically to thejudgement of a group of informed

The truth of any proposition (its factual and sophisticated holders of what Nature of truth quality) can be determined by testing it may be different constructions.

empirically in the natural world. Any Any proposition that has achieved proposition that has withstood such consensus through such a test is a test is true; such truth is absolute regarded as true until reconstructed

in the light of more information or increased sophistication; any truth is relative.

A proposition that has not been tested A proposition is neither tested empirically cannot be known to be true. nor untested. It can only be known

Limits of truth Likewise, a proposition incapable of to be true (credible) in relation empirical test can never be confirmed to and in terms of informed to be true. and sophisticated constructions.

Constructions exist only in the minds of constructors and typically cannot

Whatever exists in some measurable be divided into measurable entities. Measurability amount. If it cannot be measured If something can be measured,

it does not exist. the measurement may fit into some constructions but it is likely, at best, to play a supportive role.

Facts are aspects of the natural world Facts are always theory-laden, that is, Independence that do not depend on theories that they have no independent meaning of facts and theories happen to guide any given inquiry. except within some theoretical framework.

Observational and theoretical There can be no separate observational languages are independent. and theoretical languages.

Facts and values are independent.Facts can be uncovered and arrayed independently of the values Facts and values are interdependent.

Independence that may later be brought to bear to Facts have no meaning except within of facts and values interpret or give meaning to them. some value framework; they are

There are separate factual and value-laden. There can be no separate valuational languages, the former observational and valuational languages.describing isness and the latter oughtness.

Source: adapted from Guba and Lincoln, 1989

According to Guba and Lincoln, when consid-ering the purpose of evaluations, one needs todistinguish both between merit and worth andbetween summative and formative intent:(a) a formative merit evaluation is one concerned

with assessing the intrinsic value of someevaluand with the intent of improving it; so,for example, a proposed new curriculumcould be assessed for modernity, integrity,continuity, sequence, and so on, for the sakeof discovering ways in which those character-istics might be improved;

(b) a formative worth evaluation is oneconcerned with assessing the extrinsic valueof some evaluand with the intent of improvingit; so, for example, a proposed newcurriculum could be assessed for the extentto which desired outcomes are produced insome actual context of application, for thesake of discovering ways in which its perfor-mance might be improved;

(c) a summative merit evaluation is oneconcerned with assessing the intrinsic valueof some evaluand with the intent of deter-mining whether it meets some minimal (ornormative or optimal) standard for modernity,integrity, and so on. A positive evaluationresults in the evaluand being warranted asmeeting its internal design specifications;

(d) a summative worth evaluation is oneconcerned with assessing the extrinsic valueof some evaluand for use in some actualcontext of application. A positive evaluationresults in the evaluand being warranted foruse in that context. (Guba and Lincoln, 1989;pp. 189-190).

In practical terms, what it is that the evaluatorshould do, Guba and Lincoln start from theclaims, concerns and issues that are identifiedby stakeholders, people who are put at somerisk by the evaluation. It is therefore necessaryfor evaluators to be responsive. One of themajor tasks for the evaluator is to conduct theevaluation in such a way that each group mustconfront and deal with the constructions of allothers, a process we shall refer to as hermeneuticdialectic. [] Ideally responsive evaluation seeksto reach consensus on all claims, concerns andissues [] (Guba and Lincoln, 1989; p. 41).

A distinctive role of the evaluator, therefore, isto help put together hermeneutic circles. This is

defined by Guba and Lincoln as a process thatbrings together divergent views and seeks tointerpret and synthesise them mainly to allowtheir mutual exploration by all parties (Guba andLincoln, 1989; p. 149). As Schwandt has arguedfrom a postmodernist standpoint, only throughsituated use in discursive practices or languagegames do human actions acquire meaning(Schwandt, 1997; p. 69). Applied to evaluation,this position argues for the importance of thedialogic encounter in which evaluators arebecoming partners in an ethically informed,reasoned conversation about essentiallycontested concepts [] (Schwandt, 1997; p. 79).

In more down to earth terms, Guba andLincoln emphasise the role of the evaluator to:(a) prioritise those unresolved claims, concerns

and issues of stakeholders that have survivedearlier rounds of dialogue orchestrated by theevaluator;

(b) collect information through a variety of means collating the results of other evaluations,reanalysing the information previously gener-ated in dialogue among stakeholders,conducting further studies that may lead tothe reconstruction of understandings amongstakeholders;

(c) prepare and carry out negotiations that, as faras possible and within the resources avail-able, resolve that which can be resolved and(possibly) identify new issues that the stake-holders wish to take further in another evalu-ation round.

So how might this be exemplified in the VETdomain? It should be noted that what followsdoes not fully conform to Guba and Lincolnsvision of constructivist evaluation, largelybecause it is situated in a larger scale socioeco-nomic policy context than many of their ownsmaller scale case examples. But also it shouldbe noted that constructivist thinking is, to someextent, relevant to many contemporary evaluationchallenges and the example below is intended toillustrate such potential relevance.

So, to apply this logic to VET, constructivistthinking can be especially helpful where there is aproblem area with many stakeholders and theentire system will only be able to progress if thereis a broad consensus. For example, there may bea political desire to become more inclusive andinvolve previously marginalised groups in training


opportunities. The problem is how to ensure thatcertain groups such as women, young peopleand ethnic communities are given a higher profilein VET. Here, the involvement of many stake-holders will be inevitable. Furthermore the viewsof these stakeholders are more than data for theevaluator: they are the determinants and shapersof possible action and change. Unless thetrainers, employers, advocacy groups, fundingauthorities and employment services responsiblefor job-matching and the groups being targetedcooperate, change will not occur. It is also likelythat these stakeholders hold vital information andinsights into the past experience of similar efforts;what went wrong and right and what could bedone to bring about improvements in the future.

The evaluator might then follow much of theconstructivist logic outlined above:(a) identify the different stakeholders who poten-

tially have a stake in these areas of concern;(b) conduct a series of initial discussions to

clarify what they know, what they want andwhat are their interests;

(c) feed back to all stakeholders their own andeach others interests, knowledge andconcerns in a way that emphasis the similari-ties and differences;

(d) clarify areas of agreement and disagreementand initiate discussions among the stake-holders and their representatives to clarifyareas of consensus and continuing dissent;

(e) agree what other sources of informationcould help move the stakeholders forward perhaps by synthesising other availablestudies, perhaps by initiating new studies;

(f) reach the best possible consensus about whatshould be done to improve VET provision andparticipation for the groups concerned.

It is worth highlighting that the balance ofactivities within constructivist evaluation is verydifferent from both positivist and realist variants.It emphasises the responsive, interactive, dialogicand orchestrating role of the evaluator becausethe sources of data that are privileged are seen toreside with stakeholders, as much as with newstudies and externally generated data.


There has been a strong bias within evaluation asit has evolved to focus on method, techniqueand, to a lesser extent, methodology. It is only inrecent years that there has been an upsurge ininterest in the role of theory in evaluation. Tosome extent, this reflects the wider debates fromwithin the philosophy of science that has beensketched out above. From the early 1990sonwards there has been a re-balancing of attention towards theory. Chens book, Theory-drivenevaluations (1990), has become a landmark in thisshift in focus towards theory. The now classictext Foundations of evaluation (Shadish et al.,1991) is organised around five main bodies oftheory: social programming, knowledge construc-tion, valuing, knowledge use and evaluation practice.

As it has been widely recognised, Weiss wasamong the first to direct our attention to theimportance of theory (Weiss, 1972) and hasactively carried forward this debate under theumbrella of the Aspen Institutes New approachesto evaluating Community initiatives (Connel et al.,1995; Fulbright-Anderson et al., 1998). While thestarting point of the discussion that follows isthese authors, it continues to need to be situatedin the broader philosophical debates outlinedearlier.

5.1. Programme theory

The dominant school of theory in evaluation isprogramme theory. This is concerned withopening up the programme black-box, goingbeyond input/output descriptions and seeking tounderstand how programmes do and do not work.

Chens conceptualisation distinguishesnormative and causative components ofprogramme theory which he defines as a specifi-cation of what must be done to achieve thedesired goals, what other important impacts mayalso be anticipated and how these goals andimpacts would be generated (Chen, 1990; p. 43).

Chens conceptualisation extends to what heidentifies as six domains.

The following three domain theories are part ofthe general normative theory: 1) treatment theoryspecifies what the nature of the program treat-ment should be; 2) implementation environmenttheory specifies the nature of the contextual envi-ronment within which the program should beimplemented; 3) outcomes theory specifies whatthe nature of the program outcomes should be.

The following three domain theories are relatedto the general causative theory: 1) impact theoryspecifies the causal effect between the treatmentand the outcome; 2) intervening mechanismtheory specifies how the underlying interveningprocesses operate; 3) generalization theory spec-ifies the generalizability of evaluation results tothe topics or circumstances of interest to stake-holders. (Chen, 1990; pp. 49 and 51)

Although avowedly seeking to escape from thelimitation of input/output thinking, Chensconceptualisation is still linear. His domainsfollow the treatment/implementation/outcomelogic and incorporate concepts such as inter-vening mechanisms: the causal processesunderlying a program so that the reasons aprogramme does or does not work can be under-stood (Chen, 1990; p. 191).

The underlying logic of causality favoured byChen is essentially consistent with classic experi-mental thinking. For example, with regard to aprogramme that used comic books to influenceadolescent smoking: The underlying causalmechanism of this program is the assumption thatthe comic book will attract adolescents interestand attention and that they will read it closely andfrequently, and thereby pick up the important anti-smoking message contained in it. The messagewill in turn change their attitudes, beliefs, andbehaviour regarding smoking. The causal structureof this program is that the program treatment vari-able (exposure to the comic book) attempts toaffect the intervening variable (the intensity ofreading), which in turn will affect the outcome vari-ables (attitudes, beliefs, and behaviour towardsmoking) (Chen, 1990, p. 193).

In comparison to realist evaluation approaches(see above), there continues to be an emphasis

5. Evaluation theory

on the programmes interventions rather than onthe mechanism operating within the context. Forexample, if we try to answer the question Whyare some adolescents more likely to be influ-enced by such exposure?, answers do not fallout easily from traditional programme theorylogic. It is these underlying mechanisms that arenot explained in this framework. Nor are thecontexts within which these interventions occurexplored in detail.

5.2. Theory based evaluation andtheories of change

The other main strand of theory in evaluation islabelled theory based evaluation and latterlytheory of change and is associated with theAspen round table on comprehensive communityinitiatives. In Volume 1 of the Aspen collection,Weiss identifies four main rationales for theorybased evaluation: 1) It concentrates evaluationattention and resources on key aspects of theprogram; 2) it facilitates aggregation of evaluationresults into a broader base of theoretical andprogram knowledge; 3) it asks program practi-tioners to make their assumptions explicit and toreach consensus with their colleagues aboutwhat they are trying to do and why; 4) evaluationsthat address the theoretical assumptionsembedded in programs may have more influenceon both policy and popular opinion. (Weiss,1995; p. 69).

This is also, in essence, a programme theoryapproach. In a subsequent volume, Connel andKubisch define this evaluation approach as asystematic and cumulative study of the linksbetween activities, outcomes, and contexts of theinitiative. (Fulbright-Anderson et al., 1998; p. 16).They emphasise collaborative working withstakeholders to bring to the surface underlyingmechanisms.

Weiss herself takes these ideas of joint workingfurther: Three big advantages for pursuing atheory of change evaluation are as follows:(a) a theory of change evaluation allows evalua-

tors to give early word of events withouthaving to wait until the end of the wholeprogram sequence;

(b) the evaluators can identify which assump-tions are working out and which are not. They

can pinpoint where in the theory the assump-tions break down. This should enable theprogram to take corrective action before toomuch time goes by;

(c) The results of a theory of change evaluationcan be more readily generalized acrossprograms. Seeing the successes and the failures between closely linked assumptions,such as between greater parental attention tochildren and better child behaviour, is easier thanbetween say, parenting education programs and better child behaviour. (Weiss, 2000).

By focusing on the assumptions of programmepractitioners and aspiring to encourageconsensus among them, this vision of evaluationtheory shares many features with the participativeand even constructivist schools of evaluation. Asdescribed by some of its main proponents, thetheory of change evaluation approach as it hascome to be called is also a collaborative, dialogicprocess. It takes the themes of practitioners,makes them coherent, explicit and testable andthen seeks to measure and describe programmeoutcomes in these terms.

Overall, over the last seven or eight years inparticular, there has been a gradual blurring ofwhat is meant by programme theory. The termnow seems to encompass several approachesthat unpick the logic of programmes, makeexplicit their assumptions, work with stake-holders, monitor progress and explain theoutcomes that are observed (Rogers, 2000 as anexample of this tendency). As such, it has cometo include what is now classic programme theorywith theory based evaluations and realistic evalu-ations such as those advocated by Pawson andTilley (Rogers, 2000; p. 219).

The programme theory approach has alsobeen taken on board by those who advocatelogic models or logical frameworks that linkoutcomes with programme activities andprocesses. Thus, a recent W.K. Kellogg Founda-tion guide (2000) also makes the link with theo-retical assumptions/principles of the programmeand devotes an entire chapter to developing atheory of change logic model for yourprogramme. This is an important developmentgiven the power of logic models in the world ofevaluation. These were initiated by the WorldBank and taken on also by the EU. One of themain criticisms of these models is their lack of


explanatory power and a-theoretical nature.Bringing theory-based approaches into the logicmodel framework begins to address some ofthese criticisms.

5.3. A wider theoretical frame

However, we would not wish to confine descrip-tions of theory solely to programme theory andassociated elaborations. The Shadish et al. (1991)framework includes a theory of social program-ming. However, their focus is more on a theory ofwhat social programmes do and how effectivethey are. This is consistent with their overallapproach which is to elaborate the theoreticalbasis for evaluation practice. Other theoreticalfocuses for Shadish et al. concern: (a) the theory of use, i.e. what is known about

how to encourage use;(b) the theory of valuing, i.e. about judging

outcomes and the role of values and stake-holder interests in such judgements;

(c) the theory of knowledge in evaluation, i.e. thefamiliar questions of what constitutes knowl-edge, explanation and valid method;

(d) the theory of practice in evaluation, i.e. themain decisions about resource allocation, thechoice of methods, questions to ask andevaluation purposes.

Beyond the various focuses identified that isprogramme theory and its various elaborations and theories of evaluation itself, as articulated byShadish et al., there are a number of other bodiesof theory that are undoubtedly relevant and comeinto the discourse of evaluators. In particular,there are domain theories and implementationtheories and change.

In every policy domain or field where evalua-tion occurs, there are bodies of theory unrelatedto the practice of evaluation and to the logic ofprogrammes. Thus, in social welfare, there aretheories related to the welfare state, the nature ofsocial solidarity, the behavioural consequences ofdifferent benefit regimes and the interactionsbetween social welfare and labour-market perfor-mance. Similar bodies of theory exist in relationto VET as they do in other domains such asresearch and development, regional planning,education and criminal justice. In the Europeancontext, at least, there seems to be an expecta-

tion that evaluators will have some knowledge ofthe domain contexts within which they work,including relevant domain theories.

There are also substantial bodies of relevanttheory about policy change and implementationderiving mainly from political science and policystudies. For example this extends beyond Chensdescription of implementation environment evalu-ation (Chen, 1990). It is well exemplified by thework of people such as Sabatier (1988) andSabatier and Jenkins-Smith (1993). There is alsomore generic literature on implementation andchange, often encapsulated under the headingthe diffusion of innovation and following on fromthe work of Rogers (1995). This is particularlyrelevant to the issue of generalisability of innova-tions that are first broached on pilot basis.

In summary, five bodies of theory appear to berelevant to evaluators: (a) theories of evaluation. These would include

programme theory; theories of changeapproaches and realist approaches whichemphasise the identification of mechanismsunderlying successful change which have tobe understood in specific contexts andsettings;

(b) theories about evaluation. Thus there is agrowing literature on evaluation practice, use,design and capacity. Included in this categorywould be particular aspects of practice iden-tified by Shadish et al. such as theories ofvaluing;

(c) theories of knowledge, including the maindebates about the nature of knowledge, epis-temology, methodology, etc., and about thenature of causal inference;

(d) domain and thematic theories, which couldbe described as theory of the evaluationobject. This would include bodies of theoryabout domains such as human resourcedevelopment, skill acquisition, the develop-ment of human capital and equal opportuni-ties that could inform evaluation design,programme/policy implementation andoutcomes;

(e) theories of implementation and change oftenseen as relevant by evaluators. We wouldinclude here understandings of policychange, the diffusion of innovation andadministrative behaviour. Such bodies oftheory are likely to condition the success of


programme interventions and can be quiteseparate from the kind of programme theoriesreferred to above.

Finally, it is worth restating the main reasonsthat theory is seen as important in evaluation:(a) theory can help support interpretations. This

follows from the widespread recognition thatall evaluations are based on data that can beinterpreted in different ways. Theory providesan explicit framework for such an interpreta-tion;

(b) theory can help fill in the gaps in incompletedata. This follows from the recognition thathowever thorough evaluators may be, theywill never have the complete picture. Theorycan provide a plausible way of filling gaps inavailable evaluation data;

(c) theory can provide a framework to work withstakeholders. This follows from the increas-ingly common practice of dialogue andcollaboration between evaluators, presumedbeneficiaries and others who are affected byprogrammes and policy instruments;

(d) theory can help prediction and explanation.This is the classic scientific role of theory: tosuggest and explain causal links and likelyoutcomes;

(e) theory can make explicit the constructedobjects of evaluation. Many contemporary

objects of evaluation are constructed. Theyare abstracted ideas which do not have adirect empirical referent. (e.g. efficiency, alearning organisation, an enterprise culturewould all be examples of constructedobjects.) In order to describe and measurethese objects, theory is needed.

At a different level, we are also beginning tosee theoretical development around the issue ofcomplexity in socioeconomic programmes(Sanderson, 2000). Many interventions are notself-contained, they interact with otherprogrammes and with other social and organisa-tional processes. Thus in VET, a new trainingsystem is embedded in an institutional andeducational context which supports andconstrains this system. Similarly, a training initia-tive at the level of a

philosophies and types of evaluation reseach - elliot stern.pdf

Documents

vocational training

background report

limitations of evaluation

policy evaluation

impact researchthird

cedefops research reports

additional research

human capitalinvestments