evaluation research in education

31
Evaluation Research in Education Originally prepared by Professor Harold Silver. Component now led by Dr. Nick Pratt. © H Silver, Faculty of Education, University of Plymouth, 2004 (links reinstated August 2006) CONTENTS 1 Questions 2 What is evaluation? Definitions Types of evaluation o Process and impact o Formative and summative 3 Research? Different or the same? 4 Methods 5 Internal evaluation 6 The evaluator Examples 7 Advice 8 References and further reading 9 Tasks

Upload: valentina-riveri

Post on 02-Apr-2015

612 views

Category:

Documents


36 download

TRANSCRIPT

Evaluation Research in Education

Originally prepared by Professor Harold Silver.

Component now led by Dr. Nick Pratt.

© H Silver, Faculty of Education, University of Plymouth, 2004

(links reinstated August 2006)

CONTENTS

1 Questions

2 What is evaluation?

Definitions Types of evaluation

o Process and impact o Formative and summative

3 Research?

Different or the same?

4 Methods

5 Internal evaluation

6 The evaluator

Examples

7 Advice

8 References and further reading

9 Tasks

1 Questions

The questions to be addressed are interrelated and can be summarised as:

What is evaluation?

Is it research?

Why evaluate?

How and by whom?

Evaluation has become a widespread activity, internationally, under that name since the 1960s, in a variety of contexts. Although we are focusing on education here, it is important to remember that evaluation models have been developed elsewhere, notably in the social sciences. It has been used to test the effectiveness of, for example, national and international programmes in agriculture or crime prevention, health improvement or transport policy. A postgraduate course in applied social science at Manchester University was introduced as follows:

Increasingly, social service providers, programme administrators and legislators use evaluation research in order to consider the ‘effectiveness’ of new and existing programmes, procedures and/or interventions at producing some form of ‘outcome’ or ‘change’. The findings from evaluations focus on the strengths and weaknesses of various aspects of innovations as well of their overall ‘outcome’. This information is, in turn, used to consider how such interventions might be modified, enhanced or even eliminated in the effort to provide a better service, fulfil a particular need or meet a specific challenge.

In education evaluation has served a somewhat similar purpose, and has been applied to major programmes of ‘whole-school reform’ or specific curriculum changes, and more limited projects to try out innovations. There is a vast literature on different types and purposes of evaluation, and we shall sample some of it here as we address the priority questions. Some of the discussion overlaps with issues discussed in other RESINED components – such as action research, qualitative research and interviewing, and the links will be highlighted as they occur.

2 What is evaluation?

Definitions

At the lowest level evaluation is a regular social activity, such as that conducted by Which? magazine and other publications, and by ourselves. It makes comparisons amongst products or services, with a view to making a selection – a kitchen utensil or an investment, a car or a

Chardonnay. At this level evaluation is comparative on the basis of relatively straightforward criteria and available information, and is a preliminary to decision-making. The criteria, of course, are not the same for everyone evaluating – comfort may or may not override the cost or style of a car, and labelling may or may not influence choice of a wine. In education the purposes and the criteria are inevitably more complex and evaluation is a process of acquiring information. Evaluation of an innovation or an activity, a curriculum or organisational change, raises a series of sometimes difficult or contentious issues. Who is sponsoring the evaluation, what do they want to know, and why do they want to know it? What depends on the outcomes – more or less finance, promotion or redundancy? What is the salient issue for the evaluation – change in student learning, staff development, value for money, position in a league table… ? Whose opinion counts most – students’ feedback in the university, the teachers’ perceptions in the school, project managers, administrators?

Evaluation in education therefore encompasses competing criteria and purposes, and is situated in potentially sensitive political and ethical contexts.

 

If you will be undertaking a 'task' at the end of this 

component you may find it helpful to make some notes as you 

go along.  At this point you could make a preliminary list of 

problems you think might be encountered in evaluating 

a new initiative in your own institution.

It is important to note that ‘evaluation research’ (a concept discussed below) is basically what is commonly called programme or project evaluation. The features of such evaluation (in its various forms) may be the same or similar at all levels of education, concern innovations, initiatives and developments of many kinds, and it is mostly conducted by individual evaluators or small teams. There are, however, other forms of evaluation that are not included in the discussion here. These include, commonly in higher education, the evaluation of teaching quality or of research or the evaluation of institutions, as part of a system approach to quality assurance conducted by national agencies. Teaching quality and institutional evaluation may also be conducted internally as a form of ‘self-evaluation’ (eg Ellington and Ross, ‘Evaluating teaching quality throughout a university’ [Robert Gordon University], and Adelman and Alexander, The Self-Evaluation Institution).

Definitions of ‘evaluation’ can indicate the intentions involved, but are elusive as complete explanations. The kind of definition that was often used in the 1950s and 1960s, notably in the United States, was:

Evaluation is the systematic assessment of the worth or merit of some object.

The judgmental tenor of that definition in fact reflects the evaluation of cars of Chardonnays – assessing their worth or merit in order to choose, though it does not reflect the casual nature of personal judgments that are often unsystematic. Subsequent attempts to define evaluation have adapted this formulation. Trochim, in the United States, for example, suggests:

Evaluation is the systematic acquisition and assessment of information to provide useful feedback about some object.

He explains the older and the revised versions, which both agree that evaluation is ‘systematic’ and use ‘object’ to refer to a programme, policy, technology, person, need, activity and so on. The revised definition, however, ‘emphasizes acquiring and assessing information rather than assessing worth or merit because all evaluation work involves collecting and sifting through data, making judgements about the validity of the information and of inferences we derive from it, whether or not an assessment of worth or merit results’ (Trochim, website). Whether evaluation makes judgments or is a preliminary to other people making judgments, is a contentious issue in the field (and is discussed further below). The former definition, assessing worth or merit, inescapably involves acquiring and assessing information, but the revised version does focus on the information. It suggests that assessing worth depends on an analytical approach to information, that is, on an understanding of the ‘object’ about which feedback is required.

Another, this time British, attempt at revising the first definition was in connection with the evaluation of educational institutions. It defined such evaluation as involving:

the making of judgements about the worth and effectiveness of educational intentions, processes and outcomes; about the relationships between these; and about the resource, planning and implementation frameworks for such ventures. (Adelman and Alexander 1982, p. 5)

While retaining the notion of making judgments about worth, there are two important extensions in this version. First, the ‘object’ of study has acquired intentions, processes and outcomes; it is a complex sequence in which the parts have relationships, and it is therefore clear that evaluation is concerned in some way with that sequence. Second, this sequence is not isolated. It is in a framework which has to do with resources, planning and implementation. Evaluation therefore understands the sequence only by also taking account of the ‘framework’ in which the sequence takes place. The curriculum is in a classroom with its relationships, in a school, and in a complex and interactive context involving families and communities, authorities and the various levels of policy making - all of which affect what is taught and learned. Further education colleges and universities have their own departmental, disciplinary, institutional and other contexts that may have to be taken into account when a project or initiative is evaluated.

 

In considering evaluation in your institution are there possible 

major issues concerning relationships in the context of management, 

the whole institution, outside constituencies and agencies...?

If you were to conduct an external evaluation in an institution other

than your own, how different might the issues be from the ones 

you have considered above.

Types of evaluation

With these preliminary considerations in mind, it would be helpful to look carefully at the following and make some tentative choices regarding the role or roles that may seem most appropriate in your evaluation of the project, programme, innovation or other initiative (for simplicity sake we will encompass all of these from now on in the term ‘project’). The evaluator’s role is to be:

as objective as possible (interviewing, questioning, reporting on findings, not being too

close to the participants) and to report to the person or body for whom the evaluation is

conducted;

to collect data rigorously and scientifically;

to feed back impressions to participants (so that they can take note of your findings and

improve their activities);

to understand and describe the project and make judgments;

to be involved with the project from the outset, working with the project participants to

plan their programme and the evaluation together;

to define the nature and methodology of the evaluation professionally, to begin work

when the project is operational and monitor it at agreed intervals and at the end;

to monitor the ‘process’, that is, the implementation of the initial terms of reference or

objectives of the project;

to focus on the ‘life’ of the project in its relevant wider contexts;

to investigate the ‘outcomes’, successful or unsuccessful, of the project;

to judge whether the project has been (or is likely to be) value for money;

to conduct an external evaluation and nothing more;

to help participants to conduct an internal evaluation, in addition to the formal external

one, or as a substitute for it;

Or…..

It will be clear from the choices available that evaluation is far from being a simple or standard activity. The choices are neither right nor wrong, but may be more appropriate to particular programmes, conditions and requirements, and to the self-image of the evaluator. Evaluators and evaluation theorists have extensively explored the alternatives and these have been the focus of various kinds of controversy. To compare your own preferences or issues with some of those in the literature in terms of types of evaluation click here. We cannot here consider all of these alternative approaches, but it is important to emphasise two that are frequently met in the evaluation literature.

Process and impact

The purposes of evaluation can be encapsulated in these two terms, the former to highlight what is and has been happening, the latter to attempt to indicate what has happened as a result. Both encounter difficulties.

Process evaluation is targeted on implementation, how the programme’s intentions are being interpreted and the experience of conducting the activity, together with the continuing or changing perceptions of the various constituencies involved. The kinds of questions that such evaluation raises may include conflicts in these perceptions for reasons not necessarily connected with the activity itself, confusion about the original terms of reference or doubts about their wisdom. The larger the programme the more difficult is the question of sampling (how many people to interview and how to select, what activities to attend…) and when it is reasonable to monitor what is taking place. For an external evaluator there may be problems of time allocation and frequency of involvement, depending on the nature and extent of the programme (multi-site, national…), though even with a small, single-institution activity initial decisions about the extent of the external evaluator’s involvement may cause problems. Often called ‘implementation evaluation’, this often causes difficulties in collecting reliable information on how successfully the implementation is taking place.

Impact (or ‘outcomes’, or sometimes ‘product’) evaluation raises some of these issues, but also different ones. Would the ‘outcomes’ of the programme have happened without the intervention, and is there a credible causation between the activity and the impact? Answers to the question of what impact has taken place may be positive, negative or mixed, that is, an evaluation may be of non-success, evidence of non-impact or of the complexities that have arisen from other factors – for example, the result of other interventions, processes and contexts. Impact may cover time scales that vary considerably from programme to programme (eg a limited research/development programme in a school or university, or a World Bank project over a nation or region). Impact may be studied not only at the conclusion of an activity (or its funding) or after an interval of time, but also during the activity – especially if it is designed to provide regular feedback or if it is a longitudinal study. Evaluations of the American Head Start and similar programmes, for example, involved the evaluation of learning gains and other measures in a variety of ways at intervals over very long periods. It is common for evaluators of limited-time projects to feel (and suggest) that the real impact evaluation could only take place several years after the end of the programme. Depending on the project, impact evaluation may have policy or decision-making implications:

An impact evaluation assesses the changes in individuals’ well-being that can be attributed to a particular program or policy. It is aimed at providing feedback and helping improve the effectiveness of programs and policies. Impact evaluations are decision-making tools for policymakers and make it possible for programs to be accountable to the public. (World Bank, website)

Such a role for the evaluator raises questions, discussed below, of the kind of contract agreed at the beginning of the evaluation, and the possible influence of the audiences for the reporting procedure at the end. There are issues about the tentative or reliable nature of impact data, which may differ considerably by type of project. Since a funding agency may require impact data and an evaluator may find such data unattainable, there is room for misunderstanding and conflict.

Formative and summative

These may be, but are not necessarily, related to the above.

Hopkins (as we saw above in terms of types of evaluation) made the simple suggestion that formative evaluation was when the cook tasted the soup, and summative when the guest tasted it. He also suggested that the difference was ‘not so much when as why. What is the information for, for further preparation and correction or for savouring and consumption? Both lead to decision making, but toward different decisions’ (Hopkins 1989, p. 16). This latter distinction establishes the difference between these concepts and those relating to process and impact. Formative evaluation is designed to help the project, to confirm its directions, to influence or help to change them. It is more than monitoring or scrutinising, it serves a positive feedback function (which process evaluation does not necessarily do). Summative evaluation is not just something that happens at the end of the project, it summarises the whole process, describes its destination, and though it may have insights into impact, it is not concerned solely with impact.

Summative evaluation has often been associated with the identification of the preset objectives and judgments as to their achievement (again, not necessarily in terms of impact). The assumption in this case is that, unlike in formative modes, evaluation is not (should not be) involved in changing the project in midstream – otherwise the relationship between objectives and their achievement cannot be evaluated:

…every new curriculum, research project, or evaluation program starts with the specifications to be met in terms of content and objectives and then develops instruments, sampling procedures, a research design, and data analysis in terms of these specifications. (Bloom 1978, p. 69)

Starting specifications that are expected or required to be met therefore dictate the nature of the summative evaluation. The instruments or sampling procedures cannot produce ‘pure’ data if the process is corrupted by the intervention of evaluator feedback or other alterations to the original specifications. It is possible to conceive of evaluation as both formative and summative, but in this case ‘summative’ comes closer to meaning ‘final’, and cannot present data and make judgments as purely as is suggested in Bloom’s definition.

Other approaches to evaluation emerged in the last quarter of the 20th century, and some will be mentioned further below in relation to methodology. These have included ‘illuminative’, ‘democratic’ (as opposed to ‘bureaucratic’ evaluation), ‘participative’ and ‘responsive’ evaluation. These all have implications for the role of the evaluator in relation to the project, for example, sharing with the project participants, responding to the activity not to specifications and intentions, identifying and reporting differences of perspective and values, emphasising the importance of understanding or recording competing perceptions. Much of this work relates to discussion in other RESINED components, notably action research and case studies.

You could at this point consult the paper by Parlett and Hamilton on ‘Evaluation

as illumination’ in Hamilton et al., Beyond the Numbers Game 

(quoted in types of evaluation), 

and other contributions to this influential book.

See also the chapter on ‘Program evaluation: particularly responsive

evaluation’ by Robert Stake, in Dockrell and Hamilton, Rethinking Educational

Research, and Helen Simons, Getting to Know Schools in a Democracy:

the politics and process of evaluation.

3 Research?

We have so far by-passed discussion of the terms ‘evaluation’ and ‘evaluation research’ and some difficulties inherent in this vocabulary, but also in conjunction with other terms sometimes used in relation to evaluation – including ‘applied research’ and ‘academic research’. Jamieson suggests that there are basic differences between the last of these and evaluation research, in the degree of constraint on their purpose and operation, the funding and its implications, publishing and reporting:

Evaluation reports and research reports not only have different audiences but their main objectives are different. The goal of the research report is the enhancement of understanding and knowledge via publication to the scientific community. The main goal of the evaluation report is to inform and/or influence decision makers… the relative emphasis of the two activities must be different. (Jamieson 1984, pp. 72-3)

This seeks to establish one kind of distinction, but Jamieson also indiscriminately uses ‘evaluation’ and ‘evaluation research’ in the argument. So is evaluation a form of research? The question ultimately raises issues about the nature and definition of research as well as of evaluation, and to approach these issues let us take some examples of discussion of the relationship.

Different or the same?

For some commentators the distinction is between ‘evaluation’ and ‘research’, ignoring any such concept as ‘evaluation research’. The distinction drawn is generally between the methodology of research based in the social sciences, and often directed towards answering questions relating to policy, even to improving it. Evaluation – particularly the more recent approaches to evaluation – is seen as serving a very different purpose. Parsons argues that if evaluation is seen as serving the interests of decision makers then it has no right to claim the title of ‘evaluation’ – it is then a form of research and should obey the rigorous rules of research, and it is then the decision makers who are the real ‘evaluators’. He particularly excludes formative evaluation from the definition of research:

Formative evaluators work alongside development or action research teams with the tasking of feeding such teams with information that might help them modify their work, counter weaknesses, anticipate problems and so on. The formative evaluator is an internal critic and provides an information feedback service…[Formative evaluation] serves a narrow audience, the developers, and to be effective needs to be closely allied to or an integral part of the team. The commitment thereby generated would make the formative evaluator suspect as the provider of objective summative information of significance to a wider audience. (Parsons 1981, pp. 40-2)

This is a critique of claims for evaluation as research. Others, however, see the distinction as a necessary and positive one. A crucial point in this argument is identified By MacDonald and Walker:

The methodological difficulties faced by curriculum evaluators who want to offer a comprehensive range of information about new programmes have drawn them to the case-study as a technique. Many of the quite legitimate questions that are put to evaluators,

especially by teachers, cannot be answered by the experimental methods and numerical analyses that constitute the instrumental repertoire of conventional educational research . (MacDonald and Walker 1977, p. 181)

This argument refers to experimental methods and numerical analyses, but as ‘conventional research’, itself under attack from case study and other (including action research) approaches to research. There is, of course, no one way to conduct case studies research or action research, but broadly speaking distinguishing evaluation from research involves also drawing a distinction between them both and ‘conventional’ forms of research.

Evaluation organisations themselves sometimes distance themselves from such social policy-based versions of research. In American examples chosen earlier it may be difficult to judge in what ways they are research. The Action Evaluation Research Institute defines its central evaluation activity as

…a new method of evaluation, one that focuses on defining, monitoring, and assessing success. Rather than waiting until a project concludes, Action Evaluation supports project leaders, funders, and participants as they collaboratively define and redefine success until it is achieved. Because it is integrated into each step of a program and becomes part of an organization, Action Evaluation can significantly enhance program design, effectiveness and outcome. (AERI [2000?], website)

Explicitly, the approach is differentiated from ‘traditional evaluation’, and implicitly its purposes and methodologies differentiate it from ‘traditional research’. The strategy may be based on extensive research, but the strategy itself is difficult to define as research.

Click back on types of evaluation and judge whether you think the examples

can or cannot be described as research.

It can also be suggested that evaluation and research are the same or out of the same stable of activities, not least by using the concept and title of ‘evaluation research’. An early American attempt to consider the relationships between ‘research and evaluation studies’ thought it evident that many of the activities undertaken in evaluation and in research in education were the same. In research itself it points out that a distinction is often drawn between ‘applied research’ and ‘basic research’ on the basis of utility or simply new knowledge. Since ‘evaluation studies are made to provide a basis for making decisions about alternatives’, questions of utility are also addressed. This account sets out the range of possible differences between ‘ideal’ research and evaluation studies, and though the differences exist it concludes that they share many characteristics of method and approach, they both add to new knowledge, stimulate and benefit from the development of theory and contribute to ‘a science of education’. The essential differences are not those of the evaluator and the researcher, but those of different kinds of subsequent decision-makers:

The consequence of the differences between the proper function of evaluation studies and research studies is not to be found in differences in the subject interest or in the methods of

inquiry of the researcher and of the evaluator. It is to be found in the manner in which the outcomes of the two types of studies are used and regarded. (Hemphill 1969, pp. 189-92)

‘Studies’ is here simply a substitute for research. The defence of evaluation as either a form of research or as part of the same family has continued to emphasise that the confusion has related to stereotypes of both activities. Both have encountered debates about methodology, including a case study approach; both have erected and torn down barriers round their respective professional communities; both have faced problems about their relationship to patrons, funders and audiences.

 

These debates about evaluation and/or research can be pursued

in chapters by Parsons (‘A policy for educational evaluation’) and 

Simons (‘Process evaluation in schools’) in Lacey and Lawton

Issues in Evaluation and Accountability; in Stenhouse’s chapter 

on ‘The evaluation of curriculum’ in An Introduction to Curriculum

Research and Development, or in other items in the

bibliography below.

Is it simply a case of ‘it all depends on what you mean by….?’

4 Methods

Whatever the distinctions between ‘academic research’ and ‘evaluation research’, the research methods used are broadly similar – though any given activity in either may use only a segment of the methods available, and these will be overwhelmingly in the methods of qualitative research. Across the range of evaluation approaches the methods will include interviews and questionnaires, focus discussion groups and observation, case studies and diaries or logs. Some of these are discussed in RESINED components on interviews, observation techniques and questionnaires in education research. Some methods are used for particular evaluation strategies and purposes.

In objectives-based and some other kinds of evaluation, for example, pre-test and post-test strategies are likely to be used, in order to provide a baseline on which to make judgments about how much has changed as a result of the project. An American approach to evaluating ‘whole school reform’ explains the strategy as follows:

This model makes the assumption that without the intervention, things will go on as they did before. Other things being equal, teachers will continue to teach as they did before, and students will continue to show the same pattern of achievement as they did before. With the intervention, things will change over time, it is hoped in a positive way…The model can include repeated measures… The pattern of change at different points in time can then be interpreted as a result of the intervention. (North West Region Education Laboratory, website)

A health-related example goes into greater detail:

In order to determine how well the program is working to change those factors that cause social problems, an evaluation needs to address these specifically. Often, this means focusing upon how much behaviors of behavioural determinants (knowledge, attitudes, beliefs, skills, or values) have changed from prior to the program intervention until sometime after it. The questions answered in this type of evaluation refer to the program’s goals and how well they are being reached. For example:

o How much did participants’ knowledge of tobacco as an

addictive drug change due to the program?

o Have youth feelings of community empowerment increased

between the start and finish of the program?

o Are High School students less likely to engage in alcohol use

because of the program?

The most common way to answer these questions is through the use of pre and post surveys of participants. This is not the only way to gather data on changes in behavior or behavioral determinants… However, the pre and post survey method can provide a good way to compare participants before and after the program intervention. (Nebraska Council to Prevent Alcohol and Drug Abuse, website)

The model makes assumptions about the possibility of outcomes occurring directly or uniquely as a result of the intervention, and being susceptible to accurate measurement. The components of such an approach include objectives, data collected by test surveys, rigorous adherence to an impact model – and though it is possible to collect change data during as well as at the end of the project, relaying these data back to the project formatively would distort ability to measure pre- and post- situations. As with Bloom’s description of summative evaluation, the problem is that of ensuring the achievement of undistorted, uncontaminated data.

An evaluation method used primarily in higher education is that of student feedback, on a new course or form of delivery, or regularly on the student experience. Of the formal methods of obtaining feedback questionnaires are the most common. Students may be asked to give their views on the curriculum and the teaching, the course management and assessment, and the analysis of the questionnaire may be used as part of a broader process, as a basis for interviews

or group discussion. An alternative is some form of structured (or ‘pyramid’) group feedback, in which the group is split into small and then larger groups, with agreed points being reached at each stage, for presentation at the end in plenary discussion. The aim is to obtain feedback without any person or group dominating the response. Nominal Group Technique has the same purpose, but normally involves no discussion (except for ‘item clarification’), being based on participants’ own written recording of their views, including ‘nominating’ points for inclusion in the report, and then a presentation by the session leader of the ideas expressed, with no attempt to evaluate the suggestions. The NGT procedure aims at maximum objectivity of feedback, and shares with other forms of structured feedback techniques an attempt to make feedback ‘representative’. The purpose of regular feedback of these kinds is to inform the teaching staff of the state of a course or the success or otherwise of an intervention – and is therefore a different kind of formative evaluation. The procedures can therefore also support the evaluation of an initiative, and stand alongside other forms of evaluation of teaching or curriculum change (for fuller details see O’Neil and Pennington, Evaluating Teaching and Courses from an Active Learning Perspective, pp. 21-34).

Although structures of the kinds outlined above are not typical of illuminative or similar approaches it is important that some kind of structure is involved. Parlett and Hamilton, for example, introduce illuminative evaluation as being in three characteristic phases: ‘investigators observe; inquire further; and then seek to explain’. They give an example of how these three stages took place and overlapped, and with this three-stage framework ‘an information profile is assembled using data collected from four areas: observation, interviews, questionnaires and tests, documentary and background sources’ (Parlett and Hamilton 1976, pp. 14-15). For course development, training or other initiatives the range of methods available is wide. For evaluation generally, focus groups and questionnaires, interviews and implementation logs, feedback and testing methods are part of a menu of approaches. One American list of ‘evaluation tools’ for projects involving interactive media contains 39 items, without including anything relating to an ethnographic approach.

What all of these methods do is attempt to penetrate the complexities of social situations within which evaluation of an initiative takes place, whether or not the evaluation is descriptive or judgmental. Policy, project, documentation, process and outcome are not ‘givens’ and the evaluator relates to them in a variety of ways, using a variety of approaches. There is no ‘appropriate’ evaluation method, only the selection of an approach or approaches for a particular situation, depending on the predominant assumptions of the evaluator, a project team and the evaluation sponsors in some kind of understanding or negotiation. Determining the strengths and weaknesses of a particular method is therefore itself an elusive process, and a great deal of emphasis has to be laid on all the preliminary encounters and insights obtained either at the beginning of the evaluation or, preferably, before the project and the evaluation are launched. The evaluator’s statement of intent, terms of reference or contract therefore needs to be clear about a number of agreed principles, though not necessarily, of course, in any of the following vocabularies:

1. Prior clarification of the assumptions underpinning the project and the assumptions of

the evaluator. This could lead to identifying the ‘style’ of evaluation that would be

appropriate – whether based on objectives and measurement or a kind of joint

exploration between project and evaluation, whether ‘autocratic’ and ‘bureaucratic’ or

‘democratic’. An important element in this clarification is identifying the source of

power – for whom is the evaluation primarily intended and who can influence the

operation of the evaluation?

2. What is most wanted from an evaluation? Final calculation of whether the project has

been value-for-money? Ongoing feedback on implementation (is it working?), and

feedback to whom? What are the proposed outcomes? Are there already (eg in a

funding contract) defined objectives and expectations, possibly against existing baseline

data? Answers to such questions may determine whether to define the evaluation as

description and enlightenment, analysis and judgment.

3. The intended end product of all interventions is change. What kind of change is

anticipated, and how can the ‘effectiveness’ of the intervention in producing it be either

measured or portrayed? Discussions of evaluation often focus on what does not produce

change. For example, there are views that ‘self-evaluation’ by an institution is unlikely to

result in change; that testing instruments do not reveal what actually happens and what

produces the change; observation may reveal little of changes in teaching. The strengths

and weaknesses of a particular method may therefore be a function of its ability to

respond to underlying questions of any project: ‘Who wants to know what, and why?’ -

and whether there is a method or methods that will offer something worthwhile.

5 Internal evaluation

The focus of the discussion here has been on external evaluation. The assumption has been that evaluation with a ‘research’ connotation is conducted by someone (or a team or group) external to the project evaluated or to the institution. In the absence of, or in collaboration with, an

external evaluator some of the approaches and methodologies discussed above may also be applicable to the internal (or 'self-') evaluation conducted by the project leader or team. One task of an external evaluator may be to advise on internal evaluation methodology (interviews, questionnaires…) on an initial or ongoing basis. It is common for small projects or initiatives, whether funded from within or outside the institution, to require an evaluation without making provision for an external evaluation.

Some internal evaluation may consist solely of the collection of limited data (perhaps similar to that undertaken to obtain ‘student feedback’ in higher education, discussed above). Where a single person is responsible for the project and has an ‘audience’ or ‘partnership’ (for example, of students, colleagues, other professionals, patients…) there may be a tendency to rely on informal feedback or opinions from those with whom the project has occasional or regular contact. This in no way meets the requirements of any of the versions of evaluation research that we have considered. To answer the questions ‘What do you know?’ and ‘How do you know it?’ something more systematic has to take place.

As with all evaluation and research there is a strong temptation to include, and in internal evaluation to rely on, questionnaires as the source of data. Planning, conducting and analysing a questionnaire are subject to difficulties and pitfalls (see RESINED component on questionnaires). Interviews, particularly within the limited community that may be covered by a small project, may be difficult to conduct. Structured discussion in small groups in what are described above as focus groups or using a ‘pyramid’ approach may be useful tools in these kinds of situations, combining structure and focus for small group discussions. The use of logs or diaries by participants in the project may be a valuable alternative or supplement to any of these approaches.

As a small, internal initiative (though possibly one of a number of such initiatives) the institution is unlikely to enter into a contract or agreement on evaluation in the same way as for an external evaluator. What is needed, however, is agreement at the appropriate level for at least initial contact by the project leader(s) or team(s)with a consultant who can discuss and advise on the options available for internal project evaluation. The onus rests with the committee or senior staff to ensure not just that evaluation is needed but also that such advice is available to those conducting the project.

6 The evaluator

The conduct of an evaluation may be by a full-time professional evaluator, part-time by a member of the same or another institution, by a team of two or three people or a much larger team. The evaluation may be for a short period or for a number of years. The appointment of the team may be by the institution with a great deal of input or very little input by those conducting the project. The evaluator may be required to report to a steering group or committee responsible for the project, to the institution, or to the funding body – or to some or all of these. The ground rules for the evaluation may be decided by the evaluator, or may pre-exist the appointment.

On the last of these points, for example, the government’s Department of Employment (as it was in 1994) issued a document entitled Evaluating Development in Higher Education: a guide for steering committees, contractors and project staff. The Employment Department, like all government departments and others, did not just fund projects, it ‘contracts with an institution or organisation for a piece of project work’, and the contract stipulated a series of requirements. On the question of evaluation the Department indicated:

All this work requires evaluation. All the partners (individual staff, departments, project steering groups, institutions and the Department among others) need to know whether it has been successful and is worth imitating, what lessons there are for the future, and what further development or research may be needed. Without this, resources, including scarce staff time and energy, will be wasted in repeating mistakes and rediscovering what is already known.

For guidance the document set out 11 key questions, on such matters as the customers for the work and the evaluation, the balance between formative and summative evaluation, the data and the outcomes, baseline information. It suggested that steering groups might add their own questions, concerning how the planned work was carried out, whether each objective was achieved, future development and value for money. Although this was a ‘guide’, there were issues that ‘should always be addressed’:

assessing contract compliance and value for money

contributing formatively to development

informing future agenda building and gathering intelligence

informing the review of development and evaluation methodology (Department of

Employment, click here for greater detail)

In this kind of case with such requirements built in at the contractual stage before the appointment of an evaluator, the latter will enter a pre-determined situation, since the institution or steering group will have committed themselves to a project and an evaluation within this framework. The evaluator will have room for manoeuvre at the margins, mainly in the selection of a methodology that will provide answers to the questions that have already been formulated.

In other situations, of course, the steering group or project team will have only the broadest (if any) guidance, and the evaluator – possibly in order to secure the appointment – will be asked or volunteer to supply an evaluation brief setting out in appropriate detail what the evaluator intends to do (style of evaluation, time commitment, ownership of the data and the evaluator’s report(s), means of negotiation of any changes in evaluation procedure…). Some situations may require only an informal relationship – generally for modest interventions without external funding. In all cases, even in the most informal, a contract between the project management or the institution and the evaluator is essential – even if only to specify time, payment and any requirements – for example the date by which a final evaluation report has to be provided.

These preliminaries are necessary but only partially ‘protect’ the evaluator. As one commentator put it:

… people who accept positions as evaluators place themselves in a vulnerable position: to put it neatly the evaluator sets himself up for evaluation…In embarking on an evaluation the evaluator makes a commitment to deliver some goods… failure to deliver the goods, or to deliver superior goods, will be an embarrassment at least, if not a serious threat to his academic status or career prospects. (Gomm 1981, p. 127)

This ‘threat’ can be particularly acute if there are multiple audiences for the report, and it is not impossible for evaluators to be tempted to minimise it by muting critical content in the report. Stake, in the United States, describes the position to make it more than just a hypothetical one:

It is recognized, particularly by Mike Scriven and Ernie House, that cooption is a problem, that the rewards to an evaluator for producing a favorable evaluation report often greatly outweigh the rewards for producing an unfavourable report. I do not know of any evaluators who falsify their reports, but I do know many consciously or unconsciously choose to emphasize the objectives of the program staff and to concentrate on the issues and variables most likely to show where the program is successful. I often do this myself… (Stake 1980, p. 74)

A form of reporting that entails judgments and possibly recommendations (a common but not a universal element of reports) therefore raises particular issues of this kind. ‘Cooption’ is a danger of case study, illuminative or similar forms of evaluation, since the evaluator by definition in these cases works closely with the team and may feel tempted or even obliged, as Stake suggests, to highlight their view of the process and outcomes, and given the trust that has been involved, to highlight the positive ones. The danger is not inevitable, and can be overcome by adherence to the initial principles and strategies agreed for the project. This makes the initial agreement, and the forms of consent of the parties concerned, all the more important. Agreement at the outset needs to be clear about the process, the outcomes, the audiences and the nature and purpose of the report.

Examples

Given normal undertakings of confidentiality, the literature of evaluation contains few examples of actual reports. Those that are in the public domain are normally those submitted on major national or international initiatives, may be very substantial, and in some cases are on internet or intranet websites. A 100-200 page final report, probably highly statistical, on a multimillion £ or $ project in agriculture or literacy will not help to illustrate the issues discussed here at more modest levels. However:-

An anonymised report 'On-line learning (OLL): an evaluation' can be viewed by clicking here (to download a 55K Word document). This gives some idea of what a report on a substantial national project (covering seven universities) might contain, and indicates some of the evaluation methodology on which it was based.

Click here for an article by Ian Jamieson (1983) on ‘The role of evaluation in action-research projects: the case of the Schools Council Industry Project’, Cambridge Journal of Education, vol. 13, no. 2, pp 37-45. This describes the evaluation process in a large, four-year action-research project on teaching about industrial society in the school curriculum. This is not a report, but it gives a clear idea of the relationship between the evaluator’s purposes and the work of the project team.

Some good advice from Rob Phillips and Tony Gilding on 'Approaches to evaluating the effect of ICT on student learning' (a 333K Word document in rtf format) is available by clicking here.

7 Advice

It may help finally to summarise some advice to have in mind when undertaking an evaluation:

1. Ensure at the outset that you have a full discussion of what you are going to be doing,

resulting in an agreed written statement. This may cover time scales, finance, reporting

(frequency, to whom, ‘ownership’ of reports…).

2. Be sure whether it is a ‘process’ or ‘impact’, formative or summative, evaluation –

though this is not necessarily the language of what is agreed.

3. Be clear about the intended methodology (observation, interviews, questionnaires,

focus groups, diaries…) and the relationship with the project team, other participants

and project management (senior staff, steering committee…).

4. Be sure about confidentiality (eg if formatively reporting to the project team, what

information it is legitimate, or not, to reveal; whether interviewees will be identified or

indentifiable in reports…). The project team and others involved need to understand the

confidentiality position, and it may be advisable to explain this and other matters in

writing for everyone concerned (commonly referred to as an ‘ethics protocol’).

5. If there is also to be an internal evaluation, consider what help you can give on its

purpose and methodology.

6. When submitting reports (interim, final) will they go first as drafts (to whom?) to be

checked for accuracy – not to challenge or confirm your judgments (it is your report)?

7. Consider, throughout the evaluation process, your own and shared purposes, the

effectiveness of your methodology, the appropriateness of your relationships.

8. Take account of what literature may be helpful.

Given the hypothetical evaluation or evaluations that you

have considered, including some of the problems or difficulties, and given

your own position,if invited to conduct such an evaluation –

would you do it?

If so, why? If not, why not?

My own final reflection is that I hope you would!

8 References and further reading (including websites)

Some of the items below are accessible on the Internet as indicated. There are books that it would be worth reading, but where possible chapters or papers in books are suggested. Items in bold are the most recommended.

Action Evaluation Research Institute (2000?), ‘Helping groups define, promote and assess success’, http://www.aepro.org/ [including overview, methodology, recent essays and ‘conceptual frameworks’].

Adelman, Clem and Alexander, Robin J. (1982), The Self-Evaluating Institution: practice and principles in the management of educational change, Methuen, London.

Albee, Alana (1999) ‘Assessing impact: some current and key issues’, Caledonia Centre for Social Development, http://www.caledonia.org.uk/pia.htm [a very useful paper].

Bloom, Benjamin S. (1969) ‘Some theoretical issues relating to educational evaluation’, in Tyler, Ralph W. (ed.), Educational Evaluation: new roles, new means, National Society for the Study of Education, Chicago [perceptive study of objectives, specifications and outcomes].

Bloom, Benjamin S. (1978), ‘Changes in evaluation methods’, in Glaser, Robert (ed.), Research and Development and School Change, Lawrence Erlbaum, New York [useful insights into early assumptions about evaluation].

Burgess, Robert G. (ed.), Educational Research and Evaluation: for policy and practice?, Falmer Press, London [chs include local and national evaluation, and relationship (if any) of evaluation to policy].

Department of Employment, Further and Higher Education Branch (1994), Evaluating Development in Higher Education (duplicated).

Ellington, Henry and Ross, Gavin (1994), ‘Evaluating teaching quality throughout a university: a practical scheme based on self-assessment’, Quality Assurance in Education, vol. 2, no. 2, pp. 4-9 plus annexes.

Gomm, Roger (1981), ‘Salvage evaluation’, in Smetherham, David (ed.), Practising Evaluation, Nafferton Books, Driffield.

Hamilton, David et al. (1977), Beyond the Numbers Game: a reader in educational evaluation, Macmillan, Basingstoke [Invaluable, including key writers, MacDonald and Walker on case study, and the influential Parlett and Hamilton study of ‘Evaluation as illumination’. Can be read selectively.]

Hemphill, John K. (1969) ‘The relationship between research and evaluation studies’, in Tyler, Ralph W. (ed.) Educational Evaluation: new roles, new means, National Society for the Study of Education, Chicago [useful discussion of the relationship].

Hopkins, David (1989), Evaluation for School Development, Open University Press, Milton Keynes [First 2 chapters are a good introduction to types of evaluation and an argument for evaluation in the service of development].

Jamieson, Ian (1983), ‘The role of evaluation in action-research projects: the case of the Schools Council Industry Project’, Cambridge Journal of Education, vol. 13, no. 2. pp 37-45 [brief account, raising many of the issues discussed here].

Jamieson, Ian (1984), ‘Evaluation: a case of research in chains?’, in Adelman, Clem (ed.), The Politics and Ethics of Evaluation, Croom Helm, London [the publishers failed to have this book proof read, so read this chapter with care!].

Kogan, Maurice (1986) Education Accountability: an analytic overview, Hutchinson, London [particularly ch. 6, ‘Epistemologies and evaluation’].

Lawton, Denis (1978) ‘Curriculum evaluation: new approaches’, in Denis Lawton et al., Theory and Practice of Curriculum Studies [short, but covers most of the issues raised here].

MacDonald, Barry and Walker, Rob: see Hamilton et al. above.

Manchester University Department of Applied Social Science (n.d.) ‘Evaluating policy and practice’, [brief account of postgraduate course approach, objectives, course content].

Nebraska Council to Prevent Alcohol and Drug Abuse (2000), The Least You Need to Know About… http://www.nde.state.ne.us/SDFS/ATOD/evaluation.html[types of evaluation].

Northwest Regional Educational Laboratory (2000), Evaluating Whole-School Reform Efforts: a guide for district and school staff, http://www.nwrac.org/whole-school/index.html [including good sections on impact evaluation].

O’Neil, Mike and Pennington, Gus (1992), Evaluating Teaching and Courses from an Active Learning Perspective, CVCP Universities’ Staff Development and Training Unit, Sheffield [mainly on evaluating teaching in higher education, especially methods of collecting evidence].

Parlett, Malcolm and Hamilton, David: see Hamilton et al. above.

Parsons, Carl (1981) ‘A policy for educational evaluation’, in Lacey, Colin and Lawton, Denis, Issues in Evaluation and Accountability, Methuen, London.

Simons, Helen (1981), ‘Process evaluation in schools’, in Lacey, Colin and Lawton, Denis, Issues in Evaluation and Accountability, Methuen, London.

Simons, Helen (1987), Getting to Know Schools in a Democracy: the politics and process of evaluation, Lewes, Falmer Press.

Stake, Robert E. (1980), ‘Program evaluation, particularly responsive evaluation’, in Dockrell, W.B. and Hamilton, David (eds), Rethinking Educational Research, Hodder and Stoughton, London.

Stenhouse, Lawrence (1975) An Introduction to Curriculum Research and Development, Heinemann, London [Ch. 8 on ‘The evaluation of curriculum’ is a key text].

Trochim, William M.K. (2002), Introduction to Evaluation, http://www.socialresearchmethods.net/kb/intreval.htm [definitions, strategies, types, questions and methods; link to The Planning-Evaluation Cycle and An Evaluation Culture. Based on book, Research Methods Knowledge Base].

Wayne State University Center for Urban Studies (n.d.) account of approach to evaluation research, go to http://www.cus.wayne.edu/capabilities/intro.asp and click on 'evaluation' in bullet point list.

Weiss, C.H. (1998, 2nd edn), Evaluation: methods for studying programs and policies, Prentice Hall, New York [massive compendium, suitable for consulting; very expensive].

World Bank Group (2001) Poverty Net, http://worldbank.org/poverty/impact/ [substantial account of an approach to large-scale project evaluation, including understanding impact evaluation, methods and techniques, many examples and readings; valuable insights].