a field study of goal-based and goal-free evaluation

Western Michigan University Western Michigan University

ScholarWorks at WMU ScholarWorks at WMU

Dissertations Graduate College

4-1980

A Field Study of Goal-Based and Goal-Free Evaluation Techniques A Field Study of Goal-Based and Goal-Free Evaluation Techniques

John W. Evers Western Michigan University

Follow this and additional works at: https://scholarworks.wmich.edu/dissertations

Part of the Educational Assessment, Evaluation, and Research Commons

Recommended Citation Recommended Citation Evers, John W., "A Field Study of Goal-Based and Goal-Free Evaluation Techniques" (1980). Dissertations. 2645. https://scholarworks.wmich.edu/dissertations/2645

This Dissertation-Open Access is brought to you for free and open access by the Graduate College at ScholarWorks at WMU. It has been accepted for inclusion in Dissertations by an authorized administrator of ScholarWorks at WMU. For more information, please contact [email protected].

http://scholarworks.wmich.edu/


https://scholarworks.wmich.edu/

https://scholarworks.wmich.edu/dissertations

https://scholarworks.wmich.edu/grad

https://scholarworks.wmich.edu/dissertations?utm_source=scholarworks.wmich.edu%2Fdissertations%2F2645&utm_medium=PDF&utm_campaign=PDFCoverPages

http://network.bepress.com/hgg/discipline/796?utm_source=scholarworks.wmich.edu%2Fdissertations%2F2645&utm_medium=PDF&utm_campaign=PDFCoverPages

https://scholarworks.wmich.edu/dissertations/2645?utm_source=scholarworks.wmich.edu%2Fdissertations%2F2645&utm_medium=PDF&utm_campaign=PDFCoverPages

mailto:[email protected]



A FIELD STUDY OF GOAL-BASED AND GOAL-FREEEVALUATION TECHNIQUES

by

John W. Evers

A Dissertation Submitted to the

Faculty of the Graduate College in partial fulfillment

of the requirements for the Degree of Doctor of Education

Department of Educational Leadership

Western Michigan University Kalamazoo, Michigan

April 1980

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

A FIELD STUDY OF GOAL-BASEDAND GOAL-FREE EVALUATION TECHNIQUES

John W. Evers, Ed.D.

Western Michigan University, 1980

Educational evaluation theorists propose two methodologies that

could be used to assess a project or program’s achievements. One

method is well established and is called goal-based evaluation.

The other has been proposed recently and is called goal-free

evaluation. Little information exists about goal-free evaluation

in field settings. The problem to be addressed was:

What would be the results of a field study that

compared the relative utility of operationalized

versions of goal-free and goal-based evaluation

techniques?

The perspective selected to investigate this problem was

evaluator/evaluee interactions. This study had two

objectives:

1. to develop materials and procedures for using the

two techniques in an evaluation study, and

2. to investigate the relative utility of the two techn-

niques through a field exploration of the evaluator/evaluee

relationship.

The techniques were operationalized through handbooks that

incorporated checklists. Subjects were randomly


selected from recommendations by nationally recognized evaluators

and were randomly assigned to training in one of the two techniques.

Projects to be reviewed were randomly assigned to evaluators.

Three instruments were developed: two of a Lilcert: type and

one as a semantic differential. The two Lilcert instruments were

used to assess the following elements of the on-site evaluation

process:

1. evaluator/project director rapport

2. evaluators' time utilization .

3. evaluator/project director expectations of each other

4. evaluator/project director overall satisfaction

5. evaluator confidence with the methodology.

The third instrument went through a developmental process to

establish reliability, and was used by the evaluators who were

trained in one of the two techniques. Evaluator ratings of the

on-site process were analyzed with a repeated measures ANOVA.

Project director ratings of the on-site process and of the

utility of the evaluation reports were analyzed through use of a

completely randomized hierarchical design.

Findings supported some of the proposed differences between

the goal-free and goal-based techniques:

1. the two groups reported different patterns of

activities while on-site,

2. the goal-free evaluators rated themselves lower than

the goal-based evaluators in elements of the on-site process,


• 3. evaluce ratings of the on-site process did not differ

significantly, and .

4. evaluee ratings of'the reports produced from the two

techniques did not differ significantly.


ACKNOWLEDGEMENTS

The development of this dissertation has benefitted from

both the advice and criticism of Professors Daniel Stufflebeam,

.Tim Sanders, John Sandberg, and Mary Ann Bunda. I would also

like to recognize the challenges and opportunities provided by

the Evaluation Center while at Ohio State and at Western

Michigan University.

John W. Evers


INFORMATION TO USERS

This was produced from a copy of a document sent to us for microfilming. While the most advanced technological means to photograph and reproduce this document have been used, the quality is heavily dependent upon the quality of the material submitted.

The following explanation of techniques is provided to help you understand markings or notations which may appear on this reproduction.

1. The sign or “ target” for pages apparently lacking from the document photographed is “Missing Page(s)” . I f it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting through an image and duplicating adjacent pages to assure you of complete continuity.

2. When an image on the film is obliterated with a round black mark it is an indication that the film inspector noticed either blurred copy because of movement during exposure, or duplicate copy. Unless we meant to delete copyrighted materials that should not have been filmed, you will find a good image of the page in the adjacent frame.

3. When a map, drawing or chart, etc., is part of the material being photographed the photographer has followed a definite method in “sectioning” the material. It is customary to begin filming at the upper left hand corner of a large sheet and to continue from left to right in equal sections with small overlaps. I f necessary, sectioning is continued again—beginning below the first row and continuing on until complete.

4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost and tipped into your xerographic copy. Requests can be made to our Dissertations Customer Services Department,

5. Some pages in any document may have indistinct print. In all cases we have filmed the best available copy.

UniversityMicrofilms

International300 N. ZEEB ROAD, ANN ARBOR, Ml 48106 18 BEDFORD ROW, LONDON WC1R 4EJ, ENGLAND


8017071

Evers, Jo h n W arren

A FIELD STUDY OF GOAL-BASED AND GOAL-FREE EVALUATION TECHNIQUES

Western Michigan University Ed.D. 1980

University Microfilms

International 300 N. Zeeb Road, Ann Arbor, M I 48106 18 Bedford Row, London WC1R 4EJ, England


TABLE OF CONTENTS

ACKNOWLEDGEMENTS ........................................... ii

LIST OF TABLES ............................................. iv

CHAPTER

I. THE PROBLEM AND ITS BACKGROUND................... 1

Context of the Field Study..................... 3

Overview of Goals as Evaluation Criteria....... 5

II. GOAL-FREE EVALUATION AS DEVELOPED IN THELITERATURE .........................................10

Background.......................................H

Philosophical Tenets .......................... 12

Implied Procedure............................... -14.

Application .....................................17

III. METHODOLOGY ..................... 33

Subject Selection and Assignment................43

Training........................................ 46

Instrument Development ........................ 48

Data Collection Procedures ....... 53

Data Analysis Procedures....................... 54

IV. FINDINGS AND DISCUSSION............ 59

APPENDICES ................................................. 73

BIBLIOGRAPHY .............................................. 189

iii


LIST OF TABLES

Sources of Information and Their Use With Goal-Basedand Goal-Free Evaluation ................................... 35

Composite Background of Subjects By Groups..... ........... 47

Summary of Information Reported On Activity Logs .......... 60

Repeated Measures, ANOVA Analysis: of Evaluator Process ^Ratings .....................................................

Item Means and Standard Deviations From Evaluator Process Rating Instrument............................................ 64

Completely Randomized Hierarchal ANOVA Analysisof Project Director Process Ratings......................... 66

Completely Randomized Hierarchal ANOVA Analysis ofProject Director Ratings of Report Utility.................. 67


CHAPTER 1

THE PROBLEM AND ITS BACKGROUND

"Educational evaluation is in the air. Indeed, demands for edu

cational evaluation are so prevalent that most educators must believe

they are living in an evaluation generation" (Popham, 1975, p. 1).

Most of those demands began with the passage of the Elementary and

Secondary Education Act (ESEA) of 1965. It mandated evaluations of

Federally-funded "Title" programs. The House and Senate revisions

of a bill to reauthorize ESEA in 1978, retained continued provisions

for evaluation activities (Report on Educational Research, June 14,

1978). Even though evaluations occurred (whether mandated or

volunteered) some developers of evaluation theory and practice

have reported methodological concerns.

For instance, Stufflebeam et. al., (1971) reported in an overview

and project about the field of educational evaluation that "descrip

tions of evaluation methodology are lacking" (p. 336). Along the same

line, Worthen and Sanders (1973) more recently reported that "there is

little or no data-based information about the relative efficacy of

alternative evaluation plans or techniques" (p. 334)... In general,

evaluation has been viewed in the literature as an underdeveloped

process even though it had been an on-going, mandated activity through

Federal requirements since 1965.

Several evaluation techniques have been developed, tested and re

ported. However, other evaluation techniques have been reported but

not fully developed and field tested (e.g., the Goal-free Technique:


2

Scriven (1972) and Modus Operand! Analysis: Scriven (1974). Theye

has been a need for further development and testing of these evaluation

techniques. Therefore, studies to operationalize the recently proposed

evaluation techniques and to explore their efficacy or utility in

a field setting are worthwhile.

In this study, the goal-free evaluation technique (Scriven, 1972)

was operationalize., and field-tested. It was compared to a more

classical technique called goal-based evaluation (Tyler, 1942).

The problem to be addressed was: what would be the results of a

field study that compared the relative utility of operationalized

versions of goal-free and goal-based evaluation techniques? The

particular perspective selected to investigate this problem was the

evaluator/evaluee interaction. This interaction was one important

perspective when considering the relative utility of evaluation

techniques. In other words, an attempt was made both to develop

evaluation materials and to study the effects of those materials on

the evaluator/evaluee interaction process. It was assumed that a

"naturalistic" approach (Guba, 1978) would provide a worthwhile

perspective for considering evaluator/evaluee interactions.

This study had two objectives:

(1) To develop materials and procedures for using the goal-free

and goal-based approach in an evaluation, and

(2) To investigate the relative utility of these goal-based and

goal-free techniques of evaluation through a field exploration of the

evaluator/evaluee relationship.

The following investigatory questions were derived from the two


3

purposes of the study:

(1) What materials and procedures are needed to operationalize

the goal-free and goal-based evaluation approaches?

(2) When the materials and procedures of these two approaches

are field tested, will the evaluators rate the evaluation process

differently depending on which approach they are using?


are field tested, will the evaluees rate the evaluation process

differently depending on which approach is being applied?


are field tested, will the evaluees rate the evaluation reports

differently depending on which approach was used?

One limitation of this study concerned the fourth investigatory

question. Each report only received a rating by the evaluee whose

project was reviewed. It would have been desirable to obtain multiple

ratings of each report, and to compare reports based on the two

approaches in relation to their theoretical assumptions and content.

The next section presents an overview of the context of the study

that was designed to deal with this problem.

Context of the Field Study

The Evaluation Center at Western Michigan University contracted

with the Hill Family Foundation of St. Paul, Minnesota to review

several projects in the Midwest that were being sponsored by this

Foundation. Certain elements of this contracted evaluation lent them


4

selves to a field-based investigation of the two evaluation approaches.

To clarify these elements an overview is presented.

The overall contracted evaluation had two parts. One part was

more extensively developed and dealt with an external review of each of

various projects at 16, four-year, independent colleges throughout the

Midwest and Northwest. The second part involved in-service training in

evaluation for the college project and Foundation staff. The part of

the evaluation project that dealt with an assessment of the projects

was expanded to include a field study that compared the goal-free and

goaL-based evaluation techniques.

Early in the overall contracted study an orientation session was

held between representatives of the projects, the Foundation, and the

Center. The purposes of the session were to allow: the Foundation

officials to explain the purpose of the evaluation to project repre

sentatives, the project representatives (the evaluees) to give a gen

eral progress report about the early stages of their projects, the

representatives of the Center to meet all parties and explain the de

sign of the evaluation study, and the Foundation staff and project

representatives an opportunity to critique and influence later imple

mentation of that design. About six weeks later preliminary baseline

information about goal-setting procedures and the strategies chosen to

achieve goals was collected, by means of a mailed questionnaire, from

each project. These two steps preceded the investigation of the two

evaluation approaches and provided information that was later used with

the evaluators who participated in the field study.

In earlier evaluation work by the Center, it was found that one


evaluator preceding a panel of experts could greatly expedite their

work by gathering much information and reporting preliminary hypotheses

about project achievement for their further investigation. The evalu

ators. who made these preliminary site visitations were called "traveling

observers." In this study, evaluation strategies used by these travel

ing observers were structured as either goal-based or goal-free. This

structuring provided the basis for this study's field-based comparison

of goal-free and goal-based evaluation techniques.

Information from traveling observer visitations was developed into

two sets of reports. One report was developed first for the review

panelists who were to follow later. Each report was structured in ac

cordance with the approach followed in its development, i.e., goal-

free or goal-based. The report was then reviewed and edited by the

traveling observer into another report to be presented to the project

director whose project had been assessed. At a later point in the

study all reports were sent to the Foundation.

Only the traveling observer portion of the overall contracted

evaluation was used to investigate the effects of the goal-free and

goal-based strategies. No other elements of the contracted study were

involved in this field investigation of evaluation techniques.

Overview of Goals as Evaluation Criteria

The primary variable investigated in this study was the goals of

an enterprise being evaluated. That is, what difference does it make

if an evaluator does or does not consider a project's goals when eval

uating it?


Stake (1974) referred to the approach that concentrated on a pro

ject's goals as preordinate evaluation. Preordinate refers to the

use of prespecified goal statements as a blueprint for reviewing achieve

ments. Congruence refers to use of prespecified goal statements and

their match to terminal outcomes. Or, in other words, to what degree

do outcomes match intentions? Mismatches between outcomes and

intentions are reported as discrepancies, or non-achievement.

Stake (1974) and Clark (1975) reported that many evaluators pre

sume that a comparison of achievement to some prespecified goal state

ments is an essential part of an evaluation plan. Evaluations that

ascribe to this approach use an intrinsic set of criteria to assess

achievement. Intrinsic assessment criteria are found by examining an

educational object to discover original intentions or specifications.

Once criteria are established by looking "within" the object, they

are used as the basis for judging a present state of achievement.

In 1977, Scriven provided the following definition of this goal-

based or preordinate approach to educational evaluation:

This type of evaluation is based and focused on knowledge of the goals and objectives of the program, person or product. A goal-based evaluation does not question the merit of goals; often does not look at cost-effectiveness; often fails to search for or locate the appropriate critical competitors; often does not search for side-effects; in short, often does not include a number of important and necessary components of an evaluation. Even if it does include these components, they are referenced to the program (or personal) goals and hence run into serious problems such as identifying these goals, handling inconsistencies in them and changes in them and changes in them over time, dealing with shortfall and overrun results and avoiding the perceptual bias of knowing about them. (p. 13)

Cronbach (1963),Scriven (1967), and Stake (1967) focused on the


limitations associated with using prespecified goals and objectives as

criteria to assess achievement. Cronbach's position was that evaluation

calls for a description of outcomes and side-effects on the broadest

possible scale even at the sacrifice of superficial fairness and pre

cision. He added that observed outcomes should range far beyond the

actual curriculum content. This early position was reflected in

Stake's and Scriven's later writings. For example, Stake reported that

assessing the congruence between outcome achievement and prestated ob

jectives is only part of the evaluation process. He explained that an

evaluator must also search for side-effects and incidental gains rather

than narrowly report goal achievement.

Scriven (1967) stated this position more strongly. He said that

evaluation includes as an equal partner with a measure of performance

against goals, a procedure for the evaluation of the goals. That is,

if the goals are not worth achieving, then it is unimportant to see

how well they were achieved. Scriven explained that it is more im

portant to ask How good is the course? rather than Did the course

achieve its goals? Finally, as a premonition of ideas yet to come,

Scriven said succinctly that an evaluation should see what the project

does, and not bother with whether it had good intentions (p. 60).

Goals are considered necessary for management and planning, but

not for evaluation, Scriven (1972) later reported. The evaluator

who is ignorant of the espoused intentions of the project (stated

goals) avoids a perceptual set that more often than not biases

judgments of real achievements. By ignoring espoused intentions

(stated goals), Scriven said that the evaluator would have a greater


chance of assessing the real effects of the project. This approach was

called goal-free evaluation (GFE) by Scriven because merit was deter

mined independent (free) from the intended goals.

Scriven (1977) defined goal-free evalution as follows:

In this type of evaluation,.the evaluator(s) is not told about the purpose of the program but enters into the evaluation with the purpose of finding out what the program actually is doing without detailed cueing as to what it is trying to do. If the program is doing what its stated goals and objectives say, then these achievements should show up; if not, it is argued, they are irrelevant. Merit is determined by relating program achievements to the needs of the impacted population, rather than to the program (i.e. agency or citizenry or congressional or manager's) goals.It could thus be called "needs-based evaluation" or "consumer- oriented evaluation" by contrast with goal-based or manager- oriented evaluation. It does not substitute the evaluator's goals for the program's goals, nor the goals of the consumer.GFE is generally disliked by both managers/administrators and evaluators, for fairly obvious reasons. It is said to be less intrusive than GBE, better at finding side-effects and less prone to bias. (p. 13)

Scriven proposed that achievement can be assessed by a comparison

to an extrinsic set of criteria. Extrinsic, goal-free criteria

are found by looking "outside" the object to be evaluated and

by purposefully bypassing the potentially biasing effects of

intrinsic, or goal-based, criteria. Examples of extrinsic

achievement criteria are the demonstrated needs of a target popu

lation, educational project, program, or object.

As might be expected, the goal-free approach caused considerable

discussion. For instance, Stufflebeam (1972) commented on the

overall merit of the GFE approach by saying that the strategy was

potentially useful, but far from operational and replicable.


Because of its promise, Stufflebeam believed that Scriven and

others should further develop and test it, and report the effects of

GFE, whatever they turn out to be. Concurrently Popham (1972) reviewed

the possibilities of GFE and reported that although the strategy was

alluringly portrayed by Scriven, he would have to wait until GFE was

tried in real evaluation settings to see its effects. Except for an

example by House and Hogben (1972), the question of the effects of

goal-related information on the evaluation process was the subject for

intellectual debate but not practical investigation.

In summary, this study had two objectives:

(1) To develop materials and procedures for using the goal-free

and goal-based approach in an evaluation, and

(2) To investigate the relative efficacy of these goal-based and

goal-free techniques of evaluation through a field exploration of the

evaluator/evaluee relationship.

As an overview of the rest of the dissertation, Chapter 2 contains a

review of the development of the two evaluation techniques with an

emphasis on the goal-free approach since it is the least well known.

The techniques are compared through consideration of background,

philosophical teiiets, implied procedure, and applications. The

third chapter describes subject selection, training, instrument

development, and data collection procedures. Chapter 4 presents

findings, a discussion of implications, and recommendations for

future research.


CHAPTER 2

GOAL-FREE EVALUATION AS DEVELOPED IN THE LITERATURE

In conducting this study it was important to consider some caveats

in generalizing about differences in evaluation techniques. Evaluation

theory and its operational techniques were not a settled issue. GFE

and GBE were evolving. This made it difficult to generalize about

specific differences at a point in time. The risk in generalizing

about underlying differences was that one might mistakenly assume that

an evaluator employing one approach or the other was acting in

accordance with a standard protocol. That, realistically, was not

the case in this study. There wasn’t, and still isn't, any standard

protocol for doing goal-free evaluation. The early literature which

led up to GFE was mainly philosophical in nature and contained little

operational guidance.

For example, Scriven’s early (1967) writings presented general

thoughts about judging, goals, and assessing actual rather than intended

effects. His later (1971a) publications foreshadowed usage of the goal-

free approach by discussing the concept of "effectiveness" within a

five step evaluation process. Effectiveness was presented as information

about treatment;effects. that were not restricted to the' espoused.,

goals and included impact in unstated directions. This information

was then to be rated by the evaluator as good or bad. Scriven also

presented the position that the foundation stone of professional ethics

for evaluators was that they should see themselves as "enlightened

surrogate consumers" who are concerned for the welfare of society

as a whole, not just the target group of a producer.

10


11

More recent writings (1975) presented applications of the goal-free

technique. The following sections were organized through a discussion

of background, philosophical tenets, procedures, and applications of the

goal-free technique. Discussion of the goal-based technique was in

cluded where necessary.

Background

Scriven (1974, p. 34ff) reported that individuals were often put

in the position of external, summative, product evaluators. That is,

an evaluator who is not on the original staff is hired to make

terminal judgments about the achievements of some product. These

evaluators are too often confronted with rhetoric from the original

proposal as evidence to support the excellence of the product. This

rehetoric is used as a substitute for lack of evidence about actual

product effects. Scriven advanced the position that this rhetoric of

intention most probably affects the way in which the final product

is evaluated even when the evaluator .goes out and.obtains data on

effectiveness.

According to Scriven, the rhetoric of the proposal is often

couched in cliches, faddism, and jargon and serves a primary purpose

of obtaining project funds. Scriven (1974) concluded that reading

through the goal rhetoric contributed nothing to the evaluation process.

Scriven went on to say that it in fact produces a negative effect, and

that reading the intentions of the producer tends to develop a percep

tual set, or tunnel vision, for the evaluation. Following a blueprint

from intention to end product creates a situation where the evaluator

tends to look less hard for side-effects of the product. Looking in


12

the direction of announced' effects developed a situation that limits

potential assesment of a broad spectrum of actual effects, either neg

ative or positive. Since evaluation for Scriven was primarily-concerned

with assessing all actual effects, rather than only intended effects,

he proposed that evaluators neither read, nor be aware of the producer's

intentions or goals.

Philosophical Tenets

To Scriven, the producer's goals and intentions were potentially

contaminating. He emphasized that goals were necessary for project man

agement and development, but not for project evaluation. As further

evidence for the argument against the evaluator needing to be aware of

goals, Scriven (1974, p. 37) described the problem of trying to sort

out alleged goals from actual goals. He pointed out that it was not

the evaluator's responsibility to become entangled in the problem of

sorting out goals. As mentioned previously, it was Scriven’s position

that the evaluator's responsibility was to serve as an enlightened

surrogate consumer concerned with the welfare of society as a whole.

Another problem is trying to understand goals that are so

vaguely stated that they could cover any positive or negative effect.

Scriven emphasized that it was not the evaluator's responsibility to

uncover what was really intended. Goals are usually stated in either-

grandiose or short-sighted terminology. Rather than report a project

fell short or overachieved, Scriven explained that it was more appro

priate to assess what was actually achieved, and to give credit for

performance rather than promise. It is assumed that this type of

technique would help practitioners assess whether programs actually

work. .


13

The true strength of the GFE approach was claimed by Scriven to be

in reporting effects that had been previously overlooked. If an intended

effect is not strong enough to be detected by the goal-free evaluator,

then it is not important enough to be reported. According to him, the

goal-free approach lets the chips fall where they may, and actually

fosters discovery and invention by giving credit for good things that

were not stated as goals.

A major question, however, concerns what replaces goals as a

standard for assessing achievement. Stufflebeam (1972), Alkin (1972),

and Kneller (1972) pointed out that they doubted that the goal-free

approach was actually free of goals as the name suggested. Each dis

cussed various ways that the criteria used in the goal-free technique

could' be considered as goals. Perhaps Scriven's approach would have

received less criticism on this point if he had labelled it goal-blind

rather than goal-free.

Responding to these critiques, Scriven (1974, p. 37) pointed out

that the evaluator was not free to substitute personal goals for those

intended by the project. He considered it an error to believe that

criteria have to be either the goals of the evaluator or the evaluee.

However, the criteria that were used may be (and Scriven pointed out

that they probably were) somebody's goals. Scriven suggested that the

GFE'r use criteria that were similar to those of the consumer, of the

funding agency, or interestingly enough, the goals of the producer if

they went through a validating and judging process by the evaluator

(more on this point follows).


14

Implied Procedure

Scriven (1973, 1974) mentioned that a primary design consideration

was that someone other than the goal-free evaluator screen initial in

formation sent from the project or funding agency to delete goal-related

information. He also suggested that this intermediary accompany the

GFE'r on-site as a liaison or buffer between the project staff and the

evaluator. This individual is to act as a check against early ex

posure to prejudicial information while the evaluator is developing a

"mind set" about effects. Again, what is essential is to work very

hard in the initial stages of the evaluation process to keep prejudical

information about what the treatment intends to do from the attention

of the evaluator.

The goal-free approach, according to Scriven (1974, p. 37) was

unaffected by a project changing goals during its developmental process.

It therefore did not present a rigid evaluation design that required

an unchanging treatment to assess actual project effects. Interestingly

enough, Scriven (1972, 1973, 1974 and personal correspondence) recom

mended that GFE can be used in conjunction with GBE with different in

dividuals each using a different technique (one goal-based and one

goal-free evaluator). Another possibility is to employ one individual

who starts goal-free. Then, as he was exposed to goal rhetoric he is

converted to a more goal-based operation. The techniques are not inter

changeable in sequence however. Once a goal-based approach has been

implemented, it cannot be converted to a goal-free approach. On the

other hand, once the initial stages of the evaluation process have oper

ated with the goal-free approach to gain the benefits of establishing


15

an objective set of hypotheses about the object, the GFE approach can

be reversed to a GBE approach. Scriven (1974) reported that this com

bining of techniques develops an optimum situation where one can

"have his cake and eat it too." That is, both goal-based and goal-free

criteria can be used in the assessment of merit.

An additional difference that Scriven (1973, 1974) anticipated was

the degree of client, or project staff, anxiety produced by the two ap

proaches. Scriven implied that the relationship between the goal-based

evaluator and the project staff is one of cozy cooperation. That is,

the evaluator and project management are relatively clear on the stan

dards used and the variables to be considered. It can be anticipated

that the project staff would be more than willing to discuss their inten

tions with the evaluator so that he would better "understand" their

products. However, as might be expected, the situation of an evaluator

who does not talk to the staff until some time late in the site visitation

and who may have even directed project materials, correspondence, and

the orientation session to a liaison person for review, is seen to

evoke anxiety on the part of the project management. This anxiety

supposedly results from not knowing what effects were being considered

and what standards were used tc assess them.

Based on these published discussions of the GFE approach, it was

possible to summarize potential operational differences between it and

GBE.

The goal-based evaluator reviews any and all project documents to

establish the intended outcomes. This involves an orientation session

by project management when arriving on-site. Products and treatments


16

are reviewed for congruence with original specifications. The final

GBE evaluation report contains some side-effect information and may

even provide judgments of the producer's goals, but generally assesses

congruence between intent and achievement. Standards (intrinsic

criteria) are biased by exposure to the rhetoric of intention prior to

the review of treatment and products.

On the other hand, the goal-free evaluator reviews general mater

ials that were screened for goal rhetoric prior to arrival on-site.

Information that is appropriate for the GFE'r to review includes

sample materials, observer descriptions of the process, nature of con

trol groups, time constraints, results of quizzes, and so forth.

Scriven explained that the primary concern of this initial screening

strategy is to establish conditions where the evaluator initially

inferred a wide range of possible effects without being oriented to those

that were intended. This is a reflective stage of generating a set of

potential hypothetic-deductive or "if-then" statements about potential

elements of the treatment before observing the actual treatment.

At least two techniques could be employed either independently

or in combination to generate these hypotheses. In the

first the evaluator relies on professional judgment. This approach

presents problems of validity and objectivity, but it has the advantage

of exploiting professional expertise and using a large set of variables.

The second alternative — relying on an identification of the established

needs of the target population or consumer — has the advantage of as

sessing merit in terms of a project's service to people. But needs data

often are not available and their collection requires substantial man


17

power, time, and money. Assuming that the GFE'r can somehow generate

hypotheses, he next sets out to gather relevant data. After spending

a reasonable amount of time observing the object to be evaluated, the

GFE’r is supposed to present the initial hypotheses and observations to

the project staff. It is then appropriate to assess the importance of

each effect. According to Scriven this is done in a number of ways.

Needs assessments data are called up or collected de nova. The project’s

goals are referenced, or the evaluator gets a variety of experts and

lay persons to judge the effects. It may be the extreme case that the

producer is diligent in using the needs assessment when setting pro

duction goals, and all observed effects can be assessed by reference

to the initial goals. It is possible in this case to use these val

idated and judged goals of the producer as criteria for the evaluation.

The point to be considered as crucial is not that some goals are

eventually accepted as criteria, but that the process used to arrive at

those criteria is as objective as possible.

Application

Three examples of actual goal-free evaluation had been reported at

the time of this writing. The first was accomplished in 1972 by Ernest

House and Donald Hogben at the Center for Instructional Research in

Curriculum Evaluation (CIRCE) at the University of Illinois. The

second was directed by Michael Scriven through a consulting firm

(Education and Development Group) at Berkeley in 1975. The third was

by Wayne Welch through the University of Minnesota in 1976. There were

similarities and differences between the three examples.

To report the similarities and differences, a logical framework


18

proposed by Stufflebeam (1974) for analyzing evaluation studies was

employed. This framework (An Administrative Checklist for Reviewing

Evaluation Plans) is extensive since it includes the following sections:

1. Conceptualization of Evaluation2. Socio-Political Factors3. Contractual/Legal Arrangements4. The Technical Design5. The Management Plan6. Moral/Ethical/Utility Questions

Each of these six sections contains sub-questions that provide ana

lytical detail of the studies.

Only the final report of each of the three studies was used as the

source to be analyzed with the checklist. Using the final report was a

limitation since each project was more involved than could be presented

in a final report to a client group. However, that document was the

easiest to obtain and it revealed much about the evaluation.

If there was little or no information to answer a sub-question

from the checklist, it was considered as not available. A comparison

of the three applications of goal-free evaluation follows. The nar

rative is divided by each of the six areas in the Stufflebeam Check

list. A summary table precedes each of the six narrative sections.

These tables follow the points within the Administrative Checklist.

Each table is presented as an advance organizer to similarities and

differences across the three applications.


19

1. Conceptualization of Evaluation

Ho-use/Hogben Scriven Welch

Definition N.A. implied N.A.

Purpose external review external review external review

Questions each different: see text

each different: see text

each different: see text

Audiences varied audiences: see text

varied audiences: see text

varied audiences: see text

Process implicitly GFE implicitly GFE implicitly GFE

Standards N.A. meta evaluator used N.A.

Two of the three studies reported no definition of evaluation;

Scriven's report implies that evaluation means assessing the treatment

as it actually takes place. This lack of definition was a limitation

that made identifying the underlying similarities of intent difficult.

The three studies shared some similarities of purpose since each was an

external review of a project. Both Scriven and Welch’s work was planned

as a supplement to on-going internal evaluation activities.

Each study addressed different questions. House and Hogben were

concerned with What new perspectives about the curriculum can be seen?

and What would be a new emphasis for the internal evaluation efforts?.

Scriven's efforts focused on the following questions:

Are the materials ready to be marketed?What are suggestions for future developmental efforts?What are the implications of this informal learning

process?

How good are these materials? was the question that Welch’s study

addressed. It was possible to summarize the thrust of these

questions as Tell the client about their product from a perspective


that is fresh and outside the developmental process.

House and Hogben's main audience was the developers of a biology

curriculum (BSCS) for 13 to 15 year-old educable mentally retarded

persons. Scriven reported to three audiences. First, his main

audience was the Southwest Educational Development Laboratory. The

second audience was the National Institute of Education and the third

was prospective consumers of the materials. The internal evaluation

staff at St. Mary's Junior College was Welch's audience for his report.

The three agents doing each evaluation were referred to earlier:

CIRCE, EDC, and the University of Minnesota. The process that each

used was referred to as goal-free in the documents. There was no

explicit definition of the process each used, however, at a later

point in this discussion (under technical design) a description

of the activities occurs. House and Hogben described no standards

that they were judging their own work by. Likewise, Welch provided

no description of external standards. However, Scriven reported that

Stufflebeam was judging his project in the role of a meta evaluator.


21

2. Socio-political Factors

House/Hogben Scriven. Welch

Involvement Public School Staff

Public School Staff

College Staff

InternalCommunication

N.A. Used Liason Staff Person N.A.

InternalCredibility

N.A. Evaluee Reactions Prior to Distri

bution

N.A.

ExternalCredibility

Minimal ■ Bias Checks

Extensive Bias Checks

Moderate Bias Checks

Security N.A. AnonymousReporting

AnonymousReporting

Protocol N.A. PrearrangedVisitations

N.A.

Public N.A. N.A. N.A.

House and Hogben's, and Scriven's project evaluated activities

that occurred in a public school classroom. Specifically, House and

Hogben involved the development staff, two classroom teachers, and

their students. No reference was made about how these two groups' sup

port was to be obtained. Scriven's goal-free evaluation involved the

SEDL staff and the instructional/administrative staff at ten elementary

schools in California. There were several references to Scriven's

staff working with the school staffs to gain their involvement. Welch

reported limited contact and involvement with the St. Mary's staff


22

since his study reviewed materials that were isolated from a classroom.

Both Welch and House/Hogben reported no internal communication

process between the evaluators and the evaluees. Scriven related that

the project manager of the evaluation study served as the key liason

across the participants. Also, House/Hogben and Welch reported no

steps to establish internal credibility for their evaluation work.

Scriven reported that the final report was sent to the evaluees for

their reactions prior to shipment to his main audience.

All three goal-free evaluations reported some means to establish

external credibility. House/Hogben’s work was considered to be mini

mally bias free because two evaluators teamed the work, but only one

person synthesized the data for the final report. More extensive bias

checks were used by Scriven. He reported the use of a meta-evaluator,

an evaluator project manager to screen materials from the projects, a

replacement process for '’contaminated” evaluators, a process for train

ing and calibration of evaluators, and a standard checklist for use in

observation of the treatments. Scriven also described as additional

checks against bias, the use of multiple site visits across time and

the use of multiple site visitors (evaluators) per site. Welch reported

moderate bias checks. His evaluation project used a five member rating

panel with four of those members rating the materials independently.

The fifth member of the panel then aggregated the results of the other

four panelists. The five panelists also employed a standard rating

.instrument.

No procedure for security of the evaluation data was reported by

House/Hogben. Scriven and Welch reported some attempts at security.


23

Welch reported that all panelists' ratings of each product were report

ed anonymously to the client. On the other hand, Scriven claimed that

the classroom teachers that were observed using the product were guaran

teed absolute anonymity.

Neither House/Hogben nor Welch described attempts to maintain pro

tocol while evaluating the products. However, Scriven reported that

his staff checked in at the ten schools at prearranged times, rather

than by surprize. No attempts at public relations with the media were

reported by the three evaluators.

3. Contractual/Legal Arrangements

House/Hogben Scriven Welch

Client/EvaluatorRelationship

Clearly Stated: See Text



EvaluationProducts

Final Report Final Reports Final Report

DeliverySchedule

Varied Varied Varied

Editing Implied Evaluator’s

Responsibility

Explicitly Evaluator"s

Responsibility

ImpliedEvaluator's

Responsibility

Access:to Data See Text See Text See Test

"Release Reports N.A. N.A. N.A.

Responsibility & Authority

N.A. N.A. N.A.

Finances N.A. . N.A. , . . N.A.

The client/evaluator relationship was clearly defined in all three

studies. In the House/Hogben study, the BSCS staff were the developers;


24

House/Hogben were the external summative evaluators. Likewise, the

SEDL staff were the developers, and Scriven’s staff were the external

summative evaluators. The St. Mary's staff were the developers, and

Welch and staff were the external summative evaluators. All three

evaluations specified that their main evaluation product was the pro

duction of a final evaluation report. The House/Hogben report was due

after approximately ten days of consulting time. Scriven sent two

reports to SEDL after about five weeks time. One was goal-free in

nature, and the second report was a revision after knowing the goals.

This second version was also sent directly to the teachers who were

evaluated with the products. Welch's report was due after approximately

two or three days consulting time. In the Welch and House/Hogben

studies the responsibilities for editing the final evaluation report

were assumed to be those of the evaluators. Scriven reported that

editing responsibilities were definitly those of the evaluator.

Each evaluation study described different, means for access to

project data. House and Hogben reviewed existing data from the devel

oper and then collected new data at two field sites. Scriven reviewed

categories of developer's data (but not actual data) and then selected

some categories of data for further evaluation at the ten field sites.

Welch reviewed the materials independent of any developer's data.

Neither Welch, Scriven, or House/Hogben specified any responsibilities

for the release of the evaluation reports. All three were receiving

funds from the developer, however, the amount and schedule were un

reported in the final reports.


25

4. The Technical Design

House/Hogben . Scriven Welch

Objectives & Variables

Varied: See Text

Varied: See Text

Varied See Text

InvestigatoryFramework

Varied: See Text

Varied: See Text

Varied See Text

Instrumentation N.A. ChecklistApproach

ChecklistApproach

Sampling N.A. Not Used Not Used

Data Gathering See Text See Text See Text

Data Storage & Retrieval

N.A. ' See Text See Text

Data Analysis ProfessionalJudgment

IndependentSynthesis

IndependentSynthesis

Reporting N.A. N.A. N.A.

TechnicalAdequacy

Fair Best Good

All three studies reported slightly different ways of considering

objectives and variables. In the House/Hogben evaluation study, the

evaluators interviexved the project staff at a point late in the evalu

ation. The project staff were then asked the goal priorities of the

project: producing materials and getting them accepted by teachers.

In Scriven's goal-free study, the evaluators first wrote a goal-free

report. They then reviewed the developer's materials and found that

the developer's intentions were not what the evaluators had discovered

while reviewing the curriculum. Welch reported that no goals were

mentioned or reviewed by the evaluators. Both Scriven and Welch had

prespecified certain categories of variables by using a standard check

list.


26

In the Welch evaluation, the investigatory framework was described

as panel ratings of materials with a concurrent interview of the evaluees

by an intermediary. Scriven used a framework that included site re

views with observation and interviews. That framework can be summarized

in this way: observers used a standard instrument to describe exactly

what the treatment was at each site without any prior notions of what

the treatment was meant to be. The evaluators also used a standard

instrument to describe exactly what effects the treatment had with

students, teachersjand the overall school without any prior notion of

goals. After completing the observations and submitting the goal-free

reports, Scriven's staff examined all goal-related materials for a

comparison to their previous observations about treatment effects.

Any revisions of the GFE report that were needed at this point were

appended, but the original content remained unchanged. In the House/

Hogben study, there was limited description of the investigatory frame

work: site reviews with observations and interviews.

Both Scriven and Welch used a specially developed checklist ap

proach as instrumentation. House/Hogben reported no instrumentation

used. Again, the Welch and Scriven studies were similar because neither

used sampling. Both reviewed all sites and materials, respectively.

House/Hogben reported no information about sampling or about their

method of data storage and retrieval of evaluation information.

Scriven described his method of data storage as 150 to 200 reports in

a raw form with much narrative in an original state. Welch's data for

the evaluation was individual packets of materials that received alpha

betic ratings from A-t- to D- from each of the four judges.


27

Each of the three studies presented different methods of data

analysis. House/Hogben used a professional judgment approach through

a narrative. In Scriven's study, raw reports were synthesized once by

Scriven and another time independently by the project manager. The

two independent syntheses were then merged by the evaluation project

staff. In Welch's study, the alphabetic ratings were converted to

numbers and averaged by the fifth judge.

All three studies similarly reported no information about report

ing techniques for their summative information. Each provided their

clients with a physical report. However, Scriven provided a copy of

his goal-free report to the evaluees, the classroom teachers.

Regarding the technical adequacy of the evaluation studies,

Scriven's was the soundest of the. three. He provided good interjudge

reliability across site reviewers since they had been trained to use a

standard checklist. Good validity was provided by Scriven's use of

several content specialists, of multiple site visits across time,, and

of mixing the observers across sites. He presented a good chance to

maximize objectivity through use of a meta-evaluator to critique the

study as it evolved, and through a replacement procedure for evaluators

who became "contaminated” by learning too much about the goals.

Welch's study was the next best in terms of technical adequacy.

There was good interjudge agreement on ratings of each package although

the grades were consistently skewed towards high marks. There was un

certain validity in the study since the materials were reviewed in an

isolated situation without any users and no content specialist was

among the judges. There was moderately good objectivity since each


28

judge's rating was independent from the others.

House and Hogben's study was the least technically adequate of the

three. There was questionable reliability in this study since no in

terjudge calibration was pursued between the two evaluators. Validity

was open to question since the evaluation was a one-shot observation

and interview process. The study had questionable objectivity since no

bia3 checks were employed other than the evaluators externality to the

developers.

5. The Management Plan


OrganizationalMechanism'

ExternalContracts

ExternalContracts

ExternalContracts

OrganizationalLocation

Champaign,Illinois

Berkeley,California

Minneapolis, Minnesota

Policies & Procedures

N.A. N.A. N.A.

Staff Two Seven Five

Facilities N.A. N.A. N.A.

Data Gathering Schedules

N.A. Five Weeks N.A.

ReportingSchedule

N.A. N.A. N.A.

Training N.A. Implied Implied

Installation of Evaluation

N.A. N.A. N.A.

Budget N.A. N.A. N.A.

In reference to the organizational mechanism, all three evaluations

were contracts with an external agency. As mentioned previously, House


29

Hogben, Scriven, and Welch were respectively located in Champaign-Urbana,

Illinois; Berkeley, California; and Minneapolis, Minnesota. There was

no information available in the final evaluation reports about any man

agement policies and procedures that influenced the three studies.

Concerning staffing of the three studies, House/Hogben worked as a

pair of evaluators but they had received counsel about the goal-free

process from the CIRCE staff and Scriven. The Scriven study was staffed

by Scriven, a project manager, a kindergarten to third grade educator,

an early childhood consultant, two general site visitors and a meta

evaluator. Welch and four evaluation students did the work in the Welch

study.

There was no information available about the facilities used in the

three studies. House/Hogben and Welch provided no information about

their data gathering schedules. On the other hand, Scriven reported

that his study gathered data for approximately five weeks by classroom

observations and interviews. House/Hogben and Welch provided little in

formation in their final reports about reporting schedules during their

studies. Therefore, it was assumed that the final report was the main

report. Scriven similarly described the final report as the main sched

uled report, however, he also presented a packet of correspondence with

the meta-evaluator.

Neither Welch nor House and Hogben described any evaluation staff

training even though Welch referred to use of a checklist. Scriven

trained his evaluation staff to use both the goal-free approach and the.

checklist approach. There was no information across the three studies

about the budgets used. Cost figures would have been an important com

parison.


6. Moral/Ethical/Utility Questions

30


PhilosophicalStance

Unclear Unclear Unclear

ServiceOrientation

N.A. ImpliedConsumer

N.A.

Evaluator's Values

Unclear Unclear Unclear

Judgments See Text See Text See Text

Obj ectivity Unclear Unclear Unclear

Projects for Utility

N.A. N.A. N.A.

Cost/Effectiveness N.A. N.A. N.A.

There were no specific descriptions of the philosophical stance of

the three. Even though all three were in the position of judging the

objects that they were reviewing, it was unclear whether the three oper

ated from a value-based position. Based on Scriven's report and his

previous work in the goal-free area, it was assumed that his study em

ployed consumer-based values. Carrying the values issue further, it was

uniformly unclear across the three reports whether or not there was

potential conflict between the evaluator's values and other parties'

values who were involved in the studies. Judgments to be made in each

study were as follows: House/Hogben - evaluators judge the program;

Scriven - staff judges the program; Welch - panel judges the materials.

Objectivity of the three studies, in terms of cooptation, was also

considered. It was unclear whether the three were, or when they were

coopted by their clients. However, there has been some discussion about


31

checks against cooptation in previous sections (two and four):

Scriven's study had the best provisions of the three. The three studies

presented no information about the final report's utility to the client

or the evaluees. This would have been an interesting comparison point

since, as already stated, no previous research or evaluation study

provided an evaluee's or client's rating of evaluation•reports (which

is one purpose of this study being reported). Similarly, there was

no information about the cost/effectiveness of the three studies:

another useful comparison point.

These three examples of applying the goal-free technique differed

in objects evaluated, length of time, size of staff, background of

staff, and reporting techniques, even though all were considered

to be goal-free. These differences in procedures were to be expected.

As pointed out earlier, the technique was evolving and procedural

differences were normal evolutionary phenomena.

Some highlights did occur as similarities across applications.

For example, the case Involving Scriven and Welch used a type of

checklist so that observers/judges reviewed effects in similar dimen

sions. Both of these two examples included checks for objectivity

and a key staff member as a screening agent to review potentially biasing

materials and situations. Scriven's objectivity checks were more

extensive, but both attempted to maximize and protect the objectivity

that was assumed unique to the goal-free technique. Scriven's example

provided other highlights to applying his approach: training of

observers to use a standard protocol, releasing the final report to

the evaluees before delivery to the main client, appending reactions to


goal-related materials after writing the goal-free reports, a contin

gency plan for replacing observers who became goal-oriented before

ending observation of the treatment effects, a data collection schedule

that repeated observations and rotated observers, and interviews with

key consumers. Both House/Hogben and Scriven reached a point in their

evaluation process where they reviewed the developer's goals and

then cross checked that information with their observed data.

Some of these points were incorporated into the investigation being

reported here and will be further discussed in the next chapter about

methodology.


CHAPTER 3

METHODOLOGY

The first of the two purposes of this study was developmental:

to develop materials and procedures to implement an evaluation using

either a goal-free or goal-based technique. The second purpose was

exploratory: to investigate the relative efficacy or utility of these

goal-free and goal-based techniques for evaluation through a field

exploration of the evaluator/evaluee relationship. The methods used

to accomplish each purpose are presented in this chapter, starting

with the developmental purpose.

A general review of evaluation was done to ideixtify studies in

which GFE and GBE were implemented. As could be expected with a new

and evolving technique, few operational examples of the goal-free

technique were found. However, the illustrations by House & Hogben,

Scriven, and Welch gave suggestions for an activity sequence. Scriven

also gave a personal critique of the original plan for this study while

consulting at the Evaluation Center in 1974. His primary suggestion

was that if the study involved individuals with various backgrounds

and skills then a "checklist approach" should be considered so that

some calibration across evaluators would be possible. A decision was

made in this study to use a checklist approach to implement both GFE

and GBE.

A checklist approach was used in both the Welch and Scriven

studies that were reported in the previous chapter. However, at the


34

time that this dissertation study was being implemented there were few

instances in which a checklist was used in GBE studies. Therefore,

this investigator devised a checklist for both GFE and GBE.

The previous review about GFE reported that there were certain

things that a goal-free evaluator would do differently, and that

there were certain pieces of project data that the GFE'r needed to

avoid so that objectivity would be heightened while reviewing the

project. Scriven's writings alluded to several sources of project

information that were to be treated carefully, while doing a goal-

free evaluation.because of potential biases. Those pieces of infor

mation were broken down by pre-site visitation, and on-site sources.

A listing of these two types of information is included in Table

1 at the far left hand column. This left hand column provides a

general framework of information sources that both the goal-free

and goal-based evaluator would use. This list of information sources

enabled the investigator to analyze the two approaches to determine

which sources could and could not be used by GFE and GBE evaluators.

Within the two general evaluation approaches there were three

sub-categories. Each is presented in the following table as a

separate column. The columns labeled "Theory" reflect both the

GFE and GBE literature and its considerations of these information

sources. That is, did the writings on the two approaches allow use

of these sources? The remaining four columns present two existing

checklists and the two modified checklists that would be used in this

study.

The Scriven Checklist was his 1974 version of a "Checklist for


Reproduced with

permission

of the copyright owner.

Further reproduction prohibited

without permission.

TABLE 1Sources of Information and Their Use with Goal-based and Goalrrfree Evaluation

Sources Techniques/Uses

GOAL-BASED GOAL-FREE

Theory according to Tyler; Scriven

Stufflebeam

Checklist

ModifiedStufflebeam

Checklist

Theory ScrivenChecklist

ModifiedScrivenChecklist

A. Initial Contacts (Examples: telephone calls, letters, face-to- face conversation) Yes Yes Yes

Only if screened

Only if screened

Only if screened

B. Parts of the Project, or Program, Proposal

1. overview of the problem Yes Yes Yes Yes Yes Yes

2, needs assessment data Yes Yes Yes Yes

criticalYes

criticalYes

critical3. goals/objectives Yes Yes

criticalYes

critical INo No No

4. proposed strategies Yes Yes Yes No No No

Reproduced with

permission



without permission.

TheoryModifiedStufflebeam

Checklist

Stufflebeam

Checklist


ModifiedScrivenChecklist

5. proposed activity plan(s) Yes Yes Yes No No No

6. proposed staffing plan Yes Yes Yes

probablyNo

probablyNo No

7. proposed budget probablyYes Yes Yes

probablyYes

probablyYes Yes

B.

1.

Target Group/Evaluator Interactionscheck target group needs

probablyYes Yes Yes

restrictedand

screened

restrictedand

screened

restrictedand

screened2. check target group

treatment effects probablyYes Yes Yes

restrictedand

screened

restrictedand

screened

restrictedand

screenedC. Representative Pro

ject Materials1. curricular - stury

guides text materials, tests Yes Yes Yes

restrictedand

screened

restrictedand

screened

restrictedand

screened2. non-curricular -

environmental or experiential or "gestalt"

probably Yes j

probablyYes Yes

restrictedand

screened

restrictedand

screened

restrictedand

screened

Reproduced with

permission

of the copyright owner


without permission.

Theory Stufflebeam

Checklist

ModifiedStufflebeam

Checklist


ModifiedScriven

Checklist

D. Process Observation of Treatment

Yes Yes Yes

restrictedand

screened

restrictedand

screened

restrictedand

screenedE. Internal Evaluation

Data Examples:1. Data about cognitive,

affective and psychomotor effects like test results, report cards, graded papers, and student self- assessment Yes Yes Yes

restrictedand

screened

restrictedand

screened

restrictedand

screenedF. Historical or

Archival1. minutes of staff

meetingsprobablyYes

probablyYes Yes Yes Yes Yes

2. budget status reports probably

NoprobablyYes Yes Yes Yes Yes

3. internal staff correspondence probably

NoprobablyYes Yes Yes Yes Yes

4. correspondence between project & funding agent

probablyYes

probablyYes Yes

only if screened

only if screened

only if screened

Reproduced with

permission



without permission.

Theory Stufflebeam

Checklist

ModifiedStufflebeam

Checklist


ModifiedScriven

Checklist

5. miscellaneous progress reports

probablyYes

probablyYes Yes

only if screened

only if screened

only if screened

D. Overview of Research/ Literature in Area of Investigation No

probablyNo No Yes

probablyNo No

. ON-SITEA. Staff/Evaluator

Interactions1. staff introductions

to the project Yes Yes Yes

restrictedand

screened

restrictedand

screened

restrictedand

screened2. staff "PR" tours

Yesprobably

No Yes No No No

3. final debriefings Yes Yes Yes Yes Yes Yes4. data about long and

short-term effects or benefits Yes Yes Yes

restrictedand

screened

restrictedand

screened

restrictedand

screened

39

Evaluation of Products, Producers, and Proposals." The Modified

Scriven Checklist represents a version of this 1974 checklist that

was adapted to clarify procedures. The Stufflebeam Checklist was de

veloped by Stufflebeam and employed in earlier studies at the Evalu

ation Center. Again, the Modified Stufflebeam Checklist was this

author’s adaption of the Stufflebeam version so that it more clearly

emphasized the goal-based assumptions.

It should be pointed out that the Stufflebeam Checklist did not

conform to all of Scriven*s points found in his earlier definition of

goal-based evaluation. That is, it was reported earlier that "a goal-

based evaluation does not question the merit of goals; often does not look

at cost effectiveness; often fails to search for or locate the- appro

priate critical competitors; often does not search for side-effects;

in short does not include a number of important and necessary compon

ents of an evaluation." The Stufflebeam Checklist did include the

above points.

However, Scriven*s definition went on to say that "even if it

does include these components, they are referenced to the program

(or personal) goals and hence run into serious problems such as iden

tifying these goals, handling inconsistencies in them and changes in

them over time, dealing with shortfall and overrun results, and

avoiding the perceptual bias of knowing about them." It was unclear

about how the Stufflebeam Checklist dealt with these points. Even though

the Stuffelbeam version did not conform to Scriven's conception of GBE,

for the sake of comparison it was modified to reflect a goal-based approach.


40

Table 1 shows areas where the two techniques were different in

terms of using certain information sources. As one reads down the GFE

column, the terms "screened" and "restricted" appeared frequently.

Screened refers back to earlier discussions about an individual who

assists the goal-free evaluator during early stages of the evaluation

both in terms of editing materials and serving as a liaison to the pro

ject staff. This person serves as a critical buffer between the evalu

ator and sources of bias while the GFE'r is trying to employ strate

gies of discovery and investigation to uncover actual effects.

Restricted has more than one meaning. One. meaning is.that the

source is used only in isolation from the project staff. The GFE'r

does not observe any materials or activities wherein the project

staff might provide them with cues. Any critical explanation comes

from the GFE censor who works with the evaluator as an editor. Another

meaning is that the source simply must be "off-grounds" to the goal-

free evaluator. For example, the staft introductions to the project

that are often filled with public relations rhetoric are avoided at

all costs by the GFE'r, but might be reviewed, by the GFE censor.

This strategy maintains critical independence for the evaluator but

allows screened information to go from the censor to the evaluator.

This strategy of using the censor as a shield also allows the public

relations activities to occur and lessens possible negative reactions

•from the staff by simply not talking to them, as Scriven suggested.

Other points for discussion exist in this table. For example,

points II Al, 2, and 3 show that in the final stages of the evaluation

unrestricted evaluator and staff interactions are necessary so that


41

the hypotheses about the actual effects may be discussed with the

staff. A similar situation of final debriefing occurred in the evalu

ation done by Scriven. That is, Scriven's staff reviewed all the devel

oper's goal materials after deciding what actual effects existed at the

sites. They then debriefed themselves and reported to the developer

about the mismatches between intended and actual effects. Another point

is that most historical or archival materials were good sources, where

as, most parts of the project proposal were bad sources. Why the pro

posal was a bad source of information has been covered earlier. The

assumption underlying the goodness of the archival sources is that as

staff meetings occur and monies are dispersed these phenomena re

flect actualities rather than intentions. Hence, they become good

sources for goal-free evaluations.

With these structural, theoretical, and procedural differences in

mind, two evaluation handbooks were developed. "The Handbook for

Evaluators, First Edition" was developed (see Appendix C) for the goal-

free approach. With its development the Product Evaluation Checklist

was revised to add clarification and illustration. This first edition

had narrative sections that explained the setting of the evaluation con

tract, and provided a conceptual overview of the evaluator's role.

Individuals interested in detailed content are referred to Appendix C.

Similarly, the goal-based approach was developed through a check

list so that parallel structures existed. This effort can be found in

Appendix D. Prior to use, both handbooks were reviewed by four staff

members at the Evaluation Center for logic of presentation and overall

utility. Revisions were made after this content review process.


42

Another factor in development of the handbooks and the overall

techniques was that evidence should exist about the degree of implemen

tation of the two evaluation approaches. That is, there should be

reasonable evidence that the two approaches were operationalized and

executed according to plan. Lack of such reasonable evidence would

make any analysis of data or their interpretation meaningless. Two

indicators were developed. One indicator was an activity log that

described sources of information/interactions and the amount of time

spent with each. The other indicator was a process rating form to be

filled out by each evaluator after leaving each site. Review of these

indicators allowed the degree of implementation of the two approaches,

i.e., the independent variable, to be ascertained.

This ends a discussion of the first purpose of this study: to

develop materials and. procedures to do both goal-free and goal-based

evaluation. The process of development has been described . Appendices

C and D contain the actual products of development, along with specific

on-site procedures. These materials and procedures responded to the

first investigatory question: what would be the nature of materials

and procedures that are. developed to do goal-free and goal-based evalu

ations?

The second purpose of this study, to investigate the relative

efficacy or utility of the goal-based or goal-free techniques for

evaluation through a field exploration of the evaluator/evaluee

relationship, is discussed in the following section.


43

Subject Selection and Assignment

Evaluators for each technique were chosen so as to avoid a selec

tion bias in the investigation. To insure a rigorous and equitable

test of the two techniques several evaluators were selected from a

larger population. To develop a population of competent individuals,

a nationally-recognized group of evaluators and contributors to eval

uation theory were asked for their recommendations. Although not in

clusive of all possible contributors to evaluation theory, the fol

lowing persons were asked to recommend individual:

(1) Marvin Alkin (13) Richard Jaeger(2) Benjamin Bloom (14) David Krathwohl(3) Henry M. Brickell (15) Leslie McLean(4) Mary Anne Bunda (16) Howard Merriman(5) Lee Cronbach (17) James Popham(6) Robert Ebel (18) Malcolm Provus(7) Walter Foley (19) Michael Scriven(8) Gene Glass (20) Robert Stake(9) Egon Guba (21) Julian Stanley

(10) Robert Hammond (22) Daniel Stufflebeam(11) Thomas Hastings (23) Ralph Tyler(12) Ernest House (24) Wayne Welch

(25) Blaine Worthen

As can be seen in Appendix A, they were sent a letter asking for

one or two recommendations given these criteria:

1. Currently practicing evaluation as either a graduate student or practitioner in the field.

2. Able to commit approximately six days to the task.This included a one-day orientation session at the Evaluation Center.

3. Has a proven ability to operate as an independent evaluator.

4. Writes with an insightful, unlabored style.

5. Located within a radius of approximately 600 miles from Kalamazoo, Michigan: (to reduce travel costs.).


44

Thirty-oiie individuals who were believed to meet these criteria were

recommended.

Individuals from the population of thirty-one were randomly selec

ted by use of a random numbers table and concurrently assigned to

either the goal-free or goal-based treatment group through a coin flip.

Thau is, given a list of thirty-one evaluators, that individual whose

number was chosen first from the r'andom number table was assigned to

either treatment group by a coin flip. This routine was repeated until

all thirty-one were assigned to groups.

On completion of random selection and random assignment, there

were sixteen subjects in the goal-free group and fifteen subjects in

the goal-based group. For economical and logistical reasons three

evaluators were desired for each treatment group. However, not know

ing the actual availability of individuals that had been recommended,

oversampling was done by selecting and assigning all subjects to treat

ments. Subjects were then contacted in the order selected, until three

were available for each group.

Evaluators were contacted and told that within a contract being

implemented by the Evaluation Center at Western Michigan University

there was a need to recruit evaluation specialists to do site-review

work. They were also told that a small group of individuals would be

involved, that some training and orientation would be provided, that

two sites were involved for each individual, that individuals would

be expected to 3pend approximately two days at each site, and that

travel expenses, per diem, and an honorarium would be provided.

To give an estimate of the recruitability of those who accepted


45

the assignment, four out of sixteen subjects in the goal-free group

were contacted to obtain three evaluators. Six out of fifteen were

contacted in the goal-based group to obtain three evaluators. One

subject in the goal-based group resigned from the assignment after

accepting it due to an overcommitment of time. This subject was re

placed by going back to the original list of subjects for the group

and recruiting the next individual listed. Sites to be reviewed were

similarly assigned at random to the subjects to control for differences

that existed across sites that could inadvertently bias the results.

Site assignment utilized a lottery technique.

It is important to note that subjects were not told that there

were two groups involved or that a comparative study of evaluation

methodologies was occurring. Subject ignorance of the overall study

within a "routine consulting assignment" was maintained until a tele

phone debriefing session that occurred after all evaluation activities

were completed. No agreements were made with the evaluators regarding

confidentiality of data during the study, since they did not realize

the study was occurring. However, it was agreed during debriefing

that names of evaluators would not be linked to ratings of on-site

process or evaluation reports, and that confidentiality would be pro

vided through composite information.

Without specifying individuals by groups, here is a listing of

evaluators who participated along with their organizational affiliation

at the time of the study;

(1) Evelyn BrzezinskiMichigan State Department of Education


(2) Donald Coan Indiana University

(3) Stephen Kemmis University of Illinois

(4) Richard Smock University of Illinois

(5) Jane Stoller University of Minnesota

(6) Jerry WalkerOhio State University

Training

The GBE group was sent all available materials (i.e., proposals,

progress reports, survey results) for each project to be visited. In

dividuals were given background information about the two sites they

were to visit, and plans for travel arrangements. The GFE group re

ceived an edited version of project-related materials. For example,

some elements of the proposals were not sent because they included in

tentions and goals, whereas, some material from project-written pro

gress reports and a survey was sent as admissible material if it per

tained to a historical description of actual project achievement. This

author served as facilitator or editor for the two groups so that they

received information that was either goal-based or goal-free.

A one day orientation session for each group was developed and im

plemented. In general, it was reported during the orientation sessions

that the handbooks (Appendices C and D) were useful and understandable.

Subjects worked through possible uses of the checklist approach by a

group discussion of materials found in the handbook. Discussions

focused on operationalization of the checklist points in the handbook,


47

and potential strategies to be used on-site with various individuals.

Finally, subjects were asked to provide general background data on

themselves on the form provided in Appendix B. It was decided that

only a few unobtrusive questions would be posed to the subjects (like

"list prior evaluation experience") so that they would have less reason

to feel that the request was unusual.

Self-reported background data were categorized and quantified to

provide the following group summary.

Table 2

Composite Background of Subjects by Groups

Background Variables

Group n Sex Previous Experience3- Highest Degree

Goal-free: 3 1 Female 1 Extensive 1 Doctorate2 Male 2 Moderate 2 Masters

0 Little

Goal-based: 3 1 Female 1 Extensive 2 Doctorate2 Male 1 Moderate 1 Masters

1 Little

background variable of previous experience in evaluation was quantified by examining subject’s self-reported evaluation experiences on the background data sheet and scored as follows:

"Extensive experience" was equal to subject experiences in evaluation covering historically dated periods of time greater than three years.

"Moderate experience” was equal to subject experiences in evaluation covering historically dated periods of at least one to three years.

"Little experience" was equal to subject experiences in evaluation covering historically dated periods of less than one year.


48

It can be seen that the goal-free group had slightly more previous

evaluation experience, however, the goal-based group had more academic

experience at the doctoral level. It was assumed that the groups

were not different to the degree that either was biased, or that one

had a clear advantage, over the other.

Instrument Development

Three instruments were developed to collect data for this study.

Two were relatively short and of a Likert form, whereas, one was longer

and of a semantic differential form.

Both shorter instruments were used to measure several potential

elements of the on-site evaluation process. Items were developed to

assess the following process dimensions as was discussed earlier in

the section about development of the evaluation materials:

(1) Evaluator/Project Director Rapport(2) Evaluator's Time Utilization(3) Evaluator/Project Director Expectations of Each Other(4) Evaluator/Project Director Overall Satisfaction(5) Evaluator Confidence of Ability with Methodology

Although other elements of "the on-site evaluation, process could be ident

ified, it was assumed that ratings of these dimensions would present

a general assessment of the evaluation process. It was also, assumed

that an average score across items would yield a meaningful general

indicator of the quality of the on-site process. Evaluators responded

to items for all five dimensions, whereas, evaluees only responded to

items for the first four elements. In either case instrument length

was considered so that obtrusilveness' could be minimized.

The third instrument was a semantic differential and went through


a more technical development process. The Phi Delta Kappa Committee

(1971, p. 27-30) and Stufflebeam (1974, p. 5-11) had reported, at

the time of this study, three general categories of criteria that

prescribe necessary and sufficient attributes of evaluative informa

tion. (It should be mentioned that standards for evaluations have

been developed and are being published during 1980.) The categories

are technical adequacy, utility, and prudence. These three general

categories of criteria and the eleven specific sub-criteria follow:

I Gen

eral

Crit

eria I. Technical

AdequacyII. Utility III. Prudence

A. Reliability A. Relevance A. Cost-u cd B. Internal B. Scope Effectiveness•H 41IW >-< Validity C. Importance•H <U C - External D. Timeliness0) iHa. m Validity E. Credibilitycn o D. Objectivity F. Pervasiveness

Summarizing for each of the three, categories of criteria, it could be

said that they are focused on (1) technical soundness of information,

(2) usefulness of information to some audience, and (3) the reasonable

ness of obtaining the information. These elements were transformed to

bi-polar adjectives for evaluee use in rating the evaluation reports.

Since there was no evidence that these elements of utility had been

applied in this form as a semantic differential, a pilot test of a

larger item pool was reviewed to detect any other potentially useable

bi-polars. Before the pilot study is described, some aspects of this

semantic differential mode of measurement are considered.

Osgood (1957) presented a major criterion for inclusion of bi-


50

polars on a semantic differential instrument as relevance to the con

cept being judged. Written evaluation reports met the assumptions

presented by Osgood for selection of a concept to be rated with the

semantic differential. Osgood suggested that what may function as a

concept (or a stimulus to which the subject's checking operation is

a terminal response) in this broad sense is practically infinite ...

more often printed than spoken ... and the type selected depends

chiefly upon the interests of the investigator.

Since earlier studies (Tannenbaum, 1953, 1955) established the

validity of the semantic differential for use in measuring communi

cation effects, this type of instrument was assumed valid for measur

ing the utility of evaluation reports by the evaluees. A mean score on

the instrument represented a general measure of the evaluee's attitude

towards the overall utility of the report. The higher the mean score,

the higher the utility of the information.

Through a factor analysis study that Osgood undertook using

Roget's Thesaurus, a comprehensive pool of potential bi-polar items

was identified. Since some of these items appeared through logical

analysis to be relevant to the rating of evaluation .report utility,

they were added to the pilot instrument item pool. This presented a

pilot version of the instrument with 58 bi-polars on a seven-point scale.

The sequential position of items and the direction of the negative/

positive adjectives were randomly assigned. This was done to reduce

chances of a response set developing as evaluees did their ratings.

An estimate of the reliability of the semantic differential was

computed. If the instrument could not produce reasonably consistent


ratings of the evaluation reports, there would be difficulty in inter

preting the information due to the errors of measurement. Re

liability in this study was considered as stability over time. There

fore, a test-retest reliability study was implemented prior to using

the instrument with the evaluees.

Since it would have been difficult to collect ratings twice from

the evaluees, another group was used for the reliability study. Per

mission was granted to use students in three graduate level courses in

the Western Michigan University, College of Education. Since each class

contained approximately thirty students, it was decided to use one out

of the three classes for ease of administering instruments. The class

was selected randomly using a lottery method.

The chosen course was composed of both masters and doctoral level

education majors, and presented introductory methods of data analysis.

The instructor was given background information about the nature of the

reliability study, and understood that students could not be informed

that the retest was over duplicate materials. Individuals within the

class were given a group introduction to the task. They r*ere told

that instrumentation was being developed for rating evaluation reports

and that individuals were needed to try out che instrument so that

strengths and weaknesses could be detected.

As can be seen in Appendix F, subjects of the test/retest study

were provided with an actual, yet anonymous, evaluation report pro

duced by one of the evaluators'.: all subjects rated the same report.

Subjects had no knowledge of the true nature of the reported project.

However, this was a condition for all the subjects. Therefore, this


52

condition should have influenced them all equally. Directions also

gave conditions to subjects that established a simulated situation as

a project director. Further instructions to the subjects can be found

in Appendix F. Individuals who participated were volunteers who were

paid for their time after the data from the retest were collected.

Twenty-two individuals participated in the reliability study.

A Personian correlation coefficient was calculated by correlating

all individual mean scores on the first administration with mean scores

on the second administration. Test-Retest reliability, based on one

weeks spacing, for the pilot instrument was _r = .64.

Osgood (1957) reported a test-retest reliability study with a

coefficient of .85. This was the highest reliability coefficient re

ported. He reported that reliability of ratings declined over time,

due to changes in the rater, changes in the concept being rated, or

general errors of measurement, but did not specify the amount or degree.

Therefore, it was assumed that some of the difference between r_ = 1

(a theoretical ideal) and _r = .64 was due to change over time either in

the subject, or error.

The pilot version of the semantic differential was refined through

logical analysis of item distributions in the form of histograms with a

percentage of subjects at each scale point. Since the midpoint on a

seven-point semantic differential scale was considered to be neutral,

use of this mid-point rating could be assumed as restricting the over

all amount of total score variability. Reducing the amount of total

score variability reduced the size of any reliability coefficient

calculated on those total scores. Therefore, in order to increase the


53

reliability of the pilot version of the instrument, items that had 50%

or more of the ratings at the midpoint of the distribution were elim

inated. This eliminated nine items from the pilot version to produce

a final version of the instrument with 49 items. Although the final

instrument was not retested after revision, it was assumed to have ade

quate reliability for the purposes of this investigation even though

a reduction in the total number of items may have reduced the reliabil

ity.

The final form of the instrument can be found in Appendix G. In

structions for the project directors were sent in a letter along with

the evaluation report prepared for that particular site. It was assumed

that project directors were neither aware of the background of the study,

nor of any attempt to present differing evaluation methodologies at

different sites. Project directors weterinformed .that their ratings.were

providing feedback concerning their-perceptions of..the report's merit . .

for reviewing our procedures for future studies.

Data Collection Procedures

Background data on the evaluators were collected during the orien

tation sessions. Evaluator and project director ratings of the on-site

evaluation process were collected prior to evaluee ratings of the evalu

ation reports. Process rating forms can be found in Appendix D and E.

After evaluators had been to a site, they sent a tape-recorded,

rough draft of their report to the Center for transcription accompanied

by the appropriate version of the process rating form. All evaluators

returned process ratings for each site. Immediately after a subject

had left the project site, the project director was asked to rate the


54

overall process used on-site. All project directors returned the evalu

ation process rating forms.

As soon as the evaluation report was reviewed by the evaluator, it

was retyped and sent to the evaluee-along with the semantic differential

report rating form. Project directors were told that their ratings of

the evaluation report were desired for future planning purposes. All

evaluees returned the rating form within two weeks of receiving it.

After all reports were sent to the. project directors, and their

ratings collected, the evaluators were debriefed through a telephone

interview. No one reported prior knowledge of a comparative study be

tween methodologies. Therefore, it was assumed that no interaction

between groups had taken place that might have biased the study.

Data Analysis Procedures

Throughout this study data was collected to achieve the purpose

of investigating the relative utility of goal-based and goal-free tech

niques for project evaluation through an analysis of (1) evaluator and

evaluee ratings of the on-site evaluation process and (2) evaluee

.ratings of evaluation reports generated by evaluators- using either ap

proach. Three investigatory questions were operationalized. Each

question follows with the analysis procedure that was employed.

Question; When the materials and procedures of these two approaches are field tested, will the evaluators rate the evaluation process differently depending on which approach they are using?

Data from each item on the evaluator process rating form was totalled

and averaged across items to yield a score that had a range of one to

seven on a Likert-type scale. Since each evaluator rated two sites it


was assumed that scores for site one and site two were not orthogonal or

independent. That is, data from each evaluator were correlated.

The .-selected. statistical technique was a repeated measures or split-

plot factorial (SPF 2.2)experimental design (Kirk 1968).

The mixed linear model for that approach is as; follows:.

Xijm = p + ai +■ nm(i) + 8j + cigij + 0TTjm(i) + eo(ijm)

where Ximj was a measure for a randomly selected

subject m^ in treatment population •

______ ji was the grand mean of treatment populations.

ol was the effect of treatment i (evaluation type)which was a constant for all subjects within treatment population i.

01 was the effect of treatment j (trials or time) which was a constant for all subjects within treatment population j .

irm(i) was the constant associated with person m who is nested under level ai.

agij was the effect that represented the nonadditivity of effects ai and 0j.

giTjm(i) was the effect that represented the nonadditivity of effect 0j and 7im(i).

jEo(ijn) was the experimental error which is independent of all other e's and was normally distributed with a mean of 0 and a variance of a|. In this design eo(ijm) cannot be estimated separately from 07Tjm(i).

Presented graphically the design was as follows:


56

bl b2

S1 Xlll X112

al S2 X121 X122

S3 X131 X132

S4 X241 X242

a2 S5 X251 X252

S6 X261 X262

where a = treatments (GFE and GBE) ■b « trials, or site 1 and site 2

and s = subjects- or evaluators

Question: When the materials and procedures of ..these-twoapproaches are field.tested, will the evaluees rate the evaluation process differently depending on which approach is being applied?

Data from each item on the project director process rating form was

totalled and averaged- across items to yield a score that had a range

of one to seven on a Likert-type scale. Since each project was geo

graphically separate it was assumed that project director ratings were

independent of each other. However, it was also assumed that an im

portant source of variation in the project director ratings could be

the evaluators themselves.

That is, evaluators may have exhibited some personal or professional

traits that influenced project director ratings. In terms of experi

mental design this sort of underlying source of variation is called a


57

"nuisance variable" (Kirk, 1968). The appropriate statistical technique

for dealing with nuisance variables is a hierarchical or nested ex

perimental design. In this study six evaluators were nested within two

evaluation techniques giving two main treatment levels and six levels

of nesting. The design is; called completely randomized; hierarchical,

with two levels of one treatment and six levels of another. It was

designated as CRH-2(6).

The linear model for such a design isc a-a, fallows*:

Xijm = y + ai + 0j (i) + em(ij)

where Xijm was a measure for a randomly selected subject m in the treatment population a b ^ .

]J was the grand mean of the treatment populations.

ai was the effect of treatment i which was a constant for all subjects within treatment population i.

Bi(i) was the nesting of treatment 8 within treatmentA. This term was actually the pooled simple main effects of treatment B at each level of treatment A. No interaction term appeared in this model.

em(ij) was experimental error which is normally and independently distributed with a mean of 0 and variance of a|.

Presented graphically the design was as follows:

B1 b2 b3 b4 b6 b6

al abll ab12 ab13

a2 ab24 ab25 ab26

(n = 2)

where a = evaluation techniques (GFE and GBE) b = evaluators


58

Question: When the materials and procedures of these two approachesare field tested, will the evaluees rate the evaluation reports differently depending on which approach was used?

Item responses from the semantic differential report rating form were

totalled and averaged for each project director, yielding a total report

rating score with a range of one to seven. Identical to the previous

question, project director ratings were independent of each other. An

important source of variation in those ratings could be the evaluators

themselves. As was mentioned previously, the appropriate statistical

technique to analyze data that included a nuisance variable was a

hierarchical or nested experimental design. Once again, six evaluators

were nested within two main treatment levels of evaluation techniques.


CHAPTER 4

FINDINGS AND DISCUSSION

This section presents the results and a discussion of implications.

As the reader will recall this study had two purposes. Findings are

reported for each purpose and. its related investigatory questions.

The first purpose was to develop materials and procedures to im

plement an evaluation using either a goal-free or goal-based approach.

The related investigatory question to this developmental purpose was

What would be the nature of materials and procedures that are developed

to do goal-free and goal-based evaluations? In the previous chapter

procedures for developing the goal-free and goal-based evaluation

materials were described, and these materials can be reviewed in the

appendices.

It was also reported previously that instruments were developed

that would allow a general assessment of the implementation of the two

evaluation approaches. As discussed earlier, the treatments, i.e., the

two evaluation techniques, were applied in a field setting with little

control over the degree of implementation. If little evidence could

be found that the two techniques were used on-site at each project,

then discussion of utility or efficacy from ratings by the evaluators/

evaluees would at best be a difficult task.

The procedure to give a general assessment of independent

variable implementation was presented. This procedure involved

two pieces of information. One was a descriptive analysis of

59


60

the evaluator activity logs that were filled out while at each project.

The second piece of information included responses to a general .question

on the evaluator process rating form that were obtained after on-site

work. It was assumed that having an indication of the degree of im

plementation both during and after each site visit would give a general

overall description of the treatments that would aid in discussing the

findings and implications.

A review of activity logs revealed two major situations: inter

viewing people and a combined area of document review/planning. Within

interviews there were three groupings of individuals: the project

director, the project staff, and staff outside of the project. As can

be seen in Table 3, the evaluators using goal-free techniques did re

port a different pattern of activities than the evaluators using goal-

based techniques.

Table 3

Summary of Information Reported on Activity Logs By Evaluation Techniques

Reported Activity % of Time-GBE % of Time-GFE

1. Interviewsa. Project Director 21 21b. Project Staff 76 33c. Other Staff 0 9

2. Document Review andPlanning 3 37

Total Percent 100 100

Total Hours Logged 64.5 69.7


During their interviewing the goal-based evaluators spent 97% of their

time talking with the direct project staff. On the other hand, the

evaluators who were directed to use the goal-free technique spent 68%

of their time on interviews and made an attempt (9%) to interview people

who were not directly involved with the project. Little direct obser

vation was reported by either group; however, that may be a limitation

of the activity log format.

Looking further, it can be seen that evaluators using the GFE ap

proach reported spending 37% of their time reviewing documents they

acquired while on-site and planning. They reported the following types

of documents:

a. Student data from registrarb. North Central Accreditation reportsc. Minutes of meetingsd. Project budget changese. Institutional budgetf. Institutional profileg. Curriculum materials

The GBE group spent 3% of their time in this area and reported the fol

lowing document types as being reviewed:

a. Project proposal changesb. Project accountability filec. Curriculum materialsd. Project budget revisions.

It appeared that the two groups did report different patterns of activi

ties while on-site. The GFE group spread themselves across several ac

tivities, whereas, the GBE group tended to focus on internal project

staff interviews. The pattern reported by the GFE group suggested that

a goal-free approach focused evaluation activities across more potential

types of information sources to detect project effects.


62

The fifth item on the evaluator process rating form asked for the

degree of confidence in implementing the approach after leaving the pro

ject. The GBE group reported an average score of 5.83, whereas, the

GFE group reported 3.33. The difference between means was 2.5 points.

This practical difference was large enough to assume that the GBE group

felt more confident in their ability to implement the goal-based approach

than the GFE group felt in implementing the goal-free approach.

More information will be presented in a later discussion of the

evaluator process ratings. However, it can be said that the groups did

differ in reported on-site activities even though the group using the

goal-free approach reported less confidence in their ability to

implement the goal-free approach. These two pieces of information

supported the position that attempts were made to implement two different

evaluation approaches. This assumption will be considered during a

later discussion of findings.

Data that focuses on the second objective, to investigate the relative

efficacy or utility of the two evaluation approaches, will now be pre

sented. There were three investigatory questions that were related to

this field exploration of the evaluator/evaluee interaction. Two

questions referred to ratings of on-site procedures. The third question

investigated project director (evaluee) ratings of reports. Each

question and relevant data follow.

Question: When the materials and procedures of these twoapproaches are field tested, will the evaluators rate the evaluation process differently depending on which approach they are using?

Ratings were aggregated by sites. The following means and standard

deviations by sites were obtained:


63

Site 1

X S.D.

GBE 5.80 .20

GFE 4.40 .72

A split-plot or repeated measures analysis was applied. Results i

that analysis are presented in Table 4.

Table 4

Repeated Measures ANOVA Analysis of Evaluator Process Ratings

Source df MS F

Between Subjects 5 6.751 20.65*A 1 6.751 20.65*Subject within Groups 4 .321

Within Subjects 6B 1 .404 3.18 = N.SAB 1 .028 .05 = N.SBx Subject within Groups 4 .127

TOTAL. 11 *£ .05

These results show that there were statistically significant differences

between evaluator ratings of on-site techniques. It should be noted

that the two non-significant _F's established that there were no practice

effects or interaction effects between sites and techniques.

Site 2

X S.D.

' 6.27 .31

4.67 .61


64

It should be useful to look at the item responses for the two

groups to see where process differences originated. This seemed appro

priate especially in light of the significant £ ratio that was obtained.

It was not necessary to separate process means by sites since it was

found that there was no significant practice, or interaction, effect.

Table 5

Item Means and Standard Deviations from the Evaluator Process Rating Instrument

Items GBEX S.D.

GFEX S.D.

1. Rapport 6.50 .61 5.67 .80

2. Time Use 6.00 .32 4.83 .75

3. Expectations 5.83 .68 5.16 .524. Satisfaction 6.60 .49 5.50 .425. Confidence 5.83 .37 3.33 .98

These item means could range from a low of one to a high of seven. The

GBE group rated all items higher than did the GFE group. The largest

item mean and standard deviation difference was a 2.5 and .61 for item

five. The smallest mean difference was a .67 for item three. Items

two and four were very close in si-a of difference between groups with

a 1.17 and a 1.10 difference on the respective items. Item one was

close to item three in magnitude of mean difference with .83. General

izing across items it appeared that as a group the evaluators using

goal-free techniques were most different from the GBE group in (a) con

fidence to implement the technique, (b) time utilization and overall

satisfaction with the site visit, and least different in (c) rapport


65

with the project director and expectations of the project director as

an administrator. These generalizations are based on observations, but

were not statistically tested.

Open-ended comments from the evaluators on the process rating form

provided results that were helpful in interpretation. On the whole,

the GBE group had few comments about the methodology.. There were some

comments that more data should be collected to fully understand the pro

jects and that some reorganization of the checklist may be helpful. The

GFE group was more explicit. Across the three evaluators there was a

common feeling that their checklist approach would have been more ap

propriate for fully mature projects with greater quantities of data.

There was a feeling that the goal-free checklist was too stringent to

judge developmental projects and that there was a need for more descrip

tive categories of information within the checklist. It was also noted

that it was difficult not to gain knowledge of project goals since at

this stage of development little data existed that wasn't goal specific,

and goal oriented.

Question: When the materials and procedures of these twoapproaches are field tested, will the evaluees rate the evaluation process differently depending on which approach is being applied?

Scores averaged across items were analyzed with a completely randomized

hierarchical design with evaluators nested within treatments. The mean

and standard deviation for the GBE group was 6.04 and .69 respectively.

Whereas, the mean and standard deviation for the GFE group was 5.83

and 1.03. Results from the analysis can be seen in Table 6.


66

Table 6

Completely Randomized Hierarchal ANOVA Analysis of Project Director Process Ratings

Source df MS F

A 1 .130 .106 = N.S.

B w. A 4 1.229 2.594 = N.S.

W. cell 6 .473

TOTAL 11

A was a designation for the two techniques. B w. A was a designation

for the evaluators that were nested within treatments.. W. cell was the

experimental error term. These results showed that there was no signifi

cant difference between on-site ratings for the two groups b}’’ project

directors. Even though the GBE group was rated slightly higher than the

GFE group, the size of difference between ratings held no practical

importance.

This question of the study investigated project director ratings

of the on-site process associated with either the goal-free or goal-

based evaluation approach. Both project director ratings of the on

site process and the influence of evaluators within each approach were

found statistically non-significant. Project directors rated process

items that were parallel to those rated by the evaluators. Even though

evaluators differed in activity patterns and ratings, project directors

who rated the two groups did not significantly differ in their scores.

Question: When the materials and procedures of these two approachesare field tested, will the evaluees rate the evaluation reports differently depending on which approach was used?


67

Bi-polar ratings were averaged to yield a total report utility score.

The overall mean utility score and standard deviation for the GBE group

were 5.78 and .30, respectively. The mean and standard deviation for

the GFE group were 5.29 and .57. Report utility scores were analyzed

with a completely randomized hierarchal ANOVA procedure. Results from

that analysis can be seen in Table 7.

Table 7

Completely Randomized Hierarchal ANOVA Analysis of Project Director Ratings of Report Utility

Source df MS F

(A) EvaluationTechniques - 1 .741 1.55 - N.S.

(B w. A) Evaluators WithinTechniques 4 .478 25.16*

(W. cell) ExperimentalError 6̂ .019

TOTAL 11

*£ .01

There was no significant difference between evaluation techniques.

However, there was a highly significant difference between evaluators

within techniques. The difference was one that would not be expected

in 99 cases out of a 100 by chance alone.

These findings did not support the idea that reports that focused

on certain techniques were more highly rated. However, they do support


the position that reports produced by certain evaluators were more

highly rated than reports produced by other evaluators.

The developmental thrust of this study served to extend the

available materials and procedures to do project evaluation. Especially

helpful to future studies and practitioners was the operationalization

of the goal-free technique. It was found that evaluators can be

trained to use such a goal-free approach and that the training can carry

over to differences in the on-site evaluation process.

It should be kept in mind that any investigation has limitations

that temper results. In this study there were limitations in the check

list approach used as a protocol, in the sensitivity of the instruments

to some differences, in the small number of subjects and review sites,

and in the overall duration of the exploration. As was discussed

earlier, it was found that the goal-free checklist may not have been use

ful for projects that were immature with no developed products.

Similarly, it may have been too early to assess these particular projects

giving only a limited trial to potential differences between techniques.

The instruments used in this study provided general indications of

results. However, instruments with greater technical development may

provide more precise information. ' Another consideration would be to

better conceptualize the measures of utility and efficacy that were

employed in this study. For example, content analysis of the evaluation

reports could directly assess the amount of side effect data. This

study only applied the techniques once at each site during a short period

of time. It is assumed that repeated measures with a type of reversal

of techniques (i.e., goal-free converted to goal-based or two evaluators


69

with each using a different technique) would provide a more comprehensive

data base to assess the two evaluation techniques and their wide-ranging

differences.

The results do support the position that evaluators using the goal-

free technique would follow a different pattern of on-site activities

than would evaluators using goal-based techniques. Goal-free evaluators

reported a more comprehensive information base in both people inter-'

viewed and documents reviewed. Although this study did not provide a

measure of amount of side-effect data reported, it could be speculated

that the pattern reported by the GFE group would have a better chance

to provide side-effect information by exploring a more diverse non

project set of information. This speculation about possible scope of

actual effects reported seems plausible since the group using goal-

based techniques spent 97% of their time with the project staff directly, ‘

whereas, the GFE group only spent 49% of their time with the project

management and staff. These differences in types of information sources

and amount of time spent with project staff partially support the goal-

free theoretical contentions for differences in the process of the goal-

free and goal-based techniques.

At the same time, there were doubts raised by the evaluators in

the goal-free group that the checklist approach to goal-free evaluation

was as useful as it could have been. It was reported that the checklist

criteria were too stringent and not descriptive enough for early stages

of projects. This would lead one to consider using the approach with

more developed projects to investigate whether this checklist would be

useful for projects at that later point. If so, this would lead to a


TO

reconsideration of the point of entry of the evaluation using this check

list. If not, then further refinement of the checklist may be needed

by adding more descriptive information.

Scriven offered the opinion that doing goal-free evaluation was

threatening to the evaluator since the technique puts one’s profession

alism directly on the line. The findings supported that position. As

a group the goal-free evaluators uniformly rated themselves and the on

site process lower than did the goal-based group. The GFE group was

self-rated lowest on confidence to implement the methodology and on

ability to use time on-site to the best advantage. These two findings

would lead one to suspect that the GFE evaluators were more unsure about

what to do methodologically and how to fit the methods into a time

sequence.

The assumption that evaluees would see the goal-free approach as

more threatening than the goal-based approach was not supported. Both

groups of project directors were similar in their positive ratings of

the on-site process. It should be considered that there were several

plausible reasons that the project director scores were not lower for

the GFE process ratings.

The process rating instrument gave a very general indication of

the on-site phenomena viewed by project management. It was possible

that the instrument was not sensitive enough to areas where differences

existed. It was also reasonable to speculate that a response set may

have developed so that project directors were publicly positive but

privately negative about the evaluation process. Another possibility

was that the project directors were not experienced and sophisticated


71

in their contacts with evaluators. They may not have had experiences

that would help them generate personal criteria for assessing the eval

uation activities. That is, they could not perceive good and bad points

in the on-site process so only rated it positively. A replication of

this study with a more sophisticated group of evaluees may give different

results in terms of process ratings.

Finally, in terms of the evaluee ratings of the evaluation reports

generated from these two techniques, the results did not support differ

ences in utility or efficacy based on technique used. However, there

were differences in utility ratings between evaluators. There were

several possibilities for these findings. The techniques, or treatments,

may wash out during the reporting phases. The two techniques may be

more visible during implementation in terms of patterns of activities,

but reporting may be structured through the evaluator's experience

rather than a goal-based or goal-free influence.

It would be reasonable to assume that good evaluators would not

fit their reports into’.a> prespecified structure :for the sake of that

structure alone but draw upon past experience and knowledge to

structure their responses to audiences' information needs. It would

appear fruitful for further research on evaluation techniques to in

vestigate and document variables as a potential source of information

about differences in utility ratings. A content analysis of evaluation

reports from the two evaluation techniques would seem to be a logical

next step in this line of research. Also, a main limitation to be

mentioned again is that assessment of the two techniques processes and

products would have been enhanced by obtaining concurrent ratings from


n

other prime audiences, the panel of experts and the Hill Foundation.

This type of a meta-rating should be seriously considered in future

studies of a similar nature.

In surmary, a checklist approach to goal-free evaluation can be

operationalized as an alternative to a goal-based technique. However,

forms of the checklist used in this study may be too structured, or

inappropriate for a project early in its development. Evaluators using

the goal-free approach did show more anxiety during on-site activities

than evaluators using the goal-based approach. The evaluees did not

differ in their anxiety during implementation of the two techniques.

During the reporting phase of the evaluation activities, differences

between evaluators accounted for a large portion of report utility

ratings by project management. Differences between approaches did

not account for any significant portion of project management ratings

of report utility.


73

APPENDIX A

RECRUITMENT LETTER FOR EVALUATORS


I am currently directing a project within the Evaluation Center that has a need for short-term, qualified evaluators. In order to fill this personnel need I have contacted you, along with twenty-two other nationally recognized leaders in the evaluation field, to ask for your personal recommendations.

Specifically, I am looking for individuals who have a background in evaluation to do several site visitations during the last half of July, 1974. There would be a professional fee, per diem, and travel costs, provided by the Center. Basic qualifications for these individuals are as follows:

1. Currently practicing evaluation as either a graduate student, or practitioner in the field.

2. Able to commit approximately six days to the task. This includes a one-day orientation session at the Evaluation Center.

3. Has a proven ability to operate as an independent, solo evaluator.

4. Writes with an insightful, unlabored style.

5. Located within a radius of approximately 600 miles from Kalamazoo, Michigan; to reduce travel costs.


75

If you can recommend at least one, or more, person(s) who would meet these qualifications, please list them on the tear-off response form, and return in the enclosed, pre-paid envelope. I thank you for any help you can give me in this matter. If you wo'uld happen to have any questions, I plan to call you during the first week in June.

Sincerely,

John W. EversStaff Associate for Program Evaluation

JWE.-lje

(Tear Here)

1. Name of individual giving recommendation:

2. Recommendation(s)

Name:

Address:

Phone & Area Code:

Phone & Area Code:


76

Thank you very much for assisting me with recommendations for qualified evaluators. I received a large response and had many qualified individuals to choose from.

Finding it difficult to discriminate among those nominated, I selected names randomly. Therefore, not all individuals recommended were contacted.

However., I will be using the list for future evaluation studies, and may contact them at that time.

Sincerely,

John W. Evers, Director Hill Productivity Project

JWE:1j e


77

APPENDIX B

EVALUATOR BACKGROUND DATA FORM


78

TRAVELING OBSERVER DEMOGRAPHIC DATA SHEET

I. General

1. Name:__________________________________________

2. Business Address:____________ _________________

__________________________________________ (zip).

3. Business Phone: (area code)______ (number)_____

4. Home Address :__________________________________

___________________________________________(zip)

5. Home Phone: (area code)______ (number)__________

6. Social Security Number:_______________________

7. Present Job Title:_____________________________

8. Present Job Description______________________


79

II. Specific

1. Prior evaluation training - Academic

(approximate dates) (origin of the training)

2. Prior evaluation experiences - Vocational

(approximate dates) (categorical nature of the experience)


APPENDIX C

HANDBOOK FOR GOAL-FREE EVALUATION


81

HANDBOOK

FOR

TRAVELING OBSERVERS

The Evaluation Center Western Michigan University

J u ly , 1974

First Edition


82

AN INTRODUCTION TO THE HANDBOOK*

In accepting this work assignment from the Evaluation Center,

you are expected to follow certain methodological procedures for

collecting information and reporting it back via cassette. This

Handbook provides the following sections in assisting you with

your work.

I. Setting of the Evaluation Study.........................

II. A Conceptual Overview of the T.O.'s Role ..............

III. The Product Evaluation Checklist .......................

IV. Discussion of the Checkpoints...........................

V. An Expanded Checklist to use for Reporting Findings. . ...

Copy 1 Copy 2

VI. A Log of Activities..................................... .

Site 1 Site 2

Attachment A: An Introduction to the Checklist Approach. . .— Michael Scriven

Attachment B: A List of Contacts for the Two Sites ........

*In case of emergency situations while on-site call collect:

7:30 to 5:00 (616) 383-8166


83

SETTING OF THE EVALUATION STUDY

The Evaluation Center is currently contracted to the Hill

Family Foundation to assess the merit of a portion of the educa

tional projects they have funded this year. This foundation has

given various amounts of money to a group of independent, four-year

colleges to improve themselves, based on needs identified in a

study by the Hill Family Foundation. Three major educational needs

were identified generally. They are: 1) sharply rising instruc

tional costs, threatening the college's financial stability, 2) per

student costs rising faster than per student income, and 3) faculty

salaries comprising two-thirds of instructional costs.

This problem/needs situation was identified as the outside

parameters of the area of funding. Each institution proposed an

alternative strategy to the foundation. Some were very comprehensive

in scope, and others; limited. The Center began its contract to

study the Hill project in February, 1974 and will run through

January, 1975. We have previously held an orientation session with

the various project directors to hear their plans and allow them

to hear the Center's evaluation plans. A survey was sent to the

participating institutions in the beginning of June. In general,

there are several phases to the overall project, your work and role

as a traveling observer (T.O.) is one of those phases.

One should note that a limited amount of information will be

provided as introduction to the study. This is by design. The

rationale should become more apparent as the following sections

progress.


A CONCEPTUAL OVERVIEW OF THE T.O.'S ROLE

To gain a relational perspective to the work of the traveling

observer, the evaluation study phase preceding the T.O., and the

phase following will be discussed. The phase preceding deals with

a survey, and the phase following involves visitations to the

project sites by a panel of experts.

Only a limited amount of data gathered in the survey will be

made available to the T.O. This information concerns mainly

identification of various individuals on site who would be possible

resources during visitations. These individuals will be identified

more specifically during the T.O. training session held at the

Center and referenced in the next section on procedures. Information

withheld will not be crucial to the function of the T.O. on site.

By comparing the survey information to the T.O. report, the Center

will get a more accurate picture of the individual college projects

for the panel's visitation. It is thought that two independent

perspectives (the T.O.'s report and the survey) will give a more

valid portrayal of the situation than would one, combined

perspective.

The phase following the T.O.'s work is that of a panel of

content experts revisiting the sites to make a synthesized report

based on results of the T.O.'s report, the survey, and on their

own independent observations. One might consider the T.O.'s work

as a preliminary, summative, site visitation by a "professional

detective" to uncover and describe as many actual project effects


as possible. Then, another group of experts will follow-up on

those hypothesized effects to synthesize as accurate as possible

a portrayal of the merit of each particular project.

The panel consists of individuals who can apply expertise in

several content areas. The kinds of perspectives represented are

those of evaluation, higher education administration/finance/

economics/planning, staff utilization, and curriculum development.

As will be seen later, the T.O. will need to make recommendations

by those perspectives for each panelist to follow-up by their

specific content area.

Specifically, the objectives of the T.O. are the following:

1. To collect both descriptive and judgmental information on

each specific project at two college sites based on the methodology

presented in the hext section.

2. To summarize the raw information collected at each site on

cassette tapes to be mailed to the Center before proceeding to the

next site, responding to the format presented in a later section.

3. Edit the raw transcriptions into a report that will go

to each project director for the director's reactions. There will

be a time lag between this editing, use of the unedited transcrip

tions by the panelists, and subsequent reaction to the T.O. report

by the project director.


86

A SPECIFIC T.O. PROCEDURE ON SITE1

As one should have noticed, specific references have been made

in the earlier sections to the fact that certain available informa

tion will not be given to the T.O. before his/her visitation. It

is also the case that the T.O. should guard against certain kinds

of information whi]e on site. Specifically, the methodology being

referenced, procedurally developed, and implemented has been called

goal-free evaluation by Michael Scriven. Although a relatively

small amount of evaluation has been done in this mode, hopefully

one spin-off of this project will be further development of the

goal-free approach for evaluation.

To do goal-free evaluation, one is not being asked to do

fact-free.or information-free evaluation. Information, both

descriptive and judgmental, is as necessary to operationalize

summative, goal-free evaluation as any other approach one might

take. What one is specifically to guard against is information

that each specific project poses as intended goals. Information

on intended goals is most frequently found in proposals, progress

reports, and orientation sessions with the project staff (if the

evaluator does not carefully consider the nature of the questions

asked, or the responses freely given).

Much of the narrative concerning goal-free evaluation and the product checklist relies heavily on documents authored by Michael Scriven; however, revision and editing have been included for clarification and extension of the original ideas.


Scriven (December, 1972) is clear on the Issue behind this

goal-free approach. Basically, he lays out the argument that

knowledge of the project's proposed, alleged, or intended goals

more often than not produces a perceptual set for an external

evaluator that biases, or contaminates, his judgment of the project's

real achievements. Scriven mentions in that December article that

while he was reviewing disseminable products for the labs and

centers, he often found that the producers presented the intended

goals of the product as evidence of its actual achievement. The

differentiation tc be made is between intended achievement and

actual achievement. The case cannot be made that because something

was intended that intentions transfer automatically to actual

achievement.

Many things occur in the life of a project that can effect its

actual achievement. However, when the staff are working within that

project too often they develop, as Scriven points out, "tunnel

vision." That is, too often the project staff will develop a

perceptual set from the proposed intentions that biases their

representation of the actual project to any external reviewing

agent. This is not raising an issue of honesty. It is more a

question of not being able "to see the forest for the trees." This

problem for the internal people of perceptual set can be reversed

to be the strength of an external evaluator who provides an

independent, unbiased opinion. Therefore, one should do goal-free

evaluation.


Goals are unnecessary noise for an evaluator, according to

Scriven. Goals and objectives are a means to the planning and

production of achievement, but evaluation is assessing merit of what

has actually been achieved. One does not need to know the steps in

planning that lead up to actual achievement because having exposure

to those levels of intended achievement leads one farther and farther

from looking for evidence about side-effects. If the evaluator is

focused by the project’s goal statements to look mainly for verifi

cation of intentions, (s)he loses the potential value that an

external goal-free evaluation can contribute. That is, an external

goal-free evaluator is not looking for evidence only in areas

everyone knows about. (S)he looks in all possible areas for the

project’s actual achievements, and possibly picks up evidence about

achievement that has been previously overlooked, or can be interpreted

in a new perspective.

Goal-free evaluation does not mean that the evaluation is

comparison-, or standard-free. That is, dropping the perspective

of comparing the project's achievement to its intended goals does

not rule out a comparison against -tandards in writing the evaluation

report. Again, goal-free evaluation is not standard-free, and any

standard may be (and usually is) someone's goal. Goal-free evaluation

is free from the goals of the consumer (or at least some consumers),

or of the funding agency. The point is that the basic standards of

merit used by the goal-free, evaluator are constructed without

reference to anybody's goals.


89

The best standards to compare against a project's actual

achievement are to use the needs of the intended target population,

or the needs of the consumer. The producer, or project staff,

should have used consumer needs in established intended goals, so

the goal-free evaluator and the project start at the same place.

However, the difference is that the goal-free evaluator does not

accept the project's judgment of the best way to combine those

consumer needs and the project's resources into a worthwhile

product. Important errors can be made by the project in its

judgment of worthwhile intended achievement. New evidence can

often be turned up in the goal-free review of needs assessment

that may put intended achievement in a new light, even though the

project's intentions were well-justified at the time. Once the

evaluator sees that the project's goals are not beyond criticism,

and that one would criticize them against the needs to which they

are supposed to be responsive, the external goal-free evaluator

sees that (s)he can bypass the project's formulation of goals

because the crucial question is not what the project intended to do,

but what was actually achieved. The goal-free evaluator uses

•current needs data, casts no aspersions on the project with regard

to the original selection of goals (since (s)he knows neither the

goals nor the data on which they were based) and gets straight

into judgments of congruence between actual achievement and needs,

against costs.


90

The advantages of bypassing goals are numerous, but it is con

sidered to be harder to do evaluation against needs rather than

goals. Needs data is sometimes hard to get and needs analysis

involves some evaluation in itself. Needs, unlike wants, are

dimensions of mismatch between actual and ideal. But contrary to

past criticisms, it is false to say that the goal-free evaluator

simply substitutes his/her own value judgments for those of the

project's. The goal-free evaluator must be able to support any

claims about needs against which the evaluation is made. If there

is support for those claims, and for the logic of the evaluation,

then we have an evaluation which may have absolutely no reference

to the goals of the evaluator at all. The evaluator's goals may

be doing good evaluation, or filling the professional need for

further development; of a goal-free methodology.

Finally, it is worth remembering that the external goal-free

evaluator is not going to miss the main aims of the project since

(s)he would want to look at representative materials aimed at the

target population, and to observe the process of the project. If

(s) he does not notice the project's intended main goals as actual

achievement, it is a good bet they play a minor role. Sometimes

(s)he will miss them, but there will be some pretty interesting

compensating observations.

Although the preceding paragraphs are only a brief perspective

on goal-free evaluation, they can be considered as theoretical

highlights. Depending on an evaluator's past professional

experiences, (s) he might think it is absurd not to directly


collect evidence on a project's intended goals. However, the task

here of the external goal-free evaluator is not to test intended

achievement, but to test actual achievement which may be different,

or similar,' to that which was intended.

In order to provide a consistent reporting format for the

goal-free evaluator, the following section presents a checklist

approach to gathering and reporting project information back to

the Center.


92

THE PRODUCT EVALUATION CHECKLIST

Introductory comments on the checklist approach by Michael

Scriven can be found in the appendix; however, here is a preliminary

note on the status of the following checkpoints themselves, illus

trated with an example by Michael Scriven. Rod Stephens, one of

the most brilliant yacht designers of the twentieth century,

recently published a 100-item checklist to be used in the evaluation

of racing and cruising yachts. The status of every item in that

checklist can be expressed by saying that each is desired as

essential. The items in the following checklist are not, except in

the one case noted, desired as essential. They are essential.

Each of these conditions must be met in order that one should have

solid grounds for a conclusion of actual achievement for an

educational project.

There are often occasions on which an evaluative decision

must be made without meeting all these standards. For example, a

project may be planned and the arrangements such that it must be

implemented, and hence some strategy must be selected for it. The

T.O. may not be able to determine whether the strategy selected is

of overall merit at that point in time, but it would be desirable

to determine whether the one selected is the best available. If

the T.O. does not feel qualified to make a judgment of a project's

strategy, (s)he should point out the issue in a recommendation to

the panelists. In such a case, one can use the checklist for

comparative assessment.


93

Quite often, the T.O. may be able to make a very plausible

estimate about two or three of the items from the list on which

there is no direct evidence. This is an acceptable procedure,

especially since evaluation funds are minimal; however, this

estimation should be noted for the panel. There are special cases

where a project can be defensibly implemented, without all the

checkpoints being met. The evaluator should treat those situations

as unusual. That is, the evaluator should treat each item in this

list as a claimed necessity for a meritorious project. With that

perspective in mind, it is more likely that (s)he will uncover the

actual achievements of the project.

The general structure of the checklist is as follows: Items 1

and 2 (Need and Market) are the pre-conditions, without which no

project will have any actual value. If they are met, at least

tentatively, we can then proceed to look further at the proposed

strategy. Items 3-10 tell the evaluator the kind of information

that must be looked for. Checkpoints 3-10 only refer to categories

of information, not to quality of the project's performance in each

of these categories. To put it another way, in checking 3-10, the

evaluator is only finding out whether the car has wheels, not

whether they are round. If the project passes this preliminary

inspection, it may then be asked how well it did on dimensions

3-10, compared against the need and market considerations of 1 and

2, i.e., Are the wheels round? square? in-between? Synthesis of

1-10 gives the score for checkpoint 11, actual educational

significance, a payoff checkpoint. Then the evaluator looks at


the project's cost and combines it with checkpoint 11 to give

a measure of cost-effectiveness at checkpoint 12. And finally,

the evaluator looks ahead at 13 the desired as essential checkpoint

of post-funding support.

It is suggested that the following points be rated on a five-

point scale, 4-0. "Meeting a checkpoint" is then defined as

scoring 2 or better. The numbers should be expanded verbally as

illustrated for the first checkpoint here-, and on the full form

that follows. It is suggested that the T.O. read through a

discussion of the checkpoints. Then, review the expanded checklist

against the discussion, looking for clarification of types of

information asked for on each point. Next, read the directions

of how to use the checklist for reporting. Be prepared for a

discussion at the orientation session.


DISCUSSION OF THE CHECKPOINTS

1. Need (Justification)

The goal-free evaluator is concerned here to see whether there

is good evidence that the project fills a genuine need (perhaps

those identified by the Hill Foundation and enumerated in the

Introduction) or— a weaker alternative— a defensible want. True

needs assessments involve establishing that the project actually

facilitates a consumer group's survival, health, or some other

defensible end that is not now adequately serviced. It may involve

moral, social, and/or environmental impact considerations.- This

point is listed first because most proposed projects fail to pass

even this requirement. The usual data under "needs assessment"

refers to deficiences on norm-referenced tests or annual budgets,

which tells the evaluator nothing about need at all, without

further data.

Since this particular point of the checklist is the essential

starting place for a goal-free evaluator, more time will be spent

developing thi3 point over the others. For example the goal-free

evaluator arrives at the project site, and may wonder at a

reasonable starting point. Here are two to consider:

1. Approach the project director. Without allowing a lengthy

orientation session to develop, ask him to identify individuals

on campus who are, or will be, representative consumers, or users,

or the target audience, for the project's efforts. Then, leave

the director to spend some time with that target group synthesizing


their needs as independent data, or perhaps looking for a congruence

between what the Hill Foundation identified as educational needs and

this group's needs. It would be important to consider whether those

consumers identified by the project have a unified perspective to

identifying needs, a diverse set of needs, or a polarized, bi-modal

set of needs. The T.O. should also consider using institutional

records on-site in assessing needs. Since they probably weren't

compiled by the project staff they might be considerably bias-free.

One should also consider the possibility of securing copies as

supporting evidence to the panel's later visits. (Since this

particular evaluation is limited to a few days on site, and there

are other checkpoints on which to gather data, the goal-free

evaluator should not spend all his/her time in this needs area.

Perhaps, one-third of the time on site might be an a priori rule-

of-thumb for this area, but each goal-free evaluator will have to

use his/her own professional judgment as to time spent in this needs

area based on the particular situation of each project and then,

make recommendations to the panel as necessary.). The T.O. should

consider that by accepting only the project's definition of target

group, a selection bias is being put on the needs assessment. If

other groups are possible targets the T.O. should consider their

possible answers.

The point is that in spending time identifying the needs of

the target audience (perhaps narrowed to those general educational

needs previously identified by the Hill Foundation), the goal-free


evaluator is developing a set of standards to compare against the

actual achievements of the project. If it is the case that the

actual achievements are responsive to the needs of consumer, then

the a/aluator has identified one point of merit for that project.

If it is the case that the evaluator finds a discrepancy between

needs and actual achievement, there may be several possible causes.

For instance, the project never did a needs assessment; the needs

have changed since the original assessment was done; the project

selectively responded to particular needs (for political, financial,

etc., reasons), or any of many possible causes. The whys of the

project's congruence or discrepancy with needs will be further

elaborated in other points.

2. Another approach to begin a goal-free evaluation would be

to similarly approach the director, but ask if you can observe the

process of the project before going any further. It would probably

be better to consider the project director's role as an interpretor

if he was observing with the goal-free evaluator. Would his comments,

helpful as they might be, be goal-laden and therefore biasing? The

difference in this second starting point would be observing the

actual process and then, inferencially hypothesizing the needs that

would justify such a process. This approach leads a goal-free

evaluator to do needs assessment by inference, and may be the only

possible way if the target audience is judged to be inaccessible

for the evaluator's questioning. This second approach is the same

process that one would use under checkpoint 8 (Performance-Process) ,


98

but the end product of the process is to establish needs by

inference.

In scoring need, the following should be taken into account;

number of people involved, social significance of the need, absence

of substitutes, urgency of the matter, possible multiplicative

effects.

Cost level may or may not be part of the need specifications.

It should be, but if not, this checkpoint has to be restudied (as

does checkpoint two) after cost data on a particular product is in.

It is undesirable to use "selected expert" judgments to establish

need, if there is any chance another selection would deny it.

(But many important innovative projects can do no better.)

The five-point scale for need might look like this:

Maximum priority, desperately needed 4

Great importance 3

Probably significant need 2

Possibly significant need 1

No good evidence of need 0

Note: It is important to consider how the T.O.’s time would be

spent on-site. Two possible starting points have been considered.

It should be possible after being on-slte for two to three hours

for the T.O. to plan a schedule of remaining activities. Other

wise, observational time may expire without covering some highly

potential data source. This planning phase is highly recommended,

and one should record activities in the following log, see page 60.


99

2. Market (Dissemination)

Many needed educational projects are difficult to assess

because of limited linkages to the target audience. It is difficult

to argue for continued development unless there is a special,

preferably tested, plan for getting information used through subsidy,

legislation, or agents. For this reason, dissemination plans

should antedate detailed project development plans. Checkpoint 2

requires that there be dissemination plans that ensure a user

market. It is scored on the size and importance of the demonstrably

reachable audience. This is quite different from the size of the

group which needs the project. It is, if you like, the pragmatic

aspect of need. The dissemination plan or procedure, if already

operative has to be clear, feasible in terms of available resources,

expert and ingenious in its use of those resources, and keyed to

the need(s).

This point would be rated by the following:

Very large and/or important marketwill be reached 4

Large and/or important market willbe reached 3

Significant market will probably bereached 2

Possible, but not probable, that asignificant market will be reached 1

Inadequate evidence to suggest that asignificant market will be reached 0


1003. Performance— True Field Trials

The first of several "performance criteria" — actually criteria

for the kind of evidence about performance— stresses the necessity

for field trial data that refer a) to the final version; b) to

typical users who are c) operating without producer (or other

special) assistance, in d) a typical setting/time frame. It's

very tempting for a project to think that they can extrapolate from

field trials with volunteer schools who get materials and phone-

consulting free, or from the penultimate edition of the materials,

but this has frequently turned out to be unsound. In actual

practice, deadlines, overcommitment, and underfinancing combine to

render almost all projects deficient on this point of field trials.

Sometimes a project can make a reasonable guess, but project staff

too often tend to make optimistic guesses instead, which is an argu

ment for outside evaluation. It is much better for a project to

quote actual statistics on educational problems of the kind that

this type of project has satisfactorily handled in past field

trials, or is plausibly believed capable of handling at this

particular site. One should check the project's knowledge of

both past typical use of the strategy, and the project's proposed

plan for future field trials, and rate as follows:

Perfectly typical field trial data 4

Minor differences from typical field trial 3

Reasonable bet for generalization from trial data 2

Serious weaknesses exist in trial data 1Relevance is unclear from data 0


1014. Performance— True Consumer

The concept of "the consumer" tends to be interpreted differently

by different participants. In-service teacher training materials,

for example, will be consumed by: a) superintendents or assistant

superintendents in charge of staff development programs, b) teacher

trainers, c) students, d) taxpayers. To decide what data the project

needs with regard to which of its consumer groups requires a very

clear sense of the function of evaluation itself; which audiences

is it addressed to, commissioned by, and— regardless of those two

considerations— responsible to.

Quite often there will be several groups of consumers (identi

fied by the needs, or market checkpoints) of a given product,

each interested in different aspects of it. Data should be

gathered on all and scored separately. Failure to provide data

on any of the important relevant groups may constitute a fatal

defect in the project’s internal evaluation data, or it may just

be a weakness. Failure to provide data on performance for some

significant consumer group is of course fatal to the project. One

would rate this point as:

Full data exists on all relevant consumers 4

Fair data exists on all relevant consumers 3

Good data exists on the most importantconsumers 2

Weak data exists on the most importantconsumers 1

Only speculative data exists about mostimportant consumers 0


1025. Performance— Crucial Comparisons

There are few, if any, useful project evaluations which can

avoid the necessity to present data on the comparative performance

of the critically competitive products. All too often, project data

refers to some pre-established standards of merit (but see below)

and the evaluator has no idea whether one can do better for less, or

twice as well for 5% more, etc. — which is typically what an

evaluator wants to know. Where comparisons are done, the results

are sometimes useless because the competitor is so chosen as to

give a false impression. The worst example of this is the use of

a single 'no-treatment* or 'last year's treatment' control group.

It's not too thrilling to discover than an injection of $100,000

worth of CAI can improve the math performance of a school by 15%, if

there's a possibility that $15,000 worth of programmed texts would

do as well, or better. There are few points where good projects

distinguish themselves more clearly than in their choice of critical

competitors. Sometimes they must be created by converting the

program from the CAI memory into a programmed text, which may yield

a competitor at 10% of the cost and with the same content plus the

advantages of portability and simultaneous useability.

Critical comparisons is rated:

Good data on all important competitors 4

Good data on most important competitors 3

Fair data on the most important competitor(s) 2

Lacking data on some of the more important competitors

Little or no useful comparative data

10


103

6. Performance— Long-term

A follow-up is almost always desirable, often crucial— since

certain undesirable side-effects often take quite a while to

surface, and good results fade fa3t. It may be the case that the

goal-free evaluator can only check whether a follow-up is planned

or not, and rate as follows:

Good direct evidence about the effects exists at times needed 4

Some direct evidence about the effects exists at times needed 3

Follow-up gives reasonable support tosuggest a conclusion about effectswhen needed 2

Follow-up or other data suggests a conclusion about effects when needed -1

Useless or no follow-up, no othergrounds for inferring long-termeffects 0


104

7. Performance— Side Effects

There must be a systematic, skilled, independent search for

side effects during. at the end of, and after the project's actual

effect. Project staff are peculiarly handicapped in such a search,

by goal-oriented tunnel vision; here the outside evaluator

operating in the goal-free mode is particularly helpful. This

is a checkpoint which one is tempted to regard as icing on the

cake, but the history of educational innovation makes it clear

that the risk of doing so is too high to be conscionable.

Since it's the case that the goal-free evaluator does not

know the intended effects of the project, all his/her comments

about actual achievements can be either main effects, or side

effects. It's not the case to worry main vs_. side effects in a

goal-free mode. All observed effects are actual effects, and should

not be differentiated as other than actual effects unless some

evidence can be given that shows differentiation. However, it's

probably the case that if one can differentiate actual effects in

main-and side-effects, the evaluator may have to consider whether

or not the biases of the project staff have effected his independence.


105

8. Performance-Process

Process observation is necessary for three reasons. It may

substantiate or invalidate (a) certain descriptions cf the product,

(b) the casual claims involved in the project's internal evaluation

(that the gains were due to this treatment), and (c) it may bear

on ethical questions that have pre-emptive force in any social

interaction such as education. Since (c) is always possible, this

checkpoint is always necessary. In many cases (a) and/or (b) also

make this checkpoint necessary— but not in all cases. For example,

a product called an Inquiry Skills kit may not deserve the title,

either because of its content or because of the way it is or is not

implemented in the classroom.

As was mentioned in the first checkpoint, the goal-free

evaluator may want to start by observing the process to infer

needs. If that is the case, the goal-free evaluator may not be

able to do (a), invalidate the description, because (s)he has

not read an intended description beforehand. However, it would

be essential that (s)he comprehensively describe those actual

effects that are observed.


106

9. Performance— Causation

One way or another, it must be shown that the actual final

results reported could not reasonably be attributed to something

other than the treatment of the project. No way of doing this

compares well with the fully controlled experiment, and ingenuity

can expand its use into most situations. There are sometimes

reasonably good alternatives to experimentation as well as bad

ones and the best possible must be used.

The goal-free evaluator needs to consider the adequacy of

the project's internal evaluation design, or procedures if there

is no documented design so this checkpoint should cover that. It

may be the case that causation can never be implied. However, a

good project considers procedures to eliminate rival hypotheses

to a causal claim. The evaluator should consider a planned inter

vention of more merit than a post hoc analysis. Questions should

be raised about invalidating elements due to selection bias,

history effects, instrumentation, regression, and so forth.

Although a rigorous procedure may be inappropriate for assessing

the effects of the project, some attempt should be made for more

than testimonials as evidence.


107

10. Performance— Statistical Significance

This is frequently the only mark of sophistication in a

project’s evaluation design. However, it is worthless without

the next item, educational significance.

This point is considered concurrently with review of the

previous point. Whether the project is attempting a correlative

approach, use of non-parametrics, or fiscal analysis, an attempt

should be made to see whether effects can be attributed to chance

variation, or sample fluctuations, and degree of confidence in

any effects.


108

11. Performance— Eudcational Significance

Statistical significance is a desired essential, but it's all

too easily obtained without the results having any educational

significance, especially a) by using the magnifying power of a

large n, and b) by using instruments that assess dubious concepts,

or c) non-generalizable gains (where they should be generalizable).

The evaluator needs to look at the project's actual achievement.

Then (s)he needs to apply the perspective of an evaluator's

independent, expert judgment that gains of that size on those actual

outcomes represent an educationally significant result. The raw

data need not be reported in the project's evaluation but the

grounds for thinking them important must be reported— usually this

involves a back-reference to the needs assessment, typically somewhat

amplified in details. An explicit congruence check to the needs

assessment would still have to be given, at some level, to ensure

that judgment of educational significance relates to an actual

need and not to an alleged want like staff support, or equipment

acquisition with external funds. If either want is_ the main basis

for the judgment of educational significance, then we do not have

evidence that the need we have carefully validated at checkpoint 1

is being met. There should be no suggestion that late-discovered

dimensions of educational significance are illicit. Any development

process must search for them and hope for them— but then there must

be a recycling of the needs assessment by the project.


109

The external evaluator has to go beyond project-generated,

subject matter experts' reports since (s) he has to combine this

point with results from checkpoints 3-10 in order to achieve an

overall rating of educational significance. For example, if there

is some doubt whether the results were due to the treatment, or

whether the side-effects offset the main effect, etc., then the

merit of the project must be judged less positively, even if the

needs congruence leads to a very favorable rating of the project by

its internal staff.

In practice the'suggestion is that the first pass through 3-10

be done to identify the weak project situations. The project may

not be hopeless, but any attempt at evaluation of it will be, unless

these data are available. If we have data which can support some

kind of evaluation, we now look at the extent to which the needs and

market are met by the actual requests rather than the type of data

on 3-10. Here is where, for example, we look below the surface

requirement of statistical significance to the deep requirement that

the actual achievement match significant needs; or again, here is

where we check not just that there was a side-effects search, but

the nature of any effects that were found.

It is obvious that for all the breaking-out of the components

in the evaluative judgment that has been done in the checklist,

checkpoint 11 will often involve a pretty substantial synthesizing

performance. Even though the goal-free evaluator may find it

premature to check actual results against needs, (s)he must assess


the merit of actual achievement not available, and then, assess the

merit of future plans in some checkpoints, usually by making

recommendations to the panelists in the taped report.


Ill

12. Costs and Cost-Effectlveness

Cost data must be:

(a) Comprehensive. That means covering maintenance as well as

capital costs, psychic as well as dollar costs, ’weaning' costs and

costs of in-service updating of needed helpers as well as direct

costs, etc. There should be some, consideration of opportunity costs

other than those covered previously under critical competitors e.g.,

what else could the college have done with the funds? A pass should

be made at qualitative, cost-effectiveness analysis where possible.

(b) Verified. Cost estimates and real costs should be verified

independently. It is really not satisfactory to treat cost data as

if they are immune to bias. Performance data should also have some

independent certification, and the procedures outlined above involve

this at several points. The cost data requires this for reasons that

have not so far been so generally recognized. Costing is an extremely

difficult business, requiring technical skills that at the moment

are a limited part of the training of evaluators. Therefore, this

may need to be directed for later follow-up by the panel's financial

expert.

(c) For each product compared. Costs must be provided for the

critical competitors, something which would be covered by the

admonition to include opportunity costs, given under (a) above. But

it may perhaps be worth independently stressing the need to provide

rather careful cost estimates for the artificial competitors that

the ingenious project director should create as part of his analysis,


but this has not been part of the tradition of opportunity cost

analysis. The preceding considerations bear on the quality of the

cost data. But this checkpoint is not treated as merely a

methodological one. Since one already has the judgment of

educational significance, and since the cost data includes the

cost of comparable products, one can here score co3t-effectiveness,

i.e., (roughly) the justifiability of the expenditure.


13. Extended Support

This is an item that could be regarded as desirable rather than

necessary, but it is to be hoped that this will change in the future.

In the educational field the responsibility for the project is all

too frequently supposed to terminate upon the commencement of funding.

If it should sbusequently transpire that important improvements

should be made in the project, they may or may not get made, depend

ing upon financial considerations. This is scarcely a service to

the consumer’s needs. It should therefore be regarded as a strong

plus for a project to produce a product with a systematic procedure

for updating and upgrading the product in the light of post-funding

field experience. One should note that this implies the necessity

for a systematic continuing procedure for collecting field data.

One of the types of data that ought to be collected by a project

is data on new critical competitors. An important kind of "improve

ment" is covered by this checkpoint, it might also be described as

extended use of the product. For example, its use in new circum

stances or in conjunction with new auxiliary products will need

new evaluation and explanations in the handbooks, etc. The pro

vision of continued user-training, itself subject to progressive

cycles of improvement, should be assessed as a desired essential.


114

THE UPGRADING PHASE IN USING THIS APPROACH

Given that few projects meet all these requirements, and few

meet even half what can reasonably be done at the moment?

It should be stressed that the appropriate T.O. policy is not

to treat a project that meets the. large number of these require

ments as deserving of full support. No project that fails to meet

checkpoints 1-13 deserves full support, for the very good reason

that we do not have good grounds for supposing that such a project

has meritorious achievements. As was mentioned earlier, it must

be made clear that an exploratory, research, or field trial project

are all defensible reasons for funding even when the chances against

meeting all of these standards are quite long. They should be con

ceived as no more than an exploratory, research, or a field trial

project, funded as such, and only moved into a production phase when

these standards (or enough of them to make a convincing case) are

met. Nevertheless, we may well have grounds that justify further

investigation by the panelists in order to fill the gaps found by

the evaluation checklist. We may indeed have enough grounds to

support the tentative support of 3uch a project pending the further

investigations by the panelists. In the next section, the goal-free

T.O. will be asked to respond to the checklist format in reporting

findings to the Center.


115

AN EXPANDED VERSION OF THE CHECKLIST

The T.O. should notice that the scales are sometimes hybrid

crosses o£ methodological and substantive merit. The top scores

require good evidence or good performance is lacking. It does not

require— for example— that there is good evidence of bad achieve

ment, for otherwise projects which turned in no data would do better

than those that were known to fare badly. It will be helpful to

mention not only the ratings, but the relevant terms, e.g., "No

good evidence", since feedback and not just a Yes/No decision is

to result from the evaluation.

The list of considerations should effect the evaluator's

decision either as to evidential adequacy or merit. By check-marking

or adding salient factors effecting the rater's judgment, the form

can provide an explanation as well as an evaluation, be useful for

formative as well as summative purposes, and help improvement of

the form itself. A double check can be used to indicate considera

tions that were felt to be more important than those receiving a

single check.

Most of-the scales are simply quality-of-data or "methodological"

scales (e.g., side-effects check). They are asterisked. They still

represent necessities, but a high score on them does not show

intrinsic merit of the product, only of the data or design. When

evaluating projects, the situation is that any score less than 3

on one point will weaken the rating on educational significance

(and under 2 will destroy it), however high the need/market scores.


The T.O. should note that each checkpoint is subdivided into

information categories of description and judgment. Then each point

is divided into panelists content areas to facilitate recommendations

by the T.O. to the panelist for follow-up. In making verbal reports

to the Center, the T.O. should refer to the checkpoints as (s)he

talks through the report. Then, the T.O. should mail a checked

version of the list to the Center as a supplement. This should

be done before preceding to the next site.


Reproduced with

permission



without permission.

AN EXPANDED c hecklist to u se for reporting findings

Project Name: _______________ ___________________________

T.O.'s Name: ________________________________________________________

Directions: 1. Give taped response first by referring directly to the checklist and talkingthrough any information gathered for the description cell. The T.O. should remember that if the panelists were to arrive on-site and begin elaborating on your findings they would need as accurate a description as possible. Therefore, as one makes descriptions note sources of information used, and other sources available.

2. X out any cells referenced on tape.

3. Judge the description cell's adequacy.

4. Make recommendations to the panel.

5.- Score the checkpoints.

6. Send this checklist to the Center before going to the next site. Includethe log of activities, and the tapes.

117

Reproduced with

permission



without permission.

*1. Need: Rating Score (4 - 0)

Place single (or double) check next to factors that were particularly significant.

DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS

_____ a Populations effected

_____b Social significance of population needs

c . Absence of substitute strategies to meet needs

_____d. Multiplicative effects of meeting needs

118

Reproduced with

permission



without perm

ission.

2. Market: Rating Score (4 - 0)

Place single (or double) check next to factors that were particularly significant


a. Dissemination plan: i - clarity

ii - feasibility iii - ingenuity iv - economy v - probable

effectiveness

_____b. Size of group reached

c. Importance of the market

119

Reproduced with

permission


Further

*3. Performance— True Field Trials: Rating Score (4 - 0)



_____ a. Final version

____b. Typical user

_____ c. Typical aid

d. Typical setting

e. Typical time frame

120

Reproduced with

permission



without permission.

*4. Performance— True Consumer: Rating Score (4 - 0)



a Entire college

_____b Board of Trustees

_____ c Non-academic staff

_____ d Institutional Research staff

e Department chairmen

_____ f, Faculty

_____ g- Student

_____h., Taxpayer

121

Reproduced with

permission



without permission.

*5. Performance— Critical Comparisons: Rating Score (4 - 0)



_____ a. No treatment group

_____b. Existing competitors

_____ c. Projected competitors

_____d. Created competitors

_____ e. Hypothesizedcompetitors

122

Reproduced with

permission



without permission.

*6. Performance— Long-Term: Rating Score (4 - 0)


DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATION

_____ a Week to month later

_____b Month to year later

_____ c Year to few years later

_____ d Many years later

_____e, On-job, or life-space sample

123

Reproduced wilh

permission



without permission.

*7. Performance— Side-Effects: Rating Score (4 - 0)



a Comprehensivesearch

_____b Skilled search

_____ c Independentsearch

_____ d Goal-free search

_____ e Time of search i - during project

ii - end of project iii - later

124

Reproduced with

permission



without permission.

*8. Performance— Process: Rating Score (4 - 0)



a Descriptive congruence check

_____b Causal clues check

_____ c Instrument validity

__ d Judge/Observerreliability

_____e, Short-term effects

125

Reproduced with

permission



without permission.

*9. Performance— Causation: Rating Score (4 - 0)



a. Randomized Experimental Design

_____b. Quasi-ExperimentalDesign

_____ c. Ex Post FactoDesign

d. A Priori Interpretation Design

e. Sources of Invalidity i - selection bias

ii - regression of artifacts

iii - instrumentation iv - etc.

126

Reproduced with

permission



without permission.

*10. Performance— Statistical Significance: Rating Score (4 - 0)



_____ a. Appropriate Analysis

_____b. AppropriateSignificance Level

127

Reproduced with

permission



without permission.

11. Performance— Educational Significance: Rating Score: Rating Score (4 - 0)

Place single (or double) check next to factors that were particularly DESCRIPTION EVIDENCE USED

ADEQUACY OF DESCRIPTION RECOMMENDATIONS

significant

_____a Independent judgments

_____b Expert judgments

_____ c, Basis of external judgments

_____d. Needs congruence

e. Side-effects

_____ f. Long-term effects

_____ g. Comparative gains

_____h. Consumer data

_____ i. Process datai

128

Reproduced with

permission



without permission.

12. Cost-Effectiveness: Rating Score (4 - 0)



_____ a Comprehensive cost analysis

_____b Expert judgment of costs

c Independent judgment of costs

__ d Cost for ail competitors

129

Reproduced with

permission



without permission.

13. Extended Support; Rating Score (4 - 0)



_____ a. Post-funding datacollection

b. Post-funding system for improvement

c. Post-fundingin-service training

_____ d. Up-dating of aids

e. New uses and user data

130

Reproduced with

permission



without permission.

T.O. 's Name:_____________ ;_______

Site Name:____________________________________

A LOG OF ACTIVITIES

Directions: Since the Center is trying to get a better idea of the nature of thegoal-free evaluation method, and since we may have to account for your activities, please fill out the following as accurately as possible.

131


EVEN

ING

133

ATTACHMENT A

An Introduction to the Checklist


134

AN INTRODUCTION TO A CHECKLIST APPROACH FOR GOAL-FREE EVALUATION

By Michael Scriven

The following checklist can be used as a key item in the

evaluation of products of almost any kind besides educational

products, or projects, to which the language refers. In the

educational field, it can be used in each of the following ways:

a. As an instrument for evaluating products;

b. As an instrument for evaluating producers in the "pay-off"

dimension, i.e., without considering matters such as personnel

policy, community impacts, potentiality, etc.;

c. As an instrument for evaluating evaluation proposals

focused on products or producers;

d. As an instrument for evaluating production proposals, since

a competent producer should incorporate plans for achijytaing each of

these standards and establishing that these standards have been

achieved;

e. As an instrument for evaluating evaluators of products,

producers, etc., since it is argued that competent evaluation must

cover each of these points.

It will thus be seen that the checklist, if sound, provides

an extremely versatile instrument for assessing the quality of all

kinds of educational activities and products; the more so because

the concept of "product", as here used, is a very broad one,

covering processes and institutions as well as typical products

such as technical dt vices.


135

So much for applicability in theory. What about application

in practice? It is clear that few educational products have ever been

provided that met all these standards, prior to production, although

that is exactly when they should be. met in order to justify production.

Furthermore, not very many have been produced that meet enough of

them to justify even retrospective confidence in the merit of the

product, but enough to make clear the standards are not unrealistic.

The better correspondence courses in technical subjects would pass,

for example. To justify producing a new course is naturally harder,

since it should be required to outperform the existing ones by a

margin that justified its marginal cost.

Of course, it must be made clear that satisfactory achievement

of all of these standards is not the only criterion for funding a

project. Exploratory or research or realistic field trial projects

are all defensible, even when the chances against meeting all of

these standards are quite long. It is important to note that they

should be conceived as no more than exploratory (etc.) projects,

funded as such, and only moved into a production phase when these

standards (or enough of them to make a very convincing case) are

met. The application of this "hardline" would not only greatly

reduce the costs of educational R5D activity, which should only be

a short-term effect, but it would transform the conception of

satisfactory quality in education. And the long-term positive

results of that are far more significant and beneficial than those

of dropping a few sub-standard projects at the moment.


136

Given that the use of this checklist is potentially extremely

lethal, potential users deserve to know something about its validity

and utility. The following five comments are offered under that

heading.

First, every checkpoint on the checklist has a clear priori

rationale, in almost all cases so obvious that elaboration would be

otiose. That is, a straightforward argument can be constructed

that the failure to meet any one of the checkpoints immediately leaves

open a serious doubt that the product (for example) is simply not of

good quality.

Second, medical and industrial, products routinely pass (and

are often required to pass) every checkpoint. Your car, your

aspirin, and your food, despite the real problems of still-emerging

undesirable side-effects that afflict all of them, at least avoided

the far worse results that would be likely to arise if the check

points on this list weren;t met. I can see no way to argue that

the effects of bad education are less significant in the long run

than the effects of bad food, drugs, and cars.

Third, the checklist has been developed out of the most

intensive systematic and large-scale product evaluation activites

with which I am familiar— the Product Review Panels of 1971-72

and 72-73 done for the National Center for Educational Communication,

on sub-contract to the Educational Testing Service. The fifteen

experienced evaluators and educators who worked on these panels

provided the raw material in their detailed assessment procedures


137from which I extracted the first eight versions of this checklist.

It has been further refined since as a result of interaction with

other groups, notably EPIE (Item 13 was suggested by Ken Komoski)

and my assistants Michael Barger and Howard Levine.

Fourth, the checklist has also been critiqued (at my request)

by some of the most experienced developers in the country. Their

reaction has not been impressive. The general theme has been to

claim excessive perfectionism, but the support for this claim has

been extremely weak, consisting either of saying nothing has ever

been produced anywhere that met these standards (counterexamples

have been given above), or that the cost of meeting them would be

prohibitive, or that they may be appropriate for summative but

certainly not for formative evaluation. I have looked at the cost

complaint very carefully and it is certainly not true in general.

For example, the CSE handbooks of all tests available for, e.g.,

secondary students, products which rated well with the Product

Review Panels, can rather easily be evaluated so as to meet these

standards, on a small budget. (As presented, there was a bit too

much guessing required.) Where the cost is_ going to be large is when

we start looking at huge curriculum projects. But this reflects

the combination of two features: a) the huge costs of development;

b) the great difficulty in justifying such projects. Where the raw

gains are likely to be marginal, as in most of those projects, one

has to develop very ingenious instruments and use very large groups

to pick up the benefits. I think the checklist correctly reflects

those facts of life and of course indirectly supports the dubious

justification of most of those projects.


One reason I find the reactions of producers (to date)

unimpressive is that the checklist could easily have been extracted

from the writings or conversation of these same people when they

are extolling the merits of the R&D approach. That approach begins

with needs assessment, and goes on through a series of field trials

towards dissemination. The checklist refers to these areas and

establishes' exactly those facts from them which provide the basis

of asserting the superiority of a properly developed product. None

of the checkpoints are alien intruders in the context of justifica

tion. They begin to look threatening only in the context of

evaluation. But exactly the same factors must be present in each

of these contexts. The comment about inappropriateness for formative

evaluation is methodologically precarious since good formative

evaluation must involve giving best possible simulation of a

summative evaluation. The latent point could, I think, be put

as follows.

The checklist refers to some data which cannot be gathered

the instant a project is conceived or even in its early days— for

example, checkpoint 6 refers to long-term or follow-up data. It

is a grievous error to conclude from this that the checklist— or

even that particular checkpoint— is not relevant for formative

evaluation since one of the tasks of formative evaluation is to

set up the process and instruments for collecting that data, and

to collect it, at least on early versions of the product. Formative

evaluation is what goes on during the pre-production, improvement-


139

oriented phase of the development and anyone who wants to produce

a worthwhile product will want to get follow-up data from through

out the period during which significant changes of effects can

reasonably be expected to occur. This may well mean extending the

developmental timeline somewhat; it certainly cannot legitimately

be taken to'mean that follow-up data isn’t relevant to formative

evaluation. The 'follow-up checkpoint' is the extreme case, the

item that might seem most remote from formative concerns. Even if

the argument of the preceding paragraph was unsound, one could

scarcely argue that the checklist as a whole was irrelevant to

formative evaluation, since almost every remaining item is

obviously relevant.

There is a related complaint that deserves attention, con

cerning checkpoint 5 which requires comparative performance data

from competitive or possibly competitive products. Understanding

the issues in this case, however, is considerably facilitated by

reading the general rationale of that checkpoint in the ensuing

section, so discussion of the complaint will be postponed until

then.

The.final item of evidence about utility concerns use of

the checklist by school administrators in the Nova Ed. D. program,

and by students in the valuation training seminar at U. C. Berkeley.

They frequently volunteer the view it is of more value to them in

doing actual evaluations than any other document in the literature.

They do not frequently express either a) the opposite view, b) the

same view about other documents. Their response is undoubtedly


contaminated by my attitudes and personal elaborations, but

difficult to disregard entirely. I suspect that the search for

’models' of evaluation, although possibly less inappropriate in

this area than the search for theories of learning, is for

experimental or educational psychology, does not pay off either

conceptually or pedagogically as well as the more mundane approaches

of the. checklist and the trouble-shooting chart.

Therefore, the checklist's credentials are good. They have

been based on a proper, rather than a superficial, use of the R&D

iterative cycle, a consideration which alone would make this as

a product superior to most of those currently available. (Of

course, it's much easier to do a decent R&D job on a two-page

checklist than on a K-12 mathematics program.)

As an educational product itself, the checklist is of course

self-referent, and a study of this introduction with the checklist

in mind will show that there are still substantial gaps in the

direct empirical evidence that the present version of the checklist

is worthwhile, as is to be expected with any newly revised product.

Some of these gaps will be closed if every reader and particularly

every user of the checklist will accept part of the responsibility

for the improvement of educational quality that I believe we all

share, and provide whatever criticism and alternatives he or she

can. They will be acknowledged and incorporated as appropriate.

For my part, I believe that I have a responsibility to convince

evaluators, developers, and funding organizations, including

with permission of the copyright owner. Further reproduction prohibited without permission.

U i

legislatures, of the crucial importance of using this checklist.

Since I have already had some success with this, it is particularly

important that errors or shortcomings in it be identified as soon

as possible. I have every confidence in the R&D process, and

consequently great confidence that such errors exist, even in this

thirteenth iteration.


142

ATTACHMENT B:

Lists of Contacts for the Two Sites


143

APPENDIX D

HANDBOOK FOR GOAL-BASED EVALUATION


144

HANDBOOK

FOR

TRAVELING OBSERVERS

The Evaluation Center Western Michigan University

July, 1974

Second Edition


145

AN INTRODUCTION TO THE HANDBOOK*

By accepting this work assignment from the Evaluation Center,

you should realize there are certain methodological questions and

procedures we ask you to incorporate into your work role. This

handbook contains the essential evaluation framework that you will

be implementing. Further elaboration will be given during the

orientation session. So, please be familiar with this handbook to

participate in any discussions that develop.

Setting of the Evaluation Study . . ...........................

Conceptual Overview of the T.O.’s Role....................... •

A Specific T.O. Procedure On S i t e .............................

Format for Reporting Findings ...................................

Copy 1 Copy 2

An Activity Log ................................................

Site 1 Site 2

Attachment A: Summary of the Foundation’s Productivity Study .

Attachment B: Proposals and Progress Reports ■

Site 1 Site 2

Attachment C: List of Contacts for Each Site ..................

*Note: In case of emergencies while on site, call collect:

8:00 to 5:00 (616) 383-8166


146

SETTING OF THE EVALUATION STUDY

The Evaluation Center is currently contracted to the Hill

Family Foundation to assess the merit of a portion of the educational

projects they have funded this year. This foundation has given

various amounts of money to a group of independent, four-year

colleges to improve their productivity, based on needs identified

in a study by the Hill Family Foundation. Three major educational

needs were identified generally. They are: 1) sharply rising

instructional costs constituting a major threat to the financial

stability of the college, 2) per student costs rising faster than

per student income, and 3) faculty salaries comprising two-thirds

of instructional costs.

This problem/needs situation was identified as the outside

parameters of the area of funding. Each institution proposed an

alternative strategy to the foundation. Some were very compre

hensive in scope, and others; limited. The Center began its

contract to study the Hill project in February, 1974, and will run

through January, 1975. We have previously held an orientation

session with the various project directors to hear their plans and

allow them to hear the Center's evaluation plans. A survey was

sent to the participating institutions in the beginning of June.

In general, there are several phases to the overall project, your

work and role as a traveling observer (T.O.) is one of those phases.


147

A CONCEPTUAL OVERVIEW OF THE T.O.'S ROLE

To gain a relational perspective to the work of the traveling

observer, the evaluation study phase preceding the T.O., and the

phase following will be discussed. The phase preceding deals with

a survey, and the phase following involves visitations to the

project sites by a panel of experts.

Data gathered in the survey will be made available to the

T.O. This information concerns mainly identification of various

individuals on site who would be possible resources during

visitations. These individuals will be identified more specifically

during the T.O. training session held at the Center and referenced

in a later section on procedures. Asking the T.O. to validate

the survey information, the original proposal, and the current

progress report, the Center will get a more accurate picture of the

individual college projects for the panel's visitation. It is

thought that the independent perspective of the T.O. after being

on-site will give a more valid portrayal of the situation for the

panel than would several, separate perspectives gained by looking

at individual documents.

The phase following the T.O.'s work is that of a panel of

content experts revisiting the sites to make a synthesized report

based on results of the T.O.'s report, and on their own independent

observations. One might consider the T.O.'s work as a preliminary,

summative, site visitation by a "professional detective" to uncover

and describe as many potential project achievements as possible.


148

Then, another group of experts will follow-up on those hypothesized

achievements to synthesize as accurate as possible a portrayal of

the merit of each particular project.

The panel consists of individuals who can apply expertise

in several content areas. The kinds of perspectives represented

are those of evaluation, higher education administration/finance/

economics/planning, staff utilization, and curriculum development.

As will be seen later, the T.O. will need to make recommendations

by those perspectives for each panelist to follow-up by their

specific content area.

Specifically, the objectives of the T.O. are the following:

1. To collect both descriptive and judgmental information

on each specific project at two college sites based on the

methodology presented in the next section.

2. To summarize the raw information collected at each site

on cassette tapes to be mailed to the Center before proceeding

to the next site, responding to the format presented in a later

section.

3.. Edit the raw transcriptions into a report that will go

to each project director for the director's reactions. There will

be a time lag between this editing, use of the unedited transcrip

tions by the panelists, and subsequent reaction to the T.O. report

by the project director.


149

A SPECIFIC T.O. PROCEDURE ON SITE

References have been made in the earlier sections to the

fact that all available information about each site will be

readily given to the T.O. before his/her visitation. This will

provide a data base to be used to assess the merit of that site.

Specifically, the methodology being referenced, procedurally

implemented for this particular study has been called goal-based

evaluation by many evaluators. A relatively large amount of

evaluation has been done in this mode, therefore one might

consider it a well-established, and hopefully, familiar procedure.

To do goal-based evaluation, the T.O. is being asked to

collect information of both a descriptive and judgmental nature

as necessary to assess the intended effects of a project. What

one is specifically to use as a standard for assessment is

information that each specific project poses as intended goals or

objectives. Information on intended goals and objectives is most

frequently found in proposals, progress reports, and can be

supplemented during orientation sessions with the project staff.

Goals are an important starting point for an evaluator,

according to many sources. Goals and objectives are a means to

the planning and production of achievement, and evaluation is

assessing merit of what has been achieved based on those intentions.

One needs to know the steps in planning that lead up to actual

achievement because having exposure to those levels of intended


achievement helps one to understand the context in which it was

planned. When the evaluator is focused by the project's goal

statements to look for verification of those intentions, (s)he

gains the potential value that an external goal-based evaluator

can contribute. That is an external goal-based evaluator has an

independent interpretation of results, but realizes that a project

must be judged against what was originally intended, or planned.

The producer, or project staff, should have used local needs,

opportunities, and problems, in establishing intended goals, so

the goal-based evaluator accepts that as having occurred, and

then, looks at the adequacy of the strategy, and then compares the

results with that which was intended.

One will note in the next section presenting a reporting

format that the T.O. is to respond to information about goals,

design, implementation, and results. This conceptual framework

is called the CIPP model of evaluation developed by Daniel L.

Stufflebeam. It has been argued in professional evaluation

circles whether the so-called CIPP model is goal-based or not.

There are several different opinions on this point; however,

the T.O. should note that in this particular instance (s)he is

being asked to assume that the CIPP model is goal-based.

Specifically, the T.O. is being asked to operate in a summative,

or retrospective evaluation mode. Those evaluation scholars

who know the difference between questions asked by CIPP in the

formative and summative mode, should be more at ease in equating

the goal-based orientation to summative evaluation.


151

Digressing slightly to a description of the four horizontal

categories of information one would find in the CIPP model, they

are often called Context, Input, Process, and Product. However,

slight semantic manipulation of the jargon brings us to the headings

used in the next section: goals, designs, implementation, and

results. To briefly touch on a description of those four, the

category of goals in the summative sense would raise questions

like what goals were chosen when the project was initiated? and

why the goals were chosen over other possibilities? In the design

section under the summative mode, the T.O. asks three questions:

(1) What design was chosen? (2) What alternative designs were

rejected? and (3) Why was the winning design chosen? In the

implementation category, the T.O. asks in general: (1) What

are the strengths and weaknesses of the actual design that was

implemented? (?) What effort is being made to implement the

design? and (3) What was the actual design implemented? Under

the results section, the T.O. generally is concerned with (1)

What results were achieved? (2) Were there any side-effects?

(3) Were the objectives achieved? (4) What was the relation

of costs to benefits? These are only general questions for each

category and will be further delineated in the next section.

Although the preceding paragraphs are only a brief perspective

on goal-based evaluation they can be considered as theoretical

highlights. In order to provide a consistent reporting format

for the goal-based evaluator, the following section presents


152

a checklist appraoch to gathering and reporting project information

back to the Center.


Reproduced with

permission



without permission.

T.O.'s Name_

Directions:

A FORMAT FOR REPORTING FINDINGS

Site Name

(1) Use -the following questions to structure your taped report.

(2) As you discuss a question, check it off. Use a double check if youfeel it to be an overwhelmingly important point.

(3) Work horizontally across each question, making recommendations to the panel in the last cell.

(4) Send the used format along with the tapes.

(5) Enclose the activity log, also.

153

Reproduced with

permission



without permission.

1. GOALS

Double check any point felt overwhelmingly important.

a. What goals were chosen?

_b. What goals were considered, then rejected?

_c. What alternative goals might have been considered?

_d. What evidence exists tojustify the goals chosen?

_e. How defensible is this evidence?

_f. How well have the goals been translated into objectives?

g. Overall, what is the merit of the goals chosen?


Reproduced with

permission

of (he copyright owner


without permission.

2. DESIGNS

Double check any point felt overwhelmingly important.

a. What strategy was chosen?

Jb. What alternative strategies were considered?

_c. What other strategies might have been considered?

_d. What evidence exists to justify the strategy that was chosen?

_e. How defensible is this evidence?

f. How well was the chosen strategy translated into an operational design?

g. Overall, what is the merit of the chosen strategy?


3. IMPLEMENTATION

Double check any point felt overwhelmingly importnat. DESCRIPTION EVIDENCE USED

ADEQUACY OF DESCRIPTION RECOMMENDATIONS

_____a. What was the operational design?

_____b. To what extent was it implemented?

c. What were the strengths and weaknesses of the design under operating conditions?

_____d. What was the quality of the effort to implement it?

e. What was the actual design that was implemented?

_____ f. Overall, what is the merit of the process that was actually carried out?

Reproduced with

permission



without permission.

4. RESULTS

Double check any point fe]t overwhelmingly important.

_____ a. What results wereachieved?

b. Were the stated objectives achieved?

c. What were the positive and negative side- effects?

d. What impact was made on the target audience?

e. What long-term effects may be predicted?

f. What is the relation of costs to benefits?

g. Overall, how valuablewere the results and impacts of this effort?


157

Reproduced with

permission



without permission.

T.O.'s Name:

Site Name:

A LOG OF ACTIVITIES

Directions: Since the Center is trying to get a better idea of the nature of thegoal-based evaluation method, and since we may have to account for your activities, please fill out the following as accurately as possible.

158

(Needs As

sessment,

MODE

OF ACTIVITY

(Interview,

Reading, etc.)

NATURE

OF INFORMATION

Process

Observation, etc


160

ATTACHMENT A:

Summary of the Foundation's Productivity Study


ATTACHMENT B:

Proposals and Progress Reports


162

ATTACHMENT C:

List of Contacts for Each Site


163

APPENDIX E

EVALUATORS PROCESS RATING FORM


164

Traveling Observer Rating Sheet

Project Site:

Traveling Observer's Name:

Directions: Please check one of the seven responses after eachquestion that most agrees with your opinion, referencing judgments to the specific project site mentioned above.

I. On-site Process

(A) What degree of rapport do you feel existed between yourself and the project director whose project you recently visited?

1 .___extreme positive rapport2. moderate positive rapport3 .___somewhat positive rapport4 .__neutral rapport5 .___somewhat negative rapport6 .___moderate negative rapport7 .___extreme negative rapport

Comments on A?

(B) Was your time spent at this project allocated effectively?

1 .___extremely not effective2 .__moderately not effective3 .__ somewhat not effective4 .__ neutral effectiveness5 .__ somewhat effective6 .__ moderately effective7 .__ extremely effective

Comments on B?


165

(C) How well did the project director meet your expectations as an administrator?

1 .___extremely congruent with my expectations2 .__ moderately congruent with my expectations3. somewhat congruent with my expectations4 .__ no prior expectations5 .__ somewhat non-congruent with my expectations6. _ moderately non-congruent with my expectations7 .__ extremely non-congruent with my expectations

Comments on C?

(D) Overall, how satisfied were you with the recent visitation to this project?

1 .___extremely not satisfied2 .___moderately not satisfied3 .___somewhat not satisfied4 .__ neutral satisfaction5. somewhat satisfied6. moderately satisfied7 .__ extremely satisfied

Comments on D?

II. Methodological Assessment

(A) How much confidence do you place in your ability to fully implement at this particular site the methodology that you were previously trained in at the Center?

1 .__ extremely confident2 .__ moderately confident3 .__ somewhat confident4. _ neutral confidence5 .___somewhat not confident6 .___moderately not confident7 .___extremely not confident

Comments on A?


166

(B) What suggestions can you give to refine this methodologyfor future use? Use an extra sheet of paper, if necessary.

Thank you for your time.


167

APPENDIX F

PROJECT DIRECTORS' PROCESS RATING FORM


168

I hope that your productivity project is going well, and that things in general aren’t too hectic with, the fall opening of school being near.

I wanted to thank you for your cooperation with our recent staff visitation of your project. We will be sending you a report on your project generated from that visitation towards the end of September. I hope you will find it useful to have an independent, external opinion of the strengths and weaknesses of your project as it is being implemented.

A complete decision as to which six sites will be visited by the site review panels in September has not yet been finalized, pending completion of some of the more recent staff visitation reports. However, I will be contacting those sites chosen very close to August 30. Therefore, if you have not been contacted by September 9th, you can assume the panel will not be visiting your project.

During October and November, the panelists and the Center will be preparing the final report to the Hill Foundation. As mentioned earlier in the Spring Hill Meeting, the general purpose of the final report is to provide new information that can be used by the foundation for revision of funding policy guidelines of the productivity program in its second year of funding.

In the meantime, 1 would sincerely appreciate your feedback about the recent staff visitation at your project. Enclosed you will find a short, four question rating sheet. This information will be used by the Center to locate any improvements we should consider in future evaluation studies. These ratings are for the Center’s assessment of itself. I encourage you to be as candid as possible.

Since a return, postage-paid envelope is enclosed, I would appreciate it if you send this rating to me after you finish reading this letter. I'm sincerely committed to using this kind of information to provide as useful a process as possible to our present, and future, studies.

Sincerely,

JWEiljeEnclosure:



169

Project Director Rating Sheet

Project Site:

Directors Name:

Directions: Please check one of the seven responses after eachquestion that most agrees with your opinion.

(A) What degree of rapport do you feel existed between yourself and the person who recently visited your project as a representative of the Evaluation Center?

1 .___extreme positive rapport2 .___moderate positive rapport3 .__ somewhat positive rapport4 .__ neutral rapport5 .__ somewhat negative rapport6 .__ moderate negative rapport7 .__ extreme negative rapport

Comments on A?

(B) Was time spent at your project by the staff person allocated effectively?

1 .___extremely not effective2 .___moderately not effective3 .___somewhat not effective4. neutral effectiveness5 .__ somewhat effective6 .__moderately effective7 .__ extremely effective

Comments on B?

(C) How well did the staff person meet your expectations as an evaluator?

1 .___extremely congruent with my expectations2 .___moderately congruent with my expectations3 .__ somewhat congruent with my expectations4 .__ no prior expectations5 .__ somewhat non-congruent with my expectations6 .__ moderately non-congruent with my expectations7 .__ extremely non-congruent with my expectations

Comments on C?


170

(D) Overall, how satisfied were you with the recent staff visitation to your project?

1 .___extremely not satisfied2. moderately not satisfied3. _ somewhat not satisfied4 .__ neutral satisfaction5. _ somewhat satisfied6. moderately satisfied7 .__ extremely satisfied

Comments on D?



171

APPENDIX G

PILOT VERSION OF TEE CRITERION INSTRUMENT


172

General introduction to the assignment

Immediately following this page you will find a set of instructions for using the attached rating sheet. After those directions, you vzill find a short report to read. Following the report, are two pages of items to be filled out according to the previous directions. Put your name and social security number in the blanks at the bottom of the page where appropriate, or else you cannot be paid for your work.

The attached report should probably be read thru once for a general overview, and then, a second time for content. Then immediately go to the semantic differential items, and respond to them.

If it would be helpful to you to make a more concrete situation out of the task, then consider this:

You are a university-based administrator of a small project that has been funded by a foundation. The foundation has sent a representative to spend two days visiting your project to provide you with an external, independent interpretation of the strengths and weaknesses of your project as it is being implemented. The following report is that external interpretation. Even though you know nothing about the true nature of this project, you should be able to rate it on the items following the report.

Thank you. Go on to the instructions for the semantic differential on the next page.


173

Instructions for Rating Sheet

In responding, please make your judgments on the basis of

what this report means _to you. On the following pages, you will

find a restatement of the general concept and beneath it a set of

descriptive scales. You are to rate the report on each of these

scales in order.

Here is how you are to use these scales:

If you feel that one dimension of the enclosed report is very closely

related to one end of a descriptive scale, you should place your

mark as follows:

fair X :_______ :_______ :_______ :_______ :*_______:________ unfair

fair : : : °r : : : X unfair

If the dimension is quite closely related, or usually related,

to one or the other end of the scale (but not extremely) you should

place your mark as follows

strong : X : weak

strong : : : °r : X : weak

If the dimension seems only slightly related to one side as

opposed to the other side (but is not really neutral) then you

should mark as follows:

active : : X passive

active : :or

: : X : : passive

The direction toward which you check, of course, depends

upon which of the two ends of the scale seem most characteristic


174

of the dimension you are judging. If you consider a dimension

to be neutral on the scale, both sides of the scale equally

associated with the concept, or if the scale is completely

irrelevant, unrelated to the report, then you should place your

mark in the middle space:

safe _______ :_______ :________: X :_______ :_______ :________ dangerous

IMPORTANT: (1) Place your marks in the middle of the spaces,

not on the boundaries:THIS NOT THIS

X :_______ :_______ :_______ :_______ X_______

(2) Be sure you check every scale for the report—

do not omit any.

(3) Never put more than one mark on a single scale.

Sometimes you may feel as though you have seen the same item

before. This will not be the case, so do not look back and forth

through the items. Do not try to remember how you checked earlier,

similar items. Make each item a separate and independent judgment.

Work at a fairly high speed. Do not worry or puzzle over

individual items. It is your first impressions, the immediate

"feelings" about the items, that is wanted. On the other hand,

please do not be careless, because we want your true impressions.

Please continue, and begin marking.


175

This report is

1 active _

2 illogical _

3 rash _

4 direct _

5 initial _

6 interesting_

7 ungeneral- __ izable

8 false

9 overt

10 inconsistent

11 good _

12 timely _

13 unimportant_

14 untrue _

15 objective _

16 undescrip- _ tive

passive

logical

_ cautious

circuitous

final

boring

generalizable

true

covert

consistent

bad

_ untimely

important

_ true

_ biased

_ descriptive

17 positive _____ :_____ :_____ :_____ :_____ :_____ =_____ negative

18 narrow _____ :______:_____ :_____ :_____ :_____ :_____ wide

19 unemotional_____ :_____ :_____ :_____ :_____ :_____ :_____ emotional


176This report is .

cn§CtJ

20 intuitive ___

21 deliberate ___

22 infrequent ___

23 oral______ ___

24 risky ___

25 subjective ___

26 anonymous ____

27 organized ___

28 relevant ___

29 limited ___

30 credible ___

31 unreliable ___

32 judgmental ___

33 harmonious ____

34 incomplete ___

35 expensive ___

36 organized ____

37 superior ____

38 friendly ____

39 light ____

40 artful ____

41 sweet

rational

impulsive

frequent

written

certain

objective

identified

unorganized

irrelevant

diffused

unbelievable

reliable

unjudgmental

dissonant

complete

cheap

unorganized

inferior

unfriendly

dark

artless

sour

permission of the copyright owner. Further reproduction prohibited without permission.

177

This report is

42 high----------- -

43 warranted ---_

44 wise------------

45 therapeutic ---

46 hard-----------

47 strong---------

48 severe---------

49 deep ---

50 scholarly — _

51 motivated --

52 hot --

53 temperate —

54 youthful — .

55 sensitive —

56 sophisticated^

57 useful —

58 tense —

Thank you for your time

£

~__ low

W a rra n te d

f °°U s h

— — toxic

—■ s o f t

— ■— Weak

■— - -̂oniexit

— shallow

— ■— - tgnorant ~~— aimless— ■— . cold

i"temperate

— maturei-nsensttivs

■— — — calve

— useless ' ~ telaxed


178

APPENDIX H

FINAL VERSION OF THE CRITERION INSTRUMENT WITH DIRECTIONS TO THE PROJECT DIRECTORS


179

Enclosed is the report from a visit on July and

As the Evaluation Center outlined at the Spring Hill Conference, our study would provide some feedback to each college project director on the strengths and weaknesses of his Productivity Project as it was being implemented. It is hoped that the enclosed report will be useful feedback since it provides the views of someone who is both independent and external to each project.

The report is written in a manner specified by the Center so that information would be standardized. Each report author has personally reviewed, corrected, and edited the information now being presented to you. The Center has not exercised any editorship.

However, the Center would appreciate feedback concerning your perceptions of the report's merit. A brief checklist has been provided for your systematic assessment. The information sent on that checklist will be of great value for reviewing our procedures for future studies.

Please consider reading through the report once for an overview. Then, review it again for content. Enclosing your rating in the return envelope will provide an efficient and standardized form of report assessment. If after mailing the rating you feel it would be appropriate to respond in writing, elaborating on any strength or weakness of the report, please feel free to do so. However, please consider October 30 as a deadline for that supplemental written response.

Please return the enclosed rating sheet by October 22. If there are any questions, please feel free to call (616— 383-8166).

Thank you very much for your assistance in this matter.

Sincerely,


JWE:lje


180

Instructions for Rating Sheet

In responding, please make your judgments on the basis of what

this report means to you. On the following pages, you will find a

restatement of the general concept and beneath it a set of descrip

tive scales. You are to rate the report on each of these scales in

order.

Here is how you are to use these scales:

If you feel that one dimension of the enclosed report is very

closely related to one end of a descriptive scale, you should place

your mark as follows:

fair X : : : unfair

fair : : :or

X unfair

If the dimension is quite closely related, or usuall.y related,

to one or the other end of the scale (but not extremely) you should

place your mark as follows:

strong : X : : weak

strong : : :or

: : X : weak

If the dimension seems only slightly related to one side as

opposed to the other side (but is not really neutral) then you

should mark as follows:

active : : X : passive

active : : :or

: X : : passive

The direction toward which you check, of course, depends upon

which of the two ends of the scale seem most characteristic of the


181

dimension you are judging. If you consider a dimension to be

neutral on the scale, both sides of the scale equally associated

with, the concept, or if the scale is completely irrelevant,

unrelated to the report, then you should place your mark in the

middle space:

safe_______ :_______ : X :_______ :_______ :_______ dangerous

IMPORTANT: (1) Place your marks in the middle of the spaces, not

on the boundaries:

THIS NOT THISX :_______ :_______ :_______ :______ X_______

(2) Be sure you check every scale for the report— do

not omit any.

(3) Never put more than one mark on a single scale.

Sometimes you may feel as though you have seen the same item

before. This will not be the case, so d£ not look back and forth

through the items. Do not try to remember how you checked earlier,

similar items. Make each item a separate and independent judgment.

Work at a fairly high speed. Do not worry or puzzle over individual

items. It is your first impressions, the immediate "feelings” about

the items, that is wanted. On the other hand, please do not be

careless, because we want your true impressions.

Please continue, and begin marking.


182

This report is

1 active ______—

2 illogical_____

3 rash__________— :—

4 direct ._____

5 initial ___ _J_

6 interesting __

7 ungeneral- izable

8 false ___ _

9 overt ___ _

10 inconsistent _

11 good ---

12 timely ___

13 unimportant__

14 objective ___

15 undescrip'--tive

16 positive __ _

17 narrow __

13 unemotional__

19 intuitive __

_ passive

logical

cautious

circuitous

final

_ boring

_ generalizable

_ true

covert

consistent

bad

untimely

important

biased

descriptive

negative

wide

emotional

rational


183

This report is

20 deliberate __

21 risky _

22 subjective _

23 anonymous _

24 organized _

25 relevant _

26 limited _

27 credible _

28 unreliable _

29 judgmental _

30 harmonious _

31 incomplete _

32 expensive _

33 organized

34 inferior

35 friendly

36 unwarranted _

37 wise

38 therapeutic

39 soft

. impulsive

certain

obj ective

identified

unorganized

irrelevant

diffused

unbelievable

reliable

unjudgmental

_ dissonant

complete

_ cheap

_ unorganized

superior

_ unfriendly

warranted

foolish

toxic

hard


184

This report is . .

cn

cn Ss

40 strong ___

41 severe ___

42 deep

43 ignorant ___

44 motivated ___

45 temperate ___

46 insensitive

47 sophisticated __

48 useful __

49 tense __


weak

lenient

shallow

scholarly

aimless

intemperate

sensitive

_ naive

useless

relaxed


185

BIBLIOGRAPHY

Alkin, M.C, Evaluation theory development. Evaluation Comment, 1969, 2̂ 2-7.

Brinton, J.E. Deriving an attitude scale from semantic differential data. Public Opinion Quarterly, 1962, 25_, 499-501.

Bloom, B.S., Hasting, J.T., and Madaus, G.F. Handbook on formativeand summative evaluation of student learning. New York: McGraw-Hill, 1971.

Campbell, D.T. and Stanley, J.C. Experimental and quasi-experimental designs for research on teaching. In N.J. Gage (Ed.), Handbook for research on teaching. Chicago: Rand McNally, 1963.

Clark, D. Cecil. A prescriptive model of development or evaluation:Some needed maturity. Northwest Regional Laboratory Paper Series No. 8, 1975.

Cronbach, L.J. Course improvement through evaluation. Teachers College Record, 1963, 64, 672-683.

Glass, G.V. The growth of evaluation methodology. (Paper No. 27),Boulder, Colorado: Laboratory of Educational Research, Universityof Colorado, 1969.

Glass, G.V. and Worthen, B.R. Educational evaluation and research: Similarities and differences. In J. Weiss (Ed.), Curriculum Theory network monograph supplement; Curriculum evaluation:Potential and reality. Toronto: Ontario Institute for Studiesin Education, 1972.

Guba, E.G. The failure of educational evaluation. In C.H. Weiss (Ed.), Evaluating action programs. Boston: Allyn and Bacon, 1973.

Hammond, R.L. Evaluation at the local level. Tucson: EPIC EvaluationCenter, 1967.

Hays, W.L. Statistics. New York: Holt, Rinehart and Winston, 1963.

House, E.R. and Hogben, D.L. A goal-free evaluation for me and myenvironment. Champaign-Urbana: Center for Instructional Researchand Curriculum Evaluation, (undated mimeo).

House, E. R. Confessions of a responsive goal-free evaluator. Champaign-Urbana: CIRCE, 1974 (mimeo).

Jacobs, J.A. A model for program development and evaluation. Theory into practice. February 1974, J3, 360-364.


186

Kerlinger, F.N. Foundations of behavioral research (Second Edition).New York: Holt, Rinehart and Winston, 1973.

Kirk, R.E. Experimental design: Procedures for the behavioral sciences.Belmont: Brooks/Cole, 1968.

Leedy, P. D. Practical research: Planning and designing. New York:McMillan, 1974.

Merriman, H.O. Evaluation in a school setting: Function organizationand operation, 1971 (in press).

Messick, S.J. Metric properties of the semantic differential.Educational and Psychological Measurement, 1957, 17_, 251-256.

Metfessel, N.S. and Michael, W.B. A paradigm involving multiplecriterion measures for the evaluation of the effectiveness of school programs. Educational and Psychological Measurement,1967, 27, 931-943.

O'Keefe, K.G. Methodology for educational field studies. Ph.D. Dissertation, Ohio State University, 1968.

Osgood, C.E., Suci, G.J., and Tannenbaum, P.H. The measurement ofmeaning. Champaign-Urbana: University of Illinois Press, 1957.

Platt, J.R. Strong inference. In H.S. Broudy, R.H. Ennis, and L.I. Krimerman (Eds.), Philosophy of educational research.New York: John Wiley, 1973.

Popham, W.J. Results rather than rhetoric. Evaluation Comment,1972, 2. 2-5.

Popham, W.J. Educational evaluation. Englewood: Prentice-Hall,1975.

Provus, M.S. Discrepancy evaluation. Berkeley: McCutchan, 1971.

Scriven, M.S. A possible distinction between scientific disciplines and the study of human behavior. In R.E. Feigl and M.S. Scriven (Eds.) , Minnesota studies in the philosophy of science.Minneapolis: University of Minnesota Press, 1956.

Scriven, M.S. Causes, connections and conditions in history.. In W.H. Dray (Ed.), Philosophical analysis and history, New York: Harper and Row, 1966a.

Scriven, M.S. Value claims in the social sciences. Publication #123 of the Social Science Education Consortium. Indiana University, 1966b.

Scriven, M.S. The methodology of evaluation. In R.E.. Stake (Ed.), Curriculum evaluation (AERA monograph series no. 1). Chicago:Rand McNally, 1967.


187

Scriven, M.S. Evaluation skills (AERA tape series no. 6B). Washington: American Educational Research Association, 1971a.

Scriven, M.S. Roy G. Biv Papers. Unpublished Correspondence, 1971b.

Scriven, M.S. Prose and cons about goal-free evaluation. Evaluation Comment, 1972, _3, 2-5.

Scriven, M.S. Goal-free evaluation. In E.R. House (Ed.), School evaluation: The politics and process. Berkeley: McCutchan,1973.

Scriven, M.S. Discussion with educational R and D evaluators. In H. Poyner (Ed.), Problems and potentials of educational R and D evaluation. Austin: Educational Systems Associates, 1974b.

Scriven, M.S. The concept of evaluation. In M.W. Apple, M.J. Subkoviak, and H.J. Lufler, Jr. (Eds.), Educational evaluation: Analysis andresponsibility. Berkeley: McCutchan, 1974c.

Scriven, M.S. Exploring goal-free evaluation: An interview withMichael Scriven. Evaluation, 1974d, 2, 23-25.

Scriven, M.S. "Evaluation of Southwest Education Development Laboratory's Children's Folklore Project: Pass It On" (Unpublished Report)Berkeley: 1975a.

Scriven, M.S. Evaluation bias and its control. Kalamazoo, Michigan:The Evaluation Center Occasional Paper Series, 1975b.

Scriven, M.S. and Roth, J. Evaluation Thesaurus. Pt. Reyes, CA: Edgepress, 1977.

Snider, J.G. and Osgood, C.E. Semantic differential technique: Asourcebook. Chicago: Aldine, 1969.

Stake, R.E. The countenance of educational evaluation. Teachers College Record, 1967, 68, 523-540.

Stake, R.E. Responsive evaluation. Paper presented at the 1974 Annual Meeting of the American Educational Research Association.

Stake, R.E. and Denny, T. Needed concepts and techniques for utilizing more fully the potential of evaluation. Educational evaluation:New roles, new means (Part 2). Chicago: University of ChicagoPress, 1969.

Stufflebeam, D.L. A depth study of the evaluation requirement. Theory into Practice. June 1966, 5_, 121-133.

Stufflebeam, D.L. Evaluation as enlightenment for decision-making. Columbus, Ohio: The Evaluation Center, 1968.


188

Stufflebeam, D.L.; Foley, W.J.; Gephard, W.J.; Guba, E.G.; Hammond,- R.L.; Merrlman, H.O., and Provus, M.L. Educational evaluation and decision making. Itasca, Illinois: F.E. Peacock, 1971.

Stufflebeam, D.L. Should or can evaluation be goal-free? Evaluation Comment, 1972, 3_, 2-5.

Stufflebeam, D.L. and Scriven, M.S. (transcribed tape). AERA traveling institute on evaluation. Tampa, Florida, 1973.

Stufflebeam, D.L. Toward a technology for evaluating evaluation. Paper presented at the 1974 Annual Meeting of the American Educational Research Association.

Stufflebeam, D.L. Training materials for SHASDA workshop. Kalamazoo, Michigan: The Evaluation Center, 1975.

Tyler, R.W. General statement on evaluation. Journal of Educational Research, 1942, 35, 492-501.

Tyler, R.W. Basic principles of curriculum and instruction. Chicago: University of Chicago Press, 1950.

Tyler, R.W. Changing concepts of educational evaluation. In R.E. Stake (Ed.), Curriculum evaluation (AERA monograph series no. 1.). Chicago: Rand McNally, 1967,

Tyler, R.W. Ralph Tyler discusses behavioral objectives. Today1s Education, 1973, 6_, 309-311.

Walker, J.P. Installing an evaluation capability in an educational setting: Barriers and caveats. Paper presented at the 1972 AnnualMeeting of the American Educational Research Association.

Webster, W.J. Some considerations in developing useful evaluation designs. Paper presented at the Minnesota Educational Program Audit Conference, 1971.

Welch, Wayne W. "Goal Free Evaluation Report for St. Mary's Junior College" (Unpublished Report) Minneapolis: 1976.

Welch, W.A. and Hambleton, R.K. On the use of goals in evaluclion: Astudy of selected issues. Paper presented at the 1975 Annual Meeting of the American Educational Research Association.

Winer, B.J. Statistical principles in experimental design. New York:Holt, Rinehart and Winston, 1973.

Wittrock, M.C. and Wiley, D.E. The evaluation of instruction: Issuesand problems. New York: Holt, Rinehart and Winston, 1970.


Worthen, B.R. and Sanders, J.R. Educational evaluation: Theory and practice. Worthington: Charles Jones, 1973.


a field study of goal-based and goal-free evaluation

Documents