a field study of goal-based and goal-free evaluation
TRANSCRIPT
Western Michigan University Western Michigan University
ScholarWorks at WMU ScholarWorks at WMU
Dissertations Graduate College
4-1980
A Field Study of Goal-Based and Goal-Free Evaluation Techniques A Field Study of Goal-Based and Goal-Free Evaluation Techniques
John W. Evers Western Michigan University
Follow this and additional works at: https://scholarworks.wmich.edu/dissertations
Part of the Educational Assessment, Evaluation, and Research Commons
Recommended Citation Recommended Citation Evers, John W., "A Field Study of Goal-Based and Goal-Free Evaluation Techniques" (1980). Dissertations. 2645. https://scholarworks.wmich.edu/dissertations/2645
This Dissertation-Open Access is brought to you for free and open access by the Graduate College at ScholarWorks at WMU. It has been accepted for inclusion in Dissertations by an authorized administrator of ScholarWorks at WMU. For more information, please contact [email protected].
A FIELD STUDY OF GOAL-BASED AND GOAL-FREEEVALUATION TECHNIQUES
by
John W. Evers
A Dissertation Submitted to the
Faculty of the Graduate College in partial fulfillment
of the requirements for the Degree of Doctor of Education
Department of Educational Leadership
Western Michigan University Kalamazoo, Michigan
April 1980
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A FIELD STUDY OF GOAL-BASEDAND GOAL-FREE EVALUATION TECHNIQUES
John W. Evers, Ed.D.
Western Michigan University, 1980
Educational evaluation theorists propose two methodologies that
could be used to assess a project or program’s achievements. One
method is well established and is called goal-based evaluation.
The other has been proposed recently and is called goal-free
evaluation. Little information exists about goal-free evaluation
in field settings. The problem to be addressed was:
What would be the results of a field study that
compared the relative utility of operationalized
versions of goal-free and goal-based evaluation
techniques?
The perspective selected to investigate this problem was
evaluator/evaluee interactions. This study had two
objectives:
1. to develop materials and procedures for using the
two techniques in an evaluation study, and
2. to investigate the relative utility of the two techn-
niques through a field exploration of the evaluator/evaluee
relationship.
The techniques were operationalized through handbooks that
incorporated checklists. Subjects were randomly
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
selected from recommendations by nationally recognized evaluators
and were randomly assigned to training in one of the two techniques.
Projects to be reviewed were randomly assigned to evaluators.
Three instruments were developed: two of a Lilcert: type and
one as a semantic differential. The two Lilcert instruments were
used to assess the following elements of the on-site evaluation
process:
1. evaluator/project director rapport
2. evaluators' time utilization .
3. evaluator/project director expectations of each other
4. evaluator/project director overall satisfaction
5. evaluator confidence with the methodology.
The third instrument went through a developmental process to
establish reliability, and was used by the evaluators who were
trained in one of the two techniques. Evaluator ratings of the
on-site process were analyzed with a repeated measures ANOVA.
Project director ratings of the on-site process and of the
utility of the evaluation reports were analyzed through use of a
completely randomized hierarchical design.
Findings supported some of the proposed differences between
the goal-free and goal-based techniques:
1. the two groups reported different patterns of
activities while on-site,
2. the goal-free evaluators rated themselves lower than
the goal-based evaluators in elements of the on-site process,
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• 3. evaluce ratings of the on-site process did not differ
significantly, and .
4. evaluee ratings of'the reports produced from the two
techniques did not differ significantly.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ACKNOWLEDGEMENTS
The development of this dissertation has benefitted from
both the advice and criticism of Professors Daniel Stufflebeam,
.Tim Sanders, John Sandberg, and Mary Ann Bunda. I would also
like to recognize the challenges and opportunities provided by
the Evaluation Center while at Ohio State and at Western
Michigan University.
John W. Evers
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
INFORMATION TO USERS
This was produced from a copy of a document sent to us for microfilming. While the most advanced technological means to photograph and reproduce this document have been used, the quality is heavily dependent upon the quality of the material submitted.
The following explanation of techniques is provided to help you understand markings or notations which may appear on this reproduction.
1. The sign or “ target” for pages apparently lacking from the document photographed is “Missing Page(s)” . I f it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting through an image and duplicating adjacent pages to assure you of complete continuity.
2. When an image on the film is obliterated with a round black mark it is an indication that the film inspector noticed either blurred copy because of movement during exposure, or duplicate copy. Unless we meant to delete copyrighted materials that should not have been filmed, you will find a good image of the page in the adjacent frame.
3. When a map, drawing or chart, etc., is part of the material being photographed the photographer has followed a definite method in “sectioning” the material. It is customary to begin filming at the upper left hand corner of a large sheet and to continue from left to right in equal sections with small overlaps. I f necessary, sectioning is continued again—beginning below the first row and continuing on until complete.
4. For any illustrations that cannot be reproduced satisfactorily by xerography, photographic prints can be purchased at additional cost and tipped into your xerographic copy. Requests can be made to our Dissertations Customer Services Department,
5. Some pages in any document may have indistinct print. In all cases we have filmed the best available copy.
UniversityMicrofilms
International300 N. ZEEB ROAD, ANN ARBOR, Ml 48106 18 BEDFORD ROW, LONDON WC1R 4EJ, ENGLAND
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8017071
Evers, Jo h n W arren
A FIELD STUDY OF GOAL-BASED AND GOAL-FREE EVALUATION TECHNIQUES
Western Michigan University Ed.D. 1980
University Microfilms
International 300 N. Zeeb Road, Ann Arbor, M I 48106 18 Bedford Row, London WC1R 4EJ, England
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
TABLE OF CONTENTS
ACKNOWLEDGEMENTS ........................................... ii
LIST OF TABLES ............................................. iv
CHAPTER
I. THE PROBLEM AND ITS BACKGROUND................... 1
Context of the Field Study..................... 3
Overview of Goals as Evaluation Criteria....... 5
II. GOAL-FREE EVALUATION AS DEVELOPED IN THELITERATURE .........................................10
Background.......................................H
Philosophical Tenets .......................... 12
Implied Procedure............................... -14.
Application .....................................17
III. METHODOLOGY ..................... 33
Subject Selection and Assignment................43
Training........................................ 46
Instrument Development ........................ 48
Data Collection Procedures ....... 53
Data Analysis Procedures....................... 54
IV. FINDINGS AND DISCUSSION............ 59
APPENDICES ................................................. 73
BIBLIOGRAPHY .............................................. 189
iii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
LIST OF TABLES
Sources of Information and Their Use With Goal-Basedand Goal-Free Evaluation ................................... 35
Composite Background of Subjects By Groups..... ........... 47
Summary of Information Reported On Activity Logs .......... 60
Repeated Measures, ANOVA Analysis: of Evaluator Process ^Ratings .....................................................
Item Means and Standard Deviations From Evaluator Process Rating Instrument............................................ 64
Completely Randomized Hierarchal ANOVA Analysisof Project Director Process Ratings......................... 66
Completely Randomized Hierarchal ANOVA Analysis ofProject Director Ratings of Report Utility.................. 67
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 1
THE PROBLEM AND ITS BACKGROUND
"Educational evaluation is in the air. Indeed, demands for edu
cational evaluation are so prevalent that most educators must believe
they are living in an evaluation generation" (Popham, 1975, p. 1).
Most of those demands began with the passage of the Elementary and
Secondary Education Act (ESEA) of 1965. It mandated evaluations of
Federally-funded "Title" programs. The House and Senate revisions
of a bill to reauthorize ESEA in 1978, retained continued provisions
for evaluation activities (Report on Educational Research, June 14,
1978). Even though evaluations occurred (whether mandated or
volunteered) some developers of evaluation theory and practice
have reported methodological concerns.
For instance, Stufflebeam et. al., (1971) reported in an overview
and project about the field of educational evaluation that "descrip
tions of evaluation methodology are lacking" (p. 336). Along the same
line, Worthen and Sanders (1973) more recently reported that "there is
little or no data-based information about the relative efficacy of
alternative evaluation plans or techniques" (p. 334)... In general,
evaluation has been viewed in the literature as an underdeveloped
process even though it had been an on-going, mandated activity through
Federal requirements since 1965.
Several evaluation techniques have been developed, tested and re
ported. However, other evaluation techniques have been reported but
not fully developed and field tested (e.g., the Goal-free Technique:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2
Scriven (1972) and Modus Operand! Analysis: Scriven (1974). Theye
has been a need for further development and testing of these evaluation
techniques. Therefore, studies to operationalize the recently proposed
evaluation techniques and to explore their efficacy or utility in
a field setting are worthwhile.
In this study, the goal-free evaluation technique (Scriven, 1972)
was operationalize., and field-tested. It was compared to a more
classical technique called goal-based evaluation (Tyler, 1942).
The problem to be addressed was: what would be the results of a
field study that compared the relative utility of operationalized
versions of goal-free and goal-based evaluation techniques? The
particular perspective selected to investigate this problem was the
evaluator/evaluee interaction. This interaction was one important
perspective when considering the relative utility of evaluation
techniques. In other words, an attempt was made both to develop
evaluation materials and to study the effects of those materials on
the evaluator/evaluee interaction process. It was assumed that a
"naturalistic" approach (Guba, 1978) would provide a worthwhile
perspective for considering evaluator/evaluee interactions.
This study had two objectives:
(1) To develop materials and procedures for using the goal-free
and goal-based approach in an evaluation, and
(2) To investigate the relative utility of these goal-based and
goal-free techniques of evaluation through a field exploration of the
evaluator/evaluee relationship.
The following investigatory questions were derived from the two
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3
purposes of the study:
(1) What materials and procedures are needed to operationalize
the goal-free and goal-based evaluation approaches?
(2) When the materials and procedures of these two approaches
are field tested, will the evaluators rate the evaluation process
differently depending on which approach they are using?
(3) When the materials and procedures of these two approaches
are field tested, will the evaluees rate the evaluation process
differently depending on which approach is being applied?
(4) When the materials and procedures of these two approaches
are field tested, will the evaluees rate the evaluation reports
differently depending on which approach was used?
One limitation of this study concerned the fourth investigatory
question. Each report only received a rating by the evaluee whose
project was reviewed. It would have been desirable to obtain multiple
ratings of each report, and to compare reports based on the two
approaches in relation to their theoretical assumptions and content.
The next section presents an overview of the context of the study
that was designed to deal with this problem.
Context of the Field Study
The Evaluation Center at Western Michigan University contracted
with the Hill Family Foundation of St. Paul, Minnesota to review
several projects in the Midwest that were being sponsored by this
Foundation. Certain elements of this contracted evaluation lent them
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4
selves to a field-based investigation of the two evaluation approaches.
To clarify these elements an overview is presented.
The overall contracted evaluation had two parts. One part was
more extensively developed and dealt with an external review of each of
various projects at 16, four-year, independent colleges throughout the
Midwest and Northwest. The second part involved in-service training in
evaluation for the college project and Foundation staff. The part of
the evaluation project that dealt with an assessment of the projects
was expanded to include a field study that compared the goal-free and
goaL-based evaluation techniques.
Early in the overall contracted study an orientation session was
held between representatives of the projects, the Foundation, and the
Center. The purposes of the session were to allow: the Foundation
officials to explain the purpose of the evaluation to project repre
sentatives, the project representatives (the evaluees) to give a gen
eral progress report about the early stages of their projects, the
representatives of the Center to meet all parties and explain the de
sign of the evaluation study, and the Foundation staff and project
representatives an opportunity to critique and influence later imple
mentation of that design. About six weeks later preliminary baseline
information about goal-setting procedures and the strategies chosen to
achieve goals was collected, by means of a mailed questionnaire, from
each project. These two steps preceded the investigation of the two
evaluation approaches and provided information that was later used with
the evaluators who participated in the field study.
In earlier evaluation work by the Center, it was found that one
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
evaluator preceding a panel of experts could greatly expedite their
work by gathering much information and reporting preliminary hypotheses
about project achievement for their further investigation. The evalu
ators. who made these preliminary site visitations were called "traveling
observers." In this study, evaluation strategies used by these travel
ing observers were structured as either goal-based or goal-free. This
structuring provided the basis for this study's field-based comparison
of goal-free and goal-based evaluation techniques.
Information from traveling observer visitations was developed into
two sets of reports. One report was developed first for the review
panelists who were to follow later. Each report was structured in ac
cordance with the approach followed in its development, i.e., goal-
free or goal-based. The report was then reviewed and edited by the
traveling observer into another report to be presented to the project
director whose project had been assessed. At a later point in the
study all reports were sent to the Foundation.
Only the traveling observer portion of the overall contracted
evaluation was used to investigate the effects of the goal-free and
goal-based strategies. No other elements of the contracted study were
involved in this field investigation of evaluation techniques.
Overview of Goals as Evaluation Criteria
The primary variable investigated in this study was the goals of
an enterprise being evaluated. That is, what difference does it make
if an evaluator does or does not consider a project's goals when eval
uating it?
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Stake (1974) referred to the approach that concentrated on a pro
ject's goals as preordinate evaluation. Preordinate refers to the
use of prespecified goal statements as a blueprint for reviewing achieve
ments. Congruence refers to use of prespecified goal statements and
their match to terminal outcomes. Or, in other words, to what degree
do outcomes match intentions? Mismatches between outcomes and
intentions are reported as discrepancies, or non-achievement.
Stake (1974) and Clark (1975) reported that many evaluators pre
sume that a comparison of achievement to some prespecified goal state
ments is an essential part of an evaluation plan. Evaluations that
ascribe to this approach use an intrinsic set of criteria to assess
achievement. Intrinsic assessment criteria are found by examining an
educational object to discover original intentions or specifications.
Once criteria are established by looking "within" the object, they
are used as the basis for judging a present state of achievement.
In 1977, Scriven provided the following definition of this goal-
based or preordinate approach to educational evaluation:
This type of evaluation is based and focused on knowledge of the goals and objectives of the program, person or product. A goal-based evaluation does not question the merit of goals; often does not look at cost-effectiveness; often fails to search for or locate the appropriate critical competitors; often does not search for side-effects; in short, often does not include a number of important and necessary components of an evaluation. Even if it does include these components, they are referenced to the program (or personal) goals and hence run into serious problems such as identifying these goals, handling inconsistencies in them and changes in them and changes in them over time, dealing with shortfall and overrun results and avoiding the perceptual bias of knowing about them. (p. 13)
Cronbach (1963),Scriven (1967), and Stake (1967) focused on the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
limitations associated with using prespecified goals and objectives as
criteria to assess achievement. Cronbach's position was that evaluation
calls for a description of outcomes and side-effects on the broadest
possible scale even at the sacrifice of superficial fairness and pre
cision. He added that observed outcomes should range far beyond the
actual curriculum content. This early position was reflected in
Stake's and Scriven's later writings. For example, Stake reported that
assessing the congruence between outcome achievement and prestated ob
jectives is only part of the evaluation process. He explained that an
evaluator must also search for side-effects and incidental gains rather
than narrowly report goal achievement.
Scriven (1967) stated this position more strongly. He said that
evaluation includes as an equal partner with a measure of performance
against goals, a procedure for the evaluation of the goals. That is,
if the goals are not worth achieving, then it is unimportant to see
how well they were achieved. Scriven explained that it is more im
portant to ask How good is the course? rather than Did the course
achieve its goals? Finally, as a premonition of ideas yet to come,
Scriven said succinctly that an evaluation should see what the project
does, and not bother with whether it had good intentions (p. 60).
Goals are considered necessary for management and planning, but
not for evaluation, Scriven (1972) later reported. The evaluator
who is ignorant of the espoused intentions of the project (stated
goals) avoids a perceptual set that more often than not biases
judgments of real achievements. By ignoring espoused intentions
(stated goals), Scriven said that the evaluator would have a greater
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
chance of assessing the real effects of the project. This approach was
called goal-free evaluation (GFE) by Scriven because merit was deter
mined independent (free) from the intended goals.
Scriven (1977) defined goal-free evalution as follows:
In this type of evaluation,.the evaluator(s) is not told about the purpose of the program but enters into the evaluation with the purpose of finding out what the program actually is doing without detailed cueing as to what it is trying to do. If the program is doing what its stated goals and objectives say, then these achievements should show up; if not, it is argued, they are irrelevant. Merit is determined by relating program achievements to the needs of the impacted population, rather than to the program (i.e. agency or citizenry or congressional or manager's) goals.It could thus be called "needs-based evaluation" or "consumer- oriented evaluation" by contrast with goal-based or manager- oriented evaluation. It does not substitute the evaluator's goals for the program's goals, nor the goals of the consumer.GFE is generally disliked by both managers/administrators and evaluators, for fairly obvious reasons. It is said to be less intrusive than GBE, better at finding side-effects and less prone to bias. (p. 13)
Scriven proposed that achievement can be assessed by a comparison
to an extrinsic set of criteria. Extrinsic, goal-free criteria
are found by looking "outside" the object to be evaluated and
by purposefully bypassing the potentially biasing effects of
intrinsic, or goal-based, criteria. Examples of extrinsic
achievement criteria are the demonstrated needs of a target popu
lation, educational project, program, or object.
As might be expected, the goal-free approach caused considerable
discussion. For instance, Stufflebeam (1972) commented on the
overall merit of the GFE approach by saying that the strategy was
potentially useful, but far from operational and replicable.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Because of its promise, Stufflebeam believed that Scriven and
others should further develop and test it, and report the effects of
GFE, whatever they turn out to be. Concurrently Popham (1972) reviewed
the possibilities of GFE and reported that although the strategy was
alluringly portrayed by Scriven, he would have to wait until GFE was
tried in real evaluation settings to see its effects. Except for an
example by House and Hogben (1972), the question of the effects of
goal-related information on the evaluation process was the subject for
intellectual debate but not practical investigation.
In summary, this study had two objectives:
(1) To develop materials and procedures for using the goal-free
and goal-based approach in an evaluation, and
(2) To investigate the relative efficacy of these goal-based and
goal-free techniques of evaluation through a field exploration of the
evaluator/evaluee relationship.
As an overview of the rest of the dissertation, Chapter 2 contains a
review of the development of the two evaluation techniques with an
emphasis on the goal-free approach since it is the least well known.
The techniques are compared through consideration of background,
philosophical teiiets, implied procedure, and applications. The
third chapter describes subject selection, training, instrument
development, and data collection procedures. Chapter 4 presents
findings, a discussion of implications, and recommendations for
future research.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 2
GOAL-FREE EVALUATION AS DEVELOPED IN THE LITERATURE
In conducting this study it was important to consider some caveats
in generalizing about differences in evaluation techniques. Evaluation
theory and its operational techniques were not a settled issue. GFE
and GBE were evolving. This made it difficult to generalize about
specific differences at a point in time. The risk in generalizing
about underlying differences was that one might mistakenly assume that
an evaluator employing one approach or the other was acting in
accordance with a standard protocol. That, realistically, was not
the case in this study. There wasn’t, and still isn't, any standard
protocol for doing goal-free evaluation. The early literature which
led up to GFE was mainly philosophical in nature and contained little
operational guidance.
For example, Scriven’s early (1967) writings presented general
thoughts about judging, goals, and assessing actual rather than intended
effects. His later (1971a) publications foreshadowed usage of the goal-
free approach by discussing the concept of "effectiveness" within a
five step evaluation process. Effectiveness was presented as information
about treatment;effects. that were not restricted to the' espoused.,
goals and included impact in unstated directions. This information
was then to be rated by the evaluator as good or bad. Scriven also
presented the position that the foundation stone of professional ethics
for evaluators was that they should see themselves as "enlightened
surrogate consumers" who are concerned for the welfare of society
as a whole, not just the target group of a producer.
10
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
11
More recent writings (1975) presented applications of the goal-free
technique. The following sections were organized through a discussion
of background, philosophical tenets, procedures, and applications of the
goal-free technique. Discussion of the goal-based technique was in
cluded where necessary.
Background
Scriven (1974, p. 34ff) reported that individuals were often put
in the position of external, summative, product evaluators. That is,
an evaluator who is not on the original staff is hired to make
terminal judgments about the achievements of some product. These
evaluators are too often confronted with rhetoric from the original
proposal as evidence to support the excellence of the product. This
rehetoric is used as a substitute for lack of evidence about actual
product effects. Scriven advanced the position that this rhetoric of
intention most probably affects the way in which the final product
is evaluated even when the evaluator .goes out and.obtains data on
effectiveness.
According to Scriven, the rhetoric of the proposal is often
couched in cliches, faddism, and jargon and serves a primary purpose
of obtaining project funds. Scriven (1974) concluded that reading
through the goal rhetoric contributed nothing to the evaluation process.
Scriven went on to say that it in fact produces a negative effect, and
that reading the intentions of the producer tends to develop a percep
tual set, or tunnel vision, for the evaluation. Following a blueprint
from intention to end product creates a situation where the evaluator
tends to look less hard for side-effects of the product. Looking in
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
12
the direction of announced' effects developed a situation that limits
potential assesment of a broad spectrum of actual effects, either neg
ative or positive. Since evaluation for Scriven was primarily-concerned
with assessing all actual effects, rather than only intended effects,
he proposed that evaluators neither read, nor be aware of the producer's
intentions or goals.
Philosophical Tenets
To Scriven, the producer's goals and intentions were potentially
contaminating. He emphasized that goals were necessary for project man
agement and development, but not for project evaluation. As further
evidence for the argument against the evaluator needing to be aware of
goals, Scriven (1974, p. 37) described the problem of trying to sort
out alleged goals from actual goals. He pointed out that it was not
the evaluator's responsibility to become entangled in the problem of
sorting out goals. As mentioned previously, it was Scriven’s position
that the evaluator's responsibility was to serve as an enlightened
surrogate consumer concerned with the welfare of society as a whole.
Another problem is trying to understand goals that are so
vaguely stated that they could cover any positive or negative effect.
Scriven emphasized that it was not the evaluator's responsibility to
uncover what was really intended. Goals are usually stated in either-
grandiose or short-sighted terminology. Rather than report a project
fell short or overachieved, Scriven explained that it was more appro
priate to assess what was actually achieved, and to give credit for
performance rather than promise. It is assumed that this type of
technique would help practitioners assess whether programs actually
work. .
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
13
The true strength of the GFE approach was claimed by Scriven to be
in reporting effects that had been previously overlooked. If an intended
effect is not strong enough to be detected by the goal-free evaluator,
then it is not important enough to be reported. According to him, the
goal-free approach lets the chips fall where they may, and actually
fosters discovery and invention by giving credit for good things that
were not stated as goals.
A major question, however, concerns what replaces goals as a
standard for assessing achievement. Stufflebeam (1972), Alkin (1972),
and Kneller (1972) pointed out that they doubted that the goal-free
approach was actually free of goals as the name suggested. Each dis
cussed various ways that the criteria used in the goal-free technique
could' be considered as goals. Perhaps Scriven's approach would have
received less criticism on this point if he had labelled it goal-blind
rather than goal-free.
Responding to these critiques, Scriven (1974, p. 37) pointed out
that the evaluator was not free to substitute personal goals for those
intended by the project. He considered it an error to believe that
criteria have to be either the goals of the evaluator or the evaluee.
However, the criteria that were used may be (and Scriven pointed out
that they probably were) somebody's goals. Scriven suggested that the
GFE'r use criteria that were similar to those of the consumer, of the
funding agency, or interestingly enough, the goals of the producer if
they went through a validating and judging process by the evaluator
(more on this point follows).
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
14
Implied Procedure
Scriven (1973, 1974) mentioned that a primary design consideration
was that someone other than the goal-free evaluator screen initial in
formation sent from the project or funding agency to delete goal-related
information. He also suggested that this intermediary accompany the
GFE'r on-site as a liaison or buffer between the project staff and the
evaluator. This individual is to act as a check against early ex
posure to prejudicial information while the evaluator is developing a
"mind set" about effects. Again, what is essential is to work very
hard in the initial stages of the evaluation process to keep prejudical
information about what the treatment intends to do from the attention
of the evaluator.
The goal-free approach, according to Scriven (1974, p. 37) was
unaffected by a project changing goals during its developmental process.
It therefore did not present a rigid evaluation design that required
an unchanging treatment to assess actual project effects. Interestingly
enough, Scriven (1972, 1973, 1974 and personal correspondence) recom
mended that GFE can be used in conjunction with GBE with different in
dividuals each using a different technique (one goal-based and one
goal-free evaluator). Another possibility is to employ one individual
who starts goal-free. Then, as he was exposed to goal rhetoric he is
converted to a more goal-based operation. The techniques are not inter
changeable in sequence however. Once a goal-based approach has been
implemented, it cannot be converted to a goal-free approach. On the
other hand, once the initial stages of the evaluation process have oper
ated with the goal-free approach to gain the benefits of establishing
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
15
an objective set of hypotheses about the object, the GFE approach can
be reversed to a GBE approach. Scriven (1974) reported that this com
bining of techniques develops an optimum situation where one can
"have his cake and eat it too." That is, both goal-based and goal-free
criteria can be used in the assessment of merit.
An additional difference that Scriven (1973, 1974) anticipated was
the degree of client, or project staff, anxiety produced by the two ap
proaches. Scriven implied that the relationship between the goal-based
evaluator and the project staff is one of cozy cooperation. That is,
the evaluator and project management are relatively clear on the stan
dards used and the variables to be considered. It can be anticipated
that the project staff would be more than willing to discuss their inten
tions with the evaluator so that he would better "understand" their
products. However, as might be expected, the situation of an evaluator
who does not talk to the staff until some time late in the site visitation
and who may have even directed project materials, correspondence, and
the orientation session to a liaison person for review, is seen to
evoke anxiety on the part of the project management. This anxiety
supposedly results from not knowing what effects were being considered
and what standards were used tc assess them.
Based on these published discussions of the GFE approach, it was
possible to summarize potential operational differences between it and
GBE.
The goal-based evaluator reviews any and all project documents to
establish the intended outcomes. This involves an orientation session
by project management when arriving on-site. Products and treatments
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
16
are reviewed for congruence with original specifications. The final
GBE evaluation report contains some side-effect information and may
even provide judgments of the producer's goals, but generally assesses
congruence between intent and achievement. Standards (intrinsic
criteria) are biased by exposure to the rhetoric of intention prior to
the review of treatment and products.
On the other hand, the goal-free evaluator reviews general mater
ials that were screened for goal rhetoric prior to arrival on-site.
Information that is appropriate for the GFE'r to review includes
sample materials, observer descriptions of the process, nature of con
trol groups, time constraints, results of quizzes, and so forth.
Scriven explained that the primary concern of this initial screening
strategy is to establish conditions where the evaluator initially
inferred a wide range of possible effects without being oriented to those
that were intended. This is a reflective stage of generating a set of
potential hypothetic-deductive or "if-then" statements about potential
elements of the treatment before observing the actual treatment.
At least two techniques could be employed either independently
or in combination to generate these hypotheses. In the
first the evaluator relies on professional judgment. This approach
presents problems of validity and objectivity, but it has the advantage
of exploiting professional expertise and using a large set of variables.
The second alternative — relying on an identification of the established
needs of the target population or consumer — has the advantage of as
sessing merit in terms of a project's service to people. But needs data
often are not available and their collection requires substantial man
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
17
power, time, and money. Assuming that the GFE'r can somehow generate
hypotheses, he next sets out to gather relevant data. After spending
a reasonable amount of time observing the object to be evaluated, the
GFE’r is supposed to present the initial hypotheses and observations to
the project staff. It is then appropriate to assess the importance of
each effect. According to Scriven this is done in a number of ways.
Needs assessments data are called up or collected de nova. The project’s
goals are referenced, or the evaluator gets a variety of experts and
lay persons to judge the effects. It may be the extreme case that the
producer is diligent in using the needs assessment when setting pro
duction goals, and all observed effects can be assessed by reference
to the initial goals. It is possible in this case to use these val
idated and judged goals of the producer as criteria for the evaluation.
The point to be considered as crucial is not that some goals are
eventually accepted as criteria, but that the process used to arrive at
those criteria is as objective as possible.
Application
Three examples of actual goal-free evaluation had been reported at
the time of this writing. The first was accomplished in 1972 by Ernest
House and Donald Hogben at the Center for Instructional Research in
Curriculum Evaluation (CIRCE) at the University of Illinois. The
second was directed by Michael Scriven through a consulting firm
(Education and Development Group) at Berkeley in 1975. The third was
by Wayne Welch through the University of Minnesota in 1976. There were
similarities and differences between the three examples.
To report the similarities and differences, a logical framework
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
18
proposed by Stufflebeam (1974) for analyzing evaluation studies was
employed. This framework (An Administrative Checklist for Reviewing
Evaluation Plans) is extensive since it includes the following sections:
1. Conceptualization of Evaluation2. Socio-Political Factors3. Contractual/Legal Arrangements4. The Technical Design5. The Management Plan6. Moral/Ethical/Utility Questions
Each of these six sections contains sub-questions that provide ana
lytical detail of the studies.
Only the final report of each of the three studies was used as the
source to be analyzed with the checklist. Using the final report was a
limitation since each project was more involved than could be presented
in a final report to a client group. However, that document was the
easiest to obtain and it revealed much about the evaluation.
If there was little or no information to answer a sub-question
from the checklist, it was considered as not available. A comparison
of the three applications of goal-free evaluation follows. The nar
rative is divided by each of the six areas in the Stufflebeam Check
list. A summary table precedes each of the six narrative sections.
These tables follow the points within the Administrative Checklist.
Each table is presented as an advance organizer to similarities and
differences across the three applications.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
19
1. Conceptualization of Evaluation
Ho-use/Hogben Scriven Welch
Definition N.A. implied N.A.
Purpose external review external review external review
Questions each different: see text
each different: see text
each different: see text
Audiences varied audiences: see text
varied audiences: see text
varied audiences: see text
Process implicitly GFE implicitly GFE implicitly GFE
Standards N.A. meta evaluator used N.A.
Two of the three studies reported no definition of evaluation;
Scriven's report implies that evaluation means assessing the treatment
as it actually takes place. This lack of definition was a limitation
that made identifying the underlying similarities of intent difficult.
The three studies shared some similarities of purpose since each was an
external review of a project. Both Scriven and Welch’s work was planned
as a supplement to on-going internal evaluation activities.
Each study addressed different questions. House and Hogben were
concerned with What new perspectives about the curriculum can be seen?
and What would be a new emphasis for the internal evaluation efforts?.
Scriven's efforts focused on the following questions:
Are the materials ready to be marketed?What are suggestions for future developmental efforts?What are the implications of this informal learning
process?
How good are these materials? was the question that Welch’s study
addressed. It was possible to summarize the thrust of these
questions as Tell the client about their product from a perspective
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
that is fresh and outside the developmental process.
House and Hogben's main audience was the developers of a biology
curriculum (BSCS) for 13 to 15 year-old educable mentally retarded
persons. Scriven reported to three audiences. First, his main
audience was the Southwest Educational Development Laboratory. The
second audience was the National Institute of Education and the third
was prospective consumers of the materials. The internal evaluation
staff at St. Mary's Junior College was Welch's audience for his report.
The three agents doing each evaluation were referred to earlier:
CIRCE, EDC, and the University of Minnesota. The process that each
used was referred to as goal-free in the documents. There was no
explicit definition of the process each used, however, at a later
point in this discussion (under technical design) a description
of the activities occurs. House and Hogben described no standards
that they were judging their own work by. Likewise, Welch provided
no description of external standards. However, Scriven reported that
Stufflebeam was judging his project in the role of a meta evaluator.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
21
2. Socio-political Factors
House/Hogben Scriven. Welch
Involvement Public School Staff
Public School Staff
College Staff
InternalCommunication
N.A. Used Liason Staff Person N.A.
InternalCredibility
N.A. Evaluee Reactions Prior to Distri
bution
N.A.
ExternalCredibility
Minimal ■ Bias Checks
Extensive Bias Checks
Moderate Bias Checks
Security N.A. AnonymousReporting
AnonymousReporting
Protocol N.A. PrearrangedVisitations
N.A.
Public N.A. N.A. N.A.
House and Hogben's, and Scriven's project evaluated activities
that occurred in a public school classroom. Specifically, House and
Hogben involved the development staff, two classroom teachers, and
their students. No reference was made about how these two groups' sup
port was to be obtained. Scriven's goal-free evaluation involved the
SEDL staff and the instructional/administrative staff at ten elementary
schools in California. There were several references to Scriven's
staff working with the school staffs to gain their involvement. Welch
reported limited contact and involvement with the St. Mary's staff
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
22
since his study reviewed materials that were isolated from a classroom.
Both Welch and House/Hogben reported no internal communication
process between the evaluators and the evaluees. Scriven related that
the project manager of the evaluation study served as the key liason
across the participants. Also, House/Hogben and Welch reported no
steps to establish internal credibility for their evaluation work.
Scriven reported that the final report was sent to the evaluees for
their reactions prior to shipment to his main audience.
All three goal-free evaluations reported some means to establish
external credibility. House/Hogben’s work was considered to be mini
mally bias free because two evaluators teamed the work, but only one
person synthesized the data for the final report. More extensive bias
checks were used by Scriven. He reported the use of a meta-evaluator,
an evaluator project manager to screen materials from the projects, a
replacement process for '’contaminated” evaluators, a process for train
ing and calibration of evaluators, and a standard checklist for use in
observation of the treatments. Scriven also described as additional
checks against bias, the use of multiple site visits across time and
the use of multiple site visitors (evaluators) per site. Welch reported
moderate bias checks. His evaluation project used a five member rating
panel with four of those members rating the materials independently.
The fifth member of the panel then aggregated the results of the other
four panelists. The five panelists also employed a standard rating
.instrument.
No procedure for security of the evaluation data was reported by
House/Hogben. Scriven and Welch reported some attempts at security.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
23
Welch reported that all panelists' ratings of each product were report
ed anonymously to the client. On the other hand, Scriven claimed that
the classroom teachers that were observed using the product were guaran
teed absolute anonymity.
Neither House/Hogben nor Welch described attempts to maintain pro
tocol while evaluating the products. However, Scriven reported that
his staff checked in at the ten schools at prearranged times, rather
than by surprize. No attempts at public relations with the media were
reported by the three evaluators.
3. Contractual/Legal Arrangements
House/Hogben Scriven Welch
Client/EvaluatorRelationship
Clearly Stated: See Text
Clearly Stated: See Text
Clearly Stated: See Text
EvaluationProducts
Final Report Final Reports Final Report
DeliverySchedule
Varied Varied Varied
Editing Implied Evaluator’s
Responsibility
Explicitly Evaluator"s
Responsibility
ImpliedEvaluator's
Responsibility
Access:to Data See Text See Text See Test
"Release Reports N.A. N.A. N.A.
Responsibility & Authority
N.A. N.A. N.A.
Finances N.A. . N.A. , . . N.A.
The client/evaluator relationship was clearly defined in all three
studies. In the House/Hogben study, the BSCS staff were the developers;
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
24
House/Hogben were the external summative evaluators. Likewise, the
SEDL staff were the developers, and Scriven’s staff were the external
summative evaluators. The St. Mary's staff were the developers, and
Welch and staff were the external summative evaluators. All three
evaluations specified that their main evaluation product was the pro
duction of a final evaluation report. The House/Hogben report was due
after approximately ten days of consulting time. Scriven sent two
reports to SEDL after about five weeks time. One was goal-free in
nature, and the second report was a revision after knowing the goals.
This second version was also sent directly to the teachers who were
evaluated with the products. Welch's report was due after approximately
two or three days consulting time. In the Welch and House/Hogben
studies the responsibilities for editing the final evaluation report
were assumed to be those of the evaluators. Scriven reported that
editing responsibilities were definitly those of the evaluator.
Each evaluation study described different, means for access to
project data. House and Hogben reviewed existing data from the devel
oper and then collected new data at two field sites. Scriven reviewed
categories of developer's data (but not actual data) and then selected
some categories of data for further evaluation at the ten field sites.
Welch reviewed the materials independent of any developer's data.
Neither Welch, Scriven, or House/Hogben specified any responsibilities
for the release of the evaluation reports. All three were receiving
funds from the developer, however, the amount and schedule were un
reported in the final reports.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
25
4. The Technical Design
House/Hogben . Scriven Welch
Objectives & Variables
Varied: See Text
Varied: See Text
Varied See Text
InvestigatoryFramework
Varied: See Text
Varied: See Text
Varied See Text
Instrumentation N.A. ChecklistApproach
ChecklistApproach
Sampling N.A. Not Used Not Used
Data Gathering See Text See Text See Text
Data Storage & Retrieval
N.A. ' See Text See Text
Data Analysis ProfessionalJudgment
IndependentSynthesis
IndependentSynthesis
Reporting N.A. N.A. N.A.
TechnicalAdequacy
Fair Best Good
All three studies reported slightly different ways of considering
objectives and variables. In the House/Hogben evaluation study, the
evaluators interviexved the project staff at a point late in the evalu
ation. The project staff were then asked the goal priorities of the
project: producing materials and getting them accepted by teachers.
In Scriven's goal-free study, the evaluators first wrote a goal-free
report. They then reviewed the developer's materials and found that
the developer's intentions were not what the evaluators had discovered
while reviewing the curriculum. Welch reported that no goals were
mentioned or reviewed by the evaluators. Both Scriven and Welch had
prespecified certain categories of variables by using a standard check
list.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
26
In the Welch evaluation, the investigatory framework was described
as panel ratings of materials with a concurrent interview of the evaluees
by an intermediary. Scriven used a framework that included site re
views with observation and interviews. That framework can be summarized
in this way: observers used a standard instrument to describe exactly
what the treatment was at each site without any prior notions of what
the treatment was meant to be. The evaluators also used a standard
instrument to describe exactly what effects the treatment had with
students, teachersjand the overall school without any prior notion of
goals. After completing the observations and submitting the goal-free
reports, Scriven's staff examined all goal-related materials for a
comparison to their previous observations about treatment effects.
Any revisions of the GFE report that were needed at this point were
appended, but the original content remained unchanged. In the House/
Hogben study, there was limited description of the investigatory frame
work: site reviews with observations and interviews.
Both Scriven and Welch used a specially developed checklist ap
proach as instrumentation. House/Hogben reported no instrumentation
used. Again, the Welch and Scriven studies were similar because neither
used sampling. Both reviewed all sites and materials, respectively.
House/Hogben reported no information about sampling or about their
method of data storage and retrieval of evaluation information.
Scriven described his method of data storage as 150 to 200 reports in
a raw form with much narrative in an original state. Welch's data for
the evaluation was individual packets of materials that received alpha
betic ratings from A-t- to D- from each of the four judges.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
27
Each of the three studies presented different methods of data
analysis. House/Hogben used a professional judgment approach through
a narrative. In Scriven's study, raw reports were synthesized once by
Scriven and another time independently by the project manager. The
two independent syntheses were then merged by the evaluation project
staff. In Welch's study, the alphabetic ratings were converted to
numbers and averaged by the fifth judge.
All three studies similarly reported no information about report
ing techniques for their summative information. Each provided their
clients with a physical report. However, Scriven provided a copy of
his goal-free report to the evaluees, the classroom teachers.
Regarding the technical adequacy of the evaluation studies,
Scriven's was the soundest of the. three. He provided good interjudge
reliability across site reviewers since they had been trained to use a
standard checklist. Good validity was provided by Scriven's use of
several content specialists, of multiple site visits across time,, and
of mixing the observers across sites. He presented a good chance to
maximize objectivity through use of a meta-evaluator to critique the
study as it evolved, and through a replacement procedure for evaluators
who became "contaminated” by learning too much about the goals.
Welch's study was the next best in terms of technical adequacy.
There was good interjudge agreement on ratings of each package although
the grades were consistently skewed towards high marks. There was un
certain validity in the study since the materials were reviewed in an
isolated situation without any users and no content specialist was
among the judges. There was moderately good objectivity since each
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
28
judge's rating was independent from the others.
House and Hogben's study was the least technically adequate of the
three. There was questionable reliability in this study since no in
terjudge calibration was pursued between the two evaluators. Validity
was open to question since the evaluation was a one-shot observation
and interview process. The study had questionable objectivity since no
bia3 checks were employed other than the evaluators externality to the
developers.
5. The Management Plan
House/Hogben Scriven Welch
OrganizationalMechanism'
ExternalContracts
ExternalContracts
ExternalContracts
OrganizationalLocation
Champaign,Illinois
Berkeley,California
Minneapolis, Minnesota
Policies & Procedures
N.A. N.A. N.A.
Staff Two Seven Five
Facilities N.A. N.A. N.A.
Data Gathering Schedules
N.A. Five Weeks N.A.
ReportingSchedule
N.A. N.A. N.A.
Training N.A. Implied Implied
Installation of Evaluation
N.A. N.A. N.A.
Budget N.A. N.A. N.A.
In reference to the organizational mechanism, all three evaluations
were contracts with an external agency. As mentioned previously, House
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
29
Hogben, Scriven, and Welch were respectively located in Champaign-Urbana,
Illinois; Berkeley, California; and Minneapolis, Minnesota. There was
no information available in the final evaluation reports about any man
agement policies and procedures that influenced the three studies.
Concerning staffing of the three studies, House/Hogben worked as a
pair of evaluators but they had received counsel about the goal-free
process from the CIRCE staff and Scriven. The Scriven study was staffed
by Scriven, a project manager, a kindergarten to third grade educator,
an early childhood consultant, two general site visitors and a meta
evaluator. Welch and four evaluation students did the work in the Welch
study.
There was no information available about the facilities used in the
three studies. House/Hogben and Welch provided no information about
their data gathering schedules. On the other hand, Scriven reported
that his study gathered data for approximately five weeks by classroom
observations and interviews. House/Hogben and Welch provided little in
formation in their final reports about reporting schedules during their
studies. Therefore, it was assumed that the final report was the main
report. Scriven similarly described the final report as the main sched
uled report, however, he also presented a packet of correspondence with
the meta-evaluator.
Neither Welch nor House and Hogben described any evaluation staff
training even though Welch referred to use of a checklist. Scriven
trained his evaluation staff to use both the goal-free approach and the.
checklist approach. There was no information across the three studies
about the budgets used. Cost figures would have been an important com
parison.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6. Moral/Ethical/Utility Questions
30
House/Hogben Scriven Welch
PhilosophicalStance
Unclear Unclear Unclear
ServiceOrientation
N.A. ImpliedConsumer
N.A.
Evaluator's Values
Unclear Unclear Unclear
Judgments See Text See Text See Text
Obj ectivity Unclear Unclear Unclear
Projects for Utility
N.A. N.A. N.A.
Cost/Effectiveness N.A. N.A. N.A.
There were no specific descriptions of the philosophical stance of
the three. Even though all three were in the position of judging the
objects that they were reviewing, it was unclear whether the three oper
ated from a value-based position. Based on Scriven's report and his
previous work in the goal-free area, it was assumed that his study em
ployed consumer-based values. Carrying the values issue further, it was
uniformly unclear across the three reports whether or not there was
potential conflict between the evaluator's values and other parties'
values who were involved in the studies. Judgments to be made in each
study were as follows: House/Hogben - evaluators judge the program;
Scriven - staff judges the program; Welch - panel judges the materials.
Objectivity of the three studies, in terms of cooptation, was also
considered. It was unclear whether the three were, or when they were
coopted by their clients. However, there has been some discussion about
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
31
checks against cooptation in previous sections (two and four):
Scriven's study had the best provisions of the three. The three studies
presented no information about the final report's utility to the client
or the evaluees. This would have been an interesting comparison point
since, as already stated, no previous research or evaluation study
provided an evaluee's or client's rating of evaluation•reports (which
is one purpose of this study being reported). Similarly, there was
no information about the cost/effectiveness of the three studies:
another useful comparison point.
These three examples of applying the goal-free technique differed
in objects evaluated, length of time, size of staff, background of
staff, and reporting techniques, even though all were considered
to be goal-free. These differences in procedures were to be expected.
As pointed out earlier, the technique was evolving and procedural
differences were normal evolutionary phenomena.
Some highlights did occur as similarities across applications.
For example, the case Involving Scriven and Welch used a type of
checklist so that observers/judges reviewed effects in similar dimen
sions. Both of these two examples included checks for objectivity
and a key staff member as a screening agent to review potentially biasing
materials and situations. Scriven's objectivity checks were more
extensive, but both attempted to maximize and protect the objectivity
that was assumed unique to the goal-free technique. Scriven's example
provided other highlights to applying his approach: training of
observers to use a standard protocol, releasing the final report to
the evaluees before delivery to the main client, appending reactions to
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
goal-related materials after writing the goal-free reports, a contin
gency plan for replacing observers who became goal-oriented before
ending observation of the treatment effects, a data collection schedule
that repeated observations and rotated observers, and interviews with
key consumers. Both House/Hogben and Scriven reached a point in their
evaluation process where they reviewed the developer's goals and
then cross checked that information with their observed data.
Some of these points were incorporated into the investigation being
reported here and will be further discussed in the next chapter about
methodology.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 3
METHODOLOGY
The first of the two purposes of this study was developmental:
to develop materials and procedures to implement an evaluation using
either a goal-free or goal-based technique. The second purpose was
exploratory: to investigate the relative efficacy or utility of these
goal-free and goal-based techniques for evaluation through a field
exploration of the evaluator/evaluee relationship. The methods used
to accomplish each purpose are presented in this chapter, starting
with the developmental purpose.
A general review of evaluation was done to ideixtify studies in
which GFE and GBE were implemented. As could be expected with a new
and evolving technique, few operational examples of the goal-free
technique were found. However, the illustrations by House & Hogben,
Scriven, and Welch gave suggestions for an activity sequence. Scriven
also gave a personal critique of the original plan for this study while
consulting at the Evaluation Center in 1974. His primary suggestion
was that if the study involved individuals with various backgrounds
and skills then a "checklist approach" should be considered so that
some calibration across evaluators would be possible. A decision was
made in this study to use a checklist approach to implement both GFE
and GBE.
A checklist approach was used in both the Welch and Scriven
studies that were reported in the previous chapter. However, at the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
34
time that this dissertation study was being implemented there were few
instances in which a checklist was used in GBE studies. Therefore,
this investigator devised a checklist for both GFE and GBE.
The previous review about GFE reported that there were certain
things that a goal-free evaluator would do differently, and that
there were certain pieces of project data that the GFE'r needed to
avoid so that objectivity would be heightened while reviewing the
project. Scriven's writings alluded to several sources of project
information that were to be treated carefully, while doing a goal-
free evaluation.because of potential biases. Those pieces of infor
mation were broken down by pre-site visitation, and on-site sources.
A listing of these two types of information is included in Table
1 at the far left hand column. This left hand column provides a
general framework of information sources that both the goal-free
and goal-based evaluator would use. This list of information sources
enabled the investigator to analyze the two approaches to determine
which sources could and could not be used by GFE and GBE evaluators.
Within the two general evaluation approaches there were three
sub-categories. Each is presented in the following table as a
separate column. The columns labeled "Theory" reflect both the
GFE and GBE literature and its considerations of these information
sources. That is, did the writings on the two approaches allow use
of these sources? The remaining four columns present two existing
checklists and the two modified checklists that would be used in this
study.
The Scriven Checklist was his 1974 version of a "Checklist for
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
TABLE 1Sources of Information and Their Use with Goal-based and Goalrrfree Evaluation
Sources Techniques/Uses
GOAL-BASED GOAL-FREE
Theory according to Tyler; Scriven
Stufflebeam
Checklist
ModifiedStufflebeam
Checklist
Theory ScrivenChecklist
ModifiedScrivenChecklist
A. Initial Contacts (Examples: telephone calls, letters, face-to- face conversation) Yes Yes Yes
Only if screened
Only if screened
Only if screened
B. Parts of the Project, or Program, Proposal
1. overview of the problem Yes Yes Yes Yes Yes Yes
2, needs assessment data Yes Yes Yes Yes
criticalYes
criticalYes
critical3. goals/objectives Yes Yes
criticalYes
critical INo No No
4. proposed strategies Yes Yes Yes No No No
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
TheoryModifiedStufflebeam
Checklist
Stufflebeam
Checklist
Theory ScrivenChecklist
ModifiedScrivenChecklist
5. proposed activity plan(s) Yes Yes Yes No No No
6. proposed staffing plan Yes Yes Yes
probablyNo
probablyNo No
7. proposed budget probablyYes Yes Yes
probablyYes
probablyYes Yes
B.
1.
Target Group/Evaluator Interactionscheck target group needs
probablyYes Yes Yes
restrictedand
screened
restrictedand
screened
restrictedand
screened2. check target group
treatment effects probablyYes Yes Yes
restrictedand
screened
restrictedand
screened
restrictedand
screenedC. Representative Pro
ject Materials1. curricular - stury
guides text materials, tests Yes Yes Yes
restrictedand
screened
restrictedand
screened
restrictedand
screened2. non-curricular -
environmental or experiential or "gestalt"
probably Yes j
probablyYes Yes
restrictedand
screened
restrictedand
screened
restrictedand
screened
Reproduced with
permission
of the copyright owner
Further reproduction prohibited
without permission.
Theory Stufflebeam
Checklist
ModifiedStufflebeam
Checklist
Theory ScrivenChecklist
ModifiedScriven
Checklist
D. Process Observation of Treatment
Yes Yes Yes
restrictedand
screened
restrictedand
screened
restrictedand
screenedE. Internal Evaluation
Data Examples:1. Data about cognitive,
affective and psychomotor effects like test results, report cards, graded papers, and student self- assessment Yes Yes Yes
restrictedand
screened
restrictedand
screened
restrictedand
screenedF. Historical or
Archival1. minutes of staff
meetingsprobablyYes
probablyYes Yes Yes Yes Yes
2. budget status reports probably
NoprobablyYes Yes Yes Yes Yes
3. internal staff correspondence probably
NoprobablyYes Yes Yes Yes Yes
4. correspondence between project & funding agent
probablyYes
probablyYes Yes
only if screened
only if screened
only if screened
Reproduced with
permission
of the copyright owner
Further reproduction prohibited
without permission.
Theory Stufflebeam
Checklist
ModifiedStufflebeam
Checklist
Theory ScrivenChecklist
ModifiedScriven
Checklist
5. miscellaneous progress reports
probablyYes
probablyYes Yes
only if screened
only if screened
only if screened
D. Overview of Research/ Literature in Area of Investigation No
probablyNo No Yes
probablyNo No
. ON-SITEA. Staff/Evaluator
Interactions1. staff introductions
to the project Yes Yes Yes
restrictedand
screened
restrictedand
screened
restrictedand
screened2. staff "PR" tours
Yesprobably
No Yes No No No
3. final debriefings Yes Yes Yes Yes Yes Yes4. data about long and
short-term effects or benefits Yes Yes Yes
restrictedand
screened
restrictedand
screened
restrictedand
screened
39
Evaluation of Products, Producers, and Proposals." The Modified
Scriven Checklist represents a version of this 1974 checklist that
was adapted to clarify procedures. The Stufflebeam Checklist was de
veloped by Stufflebeam and employed in earlier studies at the Evalu
ation Center. Again, the Modified Stufflebeam Checklist was this
author’s adaption of the Stufflebeam version so that it more clearly
emphasized the goal-based assumptions.
It should be pointed out that the Stufflebeam Checklist did not
conform to all of Scriven*s points found in his earlier definition of
goal-based evaluation. That is, it was reported earlier that "a goal-
based evaluation does not question the merit of goals; often does not look
at cost effectiveness; often fails to search for or locate the- appro
priate critical competitors; often does not search for side-effects;
in short does not include a number of important and necessary compon
ents of an evaluation." The Stufflebeam Checklist did include the
above points.
However, Scriven*s definition went on to say that "even if it
does include these components, they are referenced to the program
(or personal) goals and hence run into serious problems such as iden
tifying these goals, handling inconsistencies in them and changes in
them over time, dealing with shortfall and overrun results, and
avoiding the perceptual bias of knowing about them." It was unclear
about how the Stufflebeam Checklist dealt with these points. Even though
the Stuffelbeam version did not conform to Scriven's conception of GBE,
for the sake of comparison it was modified to reflect a goal-based approach.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
40
Table 1 shows areas where the two techniques were different in
terms of using certain information sources. As one reads down the GFE
column, the terms "screened" and "restricted" appeared frequently.
Screened refers back to earlier discussions about an individual who
assists the goal-free evaluator during early stages of the evaluation
both in terms of editing materials and serving as a liaison to the pro
ject staff. This person serves as a critical buffer between the evalu
ator and sources of bias while the GFE'r is trying to employ strate
gies of discovery and investigation to uncover actual effects.
Restricted has more than one meaning. One. meaning is.that the
source is used only in isolation from the project staff. The GFE'r
does not observe any materials or activities wherein the project
staff might provide them with cues. Any critical explanation comes
from the GFE censor who works with the evaluator as an editor. Another
meaning is that the source simply must be "off-grounds" to the goal-
free evaluator. For example, the staft introductions to the project
that are often filled with public relations rhetoric are avoided at
all costs by the GFE'r, but might be reviewed, by the GFE censor.
This strategy maintains critical independence for the evaluator but
allows screened information to go from the censor to the evaluator.
This strategy of using the censor as a shield also allows the public
relations activities to occur and lessens possible negative reactions
•from the staff by simply not talking to them, as Scriven suggested.
Other points for discussion exist in this table. For example,
points II Al, 2, and 3 show that in the final stages of the evaluation
unrestricted evaluator and staff interactions are necessary so that
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
41
the hypotheses about the actual effects may be discussed with the
staff. A similar situation of final debriefing occurred in the evalu
ation done by Scriven. That is, Scriven's staff reviewed all the devel
oper's goal materials after deciding what actual effects existed at the
sites. They then debriefed themselves and reported to the developer
about the mismatches between intended and actual effects. Another point
is that most historical or archival materials were good sources, where
as, most parts of the project proposal were bad sources. Why the pro
posal was a bad source of information has been covered earlier. The
assumption underlying the goodness of the archival sources is that as
staff meetings occur and monies are dispersed these phenomena re
flect actualities rather than intentions. Hence, they become good
sources for goal-free evaluations.
With these structural, theoretical, and procedural differences in
mind, two evaluation handbooks were developed. "The Handbook for
Evaluators, First Edition" was developed (see Appendix C) for the goal-
free approach. With its development the Product Evaluation Checklist
was revised to add clarification and illustration. This first edition
had narrative sections that explained the setting of the evaluation con
tract, and provided a conceptual overview of the evaluator's role.
Individuals interested in detailed content are referred to Appendix C.
Similarly, the goal-based approach was developed through a check
list so that parallel structures existed. This effort can be found in
Appendix D. Prior to use, both handbooks were reviewed by four staff
members at the Evaluation Center for logic of presentation and overall
utility. Revisions were made after this content review process.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
42
Another factor in development of the handbooks and the overall
techniques was that evidence should exist about the degree of implemen
tation of the two evaluation approaches. That is, there should be
reasonable evidence that the two approaches were operationalized and
executed according to plan. Lack of such reasonable evidence would
make any analysis of data or their interpretation meaningless. Two
indicators were developed. One indicator was an activity log that
described sources of information/interactions and the amount of time
spent with each. The other indicator was a process rating form to be
filled out by each evaluator after leaving each site. Review of these
indicators allowed the degree of implementation of the two approaches,
i.e., the independent variable, to be ascertained.
This ends a discussion of the first purpose of this study: to
develop materials and. procedures to do both goal-free and goal-based
evaluation. The process of development has been described . Appendices
C and D contain the actual products of development, along with specific
on-site procedures. These materials and procedures responded to the
first investigatory question: what would be the nature of materials
and procedures that are. developed to do goal-free and goal-based evalu
ations?
The second purpose of this study, to investigate the relative
efficacy or utility of the goal-based or goal-free techniques for
evaluation through a field exploration of the evaluator/evaluee
relationship, is discussed in the following section.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
43
Subject Selection and Assignment
Evaluators for each technique were chosen so as to avoid a selec
tion bias in the investigation. To insure a rigorous and equitable
test of the two techniques several evaluators were selected from a
larger population. To develop a population of competent individuals,
a nationally-recognized group of evaluators and contributors to eval
uation theory were asked for their recommendations. Although not in
clusive of all possible contributors to evaluation theory, the fol
lowing persons were asked to recommend individual:
(1) Marvin Alkin (13) Richard Jaeger(2) Benjamin Bloom (14) David Krathwohl(3) Henry M. Brickell (15) Leslie McLean(4) Mary Anne Bunda (16) Howard Merriman(5) Lee Cronbach (17) James Popham(6) Robert Ebel (18) Malcolm Provus(7) Walter Foley (19) Michael Scriven(8) Gene Glass (20) Robert Stake(9) Egon Guba (21) Julian Stanley
(10) Robert Hammond (22) Daniel Stufflebeam(11) Thomas Hastings (23) Ralph Tyler(12) Ernest House (24) Wayne Welch
(25) Blaine Worthen
As can be seen in Appendix A, they were sent a letter asking for
one or two recommendations given these criteria:
1. Currently practicing evaluation as either a graduate student or practitioner in the field.
2. Able to commit approximately six days to the task.This included a one-day orientation session at the Evaluation Center.
3. Has a proven ability to operate as an independent evaluator.
4. Writes with an insightful, unlabored style.
5. Located within a radius of approximately 600 miles from Kalamazoo, Michigan: (to reduce travel costs.).
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
44
Thirty-oiie individuals who were believed to meet these criteria were
recommended.
Individuals from the population of thirty-one were randomly selec
ted by use of a random numbers table and concurrently assigned to
either the goal-free or goal-based treatment group through a coin flip.
Thau is, given a list of thirty-one evaluators, that individual whose
number was chosen first from the r'andom number table was assigned to
either treatment group by a coin flip. This routine was repeated until
all thirty-one were assigned to groups.
On completion of random selection and random assignment, there
were sixteen subjects in the goal-free group and fifteen subjects in
the goal-based group. For economical and logistical reasons three
evaluators were desired for each treatment group. However, not know
ing the actual availability of individuals that had been recommended,
oversampling was done by selecting and assigning all subjects to treat
ments. Subjects were then contacted in the order selected, until three
were available for each group.
Evaluators were contacted and told that within a contract being
implemented by the Evaluation Center at Western Michigan University
there was a need to recruit evaluation specialists to do site-review
work. They were also told that a small group of individuals would be
involved, that some training and orientation would be provided, that
two sites were involved for each individual, that individuals would
be expected to 3pend approximately two days at each site, and that
travel expenses, per diem, and an honorarium would be provided.
To give an estimate of the recruitability of those who accepted
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
45
the assignment, four out of sixteen subjects in the goal-free group
were contacted to obtain three evaluators. Six out of fifteen were
contacted in the goal-based group to obtain three evaluators. One
subject in the goal-based group resigned from the assignment after
accepting it due to an overcommitment of time. This subject was re
placed by going back to the original list of subjects for the group
and recruiting the next individual listed. Sites to be reviewed were
similarly assigned at random to the subjects to control for differences
that existed across sites that could inadvertently bias the results.
Site assignment utilized a lottery technique.
It is important to note that subjects were not told that there
were two groups involved or that a comparative study of evaluation
methodologies was occurring. Subject ignorance of the overall study
within a "routine consulting assignment" was maintained until a tele
phone debriefing session that occurred after all evaluation activities
were completed. No agreements were made with the evaluators regarding
confidentiality of data during the study, since they did not realize
the study was occurring. However, it was agreed during debriefing
that names of evaluators would not be linked to ratings of on-site
process or evaluation reports, and that confidentiality would be pro
vided through composite information.
Without specifying individuals by groups, here is a listing of
evaluators who participated along with their organizational affiliation
at the time of the study;
(1) Evelyn BrzezinskiMichigan State Department of Education
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2) Donald Coan Indiana University
(3) Stephen Kemmis University of Illinois
(4) Richard Smock University of Illinois
(5) Jane Stoller University of Minnesota
(6) Jerry WalkerOhio State University
Training
The GBE group was sent all available materials (i.e., proposals,
progress reports, survey results) for each project to be visited. In
dividuals were given background information about the two sites they
were to visit, and plans for travel arrangements. The GFE group re
ceived an edited version of project-related materials. For example,
some elements of the proposals were not sent because they included in
tentions and goals, whereas, some material from project-written pro
gress reports and a survey was sent as admissible material if it per
tained to a historical description of actual project achievement. This
author served as facilitator or editor for the two groups so that they
received information that was either goal-based or goal-free.
A one day orientation session for each group was developed and im
plemented. In general, it was reported during the orientation sessions
that the handbooks (Appendices C and D) were useful and understandable.
Subjects worked through possible uses of the checklist approach by a
group discussion of materials found in the handbook. Discussions
focused on operationalization of the checklist points in the handbook,
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
47
and potential strategies to be used on-site with various individuals.
Finally, subjects were asked to provide general background data on
themselves on the form provided in Appendix B. It was decided that
only a few unobtrusive questions would be posed to the subjects (like
"list prior evaluation experience") so that they would have less reason
to feel that the request was unusual.
Self-reported background data were categorized and quantified to
provide the following group summary.
Table 2
Composite Background of Subjects by Groups
Background Variables
Group n Sex Previous Experience3- Highest Degree
Goal-free: 3 1 Female 1 Extensive 1 Doctorate2 Male 2 Moderate 2 Masters
0 Little
Goal-based: 3 1 Female 1 Extensive 2 Doctorate2 Male 1 Moderate 1 Masters
1 Little
background variable of previous experience in evaluation was quantified by examining subject’s self-reported evaluation experiences on the background data sheet and scored as follows:
"Extensive experience" was equal to subject experiences in evaluation covering historically dated periods of time greater than three years.
"Moderate experience” was equal to subject experiences in evaluation covering historically dated periods of at least one to three years.
"Little experience" was equal to subject experiences in evaluation covering historically dated periods of less than one year.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
48
It can be seen that the goal-free group had slightly more previous
evaluation experience, however, the goal-based group had more academic
experience at the doctoral level. It was assumed that the groups
were not different to the degree that either was biased, or that one
had a clear advantage, over the other.
Instrument Development
Three instruments were developed to collect data for this study.
Two were relatively short and of a Likert form, whereas, one was longer
and of a semantic differential form.
Both shorter instruments were used to measure several potential
elements of the on-site evaluation process. Items were developed to
assess the following process dimensions as was discussed earlier in
the section about development of the evaluation materials:
(1) Evaluator/Project Director Rapport(2) Evaluator's Time Utilization(3) Evaluator/Project Director Expectations of Each Other(4) Evaluator/Project Director Overall Satisfaction(5) Evaluator Confidence of Ability with Methodology
Although other elements of "the on-site evaluation, process could be ident
ified, it was assumed that ratings of these dimensions would present
a general assessment of the evaluation process. It was also, assumed
that an average score across items would yield a meaningful general
indicator of the quality of the on-site process. Evaluators responded
to items for all five dimensions, whereas, evaluees only responded to
items for the first four elements. In either case instrument length
was considered so that obtrusilveness' could be minimized.
The third instrument was a semantic differential and went through
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
a more technical development process. The Phi Delta Kappa Committee
(1971, p. 27-30) and Stufflebeam (1974, p. 5-11) had reported, at
the time of this study, three general categories of criteria that
prescribe necessary and sufficient attributes of evaluative informa
tion. (It should be mentioned that standards for evaluations have
been developed and are being published during 1980.) The categories
are technical adequacy, utility, and prudence. These three general
categories of criteria and the eleven specific sub-criteria follow:
I Gen
eral
Crit
eria I. Technical
AdequacyII. Utility III. Prudence
A. Reliability A. Relevance A. Cost-u cd B. Internal B. Scope Effectiveness•H 41IW >-< Validity C. Importance•H <U C - External D. Timeliness0) iHa. m Validity E. Credibilitycn o D. Objectivity F. Pervasiveness
Summarizing for each of the three, categories of criteria, it could be
said that they are focused on (1) technical soundness of information,
(2) usefulness of information to some audience, and (3) the reasonable
ness of obtaining the information. These elements were transformed to
bi-polar adjectives for evaluee use in rating the evaluation reports.
Since there was no evidence that these elements of utility had been
applied in this form as a semantic differential, a pilot test of a
larger item pool was reviewed to detect any other potentially useable
bi-polars. Before the pilot study is described, some aspects of this
semantic differential mode of measurement are considered.
Osgood (1957) presented a major criterion for inclusion of bi-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
50
polars on a semantic differential instrument as relevance to the con
cept being judged. Written evaluation reports met the assumptions
presented by Osgood for selection of a concept to be rated with the
semantic differential. Osgood suggested that what may function as a
concept (or a stimulus to which the subject's checking operation is
a terminal response) in this broad sense is practically infinite ...
more often printed than spoken ... and the type selected depends
chiefly upon the interests of the investigator.
Since earlier studies (Tannenbaum, 1953, 1955) established the
validity of the semantic differential for use in measuring communi
cation effects, this type of instrument was assumed valid for measur
ing the utility of evaluation reports by the evaluees. A mean score on
the instrument represented a general measure of the evaluee's attitude
towards the overall utility of the report. The higher the mean score,
the higher the utility of the information.
Through a factor analysis study that Osgood undertook using
Roget's Thesaurus, a comprehensive pool of potential bi-polar items
was identified. Since some of these items appeared through logical
analysis to be relevant to the rating of evaluation .report utility,
they were added to the pilot instrument item pool. This presented a
pilot version of the instrument with 58 bi-polars on a seven-point scale.
The sequential position of items and the direction of the negative/
positive adjectives were randomly assigned. This was done to reduce
chances of a response set developing as evaluees did their ratings.
An estimate of the reliability of the semantic differential was
computed. If the instrument could not produce reasonably consistent
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ratings of the evaluation reports, there would be difficulty in inter
preting the information due to the errors of measurement. Re
liability in this study was considered as stability over time. There
fore, a test-retest reliability study was implemented prior to using
the instrument with the evaluees.
Since it would have been difficult to collect ratings twice from
the evaluees, another group was used for the reliability study. Per
mission was granted to use students in three graduate level courses in
the Western Michigan University, College of Education. Since each class
contained approximately thirty students, it was decided to use one out
of the three classes for ease of administering instruments. The class
was selected randomly using a lottery method.
The chosen course was composed of both masters and doctoral level
education majors, and presented introductory methods of data analysis.
The instructor was given background information about the nature of the
reliability study, and understood that students could not be informed
that the retest was over duplicate materials. Individuals within the
class were given a group introduction to the task. They r*ere told
that instrumentation was being developed for rating evaluation reports
and that individuals were needed to try out che instrument so that
strengths and weaknesses could be detected.
As can be seen in Appendix F, subjects of the test/retest study
were provided with an actual, yet anonymous, evaluation report pro
duced by one of the evaluators'.: all subjects rated the same report.
Subjects had no knowledge of the true nature of the reported project.
However, this was a condition for all the subjects. Therefore, this
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
52
condition should have influenced them all equally. Directions also
gave conditions to subjects that established a simulated situation as
a project director. Further instructions to the subjects can be found
in Appendix F. Individuals who participated were volunteers who were
paid for their time after the data from the retest were collected.
Twenty-two individuals participated in the reliability study.
A Personian correlation coefficient was calculated by correlating
all individual mean scores on the first administration with mean scores
on the second administration. Test-Retest reliability, based on one
weeks spacing, for the pilot instrument was _r = .64.
Osgood (1957) reported a test-retest reliability study with a
coefficient of .85. This was the highest reliability coefficient re
ported. He reported that reliability of ratings declined over time,
due to changes in the rater, changes in the concept being rated, or
general errors of measurement, but did not specify the amount or degree.
Therefore, it was assumed that some of the difference between r_ = 1
(a theoretical ideal) and _r = .64 was due to change over time either in
the subject, or error.
The pilot version of the semantic differential was refined through
logical analysis of item distributions in the form of histograms with a
percentage of subjects at each scale point. Since the midpoint on a
seven-point semantic differential scale was considered to be neutral,
use of this mid-point rating could be assumed as restricting the over
all amount of total score variability. Reducing the amount of total
score variability reduced the size of any reliability coefficient
calculated on those total scores. Therefore, in order to increase the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
53
reliability of the pilot version of the instrument, items that had 50%
or more of the ratings at the midpoint of the distribution were elim
inated. This eliminated nine items from the pilot version to produce
a final version of the instrument with 49 items. Although the final
instrument was not retested after revision, it was assumed to have ade
quate reliability for the purposes of this investigation even though
a reduction in the total number of items may have reduced the reliabil
ity.
The final form of the instrument can be found in Appendix G. In
structions for the project directors were sent in a letter along with
the evaluation report prepared for that particular site. It was assumed
that project directors were neither aware of the background of the study,
nor of any attempt to present differing evaluation methodologies at
different sites. Project directors weterinformed .that their ratings.were
providing feedback concerning their-perceptions of..the report's merit . .
for reviewing our procedures for future studies.
Data Collection Procedures
Background data on the evaluators were collected during the orien
tation sessions. Evaluator and project director ratings of the on-site
evaluation process were collected prior to evaluee ratings of the evalu
ation reports. Process rating forms can be found in Appendix D and E.
After evaluators had been to a site, they sent a tape-recorded,
rough draft of their report to the Center for transcription accompanied
by the appropriate version of the process rating form. All evaluators
returned process ratings for each site. Immediately after a subject
had left the project site, the project director was asked to rate the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
54
overall process used on-site. All project directors returned the evalu
ation process rating forms.
As soon as the evaluation report was reviewed by the evaluator, it
was retyped and sent to the evaluee-along with the semantic differential
report rating form. Project directors were told that their ratings of
the evaluation report were desired for future planning purposes. All
evaluees returned the rating form within two weeks of receiving it.
After all reports were sent to the. project directors, and their
ratings collected, the evaluators were debriefed through a telephone
interview. No one reported prior knowledge of a comparative study be
tween methodologies. Therefore, it was assumed that no interaction
between groups had taken place that might have biased the study.
Data Analysis Procedures
Throughout this study data was collected to achieve the purpose
of investigating the relative utility of goal-based and goal-free tech
niques for project evaluation through an analysis of (1) evaluator and
evaluee ratings of the on-site evaluation process and (2) evaluee
.ratings of evaluation reports generated by evaluators- using either ap
proach. Three investigatory questions were operationalized. Each
question follows with the analysis procedure that was employed.
Question; When the materials and procedures of these two approaches are field tested, will the evaluators rate the evaluation process differently depending on which approach they are using?
Data from each item on the evaluator process rating form was totalled
and averaged across items to yield a score that had a range of one to
seven on a Likert-type scale. Since each evaluator rated two sites it
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
was assumed that scores for site one and site two were not orthogonal or
independent. That is, data from each evaluator were correlated.
The .-selected. statistical technique was a repeated measures or split-
plot factorial (SPF 2.2)experimental design (Kirk 1968).
The mixed linear model for that approach is as; follows:.
Xijm = p + ai +■ nm(i) + 8j + cigij + 0TTjm(i) + eo(ijm)
where Ximj was a measure for a randomly selected
subject m^ in treatment population •
______ ji was the grand mean of treatment populations.
ol was the effect of treatment i (evaluation type)which was a constant for all subjects within treatment population i.
01 was the effect of treatment j (trials or time) which was a constant for all subjects within treatment population j .
irm(i) was the constant associated with person m who is nested under level ai.
agij was the effect that represented the nonadditivity of effects ai and 0j.
giTjm(i) was the effect that represented the nonadditivity of effect 0j and 7im(i).
jEo(ijn) was the experimental error which is independent of all other e's and was normally distributed with a mean of 0 and a variance of a|. In this design eo(ijm) cannot be estimated separately from 07Tjm(i).
Presented graphically the design was as follows:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
56
bl b2
S1 Xlll X112
al S2 X121 X122
S3 X131 X132
S4 X241 X242
a2 S5 X251 X252
S6 X261 X262
where a = treatments (GFE and GBE) ■b « trials, or site 1 and site 2
and s = subjects- or evaluators
Question: When the materials and procedures of ..these-twoapproaches are field.tested, will the evaluees rate the evaluation process differently depending on which approach is being applied?
Data from each item on the project director process rating form was
totalled and averaged- across items to yield a score that had a range
of one to seven on a Likert-type scale. Since each project was geo
graphically separate it was assumed that project director ratings were
independent of each other. However, it was also assumed that an im
portant source of variation in the project director ratings could be
the evaluators themselves.
That is, evaluators may have exhibited some personal or professional
traits that influenced project director ratings. In terms of experi
mental design this sort of underlying source of variation is called a
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
57
"nuisance variable" (Kirk, 1968). The appropriate statistical technique
for dealing with nuisance variables is a hierarchical or nested ex
perimental design. In this study six evaluators were nested within two
evaluation techniques giving two main treatment levels and six levels
of nesting. The design is; called completely randomized; hierarchical,
with two levels of one treatment and six levels of another. It was
designated as CRH-2(6).
The linear model for such a design isc a-a, fallows*:
Xijm = y + ai + 0j (i) + em(ij)
where Xijm was a measure for a randomly selected subject m in the treatment population a b ^ .
]J was the grand mean of the treatment populations.
ai was the effect of treatment i which was a constant for all subjects within treatment population i.
Bi(i) was the nesting of treatment 8 within treatmentA. This term was actually the pooled simple main effects of treatment B at each level of treatment A. No interaction term appeared in this model.
em(ij) was experimental error which is normally and independently distributed with a mean of 0 and variance of a|.
Presented graphically the design was as follows:
B1 b2 b3 b4 b6 b6
al abll ab12 ab13
a2 ab24 ab25 ab26
(n = 2)
where a = evaluation techniques (GFE and GBE) b = evaluators
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
58
Question: When the materials and procedures of these two approachesare field tested, will the evaluees rate the evaluation reports differently depending on which approach was used?
Item responses from the semantic differential report rating form were
totalled and averaged for each project director, yielding a total report
rating score with a range of one to seven. Identical to the previous
question, project director ratings were independent of each other. An
important source of variation in those ratings could be the evaluators
themselves. As was mentioned previously, the appropriate statistical
technique to analyze data that included a nuisance variable was a
hierarchical or nested experimental design. Once again, six evaluators
were nested within two main treatment levels of evaluation techniques.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
CHAPTER 4
FINDINGS AND DISCUSSION
This section presents the results and a discussion of implications.
As the reader will recall this study had two purposes. Findings are
reported for each purpose and. its related investigatory questions.
The first purpose was to develop materials and procedures to im
plement an evaluation using either a goal-free or goal-based approach.
The related investigatory question to this developmental purpose was
What would be the nature of materials and procedures that are developed
to do goal-free and goal-based evaluations? In the previous chapter
procedures for developing the goal-free and goal-based evaluation
materials were described, and these materials can be reviewed in the
appendices.
It was also reported previously that instruments were developed
that would allow a general assessment of the implementation of the two
evaluation approaches. As discussed earlier, the treatments, i.e., the
two evaluation techniques, were applied in a field setting with little
control over the degree of implementation. If little evidence could
be found that the two techniques were used on-site at each project,
then discussion of utility or efficacy from ratings by the evaluators/
evaluees would at best be a difficult task.
The procedure to give a general assessment of independent
variable implementation was presented. This procedure involved
two pieces of information. One was a descriptive analysis of
59
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
60
the evaluator activity logs that were filled out while at each project.
The second piece of information included responses to a general .question
on the evaluator process rating form that were obtained after on-site
work. It was assumed that having an indication of the degree of im
plementation both during and after each site visit would give a general
overall description of the treatments that would aid in discussing the
findings and implications.
A review of activity logs revealed two major situations: inter
viewing people and a combined area of document review/planning. Within
interviews there were three groupings of individuals: the project
director, the project staff, and staff outside of the project. As can
be seen in Table 3, the evaluators using goal-free techniques did re
port a different pattern of activities than the evaluators using goal-
based techniques.
Table 3
Summary of Information Reported on Activity Logs By Evaluation Techniques
Reported Activity % of Time-GBE % of Time-GFE
1. Interviewsa. Project Director 21 21b. Project Staff 76 33c. Other Staff 0 9
2. Document Review andPlanning 3 37
Total Percent 100 100
Total Hours Logged 64.5 69.7
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
During their interviewing the goal-based evaluators spent 97% of their
time talking with the direct project staff. On the other hand, the
evaluators who were directed to use the goal-free technique spent 68%
of their time on interviews and made an attempt (9%) to interview people
who were not directly involved with the project. Little direct obser
vation was reported by either group; however, that may be a limitation
of the activity log format.
Looking further, it can be seen that evaluators using the GFE ap
proach reported spending 37% of their time reviewing documents they
acquired while on-site and planning. They reported the following types
of documents:
a. Student data from registrarb. North Central Accreditation reportsc. Minutes of meetingsd. Project budget changese. Institutional budgetf. Institutional profileg. Curriculum materials
The GBE group spent 3% of their time in this area and reported the fol
lowing document types as being reviewed:
a. Project proposal changesb. Project accountability filec. Curriculum materialsd. Project budget revisions.
It appeared that the two groups did report different patterns of activi
ties while on-site. The GFE group spread themselves across several ac
tivities, whereas, the GBE group tended to focus on internal project
staff interviews. The pattern reported by the GFE group suggested that
a goal-free approach focused evaluation activities across more potential
types of information sources to detect project effects.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
62
The fifth item on the evaluator process rating form asked for the
degree of confidence in implementing the approach after leaving the pro
ject. The GBE group reported an average score of 5.83, whereas, the
GFE group reported 3.33. The difference between means was 2.5 points.
This practical difference was large enough to assume that the GBE group
felt more confident in their ability to implement the goal-based approach
than the GFE group felt in implementing the goal-free approach.
More information will be presented in a later discussion of the
evaluator process ratings. However, it can be said that the groups did
differ in reported on-site activities even though the group using the
goal-free approach reported less confidence in their ability to
implement the goal-free approach. These two pieces of information
supported the position that attempts were made to implement two different
evaluation approaches. This assumption will be considered during a
later discussion of findings.
Data that focuses on the second objective, to investigate the relative
efficacy or utility of the two evaluation approaches, will now be pre
sented. There were three investigatory questions that were related to
this field exploration of the evaluator/evaluee interaction. Two
questions referred to ratings of on-site procedures. The third question
investigated project director (evaluee) ratings of reports. Each
question and relevant data follow.
Question: When the materials and procedures of these twoapproaches are field tested, will the evaluators rate the evaluation process differently depending on which approach they are using?
Ratings were aggregated by sites. The following means and standard
deviations by sites were obtained:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
63
Site 1
X S.D.
GBE 5.80 .20
GFE 4.40 .72
A split-plot or repeated measures analysis was applied. Results i
that analysis are presented in Table 4.
Table 4
Repeated Measures ANOVA Analysis of Evaluator Process Ratings
Source df MS F
Between Subjects 5 6.751 20.65*A 1 6.751 20.65*Subject within Groups 4 .321
Within Subjects 6B 1 .404 3.18 = N.SAB 1 .028 .05 = N.SBx Subject within Groups 4 .127
TOTAL. 11 *£ .05
These results show that there were statistically significant differences
between evaluator ratings of on-site techniques. It should be noted
that the two non-significant _F's established that there were no practice
effects or interaction effects between sites and techniques.
Site 2
X S.D.
' 6.27 .31
4.67 .61
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
64
It should be useful to look at the item responses for the two
groups to see where process differences originated. This seemed appro
priate especially in light of the significant £ ratio that was obtained.
It was not necessary to separate process means by sites since it was
found that there was no significant practice, or interaction, effect.
Table 5
Item Means and Standard Deviations from the Evaluator Process Rating Instrument
Items GBEX S.D.
GFEX S.D.
1. Rapport 6.50 .61 5.67 .80
2. Time Use 6.00 .32 4.83 .75
3. Expectations 5.83 .68 5.16 .524. Satisfaction 6.60 .49 5.50 .425. Confidence 5.83 .37 3.33 .98
These item means could range from a low of one to a high of seven. The
GBE group rated all items higher than did the GFE group. The largest
item mean and standard deviation difference was a 2.5 and .61 for item
five. The smallest mean difference was a .67 for item three. Items
two and four were very close in si-a of difference between groups with
a 1.17 and a 1.10 difference on the respective items. Item one was
close to item three in magnitude of mean difference with .83. General
izing across items it appeared that as a group the evaluators using
goal-free techniques were most different from the GBE group in (a) con
fidence to implement the technique, (b) time utilization and overall
satisfaction with the site visit, and least different in (c) rapport
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
65
with the project director and expectations of the project director as
an administrator. These generalizations are based on observations, but
were not statistically tested.
Open-ended comments from the evaluators on the process rating form
provided results that were helpful in interpretation. On the whole,
the GBE group had few comments about the methodology.. There were some
comments that more data should be collected to fully understand the pro
jects and that some reorganization of the checklist may be helpful. The
GFE group was more explicit. Across the three evaluators there was a
common feeling that their checklist approach would have been more ap
propriate for fully mature projects with greater quantities of data.
There was a feeling that the goal-free checklist was too stringent to
judge developmental projects and that there was a need for more descrip
tive categories of information within the checklist. It was also noted
that it was difficult not to gain knowledge of project goals since at
this stage of development little data existed that wasn't goal specific,
and goal oriented.
Question: When the materials and procedures of these twoapproaches are field tested, will the evaluees rate the evaluation process differently depending on which approach is being applied?
Scores averaged across items were analyzed with a completely randomized
hierarchical design with evaluators nested within treatments. The mean
and standard deviation for the GBE group was 6.04 and .69 respectively.
Whereas, the mean and standard deviation for the GFE group was 5.83
and 1.03. Results from the analysis can be seen in Table 6.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
66
Table 6
Completely Randomized Hierarchal ANOVA Analysis of Project Director Process Ratings
Source df MS F
A 1 .130 .106 = N.S.
B w. A 4 1.229 2.594 = N.S.
W. cell 6 .473
TOTAL 11
A was a designation for the two techniques. B w. A was a designation
for the evaluators that were nested within treatments.. W. cell was the
experimental error term. These results showed that there was no signifi
cant difference between on-site ratings for the two groups b}’’ project
directors. Even though the GBE group was rated slightly higher than the
GFE group, the size of difference between ratings held no practical
importance.
This question of the study investigated project director ratings
of the on-site process associated with either the goal-free or goal-
based evaluation approach. Both project director ratings of the on
site process and the influence of evaluators within each approach were
found statistically non-significant. Project directors rated process
items that were parallel to those rated by the evaluators. Even though
evaluators differed in activity patterns and ratings, project directors
who rated the two groups did not significantly differ in their scores.
Question: When the materials and procedures of these two approachesare field tested, will the evaluees rate the evaluation reports differently depending on which approach was used?
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
67
Bi-polar ratings were averaged to yield a total report utility score.
The overall mean utility score and standard deviation for the GBE group
were 5.78 and .30, respectively. The mean and standard deviation for
the GFE group were 5.29 and .57. Report utility scores were analyzed
with a completely randomized hierarchal ANOVA procedure. Results from
that analysis can be seen in Table 7.
Table 7
Completely Randomized Hierarchal ANOVA Analysis of Project Director Ratings of Report Utility
Source df MS F
(A) EvaluationTechniques - 1 .741 1.55 - N.S.
(B w. A) Evaluators WithinTechniques 4 .478 25.16*
(W. cell) ExperimentalError 6̂ .019
TOTAL 11
*£ .01
There was no significant difference between evaluation techniques.
However, there was a highly significant difference between evaluators
within techniques. The difference was one that would not be expected
in 99 cases out of a 100 by chance alone.
These findings did not support the idea that reports that focused
on certain techniques were more highly rated. However, they do support
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the position that reports produced by certain evaluators were more
highly rated than reports produced by other evaluators.
The developmental thrust of this study served to extend the
available materials and procedures to do project evaluation. Especially
helpful to future studies and practitioners was the operationalization
of the goal-free technique. It was found that evaluators can be
trained to use such a goal-free approach and that the training can carry
over to differences in the on-site evaluation process.
It should be kept in mind that any investigation has limitations
that temper results. In this study there were limitations in the check
list approach used as a protocol, in the sensitivity of the instruments
to some differences, in the small number of subjects and review sites,
and in the overall duration of the exploration. As was discussed
earlier, it was found that the goal-free checklist may not have been use
ful for projects that were immature with no developed products.
Similarly, it may have been too early to assess these particular projects
giving only a limited trial to potential differences between techniques.
The instruments used in this study provided general indications of
results. However, instruments with greater technical development may
provide more precise information. ' Another consideration would be to
better conceptualize the measures of utility and efficacy that were
employed in this study. For example, content analysis of the evaluation
reports could directly assess the amount of side effect data. This
study only applied the techniques once at each site during a short period
of time. It is assumed that repeated measures with a type of reversal
of techniques (i.e., goal-free converted to goal-based or two evaluators
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
69
with each using a different technique) would provide a more comprehensive
data base to assess the two evaluation techniques and their wide-ranging
differences.
The results do support the position that evaluators using the goal-
free technique would follow a different pattern of on-site activities
than would evaluators using goal-based techniques. Goal-free evaluators
reported a more comprehensive information base in both people inter-'
viewed and documents reviewed. Although this study did not provide a
measure of amount of side-effect data reported, it could be speculated
that the pattern reported by the GFE group would have a better chance
to provide side-effect information by exploring a more diverse non
project set of information. This speculation about possible scope of
actual effects reported seems plausible since the group using goal-
based techniques spent 97% of their time with the project staff directly, ‘
whereas, the GFE group only spent 49% of their time with the project
management and staff. These differences in types of information sources
and amount of time spent with project staff partially support the goal-
free theoretical contentions for differences in the process of the goal-
free and goal-based techniques.
At the same time, there were doubts raised by the evaluators in
the goal-free group that the checklist approach to goal-free evaluation
was as useful as it could have been. It was reported that the checklist
criteria were too stringent and not descriptive enough for early stages
of projects. This would lead one to consider using the approach with
more developed projects to investigate whether this checklist would be
useful for projects at that later point. If so, this would lead to a
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
TO
reconsideration of the point of entry of the evaluation using this check
list. If not, then further refinement of the checklist may be needed
by adding more descriptive information.
Scriven offered the opinion that doing goal-free evaluation was
threatening to the evaluator since the technique puts one’s profession
alism directly on the line. The findings supported that position. As
a group the goal-free evaluators uniformly rated themselves and the on
site process lower than did the goal-based group. The GFE group was
self-rated lowest on confidence to implement the methodology and on
ability to use time on-site to the best advantage. These two findings
would lead one to suspect that the GFE evaluators were more unsure about
what to do methodologically and how to fit the methods into a time
sequence.
The assumption that evaluees would see the goal-free approach as
more threatening than the goal-based approach was not supported. Both
groups of project directors were similar in their positive ratings of
the on-site process. It should be considered that there were several
plausible reasons that the project director scores were not lower for
the GFE process ratings.
The process rating instrument gave a very general indication of
the on-site phenomena viewed by project management. It was possible
that the instrument was not sensitive enough to areas where differences
existed. It was also reasonable to speculate that a response set may
have developed so that project directors were publicly positive but
privately negative about the evaluation process. Another possibility
was that the project directors were not experienced and sophisticated
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
71
in their contacts with evaluators. They may not have had experiences
that would help them generate personal criteria for assessing the eval
uation activities. That is, they could not perceive good and bad points
in the on-site process so only rated it positively. A replication of
this study with a more sophisticated group of evaluees may give different
results in terms of process ratings.
Finally, in terms of the evaluee ratings of the evaluation reports
generated from these two techniques, the results did not support differ
ences in utility or efficacy based on technique used. However, there
were differences in utility ratings between evaluators. There were
several possibilities for these findings. The techniques, or treatments,
may wash out during the reporting phases. The two techniques may be
more visible during implementation in terms of patterns of activities,
but reporting may be structured through the evaluator's experience
rather than a goal-based or goal-free influence.
It would be reasonable to assume that good evaluators would not
fit their reports into’.a> prespecified structure :for the sake of that
structure alone but draw upon past experience and knowledge to
structure their responses to audiences' information needs. It would
appear fruitful for further research on evaluation techniques to in
vestigate and document variables as a potential source of information
about differences in utility ratings. A content analysis of evaluation
reports from the two evaluation techniques would seem to be a logical
next step in this line of research. Also, a main limitation to be
mentioned again is that assessment of the two techniques processes and
products would have been enhanced by obtaining concurrent ratings from
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
n
other prime audiences, the panel of experts and the Hill Foundation.
This type of a meta-rating should be seriously considered in future
studies of a similar nature.
In surmary, a checklist approach to goal-free evaluation can be
operationalized as an alternative to a goal-based technique. However,
forms of the checklist used in this study may be too structured, or
inappropriate for a project early in its development. Evaluators using
the goal-free approach did show more anxiety during on-site activities
than evaluators using the goal-based approach. The evaluees did not
differ in their anxiety during implementation of the two techniques.
During the reporting phase of the evaluation activities, differences
between evaluators accounted for a large portion of report utility
ratings by project management. Differences between approaches did
not account for any significant portion of project management ratings
of report utility.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
73
APPENDIX A
RECRUITMENT LETTER FOR EVALUATORS
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
I am currently directing a project within the Evaluation Center that has a need for short-term, qualified evaluators. In order to fill this personnel need I have contacted you, along with twenty-two other nationally recognized leaders in the evaluation field, to ask for your personal recommendations.
Specifically, I am looking for individuals who have a background in evaluation to do several site visitations during the last half of July, 1974. There would be a professional fee, per diem, and travel costs, provided by the Center. Basic qualifications for these individuals are as follows:
1. Currently practicing evaluation as either a graduate student, or practitioner in the field.
2. Able to commit approximately six days to the task. This includes a one-day orientation session at the Evaluation Center.
3. Has a proven ability to operate as an independent, solo evaluator.
4. Writes with an insightful, unlabored style.
5. Located within a radius of approximately 600 miles from Kalamazoo, Michigan; to reduce travel costs.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
75
If you can recommend at least one, or more, person(s) who would meet these qualifications, please list them on the tear-off response form, and return in the enclosed, pre-paid envelope. I thank you for any help you can give me in this matter. If you wo'uld happen to have any questions, I plan to call you during the first week in June.
Sincerely,
John W. EversStaff Associate for Program Evaluation
JWE.-lje
(Tear Here)
1. Name of individual giving recommendation:
2. Recommendation(s)
Name:
Address:
Phone & Area Code:
Phone & Area Code:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
76
Thank you very much for assisting me with recommendations for qualified evaluators. I received a large response and had many qualified individuals to choose from.
Finding it difficult to discriminate among those nominated, I selected names randomly. Therefore, not all individuals recommended were contacted.
However., I will be using the list for future evaluation studies, and may contact them at that time.
Sincerely,
John W. Evers, Director Hill Productivity Project
JWE:1j e
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
77
APPENDIX B
EVALUATOR BACKGROUND DATA FORM
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
78
TRAVELING OBSERVER DEMOGRAPHIC DATA SHEET
I. General
1. Name:__________________________________________
2. Business Address:____________ _________________
__________________________________________ (zip).
3. Business Phone: (area code)______ (number)_____
4. Home Address :__________________________________
___________________________________________(zip)
5. Home Phone: (area code)______ (number)__________
6. Social Security Number:_______________________
7. Present Job Title:_____________________________
8. Present Job Description______________________
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
79
II. Specific
1. Prior evaluation training - Academic
(approximate dates) (origin of the training)
2. Prior evaluation experiences - Vocational
(approximate dates) (categorical nature of the experience)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
APPENDIX C
HANDBOOK FOR GOAL-FREE EVALUATION
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
81
HANDBOOK
FOR
TRAVELING OBSERVERS
The Evaluation Center Western Michigan University
J u ly , 1974
First Edition
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
82
AN INTRODUCTION TO THE HANDBOOK*
In accepting this work assignment from the Evaluation Center,
you are expected to follow certain methodological procedures for
collecting information and reporting it back via cassette. This
Handbook provides the following sections in assisting you with
your work.
I. Setting of the Evaluation Study.........................
II. A Conceptual Overview of the T.O.'s Role ..............
III. The Product Evaluation Checklist .......................
IV. Discussion of the Checkpoints...........................
V. An Expanded Checklist to use for Reporting Findings. . ...
Copy 1 Copy 2
VI. A Log of Activities..................................... .
Site 1 Site 2
Attachment A: An Introduction to the Checklist Approach. . .— Michael Scriven
Attachment B: A List of Contacts for the Two Sites ........
*In case of emergency situations while on-site call collect:
7:30 to 5:00 (616) 383-8166
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
83
SETTING OF THE EVALUATION STUDY
The Evaluation Center is currently contracted to the Hill
Family Foundation to assess the merit of a portion of the educa
tional projects they have funded this year. This foundation has
given various amounts of money to a group of independent, four-year
colleges to improve themselves, based on needs identified in a
study by the Hill Family Foundation. Three major educational needs
were identified generally. They are: 1) sharply rising instruc
tional costs, threatening the college's financial stability, 2) per
student costs rising faster than per student income, and 3) faculty
salaries comprising two-thirds of instructional costs.
This problem/needs situation was identified as the outside
parameters of the area of funding. Each institution proposed an
alternative strategy to the foundation. Some were very comprehensive
in scope, and others; limited. The Center began its contract to
study the Hill project in February, 1974 and will run through
January, 1975. We have previously held an orientation session with
the various project directors to hear their plans and allow them
to hear the Center's evaluation plans. A survey was sent to the
participating institutions in the beginning of June. In general,
there are several phases to the overall project, your work and role
as a traveling observer (T.O.) is one of those phases.
One should note that a limited amount of information will be
provided as introduction to the study. This is by design. The
rationale should become more apparent as the following sections
progress.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A CONCEPTUAL OVERVIEW OF THE T.O.'S ROLE
To gain a relational perspective to the work of the traveling
observer, the evaluation study phase preceding the T.O., and the
phase following will be discussed. The phase preceding deals with
a survey, and the phase following involves visitations to the
project sites by a panel of experts.
Only a limited amount of data gathered in the survey will be
made available to the T.O. This information concerns mainly
identification of various individuals on site who would be possible
resources during visitations. These individuals will be identified
more specifically during the T.O. training session held at the
Center and referenced in the next section on procedures. Information
withheld will not be crucial to the function of the T.O. on site.
By comparing the survey information to the T.O. report, the Center
will get a more accurate picture of the individual college projects
for the panel's visitation. It is thought that two independent
perspectives (the T.O.'s report and the survey) will give a more
valid portrayal of the situation than would one, combined
perspective.
The phase following the T.O.'s work is that of a panel of
content experts revisiting the sites to make a synthesized report
based on results of the T.O.'s report, the survey, and on their
own independent observations. One might consider the T.O.'s work
as a preliminary, summative, site visitation by a "professional
detective" to uncover and describe as many actual project effects
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
as possible. Then, another group of experts will follow-up on
those hypothesized effects to synthesize as accurate as possible
a portrayal of the merit of each particular project.
The panel consists of individuals who can apply expertise in
several content areas. The kinds of perspectives represented are
those of evaluation, higher education administration/finance/
economics/planning, staff utilization, and curriculum development.
As will be seen later, the T.O. will need to make recommendations
by those perspectives for each panelist to follow-up by their
specific content area.
Specifically, the objectives of the T.O. are the following:
1. To collect both descriptive and judgmental information on
each specific project at two college sites based on the methodology
presented in the hext section.
2. To summarize the raw information collected at each site on
cassette tapes to be mailed to the Center before proceeding to the
next site, responding to the format presented in a later section.
3. Edit the raw transcriptions into a report that will go
to each project director for the director's reactions. There will
be a time lag between this editing, use of the unedited transcrip
tions by the panelists, and subsequent reaction to the T.O. report
by the project director.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
86
A SPECIFIC T.O. PROCEDURE ON SITE1
As one should have noticed, specific references have been made
in the earlier sections to the fact that certain available informa
tion will not be given to the T.O. before his/her visitation. It
is also the case that the T.O. should guard against certain kinds
of information whi]e on site. Specifically, the methodology being
referenced, procedurally developed, and implemented has been called
goal-free evaluation by Michael Scriven. Although a relatively
small amount of evaluation has been done in this mode, hopefully
one spin-off of this project will be further development of the
goal-free approach for evaluation.
To do goal-free evaluation, one is not being asked to do
fact-free.or information-free evaluation. Information, both
descriptive and judgmental, is as necessary to operationalize
summative, goal-free evaluation as any other approach one might
take. What one is specifically to guard against is information
that each specific project poses as intended goals. Information
on intended goals is most frequently found in proposals, progress
reports, and orientation sessions with the project staff (if the
evaluator does not carefully consider the nature of the questions
asked, or the responses freely given).
Much of the narrative concerning goal-free evaluation and the product checklist relies heavily on documents authored by Michael Scriven; however, revision and editing have been included for clarification and extension of the original ideas.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Scriven (December, 1972) is clear on the Issue behind this
goal-free approach. Basically, he lays out the argument that
knowledge of the project's proposed, alleged, or intended goals
more often than not produces a perceptual set for an external
evaluator that biases, or contaminates, his judgment of the project's
real achievements. Scriven mentions in that December article that
while he was reviewing disseminable products for the labs and
centers, he often found that the producers presented the intended
goals of the product as evidence of its actual achievement. The
differentiation tc be made is between intended achievement and
actual achievement. The case cannot be made that because something
was intended that intentions transfer automatically to actual
achievement.
Many things occur in the life of a project that can effect its
actual achievement. However, when the staff are working within that
project too often they develop, as Scriven points out, "tunnel
vision." That is, too often the project staff will develop a
perceptual set from the proposed intentions that biases their
representation of the actual project to any external reviewing
agent. This is not raising an issue of honesty. It is more a
question of not being able "to see the forest for the trees." This
problem for the internal people of perceptual set can be reversed
to be the strength of an external evaluator who provides an
independent, unbiased opinion. Therefore, one should do goal-free
evaluation.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Goals are unnecessary noise for an evaluator, according to
Scriven. Goals and objectives are a means to the planning and
production of achievement, but evaluation is assessing merit of what
has actually been achieved. One does not need to know the steps in
planning that lead up to actual achievement because having exposure
to those levels of intended achievement leads one farther and farther
from looking for evidence about side-effects. If the evaluator is
focused by the project’s goal statements to look mainly for verifi
cation of intentions, (s)he loses the potential value that an
external goal-free evaluation can contribute. That is, an external
goal-free evaluator is not looking for evidence only in areas
everyone knows about. (S)he looks in all possible areas for the
project’s actual achievements, and possibly picks up evidence about
achievement that has been previously overlooked, or can be interpreted
in a new perspective.
Goal-free evaluation does not mean that the evaluation is
comparison-, or standard-free. That is, dropping the perspective
of comparing the project's achievement to its intended goals does
not rule out a comparison against -tandards in writing the evaluation
report. Again, goal-free evaluation is not standard-free, and any
standard may be (and usually is) someone's goal. Goal-free evaluation
is free from the goals of the consumer (or at least some consumers),
or of the funding agency. The point is that the basic standards of
merit used by the goal-free, evaluator are constructed without
reference to anybody's goals.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
89
The best standards to compare against a project's actual
achievement are to use the needs of the intended target population,
or the needs of the consumer. The producer, or project staff,
should have used consumer needs in established intended goals, so
the goal-free evaluator and the project start at the same place.
However, the difference is that the goal-free evaluator does not
accept the project's judgment of the best way to combine those
consumer needs and the project's resources into a worthwhile
product. Important errors can be made by the project in its
judgment of worthwhile intended achievement. New evidence can
often be turned up in the goal-free review of needs assessment
that may put intended achievement in a new light, even though the
project's intentions were well-justified at the time. Once the
evaluator sees that the project's goals are not beyond criticism,
and that one would criticize them against the needs to which they
are supposed to be responsive, the external goal-free evaluator
sees that (s)he can bypass the project's formulation of goals
because the crucial question is not what the project intended to do,
but what was actually achieved. The goal-free evaluator uses
•current needs data, casts no aspersions on the project with regard
to the original selection of goals (since (s)he knows neither the
goals nor the data on which they were based) and gets straight
into judgments of congruence between actual achievement and needs,
against costs.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
90
The advantages of bypassing goals are numerous, but it is con
sidered to be harder to do evaluation against needs rather than
goals. Needs data is sometimes hard to get and needs analysis
involves some evaluation in itself. Needs, unlike wants, are
dimensions of mismatch between actual and ideal. But contrary to
past criticisms, it is false to say that the goal-free evaluator
simply substitutes his/her own value judgments for those of the
project's. The goal-free evaluator must be able to support any
claims about needs against which the evaluation is made. If there
is support for those claims, and for the logic of the evaluation,
then we have an evaluation which may have absolutely no reference
to the goals of the evaluator at all. The evaluator's goals may
be doing good evaluation, or filling the professional need for
further development; of a goal-free methodology.
Finally, it is worth remembering that the external goal-free
evaluator is not going to miss the main aims of the project since
(s)he would want to look at representative materials aimed at the
target population, and to observe the process of the project. If
(s) he does not notice the project's intended main goals as actual
achievement, it is a good bet they play a minor role. Sometimes
(s)he will miss them, but there will be some pretty interesting
compensating observations.
Although the preceding paragraphs are only a brief perspective
on goal-free evaluation, they can be considered as theoretical
highlights. Depending on an evaluator's past professional
experiences, (s) he might think it is absurd not to directly
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
collect evidence on a project's intended goals. However, the task
here of the external goal-free evaluator is not to test intended
achievement, but to test actual achievement which may be different,
or similar,' to that which was intended.
In order to provide a consistent reporting format for the
goal-free evaluator, the following section presents a checklist
approach to gathering and reporting project information back to
the Center.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
92
THE PRODUCT EVALUATION CHECKLIST
Introductory comments on the checklist approach by Michael
Scriven can be found in the appendix; however, here is a preliminary
note on the status of the following checkpoints themselves, illus
trated with an example by Michael Scriven. Rod Stephens, one of
the most brilliant yacht designers of the twentieth century,
recently published a 100-item checklist to be used in the evaluation
of racing and cruising yachts. The status of every item in that
checklist can be expressed by saying that each is desired as
essential. The items in the following checklist are not, except in
the one case noted, desired as essential. They are essential.
Each of these conditions must be met in order that one should have
solid grounds for a conclusion of actual achievement for an
educational project.
There are often occasions on which an evaluative decision
must be made without meeting all these standards. For example, a
project may be planned and the arrangements such that it must be
implemented, and hence some strategy must be selected for it. The
T.O. may not be able to determine whether the strategy selected is
of overall merit at that point in time, but it would be desirable
to determine whether the one selected is the best available. If
the T.O. does not feel qualified to make a judgment of a project's
strategy, (s)he should point out the issue in a recommendation to
the panelists. In such a case, one can use the checklist for
comparative assessment.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
93
Quite often, the T.O. may be able to make a very plausible
estimate about two or three of the items from the list on which
there is no direct evidence. This is an acceptable procedure,
especially since evaluation funds are minimal; however, this
estimation should be noted for the panel. There are special cases
where a project can be defensibly implemented, without all the
checkpoints being met. The evaluator should treat those situations
as unusual. That is, the evaluator should treat each item in this
list as a claimed necessity for a meritorious project. With that
perspective in mind, it is more likely that (s)he will uncover the
actual achievements of the project.
The general structure of the checklist is as follows: Items 1
and 2 (Need and Market) are the pre-conditions, without which no
project will have any actual value. If they are met, at least
tentatively, we can then proceed to look further at the proposed
strategy. Items 3-10 tell the evaluator the kind of information
that must be looked for. Checkpoints 3-10 only refer to categories
of information, not to quality of the project's performance in each
of these categories. To put it another way, in checking 3-10, the
evaluator is only finding out whether the car has wheels, not
whether they are round. If the project passes this preliminary
inspection, it may then be asked how well it did on dimensions
3-10, compared against the need and market considerations of 1 and
2, i.e., Are the wheels round? square? in-between? Synthesis of
1-10 gives the score for checkpoint 11, actual educational
significance, a payoff checkpoint. Then the evaluator looks at
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the project's cost and combines it with checkpoint 11 to give
a measure of cost-effectiveness at checkpoint 12. And finally,
the evaluator looks ahead at 13 the desired as essential checkpoint
of post-funding support.
It is suggested that the following points be rated on a five-
point scale, 4-0. "Meeting a checkpoint" is then defined as
scoring 2 or better. The numbers should be expanded verbally as
illustrated for the first checkpoint here-, and on the full form
that follows. It is suggested that the T.O. read through a
discussion of the checkpoints. Then, review the expanded checklist
against the discussion, looking for clarification of types of
information asked for on each point. Next, read the directions
of how to use the checklist for reporting. Be prepared for a
discussion at the orientation session.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
DISCUSSION OF THE CHECKPOINTS
1. Need (Justification)
The goal-free evaluator is concerned here to see whether there
is good evidence that the project fills a genuine need (perhaps
those identified by the Hill Foundation and enumerated in the
Introduction) or— a weaker alternative— a defensible want. True
needs assessments involve establishing that the project actually
facilitates a consumer group's survival, health, or some other
defensible end that is not now adequately serviced. It may involve
moral, social, and/or environmental impact considerations.- This
point is listed first because most proposed projects fail to pass
even this requirement. The usual data under "needs assessment"
refers to deficiences on norm-referenced tests or annual budgets,
which tells the evaluator nothing about need at all, without
further data.
Since this particular point of the checklist is the essential
starting place for a goal-free evaluator, more time will be spent
developing thi3 point over the others. For example the goal-free
evaluator arrives at the project site, and may wonder at a
reasonable starting point. Here are two to consider:
1. Approach the project director. Without allowing a lengthy
orientation session to develop, ask him to identify individuals
on campus who are, or will be, representative consumers, or users,
or the target audience, for the project's efforts. Then, leave
the director to spend some time with that target group synthesizing
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
their needs as independent data, or perhaps looking for a congruence
between what the Hill Foundation identified as educational needs and
this group's needs. It would be important to consider whether those
consumers identified by the project have a unified perspective to
identifying needs, a diverse set of needs, or a polarized, bi-modal
set of needs. The T.O. should also consider using institutional
records on-site in assessing needs. Since they probably weren't
compiled by the project staff they might be considerably bias-free.
One should also consider the possibility of securing copies as
supporting evidence to the panel's later visits. (Since this
particular evaluation is limited to a few days on site, and there
are other checkpoints on which to gather data, the goal-free
evaluator should not spend all his/her time in this needs area.
Perhaps, one-third of the time on site might be an a priori rule-
of-thumb for this area, but each goal-free evaluator will have to
use his/her own professional judgment as to time spent in this needs
area based on the particular situation of each project and then,
make recommendations to the panel as necessary.). The T.O. should
consider that by accepting only the project's definition of target
group, a selection bias is being put on the needs assessment. If
other groups are possible targets the T.O. should consider their
possible answers.
The point is that in spending time identifying the needs of
the target audience (perhaps narrowed to those general educational
needs previously identified by the Hill Foundation), the goal-free
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
evaluator is developing a set of standards to compare against the
actual achievements of the project. If it is the case that the
actual achievements are responsive to the needs of consumer, then
the a/aluator has identified one point of merit for that project.
If it is the case that the evaluator finds a discrepancy between
needs and actual achievement, there may be several possible causes.
For instance, the project never did a needs assessment; the needs
have changed since the original assessment was done; the project
selectively responded to particular needs (for political, financial,
etc., reasons), or any of many possible causes. The whys of the
project's congruence or discrepancy with needs will be further
elaborated in other points.
2. Another approach to begin a goal-free evaluation would be
to similarly approach the director, but ask if you can observe the
process of the project before going any further. It would probably
be better to consider the project director's role as an interpretor
if he was observing with the goal-free evaluator. Would his comments,
helpful as they might be, be goal-laden and therefore biasing? The
difference in this second starting point would be observing the
actual process and then, inferencially hypothesizing the needs that
would justify such a process. This approach leads a goal-free
evaluator to do needs assessment by inference, and may be the only
possible way if the target audience is judged to be inaccessible
for the evaluator's questioning. This second approach is the same
process that one would use under checkpoint 8 (Performance-Process) ,
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
98
but the end product of the process is to establish needs by
inference.
In scoring need, the following should be taken into account;
number of people involved, social significance of the need, absence
of substitutes, urgency of the matter, possible multiplicative
effects.
Cost level may or may not be part of the need specifications.
It should be, but if not, this checkpoint has to be restudied (as
does checkpoint two) after cost data on a particular product is in.
It is undesirable to use "selected expert" judgments to establish
need, if there is any chance another selection would deny it.
(But many important innovative projects can do no better.)
The five-point scale for need might look like this:
Maximum priority, desperately needed 4
Great importance 3
Probably significant need 2
Possibly significant need 1
No good evidence of need 0
Note: It is important to consider how the T.O.’s time would be
spent on-site. Two possible starting points have been considered.
It should be possible after being on-slte for two to three hours
for the T.O. to plan a schedule of remaining activities. Other
wise, observational time may expire without covering some highly
potential data source. This planning phase is highly recommended,
and one should record activities in the following log, see page 60.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
99
2. Market (Dissemination)
Many needed educational projects are difficult to assess
because of limited linkages to the target audience. It is difficult
to argue for continued development unless there is a special,
preferably tested, plan for getting information used through subsidy,
legislation, or agents. For this reason, dissemination plans
should antedate detailed project development plans. Checkpoint 2
requires that there be dissemination plans that ensure a user
market. It is scored on the size and importance of the demonstrably
reachable audience. This is quite different from the size of the
group which needs the project. It is, if you like, the pragmatic
aspect of need. The dissemination plan or procedure, if already
operative has to be clear, feasible in terms of available resources,
expert and ingenious in its use of those resources, and keyed to
the need(s).
This point would be rated by the following:
Very large and/or important marketwill be reached 4
Large and/or important market willbe reached 3
Significant market will probably bereached 2
Possible, but not probable, that asignificant market will be reached 1
Inadequate evidence to suggest that asignificant market will be reached 0
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1003. Performance— True Field Trials
The first of several "performance criteria" — actually criteria
for the kind of evidence about performance— stresses the necessity
for field trial data that refer a) to the final version; b) to
typical users who are c) operating without producer (or other
special) assistance, in d) a typical setting/time frame. It's
very tempting for a project to think that they can extrapolate from
field trials with volunteer schools who get materials and phone-
consulting free, or from the penultimate edition of the materials,
but this has frequently turned out to be unsound. In actual
practice, deadlines, overcommitment, and underfinancing combine to
render almost all projects deficient on this point of field trials.
Sometimes a project can make a reasonable guess, but project staff
too often tend to make optimistic guesses instead, which is an argu
ment for outside evaluation. It is much better for a project to
quote actual statistics on educational problems of the kind that
this type of project has satisfactorily handled in past field
trials, or is plausibly believed capable of handling at this
particular site. One should check the project's knowledge of
both past typical use of the strategy, and the project's proposed
plan for future field trials, and rate as follows:
Perfectly typical field trial data 4
Minor differences from typical field trial 3
Reasonable bet for generalization from trial data 2
Serious weaknesses exist in trial data 1Relevance is unclear from data 0
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1014. Performance— True Consumer
The concept of "the consumer" tends to be interpreted differently
by different participants. In-service teacher training materials,
for example, will be consumed by: a) superintendents or assistant
superintendents in charge of staff development programs, b) teacher
trainers, c) students, d) taxpayers. To decide what data the project
needs with regard to which of its consumer groups requires a very
clear sense of the function of evaluation itself; which audiences
is it addressed to, commissioned by, and— regardless of those two
considerations— responsible to.
Quite often there will be several groups of consumers (identi
fied by the needs, or market checkpoints) of a given product,
each interested in different aspects of it. Data should be
gathered on all and scored separately. Failure to provide data
on any of the important relevant groups may constitute a fatal
defect in the project’s internal evaluation data, or it may just
be a weakness. Failure to provide data on performance for some
significant consumer group is of course fatal to the project. One
would rate this point as:
Full data exists on all relevant consumers 4
Fair data exists on all relevant consumers 3
Good data exists on the most importantconsumers 2
Weak data exists on the most importantconsumers 1
Only speculative data exists about mostimportant consumers 0
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1025. Performance— Crucial Comparisons
There are few, if any, useful project evaluations which can
avoid the necessity to present data on the comparative performance
of the critically competitive products. All too often, project data
refers to some pre-established standards of merit (but see below)
and the evaluator has no idea whether one can do better for less, or
twice as well for 5% more, etc. — which is typically what an
evaluator wants to know. Where comparisons are done, the results
are sometimes useless because the competitor is so chosen as to
give a false impression. The worst example of this is the use of
a single 'no-treatment* or 'last year's treatment' control group.
It's not too thrilling to discover than an injection of $100,000
worth of CAI can improve the math performance of a school by 15%, if
there's a possibility that $15,000 worth of programmed texts would
do as well, or better. There are few points where good projects
distinguish themselves more clearly than in their choice of critical
competitors. Sometimes they must be created by converting the
program from the CAI memory into a programmed text, which may yield
a competitor at 10% of the cost and with the same content plus the
advantages of portability and simultaneous useability.
Critical comparisons is rated:
Good data on all important competitors 4
Good data on most important competitors 3
Fair data on the most important competitor(s) 2
Lacking data on some of the more important competitors
Little or no useful comparative data
10
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
103
6. Performance— Long-term
A follow-up is almost always desirable, often crucial— since
certain undesirable side-effects often take quite a while to
surface, and good results fade fa3t. It may be the case that the
goal-free evaluator can only check whether a follow-up is planned
or not, and rate as follows:
Good direct evidence about the effects exists at times needed 4
Some direct evidence about the effects exists at times needed 3
Follow-up gives reasonable support tosuggest a conclusion about effectswhen needed 2
Follow-up or other data suggests a conclusion about effects when needed -1
Useless or no follow-up, no othergrounds for inferring long-termeffects 0
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
104
7. Performance— Side Effects
There must be a systematic, skilled, independent search for
side effects during. at the end of, and after the project's actual
effect. Project staff are peculiarly handicapped in such a search,
by goal-oriented tunnel vision; here the outside evaluator
operating in the goal-free mode is particularly helpful. This
is a checkpoint which one is tempted to regard as icing on the
cake, but the history of educational innovation makes it clear
that the risk of doing so is too high to be conscionable.
Since it's the case that the goal-free evaluator does not
know the intended effects of the project, all his/her comments
about actual achievements can be either main effects, or side
effects. It's not the case to worry main vs_. side effects in a
goal-free mode. All observed effects are actual effects, and should
not be differentiated as other than actual effects unless some
evidence can be given that shows differentiation. However, it's
probably the case that if one can differentiate actual effects in
main-and side-effects, the evaluator may have to consider whether
or not the biases of the project staff have effected his independence.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
105
8. Performance-Process
Process observation is necessary for three reasons. It may
substantiate or invalidate (a) certain descriptions cf the product,
(b) the casual claims involved in the project's internal evaluation
(that the gains were due to this treatment), and (c) it may bear
on ethical questions that have pre-emptive force in any social
interaction such as education. Since (c) is always possible, this
checkpoint is always necessary. In many cases (a) and/or (b) also
make this checkpoint necessary— but not in all cases. For example,
a product called an Inquiry Skills kit may not deserve the title,
either because of its content or because of the way it is or is not
implemented in the classroom.
As was mentioned in the first checkpoint, the goal-free
evaluator may want to start by observing the process to infer
needs. If that is the case, the goal-free evaluator may not be
able to do (a), invalidate the description, because (s)he has
not read an intended description beforehand. However, it would
be essential that (s)he comprehensively describe those actual
effects that are observed.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
106
9. Performance— Causation
One way or another, it must be shown that the actual final
results reported could not reasonably be attributed to something
other than the treatment of the project. No way of doing this
compares well with the fully controlled experiment, and ingenuity
can expand its use into most situations. There are sometimes
reasonably good alternatives to experimentation as well as bad
ones and the best possible must be used.
The goal-free evaluator needs to consider the adequacy of
the project's internal evaluation design, or procedures if there
is no documented design so this checkpoint should cover that. It
may be the case that causation can never be implied. However, a
good project considers procedures to eliminate rival hypotheses
to a causal claim. The evaluator should consider a planned inter
vention of more merit than a post hoc analysis. Questions should
be raised about invalidating elements due to selection bias,
history effects, instrumentation, regression, and so forth.
Although a rigorous procedure may be inappropriate for assessing
the effects of the project, some attempt should be made for more
than testimonials as evidence.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
107
10. Performance— Statistical Significance
This is frequently the only mark of sophistication in a
project’s evaluation design. However, it is worthless without
the next item, educational significance.
This point is considered concurrently with review of the
previous point. Whether the project is attempting a correlative
approach, use of non-parametrics, or fiscal analysis, an attempt
should be made to see whether effects can be attributed to chance
variation, or sample fluctuations, and degree of confidence in
any effects.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
108
11. Performance— Eudcational Significance
Statistical significance is a desired essential, but it's all
too easily obtained without the results having any educational
significance, especially a) by using the magnifying power of a
large n, and b) by using instruments that assess dubious concepts,
or c) non-generalizable gains (where they should be generalizable).
The evaluator needs to look at the project's actual achievement.
Then (s)he needs to apply the perspective of an evaluator's
independent, expert judgment that gains of that size on those actual
outcomes represent an educationally significant result. The raw
data need not be reported in the project's evaluation but the
grounds for thinking them important must be reported— usually this
involves a back-reference to the needs assessment, typically somewhat
amplified in details. An explicit congruence check to the needs
assessment would still have to be given, at some level, to ensure
that judgment of educational significance relates to an actual
need and not to an alleged want like staff support, or equipment
acquisition with external funds. If either want is_ the main basis
for the judgment of educational significance, then we do not have
evidence that the need we have carefully validated at checkpoint 1
is being met. There should be no suggestion that late-discovered
dimensions of educational significance are illicit. Any development
process must search for them and hope for them— but then there must
be a recycling of the needs assessment by the project.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
109
The external evaluator has to go beyond project-generated,
subject matter experts' reports since (s) he has to combine this
point with results from checkpoints 3-10 in order to achieve an
overall rating of educational significance. For example, if there
is some doubt whether the results were due to the treatment, or
whether the side-effects offset the main effect, etc., then the
merit of the project must be judged less positively, even if the
needs congruence leads to a very favorable rating of the project by
its internal staff.
In practice the'suggestion is that the first pass through 3-10
be done to identify the weak project situations. The project may
not be hopeless, but any attempt at evaluation of it will be, unless
these data are available. If we have data which can support some
kind of evaluation, we now look at the extent to which the needs and
market are met by the actual requests rather than the type of data
on 3-10. Here is where, for example, we look below the surface
requirement of statistical significance to the deep requirement that
the actual achievement match significant needs; or again, here is
where we check not just that there was a side-effects search, but
the nature of any effects that were found.
It is obvious that for all the breaking-out of the components
in the evaluative judgment that has been done in the checklist,
checkpoint 11 will often involve a pretty substantial synthesizing
performance. Even though the goal-free evaluator may find it
premature to check actual results against needs, (s)he must assess
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the merit of actual achievement not available, and then, assess the
merit of future plans in some checkpoints, usually by making
recommendations to the panelists in the taped report.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Ill
12. Costs and Cost-Effectlveness
Cost data must be:
(a) Comprehensive. That means covering maintenance as well as
capital costs, psychic as well as dollar costs, ’weaning' costs and
costs of in-service updating of needed helpers as well as direct
costs, etc. There should be some, consideration of opportunity costs
other than those covered previously under critical competitors e.g.,
what else could the college have done with the funds? A pass should
be made at qualitative, cost-effectiveness analysis where possible.
(b) Verified. Cost estimates and real costs should be verified
independently. It is really not satisfactory to treat cost data as
if they are immune to bias. Performance data should also have some
independent certification, and the procedures outlined above involve
this at several points. The cost data requires this for reasons that
have not so far been so generally recognized. Costing is an extremely
difficult business, requiring technical skills that at the moment
are a limited part of the training of evaluators. Therefore, this
may need to be directed for later follow-up by the panel's financial
expert.
(c) For each product compared. Costs must be provided for the
critical competitors, something which would be covered by the
admonition to include opportunity costs, given under (a) above. But
it may perhaps be worth independently stressing the need to provide
rather careful cost estimates for the artificial competitors that
the ingenious project director should create as part of his analysis,
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
but this has not been part of the tradition of opportunity cost
analysis. The preceding considerations bear on the quality of the
cost data. But this checkpoint is not treated as merely a
methodological one. Since one already has the judgment of
educational significance, and since the cost data includes the
cost of comparable products, one can here score co3t-effectiveness,
i.e., (roughly) the justifiability of the expenditure.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
13. Extended Support
This is an item that could be regarded as desirable rather than
necessary, but it is to be hoped that this will change in the future.
In the educational field the responsibility for the project is all
too frequently supposed to terminate upon the commencement of funding.
If it should sbusequently transpire that important improvements
should be made in the project, they may or may not get made, depend
ing upon financial considerations. This is scarcely a service to
the consumer’s needs. It should therefore be regarded as a strong
plus for a project to produce a product with a systematic procedure
for updating and upgrading the product in the light of post-funding
field experience. One should note that this implies the necessity
for a systematic continuing procedure for collecting field data.
One of the types of data that ought to be collected by a project
is data on new critical competitors. An important kind of "improve
ment" is covered by this checkpoint, it might also be described as
extended use of the product. For example, its use in new circum
stances or in conjunction with new auxiliary products will need
new evaluation and explanations in the handbooks, etc. The pro
vision of continued user-training, itself subject to progressive
cycles of improvement, should be assessed as a desired essential.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
114
THE UPGRADING PHASE IN USING THIS APPROACH
Given that few projects meet all these requirements, and few
meet even half what can reasonably be done at the moment?
It should be stressed that the appropriate T.O. policy is not
to treat a project that meets the. large number of these require
ments as deserving of full support. No project that fails to meet
checkpoints 1-13 deserves full support, for the very good reason
that we do not have good grounds for supposing that such a project
has meritorious achievements. As was mentioned earlier, it must
be made clear that an exploratory, research, or field trial project
are all defensible reasons for funding even when the chances against
meeting all of these standards are quite long. They should be con
ceived as no more than an exploratory, research, or a field trial
project, funded as such, and only moved into a production phase when
these standards (or enough of them to make a convincing case) are
met. Nevertheless, we may well have grounds that justify further
investigation by the panelists in order to fill the gaps found by
the evaluation checklist. We may indeed have enough grounds to
support the tentative support of 3uch a project pending the further
investigations by the panelists. In the next section, the goal-free
T.O. will be asked to respond to the checklist format in reporting
findings to the Center.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
115
AN EXPANDED VERSION OF THE CHECKLIST
The T.O. should notice that the scales are sometimes hybrid
crosses o£ methodological and substantive merit. The top scores
require good evidence or good performance is lacking. It does not
require— for example— that there is good evidence of bad achieve
ment, for otherwise projects which turned in no data would do better
than those that were known to fare badly. It will be helpful to
mention not only the ratings, but the relevant terms, e.g., "No
good evidence", since feedback and not just a Yes/No decision is
to result from the evaluation.
The list of considerations should effect the evaluator's
decision either as to evidential adequacy or merit. By check-marking
or adding salient factors effecting the rater's judgment, the form
can provide an explanation as well as an evaluation, be useful for
formative as well as summative purposes, and help improvement of
the form itself. A double check can be used to indicate considera
tions that were felt to be more important than those receiving a
single check.
Most of-the scales are simply quality-of-data or "methodological"
scales (e.g., side-effects check). They are asterisked. They still
represent necessities, but a high score on them does not show
intrinsic merit of the product, only of the data or design. When
evaluating projects, the situation is that any score less than 3
on one point will weaken the rating on educational significance
(and under 2 will destroy it), however high the need/market scores.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The T.O. should note that each checkpoint is subdivided into
information categories of description and judgment. Then each point
is divided into panelists content areas to facilitate recommendations
by the T.O. to the panelist for follow-up. In making verbal reports
to the Center, the T.O. should refer to the checkpoints as (s)he
talks through the report. Then, the T.O. should mail a checked
version of the list to the Center as a supplement. This should
be done before preceding to the next site.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
AN EXPANDED c hecklist to u se for reporting findings
Project Name: _______________ ___________________________
T.O.'s Name: ________________________________________________________
Directions: 1. Give taped response first by referring directly to the checklist and talkingthrough any information gathered for the description cell. The T.O. should remember that if the panelists were to arrive on-site and begin elaborating on your findings they would need as accurate a description as possible. Therefore, as one makes descriptions note sources of information used, and other sources available.
2. X out any cells referenced on tape.
3. Judge the description cell's adequacy.
4. Make recommendations to the panel.
5.- Score the checkpoints.
6. Send this checklist to the Center before going to the next site. Includethe log of activities, and the tapes.
117
Reproduced with
permission
of the copyright owner
Further reproduction prohibited
without permission.
*1. Need: Rating Score (4 - 0)
Place single (or double) check next to factors that were particularly significant.
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
_____ a Populations effected
_____b Social significance of population needs
c . Absence of substitute strategies to meet needs
_____d. Multiplicative effects of meeting needs
118
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without perm
ission.
2. Market: Rating Score (4 - 0)
Place single (or double) check next to factors that were particularly significant
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
a. Dissemination plan: i - clarity
ii - feasibility iii - ingenuity iv - economy v - probable
effectiveness
_____b. Size of group reached
c. Importance of the market
119
Reproduced with
permission
of the copyright owner.
Further
*3. Performance— True Field Trials: Rating Score (4 - 0)
Place single (or double) check next to factors that were particularly significant
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
_____ a. Final version
____b. Typical user
_____ c. Typical aid
d. Typical setting
e. Typical time frame
120
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
*4. Performance— True Consumer: Rating Score (4 - 0)
Place single (or double) check next to factors that were particularly significant
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
a Entire college
_____b Board of Trustees
_____ c Non-academic staff
_____ d Institutional Research staff
e Department chairmen
_____ f, Faculty
_____ g- Student
_____h., Taxpayer
121
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
*5. Performance— Critical Comparisons: Rating Score (4 - 0)
Place single (or double) check next to factors that were particularly significant
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
_____ a. No treatment group
_____b. Existing competitors
_____ c. Projected competitors
_____d. Created competitors
_____ e. Hypothesizedcompetitors
122
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
*6. Performance— Long-Term: Rating Score (4 - 0)
Place single (or double) check next to factors that were particularly significant
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATION
_____ a Week to month later
_____b Month to year later
_____ c Year to few years later
_____ d Many years later
_____e, On-job, or life-space sample
123
Reproduced wilh
permission
of the copyright owner.
Further reproduction prohibited
without permission.
*7. Performance— Side-Effects: Rating Score (4 - 0)
Place single (or double) check next to factors that were particularly significant
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
a Comprehensivesearch
_____b Skilled search
_____ c Independentsearch
_____ d Goal-free search
_____ e Time of search i - during project
ii - end of project iii - later
124
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
*8. Performance— Process: Rating Score (4 - 0)
Place single (or double) check next to factors that were particularly significant
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
a Descriptive congruence check
_____b Causal clues check
_____ c Instrument validity
__ d Judge/Observerreliability
_____e, Short-term effects
125
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
*9. Performance— Causation: Rating Score (4 - 0)
Place single (or double) check next to factors that were particularly significant
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
a. Randomized Experimental Design
_____b. Quasi-ExperimentalDesign
_____ c. Ex Post FactoDesign
d. A Priori Interpretation Design
e. Sources of Invalidity i - selection bias
ii - regression of artifacts
iii - instrumentation iv - etc.
126
Reproduced with
permission
of the copyright owner
Further reproduction prohibited
without permission.
*10. Performance— Statistical Significance: Rating Score (4 - 0)
Place single (or double) check next to factors that were particularly significant
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
_____ a. Appropriate Analysis
_____b. AppropriateSignificance Level
127
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
11. Performance— Educational Significance: Rating Score: Rating Score (4 - 0)
Place single (or double) check next to factors that were particularly DESCRIPTION EVIDENCE USED
ADEQUACY OF DESCRIPTION RECOMMENDATIONS
significant
_____a Independent judgments
_____b Expert judgments
_____ c, Basis of external judgments
_____d. Needs congruence
e. Side-effects
_____ f. Long-term effects
_____ g. Comparative gains
_____h. Consumer data
_____ i. Process datai
128
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
12. Cost-Effectiveness: Rating Score (4 - 0)
Place single (or double) check next to factors that were particularly significant
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
_____ a Comprehensive cost analysis
_____b Expert judgment of costs
c Independent judgment of costs
__ d Cost for ail competitors
129
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
13. Extended Support; Rating Score (4 - 0)
Place single (or double) check next to factors that were particularly significant
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
_____ a. Post-funding datacollection
b. Post-funding system for improvement
c. Post-fundingin-service training
_____ d. Up-dating of aids
e. New uses and user data
130
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
T.O. 's Name:_____________ ;_______
Site Name:____________________________________
A LOG OF ACTIVITIES
Directions: Since the Center is trying to get a better idea of the nature of thegoal-free evaluation method, and since we may have to account for your activities, please fill out the following as accurately as possible.
131
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
EVEN
ING
133
ATTACHMENT A
An Introduction to the Checklist
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
134
AN INTRODUCTION TO A CHECKLIST APPROACH FOR GOAL-FREE EVALUATION
By Michael Scriven
The following checklist can be used as a key item in the
evaluation of products of almost any kind besides educational
products, or projects, to which the language refers. In the
educational field, it can be used in each of the following ways:
a. As an instrument for evaluating products;
b. As an instrument for evaluating producers in the "pay-off"
dimension, i.e., without considering matters such as personnel
policy, community impacts, potentiality, etc.;
c. As an instrument for evaluating evaluation proposals
focused on products or producers;
d. As an instrument for evaluating production proposals, since
a competent producer should incorporate plans for achijytaing each of
these standards and establishing that these standards have been
achieved;
e. As an instrument for evaluating evaluators of products,
producers, etc., since it is argued that competent evaluation must
cover each of these points.
It will thus be seen that the checklist, if sound, provides
an extremely versatile instrument for assessing the quality of all
kinds of educational activities and products; the more so because
the concept of "product", as here used, is a very broad one,
covering processes and institutions as well as typical products
such as technical dt vices.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
135
So much for applicability in theory. What about application
in practice? It is clear that few educational products have ever been
provided that met all these standards, prior to production, although
that is exactly when they should be. met in order to justify production.
Furthermore, not very many have been produced that meet enough of
them to justify even retrospective confidence in the merit of the
product, but enough to make clear the standards are not unrealistic.
The better correspondence courses in technical subjects would pass,
for example. To justify producing a new course is naturally harder,
since it should be required to outperform the existing ones by a
margin that justified its marginal cost.
Of course, it must be made clear that satisfactory achievement
of all of these standards is not the only criterion for funding a
project. Exploratory or research or realistic field trial projects
are all defensible, even when the chances against meeting all of
these standards are quite long. It is important to note that they
should be conceived as no more than exploratory (etc.) projects,
funded as such, and only moved into a production phase when these
standards (or enough of them to make a very convincing case) are
met. The application of this "hardline" would not only greatly
reduce the costs of educational R5D activity, which should only be
a short-term effect, but it would transform the conception of
satisfactory quality in education. And the long-term positive
results of that are far more significant and beneficial than those
of dropping a few sub-standard projects at the moment.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
136
Given that the use of this checklist is potentially extremely
lethal, potential users deserve to know something about its validity
and utility. The following five comments are offered under that
heading.
First, every checkpoint on the checklist has a clear priori
rationale, in almost all cases so obvious that elaboration would be
otiose. That is, a straightforward argument can be constructed
that the failure to meet any one of the checkpoints immediately leaves
open a serious doubt that the product (for example) is simply not of
good quality.
Second, medical and industrial, products routinely pass (and
are often required to pass) every checkpoint. Your car, your
aspirin, and your food, despite the real problems of still-emerging
undesirable side-effects that afflict all of them, at least avoided
the far worse results that would be likely to arise if the check
points on this list weren;t met. I can see no way to argue that
the effects of bad education are less significant in the long run
than the effects of bad food, drugs, and cars.
Third, the checklist has been developed out of the most
intensive systematic and large-scale product evaluation activites
with which I am familiar— the Product Review Panels of 1971-72
and 72-73 done for the National Center for Educational Communication,
on sub-contract to the Educational Testing Service. The fifteen
experienced evaluators and educators who worked on these panels
provided the raw material in their detailed assessment procedures
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
137from which I extracted the first eight versions of this checklist.
It has been further refined since as a result of interaction with
other groups, notably EPIE (Item 13 was suggested by Ken Komoski)
and my assistants Michael Barger and Howard Levine.
Fourth, the checklist has also been critiqued (at my request)
by some of the most experienced developers in the country. Their
reaction has not been impressive. The general theme has been to
claim excessive perfectionism, but the support for this claim has
been extremely weak, consisting either of saying nothing has ever
been produced anywhere that met these standards (counterexamples
have been given above), or that the cost of meeting them would be
prohibitive, or that they may be appropriate for summative but
certainly not for formative evaluation. I have looked at the cost
complaint very carefully and it is certainly not true in general.
For example, the CSE handbooks of all tests available for, e.g.,
secondary students, products which rated well with the Product
Review Panels, can rather easily be evaluated so as to meet these
standards, on a small budget. (As presented, there was a bit too
much guessing required.) Where the cost is_ going to be large is when
we start looking at huge curriculum projects. But this reflects
the combination of two features: a) the huge costs of development;
b) the great difficulty in justifying such projects. Where the raw
gains are likely to be marginal, as in most of those projects, one
has to develop very ingenious instruments and use very large groups
to pick up the benefits. I think the checklist correctly reflects
those facts of life and of course indirectly supports the dubious
justification of most of those projects.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
One reason I find the reactions of producers (to date)
unimpressive is that the checklist could easily have been extracted
from the writings or conversation of these same people when they
are extolling the merits of the R&D approach. That approach begins
with needs assessment, and goes on through a series of field trials
towards dissemination. The checklist refers to these areas and
establishes' exactly those facts from them which provide the basis
of asserting the superiority of a properly developed product. None
of the checkpoints are alien intruders in the context of justifica
tion. They begin to look threatening only in the context of
evaluation. But exactly the same factors must be present in each
of these contexts. The comment about inappropriateness for formative
evaluation is methodologically precarious since good formative
evaluation must involve giving best possible simulation of a
summative evaluation. The latent point could, I think, be put
as follows.
The checklist refers to some data which cannot be gathered
the instant a project is conceived or even in its early days— for
example, checkpoint 6 refers to long-term or follow-up data. It
is a grievous error to conclude from this that the checklist— or
even that particular checkpoint— is not relevant for formative
evaluation since one of the tasks of formative evaluation is to
set up the process and instruments for collecting that data, and
to collect it, at least on early versions of the product. Formative
evaluation is what goes on during the pre-production, improvement-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
139
oriented phase of the development and anyone who wants to produce
a worthwhile product will want to get follow-up data from through
out the period during which significant changes of effects can
reasonably be expected to occur. This may well mean extending the
developmental timeline somewhat; it certainly cannot legitimately
be taken to'mean that follow-up data isn’t relevant to formative
evaluation. The 'follow-up checkpoint' is the extreme case, the
item that might seem most remote from formative concerns. Even if
the argument of the preceding paragraph was unsound, one could
scarcely argue that the checklist as a whole was irrelevant to
formative evaluation, since almost every remaining item is
obviously relevant.
There is a related complaint that deserves attention, con
cerning checkpoint 5 which requires comparative performance data
from competitive or possibly competitive products. Understanding
the issues in this case, however, is considerably facilitated by
reading the general rationale of that checkpoint in the ensuing
section, so discussion of the complaint will be postponed until
then.
The.final item of evidence about utility concerns use of
the checklist by school administrators in the Nova Ed. D. program,
and by students in the valuation training seminar at U. C. Berkeley.
They frequently volunteer the view it is of more value to them in
doing actual evaluations than any other document in the literature.
They do not frequently express either a) the opposite view, b) the
same view about other documents. Their response is undoubtedly
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
contaminated by my attitudes and personal elaborations, but
difficult to disregard entirely. I suspect that the search for
’models' of evaluation, although possibly less inappropriate in
this area than the search for theories of learning, is for
experimental or educational psychology, does not pay off either
conceptually or pedagogically as well as the more mundane approaches
of the. checklist and the trouble-shooting chart.
Therefore, the checklist's credentials are good. They have
been based on a proper, rather than a superficial, use of the R&D
iterative cycle, a consideration which alone would make this as
a product superior to most of those currently available. (Of
course, it's much easier to do a decent R&D job on a two-page
checklist than on a K-12 mathematics program.)
As an educational product itself, the checklist is of course
self-referent, and a study of this introduction with the checklist
in mind will show that there are still substantial gaps in the
direct empirical evidence that the present version of the checklist
is worthwhile, as is to be expected with any newly revised product.
Some of these gaps will be closed if every reader and particularly
every user of the checklist will accept part of the responsibility
for the improvement of educational quality that I believe we all
share, and provide whatever criticism and alternatives he or she
can. They will be acknowledged and incorporated as appropriate.
For my part, I believe that I have a responsibility to convince
evaluators, developers, and funding organizations, including
with permission of the copyright owner. Further reproduction prohibited without permission.
U i
legislatures, of the crucial importance of using this checklist.
Since I have already had some success with this, it is particularly
important that errors or shortcomings in it be identified as soon
as possible. I have every confidence in the R&D process, and
consequently great confidence that such errors exist, even in this
thirteenth iteration.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
142
ATTACHMENT B:
Lists of Contacts for the Two Sites
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
143
APPENDIX D
HANDBOOK FOR GOAL-BASED EVALUATION
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
144
HANDBOOK
FOR
TRAVELING OBSERVERS
The Evaluation Center Western Michigan University
July, 1974
Second Edition
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
145
AN INTRODUCTION TO THE HANDBOOK*
By accepting this work assignment from the Evaluation Center,
you should realize there are certain methodological questions and
procedures we ask you to incorporate into your work role. This
handbook contains the essential evaluation framework that you will
be implementing. Further elaboration will be given during the
orientation session. So, please be familiar with this handbook to
participate in any discussions that develop.
Setting of the Evaluation Study . . ...........................
Conceptual Overview of the T.O.’s Role....................... •
A Specific T.O. Procedure On S i t e .............................
Format for Reporting Findings ...................................
Copy 1 Copy 2
An Activity Log ................................................
Site 1 Site 2
Attachment A: Summary of the Foundation’s Productivity Study .
Attachment B: Proposals and Progress Reports ■
Site 1 Site 2
Attachment C: List of Contacts for Each Site ..................
*Note: In case of emergencies while on site, call collect:
8:00 to 5:00 (616) 383-8166
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
146
SETTING OF THE EVALUATION STUDY
The Evaluation Center is currently contracted to the Hill
Family Foundation to assess the merit of a portion of the educational
projects they have funded this year. This foundation has given
various amounts of money to a group of independent, four-year
colleges to improve their productivity, based on needs identified
in a study by the Hill Family Foundation. Three major educational
needs were identified generally. They are: 1) sharply rising
instructional costs constituting a major threat to the financial
stability of the college, 2) per student costs rising faster than
per student income, and 3) faculty salaries comprising two-thirds
of instructional costs.
This problem/needs situation was identified as the outside
parameters of the area of funding. Each institution proposed an
alternative strategy to the foundation. Some were very compre
hensive in scope, and others; limited. The Center began its
contract to study the Hill project in February, 1974, and will run
through January, 1975. We have previously held an orientation
session with the various project directors to hear their plans and
allow them to hear the Center's evaluation plans. A survey was
sent to the participating institutions in the beginning of June.
In general, there are several phases to the overall project, your
work and role as a traveling observer (T.O.) is one of those phases.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
147
A CONCEPTUAL OVERVIEW OF THE T.O.'S ROLE
To gain a relational perspective to the work of the traveling
observer, the evaluation study phase preceding the T.O., and the
phase following will be discussed. The phase preceding deals with
a survey, and the phase following involves visitations to the
project sites by a panel of experts.
Data gathered in the survey will be made available to the
T.O. This information concerns mainly identification of various
individuals on site who would be possible resources during
visitations. These individuals will be identified more specifically
during the T.O. training session held at the Center and referenced
in a later section on procedures. Asking the T.O. to validate
the survey information, the original proposal, and the current
progress report, the Center will get a more accurate picture of the
individual college projects for the panel's visitation. It is
thought that the independent perspective of the T.O. after being
on-site will give a more valid portrayal of the situation for the
panel than would several, separate perspectives gained by looking
at individual documents.
The phase following the T.O.'s work is that of a panel of
content experts revisiting the sites to make a synthesized report
based on results of the T.O.'s report, and on their own independent
observations. One might consider the T.O.'s work as a preliminary,
summative, site visitation by a "professional detective" to uncover
and describe as many potential project achievements as possible.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
148
Then, another group of experts will follow-up on those hypothesized
achievements to synthesize as accurate as possible a portrayal of
the merit of each particular project.
The panel consists of individuals who can apply expertise
in several content areas. The kinds of perspectives represented
are those of evaluation, higher education administration/finance/
economics/planning, staff utilization, and curriculum development.
As will be seen later, the T.O. will need to make recommendations
by those perspectives for each panelist to follow-up by their
specific content area.
Specifically, the objectives of the T.O. are the following:
1. To collect both descriptive and judgmental information
on each specific project at two college sites based on the
methodology presented in the next section.
2. To summarize the raw information collected at each site
on cassette tapes to be mailed to the Center before proceeding
to the next site, responding to the format presented in a later
section.
3.. Edit the raw transcriptions into a report that will go
to each project director for the director's reactions. There will
be a time lag between this editing, use of the unedited transcrip
tions by the panelists, and subsequent reaction to the T.O. report
by the project director.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
149
A SPECIFIC T.O. PROCEDURE ON SITE
References have been made in the earlier sections to the
fact that all available information about each site will be
readily given to the T.O. before his/her visitation. This will
provide a data base to be used to assess the merit of that site.
Specifically, the methodology being referenced, procedurally
implemented for this particular study has been called goal-based
evaluation by many evaluators. A relatively large amount of
evaluation has been done in this mode, therefore one might
consider it a well-established, and hopefully, familiar procedure.
To do goal-based evaluation, the T.O. is being asked to
collect information of both a descriptive and judgmental nature
as necessary to assess the intended effects of a project. What
one is specifically to use as a standard for assessment is
information that each specific project poses as intended goals or
objectives. Information on intended goals and objectives is most
frequently found in proposals, progress reports, and can be
supplemented during orientation sessions with the project staff.
Goals are an important starting point for an evaluator,
according to many sources. Goals and objectives are a means to
the planning and production of achievement, and evaluation is
assessing merit of what has been achieved based on those intentions.
One needs to know the steps in planning that lead up to actual
achievement because having exposure to those levels of intended
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
achievement helps one to understand the context in which it was
planned. When the evaluator is focused by the project's goal
statements to look for verification of those intentions, (s)he
gains the potential value that an external goal-based evaluator
can contribute. That is an external goal-based evaluator has an
independent interpretation of results, but realizes that a project
must be judged against what was originally intended, or planned.
The producer, or project staff, should have used local needs,
opportunities, and problems, in establishing intended goals, so
the goal-based evaluator accepts that as having occurred, and
then, looks at the adequacy of the strategy, and then compares the
results with that which was intended.
One will note in the next section presenting a reporting
format that the T.O. is to respond to information about goals,
design, implementation, and results. This conceptual framework
is called the CIPP model of evaluation developed by Daniel L.
Stufflebeam. It has been argued in professional evaluation
circles whether the so-called CIPP model is goal-based or not.
There are several different opinions on this point; however,
the T.O. should note that in this particular instance (s)he is
being asked to assume that the CIPP model is goal-based.
Specifically, the T.O. is being asked to operate in a summative,
or retrospective evaluation mode. Those evaluation scholars
who know the difference between questions asked by CIPP in the
formative and summative mode, should be more at ease in equating
the goal-based orientation to summative evaluation.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
151
Digressing slightly to a description of the four horizontal
categories of information one would find in the CIPP model, they
are often called Context, Input, Process, and Product. However,
slight semantic manipulation of the jargon brings us to the headings
used in the next section: goals, designs, implementation, and
results. To briefly touch on a description of those four, the
category of goals in the summative sense would raise questions
like what goals were chosen when the project was initiated? and
why the goals were chosen over other possibilities? In the design
section under the summative mode, the T.O. asks three questions:
(1) What design was chosen? (2) What alternative designs were
rejected? and (3) Why was the winning design chosen? In the
implementation category, the T.O. asks in general: (1) What
are the strengths and weaknesses of the actual design that was
implemented? (?) What effort is being made to implement the
design? and (3) What was the actual design implemented? Under
the results section, the T.O. generally is concerned with (1)
What results were achieved? (2) Were there any side-effects?
(3) Were the objectives achieved? (4) What was the relation
of costs to benefits? These are only general questions for each
category and will be further delineated in the next section.
Although the preceding paragraphs are only a brief perspective
on goal-based evaluation they can be considered as theoretical
highlights. In order to provide a consistent reporting format
for the goal-based evaluator, the following section presents
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
152
a checklist appraoch to gathering and reporting project information
back to the Center.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
T.O.'s Name_
Directions:
A FORMAT FOR REPORTING FINDINGS
Site Name
(1) Use -the following questions to structure your taped report.
(2) As you discuss a question, check it off. Use a double check if youfeel it to be an overwhelmingly important point.
(3) Work horizontally across each question, making recommendations to the panel in the last cell.
(4) Send the used format along with the tapes.
(5) Enclose the activity log, also.
153
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
1. GOALS
Double check any point felt overwhelmingly important.
a. What goals were chosen?
_b. What goals were considered, then rejected?
_c. What alternative goals might have been considered?
_d. What evidence exists tojustify the goals chosen?
_e. How defensible is this evidence?
_f. How well have the goals been translated into objectives?
g. Overall, what is the merit of the goals chosen?
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
Reproduced with
permission
of (he copyright owner
Further reproduction prohibited
without permission.
2. DESIGNS
Double check any point felt overwhelmingly important.
a. What strategy was chosen?
Jb. What alternative strategies were considered?
_c. What other strategies might have been considered?
_d. What evidence exists to justify the strategy that was chosen?
_e. How defensible is this evidence?
f. How well was the chosen strategy translated into an operational design?
g. Overall, what is the merit of the chosen strategy?
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
3. IMPLEMENTATION
Double check any point felt overwhelmingly importnat. DESCRIPTION EVIDENCE USED
ADEQUACY OF DESCRIPTION RECOMMENDATIONS
_____a. What was the operational design?
_____b. To what extent was it implemented?
c. What were the strengths and weaknesses of the design under operating conditions?
_____d. What was the quality of the effort to implement it?
e. What was the actual design that was implemented?
_____ f. Overall, what is the merit of the process that was actually carried out?
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
4. RESULTS
Double check any point fe]t overwhelmingly important.
_____ a. What results wereachieved?
b. Were the stated objectives achieved?
c. What were the positive and negative side- effects?
d. What impact was made on the target audience?
e. What long-term effects may be predicted?
f. What is the relation of costs to benefits?
g. Overall, how valuablewere the results and impacts of this effort?
DESCRIPTION EVIDENCE USEDADEQUACY OF DESCRIPTION RECOMMENDATIONS
157
Reproduced with
permission
of the copyright owner.
Further reproduction prohibited
without permission.
T.O.'s Name:
Site Name:
A LOG OF ACTIVITIES
Directions: Since the Center is trying to get a better idea of the nature of thegoal-based evaluation method, and since we may have to account for your activities, please fill out the following as accurately as possible.
158
(Needs As
sessment,
MODE
OF ACTIVITY
(Interview,
Reading, etc.)
NATURE
OF INFORMATION
Process
Observation, etc
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
160
ATTACHMENT A:
Summary of the Foundation's Productivity Study
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
ATTACHMENT B:
Proposals and Progress Reports
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
162
ATTACHMENT C:
List of Contacts for Each Site
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
163
APPENDIX E
EVALUATORS PROCESS RATING FORM
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
164
Traveling Observer Rating Sheet
Project Site:
Traveling Observer's Name:
Directions: Please check one of the seven responses after eachquestion that most agrees with your opinion, referencing judgments to the specific project site mentioned above.
I. On-site Process
(A) What degree of rapport do you feel existed between yourself and the project director whose project you recently visited?
1 .___extreme positive rapport2. moderate positive rapport3 .___somewhat positive rapport4 .__neutral rapport5 .___somewhat negative rapport6 .___moderate negative rapport7 .___extreme negative rapport
Comments on A?
(B) Was your time spent at this project allocated effectively?
1 .___extremely not effective2 .__moderately not effective3 .__ somewhat not effective4 .__ neutral effectiveness5 .__ somewhat effective6 .__ moderately effective7 .__ extremely effective
Comments on B?
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
165
(C) How well did the project director meet your expectations as an administrator?
1 .___extremely congruent with my expectations2 .__ moderately congruent with my expectations3. somewhat congruent with my expectations4 .__ no prior expectations5 .__ somewhat non-congruent with my expectations6. _ moderately non-congruent with my expectations7 .__ extremely non-congruent with my expectations
Comments on C?
(D) Overall, how satisfied were you with the recent visitation to this project?
1 .___extremely not satisfied2 .___moderately not satisfied3 .___somewhat not satisfied4 .__ neutral satisfaction5. somewhat satisfied6. moderately satisfied7 .__ extremely satisfied
Comments on D?
II. Methodological Assessment
(A) How much confidence do you place in your ability to fully implement at this particular site the methodology that you were previously trained in at the Center?
1 .__ extremely confident2 .__ moderately confident3 .__ somewhat confident4. _ neutral confidence5 .___somewhat not confident6 .___moderately not confident7 .___extremely not confident
Comments on A?
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
166
(B) What suggestions can you give to refine this methodologyfor future use? Use an extra sheet of paper, if necessary.
Thank you for your time.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
167
APPENDIX F
PROJECT DIRECTORS' PROCESS RATING FORM
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
168
I hope that your productivity project is going well, and that things in general aren’t too hectic with, the fall opening of school being near.
I wanted to thank you for your cooperation with our recent staff visitation of your project. We will be sending you a report on your project generated from that visitation towards the end of September. I hope you will find it useful to have an independent, external opinion of the strengths and weaknesses of your project as it is being implemented.
A complete decision as to which six sites will be visited by the site review panels in September has not yet been finalized, pending completion of some of the more recent staff visitation reports. However, I will be contacting those sites chosen very close to August 30. Therefore, if you have not been contacted by September 9th, you can assume the panel will not be visiting your project.
During October and November, the panelists and the Center will be preparing the final report to the Hill Foundation. As mentioned earlier in the Spring Hill Meeting, the general purpose of the final report is to provide new information that can be used by the foundation for revision of funding policy guidelines of the productivity program in its second year of funding.
In the meantime, 1 would sincerely appreciate your feedback about the recent staff visitation at your project. Enclosed you will find a short, four question rating sheet. This information will be used by the Center to locate any improvements we should consider in future evaluation studies. These ratings are for the Center’s assessment of itself. I encourage you to be as candid as possible.
Since a return, postage-paid envelope is enclosed, I would appreciate it if you send this rating to me after you finish reading this letter. I'm sincerely committed to using this kind of information to provide as useful a process as possible to our present, and future, studies.
Sincerely,
JWEiljeEnclosure:
John W. Evers, Director Hill Productivity Project
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
169
Project Director Rating Sheet
Project Site:
Directors Name:
Directions: Please check one of the seven responses after eachquestion that most agrees with your opinion.
(A) What degree of rapport do you feel existed between yourself and the person who recently visited your project as a representative of the Evaluation Center?
1 .___extreme positive rapport2 .___moderate positive rapport3 .__ somewhat positive rapport4 .__ neutral rapport5 .__ somewhat negative rapport6 .__ moderate negative rapport7 .__ extreme negative rapport
Comments on A?
(B) Was time spent at your project by the staff person allocated effectively?
1 .___extremely not effective2 .___moderately not effective3 .___somewhat not effective4. neutral effectiveness5 .__ somewhat effective6 .__moderately effective7 .__ extremely effective
Comments on B?
(C) How well did the staff person meet your expectations as an evaluator?
1 .___extremely congruent with my expectations2 .___moderately congruent with my expectations3 .__ somewhat congruent with my expectations4 .__ no prior expectations5 .__ somewhat non-congruent with my expectations6 .__ moderately non-congruent with my expectations7 .__ extremely non-congruent with my expectations
Comments on C?
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
170
(D) Overall, how satisfied were you with the recent staff visitation to your project?
1 .___extremely not satisfied2. moderately not satisfied3. _ somewhat not satisfied4 .__ neutral satisfaction5. _ somewhat satisfied6. moderately satisfied7 .__ extremely satisfied
Comments on D?
Thank you for your time.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
171
APPENDIX G
PILOT VERSION OF TEE CRITERION INSTRUMENT
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
172
General introduction to the assignment
Immediately following this page you will find a set of instructions for using the attached rating sheet. After those directions, you vzill find a short report to read. Following the report, are two pages of items to be filled out according to the previous directions. Put your name and social security number in the blanks at the bottom of the page where appropriate, or else you cannot be paid for your work.
The attached report should probably be read thru once for a general overview, and then, a second time for content. Then immediately go to the semantic differential items, and respond to them.
If it would be helpful to you to make a more concrete situation out of the task, then consider this:
You are a university-based administrator of a small project that has been funded by a foundation. The foundation has sent a representative to spend two days visiting your project to provide you with an external, independent interpretation of the strengths and weaknesses of your project as it is being implemented. The following report is that external interpretation. Even though you know nothing about the true nature of this project, you should be able to rate it on the items following the report.
Thank you. Go on to the instructions for the semantic differential on the next page.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
173
Instructions for Rating Sheet
In responding, please make your judgments on the basis of
what this report means _to you. On the following pages, you will
find a restatement of the general concept and beneath it a set of
descriptive scales. You are to rate the report on each of these
scales in order.
Here is how you are to use these scales:
If you feel that one dimension of the enclosed report is very closely
related to one end of a descriptive scale, you should place your
mark as follows:
fair X :_______ :_______ :_______ :_______ :*_______:________ unfair
fair : : : °r : : : X unfair
If the dimension is quite closely related, or usually related,
to one or the other end of the scale (but not extremely) you should
place your mark as follows
strong : X : weak
strong : : : °r : X : weak
If the dimension seems only slightly related to one side as
opposed to the other side (but is not really neutral) then you
should mark as follows:
active : : X passive
active : :or
: : X : : passive
The direction toward which you check, of course, depends
upon which of the two ends of the scale seem most characteristic
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
174
of the dimension you are judging. If you consider a dimension
to be neutral on the scale, both sides of the scale equally
associated with the concept, or if the scale is completely
irrelevant, unrelated to the report, then you should place your
mark in the middle space:
safe _______ :_______ :________: X :_______ :_______ :________ dangerous
IMPORTANT: (1) Place your marks in the middle of the spaces,
not on the boundaries:THIS NOT THIS
X :_______ :_______ :_______ :_______ X_______
(2) Be sure you check every scale for the report—
do not omit any.
(3) Never put more than one mark on a single scale.
Sometimes you may feel as though you have seen the same item
before. This will not be the case, so do not look back and forth
through the items. Do not try to remember how you checked earlier,
similar items. Make each item a separate and independent judgment.
Work at a fairly high speed. Do not worry or puzzle over
individual items. It is your first impressions, the immediate
"feelings" about the items, that is wanted. On the other hand,
please do not be careless, because we want your true impressions.
Please continue, and begin marking.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
175
This report is
1 active _
2 illogical _
3 rash _
4 direct _
5 initial _
6 interesting_
7 ungeneral- __ izable
8 false
9 overt
10 inconsistent
11 good _
12 timely _
13 unimportant_
14 untrue _
15 objective _
16 undescrip- _ tive
passive
logical
_ cautious
circuitous
final
boring
generalizable
true
covert
consistent
bad
_ untimely
important
_ true
_ biased
_ descriptive
17 positive _____ :_____ :_____ :_____ :_____ :_____ =_____ negative
18 narrow _____ :______:_____ :_____ :_____ :_____ :_____ wide
19 unemotional_____ :_____ :_____ :_____ :_____ :_____ :_____ emotional
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
176This report is .
cn§CtJ
20 intuitive ___
21 deliberate ___
22 infrequent ___
23 oral______ ___
24 risky ___
25 subjective ___
26 anonymous ____
27 organized ___
28 relevant ___
29 limited ___
30 credible ___
31 unreliable ___
32 judgmental ___
33 harmonious ____
34 incomplete ___
35 expensive ___
36 organized ____
37 superior ____
38 friendly ____
39 light ____
40 artful ____
41 sweet
rational
impulsive
frequent
written
certain
objective
identified
unorganized
irrelevant
diffused
unbelievable
reliable
unjudgmental
dissonant
complete
cheap
unorganized
inferior
unfriendly
dark
artless
sour
permission of the copyright owner. Further reproduction prohibited without permission.
177
This report is
42 high----------- -
43 warranted ---_
44 wise------------
45 therapeutic ---
46 hard-----------
47 strong---------
48 severe---------
49 deep ---
50 scholarly — _
51 motivated --
52 hot --
53 temperate —
54 youthful — .
55 sensitive —
56 sophisticated^
57 useful —
58 tense —
Thank you for your time
£
~__ low
W a rra n te d
f °°U s h
— — toxic
—■ s o f t
— ■— Weak
■— - -̂oniexit
— shallow
— ■— - tgnorant ~~— aimless— ■— . cold
i"temperate
— maturei-nsensttivs
■— — — calve
— useless ' ~ telaxed
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
178
APPENDIX H
FINAL VERSION OF THE CRITERION INSTRUMENT WITH DIRECTIONS TO THE PROJECT DIRECTORS
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
179
Enclosed is the report from a visit on July and
As the Evaluation Center outlined at the Spring Hill Conference, our study would provide some feedback to each college project director on the strengths and weaknesses of his Productivity Project as it was being implemented. It is hoped that the enclosed report will be useful feedback since it provides the views of someone who is both independent and external to each project.
The report is written in a manner specified by the Center so that information would be standardized. Each report author has personally reviewed, corrected, and edited the information now being presented to you. The Center has not exercised any editorship.
However, the Center would appreciate feedback concerning your perceptions of the report's merit. A brief checklist has been provided for your systematic assessment. The information sent on that checklist will be of great value for reviewing our procedures for future studies.
Please consider reading through the report once for an overview. Then, review it again for content. Enclosing your rating in the return envelope will provide an efficient and standardized form of report assessment. If after mailing the rating you feel it would be appropriate to respond in writing, elaborating on any strength or weakness of the report, please feel free to do so. However, please consider October 30 as a deadline for that supplemental written response.
Please return the enclosed rating sheet by October 22. If there are any questions, please feel free to call (616— 383-8166).
Thank you very much for your assistance in this matter.
Sincerely,
John W. Evers, Director Hill Productivity Project
JWE:lje
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
180
Instructions for Rating Sheet
In responding, please make your judgments on the basis of what
this report means to you. On the following pages, you will find a
restatement of the general concept and beneath it a set of descrip
tive scales. You are to rate the report on each of these scales in
order.
Here is how you are to use these scales:
If you feel that one dimension of the enclosed report is very
closely related to one end of a descriptive scale, you should place
your mark as follows:
fair X : : : unfair
fair : : :or
X unfair
If the dimension is quite closely related, or usuall.y related,
to one or the other end of the scale (but not extremely) you should
place your mark as follows:
strong : X : : weak
strong : : :or
: : X : weak
If the dimension seems only slightly related to one side as
opposed to the other side (but is not really neutral) then you
should mark as follows:
active : : X : passive
active : : :or
: X : : passive
The direction toward which you check, of course, depends upon
which of the two ends of the scale seem most characteristic of the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
181
dimension you are judging. If you consider a dimension to be
neutral on the scale, both sides of the scale equally associated
with, the concept, or if the scale is completely irrelevant,
unrelated to the report, then you should place your mark in the
middle space:
safe_______ :_______ : X :_______ :_______ :_______ dangerous
IMPORTANT: (1) Place your marks in the middle of the spaces, not
on the boundaries:
THIS NOT THISX :_______ :_______ :_______ :______ X_______
(2) Be sure you check every scale for the report— do
not omit any.
(3) Never put more than one mark on a single scale.
Sometimes you may feel as though you have seen the same item
before. This will not be the case, so d£ not look back and forth
through the items. Do not try to remember how you checked earlier,
similar items. Make each item a separate and independent judgment.
Work at a fairly high speed. Do not worry or puzzle over individual
items. It is your first impressions, the immediate "feelings” about
the items, that is wanted. On the other hand, please do not be
careless, because we want your true impressions.
Please continue, and begin marking.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
182
This report is
1 active ______—
2 illogical_____
3 rash__________— :—
4 direct ._____
5 initial ___ _J_
6 interesting __
7 ungeneral- izable
8 false ___ _
9 overt ___ _
10 inconsistent _
11 good ---
12 timely ___
13 unimportant__
14 objective ___
15 undescrip'--tive
16 positive __ _
17 narrow __
13 unemotional__
19 intuitive __
_ passive
logical
cautious
circuitous
final
_ boring
_ generalizable
_ true
covert
consistent
bad
untimely
important
biased
descriptive
negative
wide
emotional
rational
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
183
This report is
20 deliberate __
21 risky _
22 subjective _
23 anonymous _
24 organized _
25 relevant _
26 limited _
27 credible _
28 unreliable _
29 judgmental _
30 harmonious _
31 incomplete _
32 expensive _
33 organized
34 inferior
35 friendly
36 unwarranted _
37 wise
38 therapeutic
39 soft
. impulsive
certain
obj ective
identified
unorganized
irrelevant
diffused
unbelievable
reliable
unjudgmental
_ dissonant
complete
_ cheap
_ unorganized
superior
_ unfriendly
warranted
foolish
toxic
hard
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
184
This report is . .
cn
cn Ss
40 strong ___
41 severe ___
42 deep
43 ignorant ___
44 motivated ___
45 temperate ___
46 insensitive
47 sophisticated __
48 useful __
49 tense __
Thank you for your time.
weak
lenient
shallow
scholarly
aimless
intemperate
sensitive
_ naive
useless
relaxed
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
185
BIBLIOGRAPHY
Alkin, M.C, Evaluation theory development. Evaluation Comment, 1969, 2̂ 2-7.
Brinton, J.E. Deriving an attitude scale from semantic differential data. Public Opinion Quarterly, 1962, 25_, 499-501.
Bloom, B.S., Hasting, J.T., and Madaus, G.F. Handbook on formativeand summative evaluation of student learning. New York: McGraw-Hill, 1971.
Campbell, D.T. and Stanley, J.C. Experimental and quasi-experimental designs for research on teaching. In N.J. Gage (Ed.), Handbook for research on teaching. Chicago: Rand McNally, 1963.
Clark, D. Cecil. A prescriptive model of development or evaluation:Some needed maturity. Northwest Regional Laboratory Paper Series No. 8, 1975.
Cronbach, L.J. Course improvement through evaluation. Teachers College Record, 1963, 64, 672-683.
Glass, G.V. The growth of evaluation methodology. (Paper No. 27),Boulder, Colorado: Laboratory of Educational Research, Universityof Colorado, 1969.
Glass, G.V. and Worthen, B.R. Educational evaluation and research: Similarities and differences. In J. Weiss (Ed.), Curriculum Theory network monograph supplement; Curriculum evaluation:Potential and reality. Toronto: Ontario Institute for Studiesin Education, 1972.
Guba, E.G. The failure of educational evaluation. In C.H. Weiss (Ed.), Evaluating action programs. Boston: Allyn and Bacon, 1973.
Hammond, R.L. Evaluation at the local level. Tucson: EPIC EvaluationCenter, 1967.
Hays, W.L. Statistics. New York: Holt, Rinehart and Winston, 1963.
House, E.R. and Hogben, D.L. A goal-free evaluation for me and myenvironment. Champaign-Urbana: Center for Instructional Researchand Curriculum Evaluation, (undated mimeo).
House, E. R. Confessions of a responsive goal-free evaluator. Champaign-Urbana: CIRCE, 1974 (mimeo).
Jacobs, J.A. A model for program development and evaluation. Theory into practice. February 1974, J3, 360-364.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
186
Kerlinger, F.N. Foundations of behavioral research (Second Edition).New York: Holt, Rinehart and Winston, 1973.
Kirk, R.E. Experimental design: Procedures for the behavioral sciences.Belmont: Brooks/Cole, 1968.
Leedy, P. D. Practical research: Planning and designing. New York:McMillan, 1974.
Merriman, H.O. Evaluation in a school setting: Function organizationand operation, 1971 (in press).
Messick, S.J. Metric properties of the semantic differential.Educational and Psychological Measurement, 1957, 17_, 251-256.
Metfessel, N.S. and Michael, W.B. A paradigm involving multiplecriterion measures for the evaluation of the effectiveness of school programs. Educational and Psychological Measurement,1967, 27, 931-943.
O'Keefe, K.G. Methodology for educational field studies. Ph.D. Dissertation, Ohio State University, 1968.
Osgood, C.E., Suci, G.J., and Tannenbaum, P.H. The measurement ofmeaning. Champaign-Urbana: University of Illinois Press, 1957.
Platt, J.R. Strong inference. In H.S. Broudy, R.H. Ennis, and L.I. Krimerman (Eds.), Philosophy of educational research.New York: John Wiley, 1973.
Popham, W.J. Results rather than rhetoric. Evaluation Comment,1972, 2. 2-5.
Popham, W.J. Educational evaluation. Englewood: Prentice-Hall,1975.
Provus, M.S. Discrepancy evaluation. Berkeley: McCutchan, 1971.
Scriven, M.S. A possible distinction between scientific disciplines and the study of human behavior. In R.E. Feigl and M.S. Scriven (Eds.) , Minnesota studies in the philosophy of science.Minneapolis: University of Minnesota Press, 1956.
Scriven, M.S. Causes, connections and conditions in history.. In W.H. Dray (Ed.), Philosophical analysis and history, New York: Harper and Row, 1966a.
Scriven, M.S. Value claims in the social sciences. Publication #123 of the Social Science Education Consortium. Indiana University, 1966b.
Scriven, M.S. The methodology of evaluation. In R.E.. Stake (Ed.), Curriculum evaluation (AERA monograph series no. 1). Chicago:Rand McNally, 1967.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
187
Scriven, M.S. Evaluation skills (AERA tape series no. 6B). Washington: American Educational Research Association, 1971a.
Scriven, M.S. Roy G. Biv Papers. Unpublished Correspondence, 1971b.
Scriven, M.S. Prose and cons about goal-free evaluation. Evaluation Comment, 1972, _3, 2-5.
Scriven, M.S. Goal-free evaluation. In E.R. House (Ed.), School evaluation: The politics and process. Berkeley: McCutchan,1973.
Scriven, M.S. Discussion with educational R and D evaluators. In H. Poyner (Ed.), Problems and potentials of educational R and D evaluation. Austin: Educational Systems Associates, 1974b.
Scriven, M.S. The concept of evaluation. In M.W. Apple, M.J. Subkoviak, and H.J. Lufler, Jr. (Eds.), Educational evaluation: Analysis andresponsibility. Berkeley: McCutchan, 1974c.
Scriven, M.S. Exploring goal-free evaluation: An interview withMichael Scriven. Evaluation, 1974d, 2, 23-25.
Scriven, M.S. "Evaluation of Southwest Education Development Laboratory's Children's Folklore Project: Pass It On" (Unpublished Report)Berkeley: 1975a.
Scriven, M.S. Evaluation bias and its control. Kalamazoo, Michigan:The Evaluation Center Occasional Paper Series, 1975b.
Scriven, M.S. and Roth, J. Evaluation Thesaurus. Pt. Reyes, CA: Edgepress, 1977.
Snider, J.G. and Osgood, C.E. Semantic differential technique: Asourcebook. Chicago: Aldine, 1969.
Stake, R.E. The countenance of educational evaluation. Teachers College Record, 1967, 68, 523-540.
Stake, R.E. Responsive evaluation. Paper presented at the 1974 Annual Meeting of the American Educational Research Association.
Stake, R.E. and Denny, T. Needed concepts and techniques for utilizing more fully the potential of evaluation. Educational evaluation:New roles, new means (Part 2). Chicago: University of ChicagoPress, 1969.
Stufflebeam, D.L. A depth study of the evaluation requirement. Theory into Practice. June 1966, 5_, 121-133.
Stufflebeam, D.L. Evaluation as enlightenment for decision-making. Columbus, Ohio: The Evaluation Center, 1968.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
188
Stufflebeam, D.L.; Foley, W.J.; Gephard, W.J.; Guba, E.G.; Hammond,- R.L.; Merrlman, H.O., and Provus, M.L. Educational evaluation and decision making. Itasca, Illinois: F.E. Peacock, 1971.
Stufflebeam, D.L. Should or can evaluation be goal-free? Evaluation Comment, 1972, 3_, 2-5.
Stufflebeam, D.L. and Scriven, M.S. (transcribed tape). AERA traveling institute on evaluation. Tampa, Florida, 1973.
Stufflebeam, D.L. Toward a technology for evaluating evaluation. Paper presented at the 1974 Annual Meeting of the American Educational Research Association.
Stufflebeam, D.L. Training materials for SHASDA workshop. Kalamazoo, Michigan: The Evaluation Center, 1975.
Tyler, R.W. General statement on evaluation. Journal of Educational Research, 1942, 35, 492-501.
Tyler, R.W. Basic principles of curriculum and instruction. Chicago: University of Chicago Press, 1950.
Tyler, R.W. Changing concepts of educational evaluation. In R.E. Stake (Ed.), Curriculum evaluation (AERA monograph series no. 1.). Chicago: Rand McNally, 1967,
Tyler, R.W. Ralph Tyler discusses behavioral objectives. Today1s Education, 1973, 6_, 309-311.
Walker, J.P. Installing an evaluation capability in an educational setting: Barriers and caveats. Paper presented at the 1972 AnnualMeeting of the American Educational Research Association.
Webster, W.J. Some considerations in developing useful evaluation designs. Paper presented at the Minnesota Educational Program Audit Conference, 1971.
Welch, Wayne W. "Goal Free Evaluation Report for St. Mary's Junior College" (Unpublished Report) Minneapolis: 1976.
Welch, W.A. and Hambleton, R.K. On the use of goals in evaluclion: Astudy of selected issues. Paper presented at the 1975 Annual Meeting of the American Educational Research Association.
Winer, B.J. Statistical principles in experimental design. New York:Holt, Rinehart and Winston, 1973.
Wittrock, M.C. and Wiley, D.E. The evaluation of instruction: Issuesand problems. New York: Holt, Rinehart and Winston, 1970.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.