case study evaluations gao/pemd-91-10.1.9 - us government

United States General Accounting Office

GAO Program Evaluation and MethodologyDivision

November 1990

Case StudyEvaluations

GAO/PEMD-91-10.1.9

Preface

GAO assists congressional decisionmakers in theirdeliberative process by furnishing analyticalinformation on issues and options underconsideration. Many diverse methodologies areneeded to develop sound and timely answers to thequestions that are posed by the Congress. To provideGAO evaluators with basic information about themore commonly used methodologies, GAO’s policyguidance includes documents such as methodologytransfer papers and technical guidelines.

This methodology transfer paper on case studyevaluations describes how GAO evaluators could usecase study methods in performing our work. Itdescribes six applications of case study methods,including the purposes and pitfalls of each, andexplains similarities and differences among the six.This paper presents an evaluation perspective on casestudies, defines them, and determines theirappropriateness in terms of the type of evaluationquestion posed. The original report was authored byLois-ellin Datta in April 1987. This reissued(1990) version supersedes the earlier edition.

Case Study Evaluations is one of a series of papersissued

by the Program Evaluation and Methodology Division(PEMD). The purpose of the series is to provide GAOevaluators with guides to various aspects of audit andevaluation methodology, to illustrate applications,and to indicate where more detailed information isavailable.

GAO/PEMD-91-10.1.9 Case Study EvaluationsPage 1

Preface

We look forward to receiving comments from thereaders of this paper. They should be addressed toEleanor Chelimsky at 202-275-1854.

Werner GrosshansAssistant Comptroller GeneralOffice of Policy

Eleanor ChelimskyAssistant Comptroller Generalfor Program Evaluation and Methodology


Contents

Preface 1

Chapter 1 Introduction

8

Chapter 2 What Are CaseStudies?

13What Is Meant by “a Case Study”? 15Some Common Benefits Expected From

Case Study Evaluations23

Instance Selection in Case Studies 25

Chapter 3 Case StudyApplications

37Illustrative 37Exploratory 40Critical Instance 45Program Implementation 48Program Effects 54Cumulative 57Design Decisions and Case Study

Applications62

Chapter 4 Data Collectionand Analysis

63Data Collection 63Data Analysis 67Handling Multisite Data Sets 70Basic Models for Data Analysis 73Pitfalls and Booby Traps 74Where to Go for More Information 78

Chapter 5 Summary

79What Are Case Studies? 79When Are Case Studies Appropriately Used

in Evaluation?79

What Distinguishes a Good From aNot-Good Case Study?

80

Impartiality and Generalizability 84


Contents

Appendixes Appendix I: Theory and History 86Appendix II: Site Selection Example 106Appendix III: Guidelines for Reviewing Case

Study Reports113

Bibliography 121Glossary 145Papers in This Series 150

Tables Table 2.1: What Is a Case Study? Exercise. 13Table 2.2: Complexity of Questions 18Table 2.3: Methods of Obtaining Description

and Analysis in Case Studies20

Table 2.4: Some Common Benefits ExpectedFrom Case Study Evaluations

24

Table 2.5: Instance Selection in Case Studies 27Table 2.6: Hypothetical Data on Instance

Selection30

Table 3.1: Illustrative Case Studies 38Table 3.2: Exploratory Case Studies 42Table 3.3: Critical Instance Case Studies 48Table 3.4: Program Implementation Case

Studies51

Table 3.5: Illustration of Differences inNote-Taking

53

Table 3.6: Program Effects Case Studies 56Table 3.7: Cumulative Case Studies 59Table 3.8: Some Design Decisions in Case

Study Methods61

Table 4.1: Ways of Analyzing Case StudyData

69

Table 5.1: Some Common Pitfalls in CaseStudy Evaluation

81

Table I.1: Criteria of Good Research 88Table I.2: Evaluation Adaptations of the

Research Case Study103

Table II.I: Hypothetical Data on UnfiledCorporate Income Tax Returns for 1986State Income Tax Returns

107


Contents

Table III.1: Checklist for Reviewing CaseStudy Reports

119

Abbreviations

GAO General Accounting OfficeOTTR Observe, think, test, and revisePEMD Program Evaluation and Methodology

DivisionSSA Social Security Administration


Chapter 1

Introduction

At his government-required anti-terrorist trainingsession recently, a captain for a major airline said,

“The bits of information were so few and far between that peopleweren’t even paying attention. My instructor for the eight-hourcourse entered the room only to change videotapes. People weretalking; they were doing other things, including reading the paper.”(Philadelphia Inquirer, 1986)

This is a case instance. It is an effective way ofdrawing attention to a problem such as trainingquality. Such anecdotes are remembered and they areconvincing. What they are not, however, isgeneralizable: that is, an anecdote doesn’t tell whetherit is the only such instance or whether the problem iswide-spread. And anecdotes usually don’t show thereasons for a situation, and thus are of limited valuein suggesting solutions.

The challenge for evaluators is how to use thoseaspects of an anecdote that are effective for ourwork—the immediacy, the convincingness, theattention-getting quality—and, at the same time, fulfillother informational requirements for our jobs, such asgeneralizability and reliability. Case study methods,while not without their limitations in this regard, canhelp us answer this challenge.

GAO already does a lot of case studies—or at least,what we ourselves call case studies in describing ourmethods. There are GAO case studies in manyareas—urban housing, weapon systems testing,community development, military procurementcontracts, influences on the Brazilian export-importbalances, how programs aimed at improving waterquality are working, and the implementation of blockgrants—to name only a few.

Most of these case studies are either “illustrative” or“critical” instance applications. The first type of


Chapter 1

Introduction

application illustrates findings established by othertechniques, supplementing, for example, nationalfindings on clean air from administrative records andother sources, with in-depth description on how fundshave been used and with what results in selectedcities. The second type of application is in-depthanalysis of a case of unique interest, such as whetherfunds have been awarded and managed properly in aspecific community health center or if a certainformer government official had done anythingimproper before or after leaving the government.There are, however, four other applications of casestudies that are less often used at present but thatcould be appropriate for our jobs. In brief, the sixtypes of case study, which we examine in chapter 3,are as follows:

1. Illustrative. This case study is descriptive incharacter

and intended to add realism and in-depth examples toother information about a program or policy.

2. Exploratory. This is also a descriptive case studybut is aimed at generating hypotheses for laterinvestigation rather than illustrating.

3. Critical instance. This examines a single instance of

unique interest or serves as a critical test of anassertion about a program, problem, or strategy.

4. Program implementation. This case studyinvestigates operations, often at several sites, andoften normatively.

5. Program effects. This application uses the casestudy to examine causality and usually involvesmultisite, multimethod assessments.


Chapter 1

Introduction

6. Cumulative. This brings together findings frommany case studies to answer an evaluation question,whether descriptive, normative, or cause-and-effect.

Case Study Evaluations is a review of methodologicalissues involved in using case study evaluations. It isnot a detailed guide to case study design. It does,however, explain the similarities and differencesamong the six kinds of case study and discusses ideasfor successfully designing them. It also gives guidanceto the manager who, in reviewing completed casestudies, wants to assess their strengths. Finally, itpresents an evaluation perspective on case studies,defining them and determining their appropriatenessin terms of the type of evaluation question posed.

The methods and types of case studies outlined hereare not definitive. The case study as a researchmethod has evolved over many years of experiencebut evaluative use of the method has been morelimited. Indeed, the history of the case study as anevaluation method is little older than a decade.Therefore, discussion of some of the applicationsdescribed here is based on relatively extensive fieldexperience (with questions in such domains asjustice, education, welfare, environment, housing, andforeign aid), while the discussion of some of the otherapplications is based on more constrainedexperience.

We have paid particular attention to the conventionalwisdom that case studies are always subjective andnongeneralizable. In many uses of case studies, thereis no need to generalize. Nonetheless, we find thatthere are steps that can be taken to generalize fromcase studies when this is desired. However, we didnot devote any particular emphasis to the popularidea that case studies are inexpensive to conduct(issues of research management common to all


Chapter 1

Introduction

designs were outside the scope of our work).However, one thing that should emerge quite clearlyfrom the discussion of design features intrinsic to thecase study is that it can be a rather costly endeavor,given the time required, the rich in-depth nature of theinformation sought, and the need to achievecredibility. This reinforces the importance ofweighing carefully the decisions to employ the casestudy method in program evaluation.

In this paper, we have taken positions on many issues,expecting to revise these as experience accumulatesand as we receive reactions from evaluators andresearchers. This paper is intended to transfer whatwe believe to be good practice in case studies and tohelp establish the principles of applying case studiesto evaluation. Thus, while the document offerspreliminary guidance, it is also a point of departure.For example, we are developing the variation that wecall the “cumulative” case study. It can entailprospective and retrospective designs and it permitssynthesis of many individual case studies undertakenat different times and in different sites.

The quality of case studies can be variable. Somescore high on reasonable tests of quality; others havelower scores. Three problems often encountered haveto do with matching the question the evaluator set outto answer and the method for selecting the instancesexamined, reporting the basis for selecting theinstances, and integrating findings across severalinstances when the findings in one were inconsistentwith those in another.

The next sections of this paper will first present somenew ways of thinking about a familiar method, thecase study, and then introduce the six applications,describing what is required, in terms of methodology,to get the benefits case studies can offer. In the last


Chapter 1

Introduction

chapter, we turn to two basic questions: What do weneed to take into account with regard to theobjectivity of case studies and their generalizability?


Chapter 2

What Are Case Studies?

Almost everyone in GAO probably has worked on acase study at one time or another yet may beunfamiliar with what is meant, methodologically, by acase study. The methodological meaning is importantin understanding what differentiates a case studyfrom a noncase study and a good case study from anot-so-good case study.

What is a case study? The exercise in table 2.1describes a job we might be asked to do and a designfor it and asks you to decide whether or not this is acase study. Take about 10 minutes to think throughthis example and write out your answer. It isimportant that you try this out yourself, so please doit before continuing.

Table 2.1: What Is a CaseStudy? Exercise. Item Writing assignment

Exercise Suppose GAO has been askedwhether the informed consentrequirements for experimentationwith human subjects are beingproperly implemented. Supposefurther that we visit three siteswhere humans are used assubjects for research— ahospital, a university, and aclinic— and that we review theinformed consent procedures ateach site.

Question 1 Is this an application of the casestudy method? Why?

Question 2 If not, would case studies beappropriate for answering thequestion we were asked? Why?

Question 3 What is your definition of “casestudy”?


Chapter 2


The answers some GAO evaluators gave mayillustrate the range of definitions surrounding casestudy methods.

To some GAO evaluators, the instance was anapplication of the case study method, because wewere looking at only a few sites or because we couldnot generalize or because “actual subjects are beingused for analysis of a specific question.” To some, theinstance was clearly not an application of the casestudy method, because “we do not know if theinstances are representative of the universe,” and“there doesn’t appear to be enough done at each site.”To still others, it was not possible to tell whether thiswas a case study because looking at instances waswhat we do in all our methods, and there was nodifferentiation between this job and a complianceaudit.

The definitions given also varied greatly. To oneperson, a case study involves looking at individualpeople. To another, a case study examines a clearlydefined site and reports on that one site, so thatmultiple site studies would not be case studies. Toanother, case studies involve getting a great deal ofinformation about a single site or circumstance, whengeneralizability isn’t important. To others, “a randomsample is necessary for a case study,” “case studiesare nonnormative research that investigate a situationwithout prejudice,” “where we could look at a limitednumber of cases that would represent the universeoverall,” and “a review of relevant conditions in aspecific environment with no attempt to project to alarger universe.” There were almost as manydefinitions as people, and few of them had elementsin common. While exact uniformity isn’t expected orperhaps even possible when people are asked torecall a definition, the extreme variability illustratesthat we could be talking about very different things in


Chapter 2


a proposal or report when we discuss case studymethods. Thus a decision to “do case studies” couldlead to the collection of irreconcilably dissimilarinformation from groups working on the same job.

What Is Meant by“a Case Study”?

We have developed a definition of case studies thatleads to appropriate uses and says something abouthow a good case study is conducted. It is somewhattechnical, so we turn next to giving this definition andto discussing each of its elements.

“A case study is a method for learning about a complex instance,based on a comprehensive understanding of that instance obtainedby extensive description and analysis of that instance taken as awhole and in its context.”

For example, if we were asked to study what causedthe Three Mile Island disaster and scoped the job todescribe whether required safeguards were compliedwith, this would not be a case study. If, however, wescoped the job to examine in depth events leading upto the disaster, what went wrong, and why it wentwrong, this would be a case study. For a secondexample, if we were asked to study the safety ofnuclear plants in general, we might select as ourmethod a survey of self-reported compliance withsafeguards in all existing plants. This would not be acase study. If, however, we scoped the job to examinein depth recent problems in appropriately selectednuclear plants including among others Three MileIsland, seeking to understand why the safeguardseither were not complied with or were not sufficient,then we would have selected the case study methodto answer the question.

As we will discuss later, several methods can be usedin one job; these examples are only intended tohighlight what is not, and what is, a case study.


Chapter 2


Examining the elements of the definition also mayhelp make this distinction clear.

“A complex instance” means that input and outputcannot be readily or very accurately related. There areseveral reasons why such a relationship might bedifficult. There could be many influences on what ishappening and these influences could interact innonlinear ways such that a unit of change in the inputcan be associated with quite different changes in theoutput, sometimes increasing it, sometimesdecreasing it, and sometimes having no discernibleeffect.

Table 2.2 gives an example of a less and a morecomplex instance. “Are U.S. airports followingrequired U.S. and international security proceduresfor passengers?” is a less complex question becausethe criterion is fairly clear, the focus is narrow, theinfluences on compliance are likely to be relativelyfew, and the relation of input and output is likely tobe fairly direct. Staff knowledge of procedures oughtto play some role in following these procedures, forinstance.

Some questions are more complex, however, such asthe question: “Are security procedures in U.S. airportssufficient to protect the safety of passengers andequipment?” This is more complex because thecriterion of “sufficient protection” is much lesscertain; the focus is broader; the influences on actualachievement of sufficient procedures are likely to bemany; and the relation of input and output is not onlylikely to be both direct and indirect but also difficultto measure.

The second key element in our definition is “acomprehensive understanding.” Here the situation ismore straightforward. This means that the goal of a


Chapter 2


case study is to obtain as complete a picture aspossible of what is going on in an instance, and why.

The third key element, “obtained by extensivedescription and analysis,” has three components.These are summarized in table 2.3. Case studiesinvolve what methodologists call “thick” descriptions:rich, full information that should come from multipledata sources, particularly from firsthandobservations. The analysis also is extensive, and themethod compares information from different types ofdata sources through a technique called“triangulation.” That is, reliability of the findings isdeveloped through the multiple data sources withineach type. This is akin to corroboration as discussedin the General Policy Manual, chapter 8.0. The validityof the findings, particularly validity with regard to


Chapter 2


Table 2.2: Complexity ofQuestions Example Characteristic

A less complex question

Are U.S. airports followingrequired U.S. andinternational securityprocedures forpassengers?

Criterion is fairly clear:“required U.S. andinternational securityprocedures”

Focus is narrow:“passengers”

Influences on complianceare likely to be relativelyfew: staff knowledge ofprocedures, staff training intheir implementation,functioning equipment,number of staff comparedto workflow, degree ofsupervision, staff screeningand selection

Relation of input (influenceson compliance) to output(that required securityprocedures are followed) isfairly direct

(continued)


Chapter 2


Example Characteristic

A more complex question

Are security procedures inU.S. airports sufficient toprotect the safety ofpassengers andequipment?

Criterion is less clear: whatwould be sufficient underpresent conditions and withexisting and possibletechnologies? Focus isbroader: passengers andequipment (although stillfairly well specified)

Influences on achievementof sufficient procedureslikely to be many, includingthe state of the art ofdetection technologies,number and militancy ofpotential threats to security,and the willingness ofpassengers, airlinepersonnel, and airportpersonnel to acceptdifferent costs and forms ofprotection

Relation of input (influenceson security) and output(safety) likely to be difficultto measure and to be bothindirect and direct


Chapter 2


Table 2.3: Methods ofObtaining Descriptionand Analysis in CaseStudies a

Technique Methodology

Extensive or “thick” analysis Analysis of multiple typesof data sources such as

—Interviews with allrelevant persons

—Observations over time

—Participant observation

—Documents

—Archives

—Physical information

Analysis via triangulation ofdata

Analysis through

—Pattern matching

—Explanation building

—Thematic review

Comparison of evidence forconsistency

Analysis throughtechniques such as

—Matrix of categories

—Graphic data displays

—Tabulation of eventfrequencies

—Chronological or timeseries ordering

aDifferent types of evidence and standards for them arediscussed in General Policy Manual, chapter 8.0.


Chapter 2


cause and effect, is derived from agreement amongthe types of data sources, together with thesystematic ruling out of alternative explanations andthe explanation of “outlier” results. Examiningconsistency of evidence across different types of datasources is akin to verification. There are specializedstrategies for making these comparisons—namely,pattern matching, explanation building, and thematicreview. The technical how-tos for these threestrategies will be summarized later in this paper. Theyinvolve techniques such as graphic data displays,tabulations of event frequencies, and chronological ortime series orderings. Generally, data collection andanalysis are concurrent and interactive—that is,“yoked” in case study methods.

The next element of the definition is “taken as awhole.” As this list indicates, the size of the instancecan be as small as one individual or as large as anation. The instance as a whole can be

• An individual (Ferdinand Marcos).• A site (Three Mile Island).• A function (joint test and evaluation program1).• An office (program evaluation groups in

departments).• A department or agency (IRS, Census).• An event (Cuban missile crisis1; Challenger tragedy).• A region, nation, or organization (Chesapeake Bay

water cleanup efforts, democracy in Philippines,UNESCO).

• “Nested” units in a large or complex case study (notethat the instance or unit must be specified and dataappropriate to it collected).

One example of a GAO case study that examines anindividual is our examination of whether or not a

1These instances have been the subject of case studies. (See U.S.General Accounting Office, February 22, 1984, and Allison, 1971.)Others are general illustrations.


Chapter 2


senior official behaved improperly with regard toinfluence and accepting money before and sinceleaving the White House (U.S. General AccountingOffice, July 11, 1986). Another example would be arequest to examine in detail ex-President Marcos’ useof funds intended by the United States for military orcivilian purposes for his personal benefit. At the otherextreme, an instance may be as large as an event,such as the Cuban missile crisis (Allison, 1971) andthe swine flu vaccine (Neustadt and Fineberg, 1978),which have been the subjects of two well-known casestudies, or the Challenger tragedy. It can be a region(Chesapeake Bay water cleanup programs), a nation(democracy in the Philippines), or an organization(UNESCO). Moreover, it is possible to have questionsthat require nested case studies. For example, toanswer a question about how programs to servehandicapped children are working, we might selectthe cases of preschool and elementary programs; wemight further select within preschool programs, thosefor the hearing impaired and those for theorthopedically impaired. Each of these nested studiesis treated, in terms of specification of the unit of studyand collection of data appropriate to it, as any othercase study would be.

The last key element of the definition is “and in itscontext.” Context means all factors that could affectwhat is happening in an instance. As an example, inthe Challenger tragedy, inquiry began with trying tolocate the technology that failed as the reason for theexplosion. The righthand booster rocket wasidentified as the source of the explosion and, withinthe rocket, technological attention focused on theO-rings. The inquiry expanded very quickly, however,from asking what technology failed to an examinationof contextual influences, such as


Chapter 2


• decisionmaking on whether or not to go, in relation tothe O-rings;

• decisionmaking on whether or not to go, in relation toother components, such as tiles;

• decisionmaking more generally in NASA with regardto NASA-contractor-astronaut relations andresponsibilities;

• influences on NASA, such as alleged pressures not tocancel flights;

• quality control tradeoffs in NASA generally and NASAmanagement.

That is, the Challenger inquiry could be seen assimilar to a case study in some ways. The rapidspread of inquiry from an examination of thetechnology to an investigation of decisionmaking onthat flight, to inquiry about NASA management as itaffected the Challenger disaster generally, is what“taking the context into account” means. In casestudy methods, to understand what happened andwhy, context always is considered, and it is thisconsideration that gives the case study its strength asa way of understanding cause and effect.

Some CommonBenefitsExpected FromCase StudyEvaluations

Doing a good case study is more than just looking atwhat is happening in a few instances. It is a specialsystematic way of looking at what is happening, ofselecting the instances, collecting the data, analyzingthe information, and reporting the results.

There are nine features of case study evaluations thatmerit special discussion. Each of these features—ifcarried out—confers certain benefits in terms of theproduct. Two of the features relate to design, three todata collection, three to analysis, and one toreporting. These features and their benefits are shownin table 2.4. For example, with regard to design,information over time—the longitudinal feature of the


Chapter 2


design—provides assurance that the final productrepresents what is happening and is not registering anatypical situation.

Table 2.4: Some Common Benefits Expected From Case Study EvaluationsStudy feature Benefits expected

Design

Longitudinal Assurance that a short-term situation thatmay be unrepresentative of what ishappening isn’t inflated in importance

Triangulation Assurance that reasons given for eventsproperly reflect influences from manydifferent sources

Purposive instance Ability to match questions asked and latergeneralization of findings at levelappropriate to the questions

Data collection

Comprehensive Assurance that important conditions,consequences, and reasons for thesehave not been overlooked

Flexible Broader perspectives, increasedassurance that what is important on thescene rather than centrally will beexamined

Multiple data sources Assurance that a full picture will beobtained and that bias associated withself-protection or self-interests will bereduced

Analysis

“Yoked” or concurrent with datacollections

Assurance of the ability to collect dataneeded to test alternative interpretationsand to make rapid adjustments in design

Search for disproving-proving evidence Assurance that alternative interpretationshave been thoroughly searched for andchecked; thorough identification ofinstances that don’t fit the general pattern;and, often, understanding of the reasonsfor the outliers

(continued)GAO/PEMD-91-10.1.9 Case Study EvaluationsPage 24

Chapter 2


Study feature Benefits expected

Chain-of-evidence and pattern matchingtechniques

Permit fairly direct assessment of howconvincingly the evidence of conclusionsare related

Reporting

Actual instances persuasiveness Assurance of authenticity through andease of recall; use of the tendency togeneralize from personal experience butvia the substitution of more objectiveexperience for anecdotes of unknowncredibility

These features are the price of admission to theexpected benefits. One frequent question about casestudy methods is how rigorously these features haveto be followed. Obviously, the more closely therequirements are followed, the more benefits can beexpected. It is a judgment call as to how much thefeatures can be compromised before the “case study”becomes a site visit or turns into a survey. Probablythe most critical features are appropriate instanceselection, triangulation, and the search for disprovingevidence. And of these three, probably the mostcritical is appropriate instance selection.

InstanceSelection in CaseStudies

There are three general bases for selecting instances:convenience, purpose, and probability. Each has itsfunction and can be used to answer certain questions.A good case study will use a basis for instanceselection that is appropriate for the question to beanswered. Using the wrong basis for selecting aninstance is a fatal error in case study designs, as in alldesigns. Such a case study is a not-good case study,and it is irredeemably flawed despite anymethodological virtues it may have in terms of datacollection, analysis, and reporting.


Chapter 2


Table 2.5 summarizes the three general bases forselecting instances and the questions each basis cananswer. Of particular interest may be the sevenvarieties of purposive site selection: bracketing, bestcases, worst cases, cluster, representative, typical,and special interest.

Instance selection is crucial to generalizability and toanswering the evaluation questions appropriately.Only rarely will convenience be a sound basis forinstance selection; only rarely will probabilitysampling be feasible. Thus, instance selection on thebasis of the purpose of the study is the mostappropriate method in many designs.


Chapter 2


Table 2.5: InstanceSelection in Case Studies

Selection basisWhen to use and whatquestions it can answer

Convenience “In this site, selected because itwas expedient for data collectionpurposes, what is happening,and why”

Purpose

Bracketing “What is happening at extremes?What explains such differences?”

Best cases “What accounts for an effectiveprogram?”

Worst cases “Why isn’t the program working?”

Cluster “How do different typesof programs compare with eachother?”

Representative “In instanceschosen to represent importantvariations, what is the programlike and why?”

Typical “In a typical site, what ishappening and why?”

Special interest “In this particularcircumstance, what is happeningand why?”

Probability “What is happening in theprogram as a whole, and why?”

The match between the question asked and themethod of purposive sampling chosen can be tricky.For example, studies that attain “representativeness”by conducting a few case studies in a rural setting, afew in a suburban setting, and a few in an urbansetting will produce a report in which the threesettings receive more or less equal weight. If,however, 90 percent of the clients or sites for theprogram are rural, such “representativeness” mayappropriately capture the range of site experiences


Chapter 2


but be rather unrepresentative of the program as awhole, and care will be needed to generalize only tothe range of settings and not to the program as awhole.


Chapter 2



Chapter 2


Table 2.6: Hypothetical Data on Instance SelectionLocation Operated by Number of beds Clientele served

1. San Diego, Calif. CAIM, Inc. 800 Men and boys

2. Amarillo, Tex. CAIM, Inc. 130 Men and boys

3. El Paso, Tex. PIC 75 Families

4. El Paso, Tex. CAIM, Inc. 350 Men and boys

5. Miami, Fla. Security 100 Men and boys

6. Clearwater, Fla. CAIM, Inc. 300 Men and boys

7. Pensacola, Fla. Security 100 Families

8. Denver, Colo. PIC 100 Families

9. Salida, Colo. Security 200 Men and boys

10. Salinas, Calif. CAIM, Inc. 100 Men and boys

11. Los Angeles,Calif.

Security 300 Men and boys

12. San Francisco,Calif.

Security 250 Men and boys

13. San Francisco,Calif.

PIC 100 Men and boys

14. New York, N.Y. ARIVA, Inc. 100 Men and boys

15. Washington, D.C. ARIVA, Inc. 300 Families

16. Seattle, Wash. Security 100 Men and boys


Chapter 2


Years in operation Funded by Costs a Problems b

2 INS 25 4%

1 INS 30 4

3 INS 15 7

1 BOP/INS 60 7

1 BOP/INS 150 15

5 BOP/INS 100 10

5 INS/State 70 6

3 INS/State 20 3

4 INS 70 9

2 INS 30 3

3 INS 75 5

3 INS/State 70 7

3 INS 25 4

2 INS 55 6

2 INS 85 5

3 INS/State 60 7aCosts per person per day, charged by contractor to funder(hypothetical data).

bProblem rates include all problems considered under contractas serious, such as escape, acts of violence by or towardindividuals, vandalism requiring more than $1,000 to repair,and suicides. Rates are number of such instances per 100days per year (hypothetical data).

To illustrate what each variety means, and how itmight be operationalized, consider the information intable 2.6. This gives hypothetical data about a realsituation in designing a study—selecting instances (in


Chapter 2


this study, sites or locations) for an assessment of thecosts and operations of federal detention facilitiesmanaged by private contractors under OMB CircularA-76. There are not many such facilities—so the 16hypothetical facilities represent what we mightactually find in such a study. The followingparagraphs describe what a sample would look like ifit were chosen according to the bases in table 2.6.

ConvenienceSamples

If our location were the Denver Regional Office, aconvenience sample would be sites 8 (Denver) and 9(Salida). That is, ease of collecting data andminimizing resources required would have driven ourchoice.

Purposive Sample

Bracketing If our interests were extreme costs, numbers 3 (ElPaso, at $15 per person day) and 5 (Miami, at $150 perperson day) would bracket the cost extremes. If wewanted the three least expensive and the three mostexpensive, we could select 3 (El Paso), 8 (Denver, at$20), and 13 (San Francisco, at $25) in comparison to5 (Miami, at $150), 6 (Clearwater, at $100), and 15(Washington, D.C., at $85). Such an addition wouldalso give us a better basis for analysis because itincludes not only high-cost and low-cost sites but alsoservices to men and boys and to families, a differencethat in itself might be expected to lead to costvariations.

Best Cases If our interests were in operating centers with theleast problems, we might examine numbers 8(Denver, 3 percent) and 10 (Salinas, 3 percent). Sinceboth are in Colorado (although operated by differentfirms and serving different groups), we might want toadd sites. Such an addition could show whether we


Chapter 2


were looking at something about Colorado ratherthan about low-problem centers. We could do this byselecting 1 (San Diego, 4 percent), 2 (Amarillo,4 percent), and 13 (San Francisco, 14 percent).

Worst Cases Sites 5 (Miami, 15 percent problems) and 6(Clearwater, 10 percent) stand out as worst cases.Selecting an out-of-state comparison, if we wanted it,is harder here. The next highest problem rate (9,Salida, at 9 percent) is run by a different company andcosts much less. Security has a site in San Francisco,for men and boys, which costs $70 daily with a7-percent problem rate. The costs of site 15(Washington, D.C.) are higher, but this site servesfamilies and has a low problem rate. The best choiceprobably is 12 (San Francisco): it serves the samegroup (men and boys) and is run by the samecompany (Security).

Cluster We might be interested in administrativearrangements—in, for example, how administrationworks out when INS alone is the contractor, whenresponsibility is shared with another federal agency(Bureau of Prisons), and when responsibility isshared with the state. One cluster of sites (1, 2, 3, 8, 9,10, 11, 13, 14, and 15) is administered by INS alone.Another cluster (4, 5, and 6) is shared between BOPand INS, and the last cluster (7, 8, 12, and 16) is run byINS and the state. We could pick one or two sitesfrom each cluster to get a sense of how agencyauspices may affect program operations.

Representative One issue we might need to examine could beefficiencies of operation—particularly in terms offacility size. Here we might select numbers 1 (SanDiego, 800 beds), 6 (Clearwater, 300 beds), and 10(Salinas, 100 beds). All are run by CAIM, and all servemen and boys. We would have to limit ourgeneralizations to facilities for men and boys, but


Chapter 2


these three sites should give a good sense of the sizeand operations issue.

Typical This would be a challenge. In terms of size, there is a“typical” bed size (100 beds); in terms of peopleserved, there is a “typical” population (men and boys);and in terms of years of operation, 3 years is “typical,”with 2 years a close runner-up. In terms of costs,however, the distribution is trimodal—that is, threevalues appear about equally often—and for percent ofproblems, it is almost flat with two outliers. Also,there is not a single site that matches all three“typical” characteristics well. Miami, for example, has100 beds and serves men and boys, but it has been inoperation only 1 year, costs $150 per person per day,and has a 15-percent problem rate. The best approachwould be to indicate that it is not possible to pick onesite that is “typical” of such distributions.

Special Interest Any one of the 16 sites might be examined as a resultof special congressional interest. Such interest usuallywould be based on information extraneous to the datain the table: a complaint might be received, forexample, about conditions in the San Diego site, orallegations might be made that the high costs of theMiami site were due to mismanagement.

Probability Samples Probabilistic sampling is the method of choice foranswering questions about “how much,” or howextensive a problem is in a population. Properlycarried out, it provides strong generalizability andassurance of representativeness. A probability sampleis one in which all members of the population have aknown and equal chance of being selected. If we useda table of random numbers, and selected as the firsttwo sites those corresponding to the first twonumbers between 1 and 16 in the table, we wouldhave selected a probability sample. Each site would


Chapter 2


have a 1-in-16 chance of selection, and that chancewould be equal among sites. A fair objection to thisstatement is that the laws of probability operate onlarge numbers, and selecting fewer than 30 instancesdoes not always provide the generalizability to thepopulation as a whole that probability samplespromise. However, in terms of actual operations,which we want to illustrate here, the method justsketched is a probabilistic one, and some case studieshave involved 30 or more sites selected on aprobabilistic basis. (See PEMD’s transfer paperentitled Using Statistical Sampling (U.S. GeneralAccounting Office, May 15, 1986) for moreinformation.)

For readers who want to check out their skills inapplying different types of purposive selection,appendix II gives information for a job involving the50 states (a fairly common situation for GAO), a formfor indicating which you would select for each of theseven kinds of purposive selection, and our answers,for comparison against yours.

In many jobs, what is a “case” and what dimensionsare important to consider in selection will be clear.For example, the population of detention facilitiessupported by INS contracts can be defined legally (bythe contract awarded), and the relevant dimensions(length of time in operation, facility size, detaineemix) are straightforward. There are, however, moreproblematic circumstances. An example would be astudy of the extent to which voluntary organizationshave taken up any slack in welfare supports. What is avoluntary organization can be defined broadly, as“any nonprofit organization,” or narrowly, as “aservice-oriented group whose members do notreceive payment for their work.”


Chapter 2


Dimensions of potential relevance for the outcome ofinterest are many, and the empirical basis forselecting any one dimension over others few. In suchsituations, the evaluator can turn to past experience,a search of the appropriate theoretical as well asempirical literature, the advice of knowledgeablepersons, an examination of key issues in proposed orpending legislation, customer guidance, and similartechniques. That is, while it is important to recognizethe difficulties, there are ways of dealing with them incase definition.


Chapter 3

Case Study Applications

As noted earlier, there are six types of applicationsfor case study methods—illustrative, exploratory,critical instance, implementation, program effects,and cumulative. But case study reports commonly useonly two of the six applications: illustrative andcritical instance. Greater use could be made of thefour others in selecting alternative ways of answeringquestions, because these may be able to giveinformation that is more valuable to customers thanother techniques. Also, improvements can always bemade in how even the two approaches already usedfrequently are carried out, especially in the area ofselecting instances for study. The next sectionssummarize, for each of the six types, the evaluationquestions they can answer, the functions theyperform, their design features, and their pitfalls. Thelast section shows what basis for selecting sites isappropriate for each of the six applications.

Illustrative As table 3.1 indicates, illustrative case studiesprimarily describe what is happening and why, in oneor two instances, to show what a situation is like. Thiscan help in the interpretation of other data,particularly if we have reason to believe most readersknow too little about a program or situation tounderstand fully the information from surveys orother methods.


Chapter 3


Table 3.1: IllustrativeCase Studies Aspect examined Characteristics

Evaluationquestions

Help interpret other data when thereis reason to believe that readersknow too little about a program;descriptive, often used in conjunctionwith other methods

Functions Make the unfamiliar familiar; providesurrogate experience; avoidover-simplification of reality; and givereader a common language aboutthe topic

Design features Site selected as typical orrepresentative of important variations;small number of cases to keepreader’s interest; data often includevisual evidence; analysis concernedwith data quality and meaning; andreports use self-contained, separatenarratives or descriptions

Pitfalls May be difficult to hold reader’sinterest while presenting in-depthinformation on each illustration; maynot adequately represent situationswhere considerable diversity exists(in such situations, it may beimpossible to represent variety wellenough to use illustrative casestudies); and may not have timeon-site for in- depth examination

GAO has many examples of such illustrative use. In1982, for instance, CED examined housing blockgrants through a survey supplemented by casestudies. The results of the survey were published inthe main report (U.S. General Accounting Office,December 13, 1982). For three of the sites (Pittsburgh,Seattle, and Dallas), individual reports described whateach city was like with regard to housing andhousing-related activities and how the money was


Chapter 3


used in that city and included before-and-afterpictures of what rehabilitation meant for individualneighborhoods and houses (U.S. General AccountingOffice, March 24, 1982; March 30, 1982; April 30,1982). In a similar application, HRD described theprojects funded under the Emergency JobAppropriations Act of 1983 in communities in Texas,Alabama, California, Georgia, and Massachusetts(U.S. General Accounting Office, March 26, 1985;August 27, 1985; September 25, 1985; December 6,1985).

Illustrative case studies are used by evaluators inother agencies. When the Department of Health andHuman Services was trying out delivery of Head Startservices to parents and children in their own homes,called Home Start, the Department supplemented aformal assessment of the development of the childrenbefore and after the program with case studies(High/Scope Educational Research Foundation, 1972).These case studies described what services weredelivered, the conditions in rural as well as urbanareas, and what the Home Start teachers did duringthe home visits and generally provided a surrogate orvicarious experience for readers who might neverhave visited a Head Start or a Home Start center. Thecase studies told, too, of the development of theprogram over time and helped give a realistic sense ofproblems in start-up and implementation, howchanges in staffing were accommodated, and theimpact of shifting federal guidance on efforts to carryout the program in the field.

Case studies such as these are well accepted as avalid way of amplifying a more systematicpresentation via the realism and vividness ofanecdotal information. There are, however, pitfalls inpresenting illustrative case studies. The most seriousis selecting the instances. The case or cases must


Chapter 3


adequately represent the situation or program. This isrelatively easy if the program is small andhomogeneous. Where considerable diversity exists, itmay not be possible to select a “typical” site, and thediversity may be so great that to represent itadequately would require more case studies thanmost people would want to read for illustrativepurposes. In the example of privately operateddetention facilities, an illustrative case study mightrun the risk of oversimplifying a more complexsituation. The example was contrived to illustrateexactly this point: that sometimes we cannot select asite that fits our needs and thus the method is notappropriate.

However, in many real-world situations, it is possibleto represent diversity adequately for illustrativepurposes and to obtain the benefits of thisapplication: helping readers feel, hear, see, “be there”when this kind of surrogate site experience isnecessary to undo stereotypes or explain a situationotherwise inaccessible for most people.

Such a situation might be a bilingual education class,about which stereotypes can abound, or life aboard anuclear-weapon-equipped submarine, a situation fewreaders will ever experience themselves but may needto get a feel for in order to understand staff selection,training, and management on modern submarines.

Exploratory The exploratory case study is a shortened case study,undertaken before launching into a large-scaleinvestigation. Its function is to develop the evaluationquestions, measures, designs, and analytic strategy forthe bigger study. As table 3.2 indicates, it is mosthelpful where considerable uncertainty exists aboutprogram operations, goals, and results. Also ratherthan initiate a job requiring 1,000 staff days or more,


Chapter 3


when we do not have an adequate on-the-shelf set ofdesigns and measures, an exploratory case study cansave time and money in implementation as well asimproving the confidence we have in our results. Wecan aim more precisely and hit the target more often.


Chapter 3


Table 3.2: ExploratoryCase Studies Aspect examined Characteristic

Evaluationquestions

Usually cause and effect

Functions Where considerable uncertaintyexists about program operations,goals, and results, exploratory casestudies help identify questions, selectimportant measurement constructs,develop actual measures for these,which can be used later inlarger-scale tests; formulateexpectations; safeguard investmentin larger studies (for problems orprograms that are not well-developed)

Design features Site selected: needs at least one sitethat represents each importantvariation to make a conveniencesample acceptable; number of casessufficient to cover diversity; datafocus on program operations andon-site observation, are notlongitudinal but need enough time tofind out what is going on; analysis isclosely concurrent with field work butdoes not require strong chain ofevidence or audit trail; reports areusually internal or parts of larger,longer reports

Pitfalls Temptation to prolong the exploratoryphase; site selection only forconvenience, inadequate coverageof diversity; prematurity —exploratoryfindings released as conclusions;over-involvement in evaluator’s ownhunches so that initial findings areconfirmed rather than tested

Some of our scoping work already may involveexploratory case studies. For example, in GGD, adesign study was done as a separate job, culminating


Chapter 3


in a briefing, prior to an in-depth study of theimplementation of the Bail Reform Act of 1984. Themethodology included 90 interviews, observations,and data analysis from the population of 94 courtdistricts selected purposively for their characteristicson significant variables. Researchers and experts inthe field were also interviewed. An expert panel wasused to give feedback at various points to make surewe had a comprehensive picture of the situation. Theproduct of this exploratory case study was a briefing,with the study design choices described, includingdetailed research questions, outlines of data sources,significant variables, extant data bases, and siteselection criteria. From this, a larger study wasdesigned to meet the needs of the requester. Otherjobs may involve similar efforts that are not, however,reported as separate jobs and thus are less visible asexploratory case studies.

Also reports that include some features of exploratorycase studies have been issued by GAO. In 1985, forexample, NSIAD examined emerging issues in exportcompetition through a case study of the Brazilianmarket (U.S. General Accounting Office,September 26, 1985). Combining site visits to Brazil,Japan, West Germany, and France, interviews withmany officials of appropriate agencies and from theprivate sector, examination of official governmentfiles, and a questionnaire survey of high technologyfirms active in the Brazilian market, the evaluatorsamassed a rich array of contextual and focalinformation and identified four trade practicesconsidered to be key factors in exportcompetitiveness in Brazilian markets. These werebilateral trade accords, countertrade, exportfinancing, and compliance with trade-relatedindustrial policy. Although to meet the requirementsof the job, NSIAD did not need to test these factorsfor generalizability to other countries through a later


Chapter 3


study, the product would permit such testing. NSIADis using the findings in this way, as part of its ongoingwork on bilateral initiatives. Of particularmethodological note in this report is the detailedexplanation of why export competitiveness inBrazilian markets (the instance) was selected for thecase study.

The exploratory case study has been used by agenciesoutside GAO. The Department of Justice, for example,supported an exploratory case study of the careercriminal program (Chelimsky and Dahmann, 1980).The career criminal program aimed at “swift andcertain” justice by trying to expedite and strengthenprocessing of individuals who had long criminalhistories at the time of apprehension. The exploratorystudy looked in depth at four of the ninedemonstration sites prior to conducting a programeffects evaluation. The evaluators identified the keyelements of the programs as implemented and whatmeasurable changes were likely to occur anddeveloped measures of the outcomes, as well asdesigns for testing cause and effect in the subsequentlarger study (Chelimsky and Sasfy, 1976).

The greatest pitfall in the exploratory study isprematurity: that is, the findings may seem soconvincing that it can be difficult to resist pressuresto report on these as if they had the strength of thelarger study. Also, care must be taken to scope andsequence the exploratory study so that it yieldsenough information to be worthwhile and in time foruse in the larger study but does not unduly delayanswering the questions through the larger study. Inaddition, it is inappropriate to use the scoping phaseas an ad hoc exploratory case study accompanied byan urge to issue the product at the end of scoping,when the necessary procedures for an exploratory


Chapter 3


case study with regard to such issues as instanceselection have not been followed.

Critical Instance The critical instance is the most frequent applicationof the case study method in GAO, so much so that itmay be seen as a “usual GAO review” rather thanrecognized as what it can be—a case study (U.S.General Accounting Office, January 22, 1981; April 23,1982; October 30, 1985). The advantage of recognizingthe approach as an application of case study methodsis that some aspects of the method—such as the closeyoking of data collection and analysis—that may notbe widely used now could be applied in a way thatincreases timeliness without reducing quality. (Thistechnique, discussed in more detail in the section onanalysis, can increase efficiency by reducingcollection of data and large-scale analyses of thesedata that subsequently do not prove useful.)

The critical instance case study examines one, or veryfew, sites for one of two purposes. First, a veryfrequent application is the examination of a situationof unique interest, such as Three Mile Island, theChallenger disaster, or allegations concerning fundingfor a specific presidential campaign. There is little orno interest in generalizability. The instance is not“selected” by us; rather, we are called to it.

GAO conducts many critical instance studies. Oneexample, already mentioned, was our review of therepresentation of foreign interests by former veryhigh government officials (U.S. General AccountingOffice, July 11, 1986). Another is PEMD’s review ofthe readiness of the Big Eye Bomb for production(U.S. General Accounting Office, May 23, 1986). Yetanother is RCED’s review of a construction contractaward at Jean Lafitte National Historical Park (U.S.General Accounting Office, September 26, 1987) and


Chapter 3


their examination in a separate report of the parkservice actions at Delaware Water Gap NationalRecreation area in awarding a lease, closing a campground, and raising a house rent (U.S. GeneralAccounting Office, October 28, 1987).

A second, rare, application is where a highlygeneralized or universal assertion is being called intoquestion, and we are able to test it through examiningone instance.

In one such study, GGD examined whether nationalpolicies, procedures, and practices with regard tocargo imports were causing problems in portoperations (U.S. General Accounting Office,December 1986). The Port of New York offered acritical test because, given the diversity of importsand the volume of work, if problems were occurring,they would be likely to show up clearly in this site. Ifno problems were observed, problems in other siteswere unlikely. GGD used observations, interviews,and document analysis at three sites in the Port ofNew York and supplemented these with a smallnumber of less intensive observations at other sites.The method, in this instance, was sufficient to permitrecommendations that were systemwide andgeneralizable with the single case.

Table 3.3 summarizes the features of the criticalinstance case study. As noted, the method isparticularly suited for answering cause-and-effectquestions about the instance of concern. It providesassurance that we have not prematurely overlookedimportant factors, that we have not been swayed byinformation from limited or perhaps biased sources,and that we have taken context into account, thusgiving a fair and balanced picture of the situation.


Chapter 3


Perhaps the biggest pitfall in this application isinsufficient specification of the customer’s question.That is, the job may be presented to us as if only thatsituation is of concern, but the underlying questionmay call for a broader look at the issue. A request toinvestigate the reasons for the bank failures in Ohio,for example, may reflect an interest only in Ohio, butit could be a “tip of the iceberg” question. What thecustomer may really want to know is whether otherstates are likely to have similar problems. In such asituation, Ohio might be selected as a site to examinebut we would also need to look at other states or useother approaches to achieve the generalizabilityneeded. This then rules out the critical instancemethod as appropriate for this job. The importance ofprobing the underlying questions in a request toachieve good specification of the evaluation questionis not unique, of course, to the critical instance casestudy but it is crucial in its appropriate application.


Chapter 3


Table 3.3: CriticalInstance Case Studies Aspect examined Characteristic

Evaluationquestions

Cause and effect, usually stand alone

Functions Investigation of specific problem(frequently encountered at GAO),decisive testing of universalassertion; cause-and-effect questions

Design features Site selects itself in specificproblem—for decisive testing, haveto assume uniform system withregard to issue and so conveniencesample acceptable; number of casesis usually one instance;comprehensive data for specificproblem—for decisive testing, needmore modeling, hypotheses, andtargeting to know what to study; dataanalysis and collection concurrentand interactive: data feed newcollection, and emphasis on rulingout alternative causes; reportdescribes instances, presentsconclusions about cause, givesevidence

Pitfalls Inappropriate selection of thistechnique as real issue may not bespecific problem (e.g., Ohio bankfailure) but more general questions;premature closure may narrowcausal search too early;overgeneralization from evidence

ProgramImplementation

We frequently are asked whether a program has beenimplemented and, often, whether implementation is incompliance with congressional intent. The programimplementation case study is helpful where enablinglegislation offers considerable flexibility. In suchcases, a wide variety of expenditures or actions couldbe consistent with legislation and compliance with


Chapter 3


intent may be a matter of understanding the processby which decisions were made, who was involved,and whether the actions are meeting local needs. Oneexample is the 1981 legislation consolidating manysmall categorical grants into larger block grants, thefunds for which could be spent very flexibly.

Another situation where program implementationcase studies may be called for is when concern existsabout implementation problems. In-depth,longitudinal reports of what has happened over timeand why can set a context for interpreting a finding ofimplementation variability: that is, whether thereseem to be basic structural problems or if theprogram understandably requires time for installment,adaptations, and building an infrastructure.

In some instances, GAO has been able to follow fairlyintensively the implementation of programs oractivities. One example is GGD’s series of reports onhow the 1980 census was conducted. GAO evaluators,in addition to being “on the scene” due to theirlocation at the major audit site accompaniedenumerators into the field and examined, in depth,Census procedures at field offices. In other instances,we have spent somewhat less elapsed time in thefield, with less direct observation, and with greaterreliance on interview and documentary evidence. In1985, for example, RCED was asked how theDepartment of Interior was implementing the Officeof Management and Budget’s Circular A-76, dealingwith privatization of all appropriate services. Therequest overlapped with another similar request. Thisrequest reflected a senator’s special interest in theGlacier National Park in Montana. The evaluatorswere able to combine the jobs in a review thateventually involved information from 8 of 17 NationalPark Service regional offices and 19 of 402 fieldoffices. The report aggregates findings across these


Chapter 3


sites and concludes that agencies have been slow toimplement the circular, although progress has beenmade since 1982 (U.S. General Accounting Office,March 15, 1985).

Another example is GAO’s review of 23 federalagencies’ efforts to implement the Federal Managers’Financial Integrity Act of 1982. A series of casestudies, together with an overview report, wasproduced. Among these, RCED’s review of theDepartment of Commerce implementation, to takeone report, examined the actions Commerce took thatwere intended to improve internal controls, such astraining senior financial analysts in evaluatingapplicants and borrowers in the troubled EDAbusiness loan program and overhauling the way inwhich computer resources were used for the NationalWeather Service. RCED also examined the results ofthese efforts and highlighted priority areas for furtherimprovement, such as better information on resultsfor internal management purposes.

Table 3.4 summarizes the design, data collection,analysis, and reporting features of programimplementation case studies. Usually, in such studies,generalization is wanted and care is required tonegotiate the question with the customer (bestsituations? worst? typical?) and to match instanceselection carefully with the questions. Unless theprogram is small and homogeneous, the evaluatorfaces two possibilities. The first possibility is that thenumber of instances will need to be fairly large inorder to achieve the generalizability wanted, and, as aconsequence, skill will be needed to manage datacollection with sufficient flexibility to obtain theinsights case studies offer and sufficient structure topermit cross-site aggregation of findings. The secondpossibility is that the diversity will be so great that itwould be impossible to have enough instances to


Chapter 3


meet needs for generalizability and still manage thedata collection and analysis.

Table 3.4: ProgramImplementation CaseStudies

Aspect examined Characteristic

Evaluationquestions

Descriptive, normative

Functions Learn what implementation has beenachieved, understand unexpectedaspects; understand reasons whyimplementation looks the way it does;useful when enabling legislation hasgiven flexibility

Design features Site selection cannot be conveniencebecause usually generalizationwanted, and purposive sample canbe typical and representative ofdiversity and best and worst cases;number of cases depends onprogram diversity sincegeneralization usually wanted; datarely on common instruments,published documents, andobservation; reports are varied intheme, site, chronology, and narration

Pitfalls Bias detection methods may beinadequate; may fail to take intoaccount diverse views aboutprogram goals and purposes;competence of all on-site observersmay not be sufficiently high; can becostly due to study size; thedemands of data management, dataquality control, validation procedures,and analytic model (within site, crosssite, etc.) may lead to cutting toomany corners to maintain quality

An important requirement for good programimplementation case studies is investment of enoughtime on site to get longitudinal data and to obtain


Chapter 3


breadth of information. If the purpose is to reportwhat is happening in a descriptive sense only, shortsite visits together with administrative records mayprovide adequate bases for findings. If, however, theevaluation question requires GAO to report on howsatisfactory progress is or the reasons for problems inimplementation, the more staff who can be on siteover time, with the richest or “thickest” base forexamining the situation as the many people involvedsee it, the sounder our causal conclusions andsubsequent recommendations will be.

The multiple sites usually required for programimplementation questions impose demands ontraining and supervision needed for quality control.Because of tight resources, lack of travel funds, andthe need to use staff with uneven experience andskills, this becomes critical in situations involvingmany evaluators working in different regions. That is,time is needed to train staff adequately in such casestudy techniques as the note-taking required for thickdescriptions, which is in turn required for the contentanalysis of themes in the instance. It is possible, forexample, for two persons to interview the sameinformant and find that one has used a one-sentencesummary for a detailed, rich, 5-minute discoursewhile the other captured much more of thecomplexity and essence of what was said and whatwas happening. Table 3.5 illustrates such a difference.


Chapter 3


Table 3.5: Illustration of Differences in Note-TakingSituation Technique Characteristic

In an interview with the Director ofthe National Science Foundationprogram for grants to smallcolleges, the following question isasked: “How does your programinform the eligible colleges of theopportunity to apply for grants?”

Rich notes “The Director indicated thatprocedures had changed threetimes since the inception of theprogram. In the first 4 years,announcements were mailed to theindividual named as president in thelisting, for the same year, of theAmerican Association of SmallColleges. Because applications werevery sparse, with about 30% ofeligible colleges applying, theprocedure was changed to atwo-stage mailing, first to thepresident to find out the name of theofficial in charge of federal programsand then to the official. This workedwell for a 5-year period, in terms ofreceipt of applications from over80% of the eligible colleges, butwhen overall federal funding forresearch was reduced, the positionsof federal program coordinatorswere abolished and applications fellto about 40% of eligible institutionsresponding. Two years ago, thedecision was made to mail copies tothe persons listed as chairs of therelevant science college inappropriate professional associationlistings. This has increased the costof outreach by about $15,000 orabout 25% more than the priorsystem. To date, returns are at the80% rate again.”

Thin notes “The current system is to mail copiesof the announcements to the chairsof relevant science departments,such as chemistry, biology, physics,and computer science.”


Chapter 3


Program Effects Case studies can determine the effects of programsand reasons for success (or failures). In 1982, forexample, RCED examined the progress made sincethe 1970’s in cleaning up the nation’s air, water, andland, finding that while strides had been made towardmeeting the established goals (cleaner air, properlytreated wastewater, more drinkable water), deadlineshad been extended and unresolved issues mademeeting even these deadlines difficult (U.S. GeneralAccounting Office, July 21, 1982). We pointed to lackof flexibility as a source of cascading problems anddelays. The bases for these conclusions were in-depthcase studies of three sites (Cleveland, Dallas, andNew York City) together with information fromreports prepared by six federal agencies and byenvironmental organizations and public interestgroups and interviews with Environmental ProtectionAgency officials. Particularly notablemethodologically in this report is the integration ofcase study findings with other sources of informationthroughout the first volume.

A PEMD report has focused on water quality: theeffectiveness of efforts to improve water quality andthe reasons for successes and failures. In-depth, veryextensive case studies of several water catchmentareas were conducted, and the final report is based ona synthesis of the findings from the casestudies—another example of integration of findingsacross diverse sites (U.S. General Accounting Office,December 17, 1986a, b; September 19, 1986). Thisseries of reports also is useful for illustrating the wayin which causality is established in case studies:through development of internally consistentexplanations of what led to what and theconscientious use of information from within the siteand from contrasting sites to rule out alternativeexplanations.


Chapter 3


For another example, to determine whether actionstaken by the states since the mid-1970’s to addressmedical malpractice insurance reduced insurancecosts, the number of claims filed, and the averageamount paid per claim, HRD conducted case studiesin six selected states (Arkansas, California, Florida,Indiana, New York, and California). Work includedobtaining views of organizations representingphysicians, hospitals, insurers, and lawyers onperceived problems, actions taken to deal with them,results of these actions, and the need for federalinvolvement. Other information came from surveys ofnonfederal hospitals about the sources, coveragelimits, and costs and claims from leading insurers ineach state and, for comparison, the same type ofinformation from a nationwide company. The resultsare presented separately in six case study reports andaggregated in the overall report (U.S. GeneralAccounting Office, December 31, 1986).

Other federal agencies have used the case studymethod successfully in answering program effectsquestions. The National Science Foundation, forexample, assessed the effectiveness of a cooperativescience program aimed at increasing innovation andknowledge transfer between university and industryresearchers. Ten case studies were undertaken of acarefully selected group of projects that ranged fromcomputer language systems through nuclear scienceto fisheries biology and chemical engineering. Of noteis the methodological detail given on projectselection, data collection, analysis, and case format.In a companion report, results from a survey of grantrecipients are analyzed, giving both a quantitative anda qualitative sense of how the program was working.Results from the two methods were not integrated;both suggested, however, that the program wasgenerally working well (National Science Foundation,1984).


Chapter 3


Table 3.6 summarizes key features of program effectscase studies. Like the program implementation casestudy, the evaluative question often requiresgeneralizability and, for a highly diverse program, itmay not be possible to answer the questionsadequately and still have a manageable number ofsites.

Table 3.6: Program Effects Case StudiesAspect examined Characteristic

Evaluation questions Cause and effect, can be stand alone or multimethodsand can be conducted before, during, or after othermethods

Functions Determine impact and give strong inference aboutreasons for effects

Design features Site selection depends on program diversity, cannot beused with highly diverse programs; best, worst,representative, typical, or cluster bases appropriate; mustkeep number of cases manageable or risk becomingminisurvey, can use survey before or after to checkgeneralizability or mix survey with concurrent casestudies selected for special purposes; data rely onobservation and structured materials, often combinequalitative and quantitative data; analysis uses varyingdegrees of formalization around emergent orpredetermined themes; reports are usually thematic anddescribe site differences and explain these; variation indegree of integration of data across sites and of findingsfrom different methods

Pitfalls Not collecting the right amount of data; not examining theright number of sites; insufficient supply of well-trainedevaluators; difficulties in giving evaluators enough datacollection latitude to obtaining insight without risking bias

There are some methodological solutions to thisproblem. One solution would be to conduct the casestudies first in a set of sites chosen forrepresentativeness and to verify the findings from thecase study through targeted examination of


Chapter 3


administrative data, prior reports, or a survey. Asecond solution would be to use these other methodsfirst. After identifying the findings of particularinterest, case studies would be conducted in sitesselected to maximize the ability to get the specificunderstanding required. Both of these approacheshave been used with good effect in programevaluation.

Cumulative This relatively new and not as yet widely usedapplication of case study methods brings together thefindings from case studies done at different times.The applications previously discussed that involvedmultisite case studies are cross-sectional: that is,information from several sites is collected at the sametime. In contrast, the cumulative case studyaggregates information from several sites collected atdifferent and even quite extended times.

The cumulative case study can be retrospective,aggregating information across studies done in thepast, or prospective, structuring a series ofinvestigations for different times in the future. Thetechniques for ensuring sufficient comparability andquality and for aggregating the information are whatconstitute the “cumulative” part of the methodology.

That is, the cumulative case study is similar to anevaluation synthesis, in that it is a method foraggregating the findings of several studies. It differsfrom an evaluation synthesis in that specialtechniques are required to aggregate the qualitativeinformation that often is a feature of case studies andto maintain the sense of the “instance as a whole” inits complexity that distinguishes case studies fromsurveys of several sites. For some jobs, both casestudy and noncase study reports can be aggregated,


Chapter 3


each using the appropriate techniques, in order toproduce capping reports or similar products.

GAO does not appear to have done a cumulative casestudy using our own case study reports or other casestudies. GAO reports have been used with goodresults, however, in cumulative case studiespublished by others outside GAO. One example is abook on bureaucratic failures, which is based entirelyon GAO reports of management problems in differentagencies over a considerable period of time (Pierce,1981). The author began with a set of hunches orhypotheses about what can go wrong in agencymanagement, and what would be evidencesupporting—or contradicting—these hypotheses. Hereviewed the GAO reports in detail, analyzed the datafrom each one in terms of his framework, andaggregated the results in his final chapter.

Other examples of cumulative case studies come fromtwo international agencies. A retrospectivecumulative case study was conducted by the WorldBank in its examination of four in-depth case studiesof the effectiveness of educational programs. Thesecase studies were intended initially as stand-aloneassessments of the programs but were broughttogether to learn about the effectiveness of theevaluations themselves in the context of educationalprograms (Searle, 1985). A prospective cumulativecase study was commissioned by the U.S. Agency forInternational Development. The purpose was toidentify input and process components of economicassistance that could be quantitatively associatedwith differences in outcome measures. The methodwas the specification of a common set of data (bothqualitative and quantitative) to be collected over a5-year period as projects were initiated, together witha means of coding the data across the 47 studieseventually completed. The coded results were


Chapter 3


analyzed quantitatively in the final report(Finsterbush, 1984).

Table 3.7: CumulativeCase Studies Aspect examined Characteristic

Evaluation questions Cause and effect

Functions Retrospective cumulationallows generalizationwithout cost and time ofconducting numerous newcase studies; prospectivecumulation also allowsgeneralization withoutunmanageably largenumbers of cases inprocess at any one time;strengthens inference fromnew studies by combiningwith results from olderstudies

Design features Uses site selection andusually a large number ofcases; data as reported(retrospective); usuallyon-site observation(prospective); backfilltechniques; analysis usescase survey method tocumulate findings; possibleto examine interactionsdirectly since number ofinstances is large; reportsmay resemble evaluationsyntheses

(continued)


Chapter 3


Aspect examined Characteristic

Pitfalls Publication basis mayseverely limitgeneralization; inadequateor uncertain quality oforiginal data; quality ofdata-reduction proceduresmay be very difficult todetermine; the effects ofchanges in manycontextual factors overtime may be difficult toseparate from effects of theprograms

Two features of the cumulative case study, shown intable 3.7, are the case survey method just described asa means of aggregating findings (Lucas, 1974; Yin andHeald, 1975; Yin et al., 1976) and backfill techniques(Berger, 1983). The latter are helpful in retrospectivecumulation as a means of obtaining information fromthe authors that permits an otherwise unusable casestudy to be included in the aggregation. Knowing thebasis on which the case instances were selected, forexample, is crucial in cumulation; otherwise it is notpossible to know whether best case, worst case,typical, or the like instances are being aggregated.Some published case studies do not provide sufficientdetail on this. In backfilling, the evaluator might callthe author, visit the author to review the original data,or contact others who were knowledgeable about thedesign decisions in order to get adequate informationon instance selection.

Opinion varies as to the credibility of cumulative casestudies for answering program implementation andeffects questions. One authority notes that publicationbiases may favor programs that seem to work, whichcould lead to a misleadingly positive view (Berger,1983). Other experts are concerned about the quality


Chapter 3


of the original data and analyses and problems inverifying their quality (Hoaglin et al., 1982; Yin, 1989).For the cumulative use of GAO reports, theseconcerns are less important, since we already use the“audit trail” procedures recommended in the policyand other manuals for verification of data collectionand analysis quality. We do, however, have theopposite concern: that is, we would need to be surethere was not “bad news” selectivity in a particulararea, associated with killing jobs that did not identifyproblems during scoping.

Table 3.8: Some Design Decisions in Case Study MethodsType of question

Design decisionIllustrative,exploratory Critical instance

Implementation,program effects,cumulative

Basis for site selection Typical,representative,cluster

Convenience,unique interestunique

Best-worst case,bracketing, typical,representative,cluster, probability

If multimethod Concurrent Concurrent Before, concurrent,after

Prestructuring Low, moderate Low, moderate Moderate, high

Type of data Qualitative only,qualitative-quantitative

Qualitative only,qualitative-quantitative,

Qualitative only,qualitative-quantitativequantitative only

Sequence of analysis Within sites, thenacross

Within sites, thenacross

Within sites, thenacross; across sites;then within;concurrent

Reporting Narrative, thematic Narrative, thematic Thematic


Chapter 3


Design Decisionsand Case StudyApplications

In earlier sections, we discussed seven bases forpurposive selection of instances and six applicationsof the case study method, each of which wasassociated with a different evaluation purpose orquestion. Bringing this information together, table 3.8shows the relations among case study applicationsand design decisions. For example, if the purpose ofthe study is illustrative, an appropriate basis for siteselection could be typical, representative, or cluster;the case studies would be conducted concurrentlywith other methods used in the main study;prestructuring or guidance to the evaluators in thefield would be low to moderate to permit thethickness and richness of insights needed; data couldbe qualitative only or both qualitative andquantitative; the case studies probably would beanalyzed within sites only; and the reporting wouldprobably be narrative.


Chapter 4

Data Collection and Analysis

We have said that the features distinguishing casestudies from other methods are how sites areselected, how the data are collected, and how they areanalyzed. In the last chapter, we covered instanceselection. We turn now to other elements thatdistinguish a case study from a not-case study and agood case study from a not-good case study. Thediscussion is an introduction to the approaches.

Data Collection In other transfer papers on program evaluation, wehave emphasized the importance of validity. Validityinvolves measurement and also design. A validmeasure—that is, one with constructvalidity—reflects what it claims to reflect and notsomething else. For example, whether or not thereare active opposition parties may be a more validmeasure of whether a country is a democracy thanhow many people vote in an election. A validcause-and-effect design—that is, one with internalvalidity—rules out alternative explanations of resultsby comparing what happened with an intervention towhat happened in the absence of the intervention. Forexample, in a study of the effects of an employmenttraining program, greater employment of participantsafter the training than before must be shown to bedue to the training and not simply to better economicconditions, which also could increase employment.

MeasurementValidity

Case study methods can use two tactics for achievingmeasurement validity: multiple sources of evidenceand using the chain-of-evidence technique in datareduction.

Multiple Sources ofEvidence

Turning first to multiple data sources: case studiesrequire “thick” description in order to get enoughinformation to check for trends, to rule out competing


Chapter 4


explanations, and to corroborate findings. Eighttechniques are used—sometimes all of them in thesame study—to collect information (Neustadt andFineberg, 1978; Yin, 1989).

1. Collect physical articles.

2. Collect documents such as contracts, memos, andreports.

3. Examine archives such as lists of persons served,computerized order records.

4. Conduct open-ended interviews.

5. Conduct focused interviews.

6. Conduct structured interviews and surveys.

7. Undertake direct observations.

8. Carry out participant observations.

Many of the eight techniques are discussed in theGeneral Policy Manual, chapter 8.0. Of these ways,the approaches that most differentiate case studiesfrom other techniques are direct observation andparticipant observation.

GAO has used both approaches in its jobs. Forexample, in NSIAD’s study of conditions onsubmarines, auditors spent time aboard submarines ina variety of situations, getting firsthand knowledge oflife in these vessels. Their direct observations formthe primary data source for our report. We went tosea in this instance, however, in our GAO role, asauditors and evaluators and so—it could beargued—might have seen what special guests see andnot what life would be like for the average sailor.


Chapter 4


To get more authentic information, evaluators havesometimes become participants in situations, notidentified to the other persons involved as GAO staff.One example of how we have adapted thisparticipant-observer approach was in GGD’s study ofthe services available to taxpayers from IRS after IRSreduced the number of public information agents(U.S. General Accounting Office, April 5, 1984). Wedeveloped a set of standard income tax questionsabout which citizens typically would call IRS,obtained IRS agreement on the correct answers tothese questions, and then, on a probabilistic samplingbasis, called IRS offices around the country to seekhelp. We used names such as Gerald A. Office in theseconversations but did not say we were from GAO. Wewere able to report how long it took to get the phoneanswered, how long it took to get information, theconsistency of information, and general helpfulness ofthe responding agent. Such an approach gave moreauthentic information than relying only on IRSrecords of calls received, or a survey of taxpayers. Inthe first instance, IRS would have no record of timebefore the person could get through to an agent andof “discouraged callers.” In the second, a survey oftaxpayers would have to be very large to get a good“hit” rate of individuals who sought assistance, andthe diversity of individual questions would haveblurred ability to interpret variation in IRSresponsiveness. HRD used a similar approach inreviewing the Social Security Administration’stelephone inquiry program; over 4,000 calls weremade, with GAO personnel taking the role of ordinarycitizens in asking the randomly selected, preparedquestions (U.S. General Accounting Office, August 29,1986).

One element of data collection that distinguishes casestudies from other techniques is thatcomprehensiveness of interviewing is very important.


Chapter 4


In order to learn the meaning of events to thoseinvolved in them, a key element of case studies, theviews of more senior officials are not given greaterweight than views of less highly placed persons. Infact, a case study where the only people interviewedwere senior officials would be seen as a not-good casestudy, in contrast to one where the views ofindividuals at all levels affected was obtained.

For example, if we wanted to learn about hownoncompetitive awards were reviewed in an agency, agood case study would obtain information from theagency head, the head of the procurement division,the inspector general’s office, the contracts officerresponsible for selected awards, staff involved in thereviews for these awards, counterpart persons fromthe contractors’ procurement and program operationsstaff, and the legal divisions within the agency and thecontractors. We might shadow severalnoncompetitive procurements, following their lifehistory from initiation through actual awards, sittingin on meetings, and studying, over time, how theawards were handled.

Chain of Evidence A chain of evidence is the sequence from observationto conclusions. In a strong chain of evidence, anindependent second evaluator could follow the firstevaluator from original observations, the “raw” orunreduced data, through all the steps of dataaggregation and analysis, and conclude that the firstevaluator’s findings were justified by the evidence andfairly represented it. This requires carefulorganization of the files of original observations,complete documentation of the conditions of datacollection that are relevant to the trustworthiness andcredibility of the information, and making transparentand reproducible the manner in which the evaluatormoved from phase to phase of the analysis. Some


Chapter 4


evaluators call such a procedure “building an audittrail” and use procedures similar to indexing andreferencing to establish both the construct validity ofthe measures reported and the convincingness of thecausal explanations developed in the case study(Halpern, 1983). That is, they have an independentevaluator review the equivalent of their workpapersrather than providing so much detail in the reportitself that a reader can come to the same conclusion.

Some information in a case study is likely to bejudgmental, particularly when observer andparticipant-observer modes of data collection areused. And the collection process involves judgmentcalls of promising leads and the meaning of initialinformation. While documenting the basis forjudgments can be more difficult than documentingnonjudgmental information, overall the chain ofevidence or audit trail techniques should not pose anygreater difficulty for GAO evaluators than ourdocumentation procedures for other evaluationmethods.

Data Analysis Case studies, obviously, can generate a great deal ofdata, data that need to be analyzed sufficiently andwith appropriate techniques in order to be useful.Much is qualitative. As table 4.1 indicates, there aresix general features of data analysis. Four areessential to case study methods: iteration, OTTR,triangulation, and ruling out rival explanations.

A unique feature of case studies is that data collectionand analysis are concurrent. In most methods, weplan for data collection, then we collect theinformation, then we analyze it, and then we write thereport. In case studies, the data coming in areanalyzed as they become available, and the emergingresults are used to shape the next set of observations.


Chapter 4


The sequence in which this takes place is the OTTR,which stands for “observe, think, test, revise.” Afterobservations have been made in the first phase (andduring the observations, because that is a natural wayfor our minds to work), the evaluators think about themeaning of the information: what does it suggestabout what is happening and why? What else couldexplain what is going on? The


Chapter 4


Table 4.1: Ways ofAnalyzing Case StudyData

Feature Methodology

Iterative Data collection andconcurrent analysis

OTTR Observe, think, test, andrevise

Triangulation Comparison of multiple,independent sources ofevidence before decidingthere is a finding

Rival explanations Developing alternativeinterpretations of findingsand testing through searchfor confirming anddisconfirming evidenceuntil one hypothesis isconfirmed and others ruledout

Reproducibility of findings Establish through analysisof multiple sites and dataover time

Plausible and complete Data analysis ends when aplausible explanation hasbeen developed,considering completely allthe evidence

Specific techniques forhandling multisite data sets

Matrix of categories,graphic data displays,tabulating frequency ofdifferent events,developing complextabulations to check forrelationships, and orderinginformation chronologicallyfor time series analysis

second, or “think,” phase ends with specification ofwhat new information would be needed to rule outalternative explanations or confirm interpretations.This triggers the third phase: test. In this phase, the


Chapter 4


evaluator collects more information, as required bythe specifications from the “think” cycle. The datacollected in the third phase are not specified beforethe first phase: they emerge, often with surprises,from the initial observations. The fourth phase isexamination of the second round of data collectionand a revision of initial interpretations andexpectations—the “revise” phase. The revise phasemay lead to another test phase, if information fromthe second round of data collection was insufficientto rule out alternatives, or if, during revision, newinterpretations emerged. This iterative process endswhen a plausible explanation has been developedand, at the end of a “revise” phase, there are no outlieror unexplained data, no further interpretationspossible, or it is clear that despite the most diligentsearch for information, more is not available tofurther refine description and explanation.

In case study methods, causality is establishedthrough the internal consistency and plausibility ofexplanation, derived additively through the OTTRsequence. This is in considerable contrast to otherevaluation methods, where control and comparisongroups are used subtractively to rule out otherreasons for a finding and establish firm attribution.

HandlingMultisite DataSets

Several techniques have been developed recently forhandling multisite case study data sets. These includesetting up a matrix of categories, graphic datadisplays, tabulating frequencies, developingcross-tabulations, and time series analysis.

Matrix of Categories In this technique, a coding scheme is developed priorto data collection. It is modified during datacollection and the OTTR process and finalized afterthe evaluation team has read through all the case


Chapter 4


materials. The categories are related to the evaluationsubquestions; for example, if a subquestion was “Howdoes the Immigration and Naturalization Servicemonitor the conditions of confinement in privatelycontracted detention facilities,” coding categoriesmight include who is responsible, how these personsget information, what they do with informationreceived, evidence that minimum standards are met,evidence of shortfalls, changes over time inmonitoring, and conflicting guidance orresponsibilities. These categories might be put into amatrix by facility size or groups served. The approachis similar to content analysis, and the PEMD transferpaper on content analysis gives further how-toinformation (U.S. General Accounting Office,June 1982).

Graphic DataDisplays

This is a family of techniques, some of which havebeen adapted for computers and some of which usewall-space. The evaluators immerse themselves ininformation on a site, following OTTR. Their initialstory of what is happening and why is displayed as aflowchart with a series of critical paths for action.Evidence supporting the story is arrayed in thedisplay. The materials then are searched forcounterevidence and subsidiary or branching pathsare laid out. As a satisfactory graphic is developed forone site, the evaluators turn to the next site. Theevaluators could at this point either modify the firstgraphic, based on information from the second site,or prepare an independent flowchart. In the secondapproach, aggregation would come after all the siteshad been charted, and the charts would be used as thedata base for aggregation.

The graphic techniques can be applied to an instanceas a whole or to subcomponents. For example, if ananalysis of life-threatening or fatal incidents at


Chapter 4


national parks were needed, the evaluators mightdevelop separate graphics for events leading up to theincidents, the incidents themselves, and postincidentactions. More complex case studies might needseveral “layers” or graphics; less complex, few.

Tabulating EventFrequencies

Another technique for analyzing multisite case data isidentifying events within each case study (“meetingbetween Jones and Smith”; “Smith staff preparesrecommendations”) and tabulating their frequency ofoccurrence. Such a simple tabulation can draw theevaluator’s attention to events that may be significantor to informal networks and give a sense of actual (ascontrasted to on-paper) organizational relationships.Divergences between observed and expected patternscan be examined further to see what happens as aresult of these meetings and identify potentialproblem nodes: for example, when an expectedhigh-communication node turns out to be, relativelyspeaking, a low-communication spot.

ComplexTabulations

Cross-tabulations of events can identify interactionsand check the developing story more formally. Forexample, service coordination is a popular remedy forlimited funds. An evaluator in the field may observethat coordination among local agencies fundedthrough the same federal agency is more frequentthan coordination among local agencies funded bydifferent federal departments. Tabulations of actualmeetings and of consequent actions for same-agencyfunded and different-agency funded services can helpcheck out whether this impression is reliable.

Time Series Analysis Organization of information within each site by timeof occurrence, coupled with a systematic analysis ofcontextual influences on events, permits a


Chapter 4


nonquantitative time series analysis for case studydata. The flow of events over time for each significantactor and for significant points in the series of eventsforms the organizing framework for data analysiswithin each site. Such comparisons of when keyactions occurred, how well (or poorly) they werecarried out, and what influenced both timing andquality of performance can be particularly helpful incase studies of program implementation.

In some instances, only one component of a casestudy may be analyzed in this way. For example, acase study of the effectiveness of a job trainingprogram might need to take into account generaleconomic trends, such as unemployment rates in thecommunity. A time series comparing localunemployment rates with placement rates for jobtraining program participants could be computedquantitatively and changes interpreted through themore qualitative time series data about the program.

Basic Models forData Analysis

Two basic models of data analysis are patternmatching and explanation building. Pattern matchingrequires using past experience, logic, or theory beforethe job begins to specify what we expect to find. Theanalysis then compares actual findings toexpectations. When the findings fit, the pattern isconfirmed. When the findings don’t fit, the evaluatoradjusts the expectations or elaborates them, buildinga subroutine that can explain the unexpectedfindings. Explanation building is the inverseprocedure: starting with the observations, theevaluator develops a picture of what is happening andwhy. Data are used to fill in the initial hunches, tochange them, to elaborate on them. The first strategymatches findings to hypotheses or assumptions. Thesecond uses the data to structure the hypotheses orassumptions.


Chapter 4


In either strategy, the evaluator needs to search thefull data base thoroughly for disconfirming evidence,in order to avoid the pitfall of premature conclusionsand data analysis ends when the best fit possible hasbeen reached between the observations and astatement about what they mean.

In either strategy, expectations and explanations canbe expressed as themes: a job dealing with bankfailures, for example, might have as themes decisionsabout credit risks, procedures for reviewingdecisions, or controls over the accuracy and recencyof information on bank solvency. A job dealing withemployee training might have as themes decisionsabout training needs, how employees are selected fortraining, how course quality is monitored, or howemployees and supervisors view the purpose oftraining.

Themes, in turn, can be analyzed within individualsites first, then findings on each theme aggregatedacross sites. Alternatively, all themes within one sitecan be analyzed first; then data from the second (andsubsequent) sites can be examined. Theme analysisalso can proceed in matrix fashion. On the PEMDAFDC study, for example, evaluators were assignedas site managers, responsible for understandingacross themes all there was to know about the issuesfor their site. They also were assigned to individualthemes, such as health and employment, responsibleconcurrently for looking across all sites forinformation on their topic. This organization provedhelpful in ensuring that reasons why a site showed upas an outlier for a given theme could be discussed bysomeone who knew the site as a whole.

Pitfalls andBooby Traps

Case study methods, like any other method, offerplenty of opportunity to go awry. Two frequent


Chapter 4


concerns are the risks in using other people’s studiesand in generalizability.

Impartiality The biggest risk when we use other people’s casestudies is that GAO standards of impartiality may nothave been met. There are three meanings ofimpartiality, one of which does not create problems.Case studies use as data the impressions andjudgments of the evaluator, which are inherentlysubjective. For a case study methodologist and forGAO, if proper care is taken, this should not be aproblem. If we want to illustrate, for example,working conditions for immigrant laborers, we canreport what the thermometers registered and we canalso report, firsthand, how people were sweating andwhat it felt like to be out in the fields. Suchobservation is part of the richness, immediacy, and“thick” description of a case study. However, casestudies, like any other method GAO uses, have tomeet two other criteria of impartiality: accuracy andlack of bias, in the sense that the evaluator’s personal,preconceived opinions about a situation do notdistort reporting and that the evaluator isscrupulously evenhanded in examining all sides of asituation.

Some authorities on evaluation methods believe thatcase studies reflect the author’s values in ways thatcan be difficult to detect. Other experts conclude thatthree actions, taken together, are sufficientsafeguards for lack of bias and adequate accuracy.These are (1) submitting reports to people fromwhom data were collected and printing their critiqueswith the report, (2) use of multiple data collectionmethods within case studies, and (3) adoption of theaudit trail or chain-of-evidence technique. Adequatesupervisory controls also are recommended.Complying with these safeguards should give us no


Chapter 4


major problems in our own jobs. The guidance wouldmainly expand the range of reviewers. We alreadyconduct exit conferences and, following the “YellowBook” and Communications Manual, submit draftreports for agency comments. We often use multiplemethods, and the audit trail technique nowrecommended for case study use was itself adoptedfrom such auditing procedures as workpapers andreferencing, which are standard practice with GAO.We also require adequate supervisory control throughsuch means as prompt review of workpapers. Wewould need to assure ourselves, however, that casestudies whose results we are going to use haveadopted the same procedures for ensuringimpartiality. (Appendix III gives a checklist forreviewing proposed or completed case studies forquality.)

Generalizability We often are asked questions where the customerwants in-depth information that is nationallygeneralizable, but frequently the issue may not yet beripe for a national study or we do not have theresources to collect in-depth data from nationallyrepresentative samples. Using 4, 10, or 15 sites as casestudies might be feasible, but we would still need tobe concerned about the risks in generalizability. Amain point of this paper is that generalizabilitydepends less on the number of sites and more on theright match between the purpose of the study andhow the instances were selected, taking into accountthe diversity of the programs.

An example of an efficient combination of carefulspecification of the purpose of the study matchedwith appropriate site selection is the GGD study ofthe productivity of the Social SecurityAdministration’s (SSA’s) regional operations. Thisreview examined in depth only one SSA region (U.S.


Chapter 4


General Accounting Office, September 11, 1985).Atlanta was selected because it had the bestproductivity among the 10 regions; if GAO coulddemonstrate opportunities for improvement in themost productive SSA region, then similarimprovements might be possible in the lessproductive regions. Following the case study, aninexpensive (25 staff day) check was made onproductivity data and trends from other SSA regions,and similarities were noted. While other problemsmight be affecting these less productive regions, thefindings from the single site plus the trends were soconvincing that SSA concluded the single instanceexamination had national implications. Subsequentanalysis of regional office plans for productivityimprovement led to the conclusion that theirimplementation could save about $60 millionannually.

Sometimes, however, it is not possible to answer theevaluation question using case studies, if the programis diverse and the user needs national generalizability.The user may prefer to sacrifice in-depth informationfor generalizability and we will have to use othermethods, such as surveys or secondary analysis ofexisting data. However, it often is possible—withappropriate instance selection—to obtain adequategeneralizability with a manageable number ofinstances. In addition, the evaluator can apply thecase survey method to increase the generalizability offindings and can combine case studies with othermethods. Taken together, these strategies can permitthe use of the case study technique with enoughgeneralizability for many users’ purposes. That is, forthe first three types of case studies (illustrative,exploratory, and critical instances) generalizability, ifneeded, cannot be achieved unless they are combinedwith other methods. Generalizability can be achievedfor the three other types, however, even when they


Chapter 4


are used independently, as long as they are carefullydesigned in terms of case selection and analyticstrategies.

Where to Go forMore Information

More detail on data collection and analysis can befound in two books on case study methods: CaseStudy Research by Yin (1989) and AnalyzingQualitative Data by Miles and Huberman (1984). Moredetail on applicable GAO guidance can be found inthe Communications Manual.


Chapter 5

Summary

We can summarize this paper in the answers to threequestions: What are case studies? When are theyappropriately used in evaluation? What distinguishesa good case study from a not-good case study?

What Are CaseStudies?

The case study is a method of learning about acomplex instance, based on a comprehensiveunderstanding of that instance obtained by extensivedescription and analysis of the instance taken as awhole and in its context. Applying this definitionmeans learning virtually everything about the instancebeing studied, including how it operates and what itdoes, in relation to the extrinsic or contextual eventsit is part of.

Case studies often use one or only a few instances,because collecting and analyzing comprehensive dataare prohibitively difficult for large numbers of sites.However, not all studies of a small number ofinstances are case studies. Some studies collect datafrom a small number of sites but have no otherfeatures in common with case studies and offer noneof their advantages. Thus, the fact that a studyinvolves only one or a few sites does notautomatically make it a case study. For example, theevaluators may not have selected the sitesappropriately for the generalizability needed or theymay have collected minimal information with littledepth of inquiry.

When Are CaseStudiesAppropriatelyUsed inEvaluation?

We discussed six types of case study that differconsiderably in their requirements for site selection,data collection, and analysis, among other things. Thesix types are illustrative, critical instance,exploratory, program implementation, programeffects, and cumulative. Together, they cover a widerange of evaluation questions, although clearly not all


Chapter 5

Summary

evaluation questions. For example, case studies arenot well suited for answering the question, How oftendoes something happen?

Some applications of the case study to evaluationpurposes have been tried fairly extensively—forexample, program implementation case studies.Others are relatively untried—for example,cumulative case studies. The latter is a particularlypromising method for GAO, because it can capitalizeon the large number of case-study-like reports thatare available, on the quality of the documentation thatsupports their findings, and on the generalmethodological framework that GAO’s standardsprovide. We have not stressed, in our analysis, thecosts, feasibility, and timeliness of case studies, sincethese are management criteria that are considered inall designs rather than issues of particular concern tocase studies. However, the implications of the designfeatures discussed here are that, contrary to whatmany people think, the case study is not necessarilyinexpensive, easy to conduct, or quick. It may requirein-depth data collection dependent on sensitivity tothe setting that takes time to acquire and involveextended periods for data analysis, interpretation, andreporting.

WhatDistinguishes aGood From aNot-Good CaseStudy?

We have addressed quality in two ways. One isprospective and intended to help those who planevaluation to know the minimum features of thevarious case study applications. The other isretrospective and intended to help those who reviewcase study reports to assess the quality of completedcase studies. Table 5.1 summarizes common pitfallsthat we have mentioned throughout this paper.


Chapter 5

Summary

Table 5.1: Some CommonPitfalls in Case StudyEvaluation

Study stage Common pitfall

Design Mismatch between criteriafor the specific job andwhat the case studyapplication can do;insufficient attention tocontrasts and comparisonsneeded for purposes of thestudy

Site selection More sites selected thanneeded; fewer sitesselected than needed;inappropriate basis for siteselection, for the particularjob and evaluation question

Data collection Reliability jeopardized bylack of common guidancein data collection; findingsnoncomparable; lack ofquality control in datacollector roles andresponsibilities: impartialitythreatened; overly looserelationship between datacollected and theevaluation question;inadequacy of information

Data analysis Insufficient attention torequirements of analyticplan chosen: lowplausibility of results;insufficient attention tomanagement and datareduction: inefficiency,lateness, incomplete use ofdata; inadequate methodsof relating findings acrosssites; inadequate methodsfor relating qualitative andquantitative data withinsites

(continued)


Chapter 5

Summary

Study stage Common pitfall

Reporting Overgeneralization,compared to actual basisfor site selection, numberof sites studied, andrequirements for inferencein the design; inadequateinterpretation, unintegratednarrative, results notadequately related to userquestions; inadequateattention to threats toimpartiality and the extentto which these have beenavoided

Quality andEvaluation Design:Planning

We have presented six types of case study evaluationsand for each one described features such as numberof sites, site selection, data collection, data analysis,and reporting. Our descriptions represent a “floor” ofquality for each evaluation application. The featuresof the six types of case study are not interchangeable.That is, the features of a case study that areappropriate for answering one kind of evaluationquestion are not necessarily appropriate foranswering another kind of evaluation question.Evaluators considering the case study as a design forevaluation must first decide what type of evaluationquestion their specific question is and then examinethe strengths and limitations of each type of casestudy for answering it. The crucial next step is to lookat the features of each type and decide whether it willbe possible to meet these methodologicalrequirements in the specific situation.

For example, the basis on which instances can beselected differs for the different case studyapplications. Usually, an illustrative case study siteshould be typical of the program being examined


Chapter 5

Summary

while exploratory case study sites should bracket thediversity that is likely to be encountered in theprogram, population, and setting of a larger study.Usually, sites for program effects case studies shouldbe selected with great care for criteria such aswhether there is evidence that the program has beenimplemented at the site, whether the site has beensubjected to changes that could have the same effectsas the program or that could mask its effects, andhow the addition of this site to the group of sitesbeing studied supports the generalizability of thefindings.

Quality ofEvaluation Design:Reviewing

Turning to ways of assessing the quality of completedcase studies, we have provided guidelines forreviewing case study reports in appendix III. Theseguidelines are intended to apply to all types of casestudy applications.

On matters of design, the guidelines discuss theclarity of issues, the relationship of the evaluationquestion to the case study application selected, thebasis for case study selection, and the time span ofthe study. The data-collection guidelines emphasizeappropriateness of data-collection methods, evaluatortraining, and information sources. Guidelines fordata-base formation and analysis deal withexplicitness of procedures and techniques,interpretation differences, and the relationship of thefindings to those of similar studies. With regard toreporting, the guidelines emphasize constraints on thestudy, arguments for and against various resolutionsof the issues, and the role of judgment in reachingconclusions. With regard to impartiality andgeneralizability, the guidelines emphasize that a goodcase study report (or, for GAO purposes, jobdocumentation) describes both the evaluators’training and work on related studies, presents


Chapter 5

Summary

comments on the draft report, and supplies adequateinformation for judging generalizability. Reviewerswill need to refer in addition to the features of eachspecial type of case study application forsupplementary guidance on what to look for inindividual case studies.

Impartiality andGeneralizability

Partiality and—in some instances—the inability togeneralize from the findings can limit the utility ofcase study methods for evaluation. There are threemain threats to impartiality: subjectivity, inaccuracy,and bias. The case study method inherently requiressubjective and judgmental elements. When properprocedural safeguards are used, these elements alonedo not diminish the value of case study methods.However, inaccuracy and bias are unacceptable inany case study. Some ways of detecting andpreventing bias, such as the audit trail, have been welldeveloped. Their applicability to case studyevaluations outside of settings such as GAO is beingexplored.

Many evaluation questions do not require a highdegree of generalizability. Certain case studyapplications provide high degrees of generalizabilitywith small numbers of instances. When both broadgeneralizations and in-depth understanding arerequired, designs that cumulate case studies over awide number of sites and settings, or that combinecase study methods and other methods in oneconcurrent effort may meet this dual need (U.S.General Accounting Office, April 2, 1984). However,the diversity of the population to which generalizationis required is a limiting factor in case studyapplications. It is also true that without carefulattention to standards, case studies are prone to thekind of overgeneralization that comes from selectinga few instances, assuming without evidence that they


Chapter 5

Summary

are typical or representative of the population andthen offering national projections. The GAO ProjectManual cautions against overgeneralization from anymethod. For case studies in particular, there must bean empirical basis for instance selection andassurance of adequate population homogeneity.


Appendix I

Theory and History

As a research method, the case study originates in thesocial sciences, particularly in the fieldwork ofanthropology and sociology. Within these disciplines,researchers have defined the case study anddiscussed its critical elements in a variety of ways. Ingeneral, six elements appear frequently: purpose, typeof data collected, method of data collection, design,method of data analysis, and reporting.

Purpose The research case study has been defined as a methodfor learning the “right” questions to ask (Hoaglin etal., 1982). That is, the purpose of case studies is saidby some researchers to be to generate hypothesesrather than to test or confirm them. The methodinvolves an in-depth, longitudinal examination of asingle instance. The product is a sharpenedunderstanding of what might be important to look atfurther in similar situations and what explains whythe instance happened as it did. Because such inquiryexplores only one situation, it is argued that it cannotcontribute directly to the testing of generalpropositions, although it can contribute powerfully tothe invention of hypotheses.

Some other methods have a similar purpose.Exploratory analyses of large data bases are oftenused to select a smaller number of variables foradditional testing, on the basis of interesting patternsthat emerged from various combinations of theelements of the large data base. Promisingrelationships are singled out and those that seemuninteresting are set aside. Like findings from casestudies, the result is considered as contributing notanswers but a better understanding of what questionsto ask and how to ask them.

An analogy might be drawn also to “evaluabilityassessment.” Such assessment may provide


Appendix I

Theory and History

information valuable in itself about how completely aprogram has been implemented. It is undertaken priorto testing the effects of the program, chiefly, however,as an indicator of appropriate evaluation design.

Other researchers regard case studies as not only adifferent way of knowing but as a better way (Rist,1981a; 1982a). More specifically, they emphasize themethod’s unique value in many complex situations ofimportance, such as studies of school desegregation,or economically distressed communities, or theChallenger tragedy. One reason they give is thatskilled observers and interviewers can makejudgments and valuations about factors that areotherwise very difficult to assess, such as how mucheffort a manager made to get information before a keydecision was made or how much that person knewabout what was going on. Also, these researchersbelieve, in complex situations the many persons whoare significantly involved have different “realities” intheir explanation of events and even in theirperceptions of what happened, and this is bestmatched with a method that gradually represents andreconstructs these multiple realities, rather than amethod that assumes a single “truth” exists.

As seen by such scholars,

“there is no single reality on which inquiry may converge, but ratherthere are multiple realities that are socially constructed, and that,when known more fully, tend to produce diverging reality. Thesemultiple and constructed realities can not be studied in pieces (asvariables, for example), but only holistically, since the pieces areinterrelated in such a way as to influence all other pieces.Moreover, the pieces themselves are sharply influenced by thenature of the immediate context.”

Methodologists who focus on case studies expresstheir criteria of good research in different language,although they may deal with underlying concerns


Appendix I

Theory and History

similar to those of researchers from moreexperiment-oriented traditions. Some criteria,however, are seen as unique to case studies andqualitative approaches. One example is fairness orassurance that a study has presented a balanced viewof the many constructions of reality and the valuesunderlying these. Another example is authenticity;another, realism. Each of these criteria is associated,in the literature on case study methods, withperformance standards such as triangulation believeduseful in ensuring�—if they are carried out�—thatthe study will be a good one.

Table I.1: Criteria of GoodResearch Case study Other approaches

Truth value;trustworthiness;credibility

Internal validity

Applicability;transferrability

External validity; generalizability

Confirmability ofdata

Objectivity of observer

Consistency,dependability of ofdata; explainableinstabilities

Replicability; stability reliability of data

Type of Data In some textbooks on evaluation, case studies aresynonymous with qualitative data—that is, data thatare subjective or judgmental. Such data includenarratives of events written by participant observers,accounts of what the participants understood aboutan event, reports of what was said at a meeting or aninterview, observational records of how an event tookplace, and statements of impressions about what wasgoing on, why it was happening, and how people feltabout it, themselves, or each other.


Appendix I

Theory and History

To illustrate differences among types of information,we might base the conclusion that “the day was hot”on data from an instrument that records the roomtemperature (numerical and objective), a record ofthe atmospheric temperature as written down by anobserver checking a thermometer (numerical andrelatively nonsubjective), a survey asking people howhot they felt (nonnumerical and subjective), and a“thick” description of what clothes people werewearing, how much they perspired or shivered,whether they turned up the furnace or the airconditioner, and how much energy they seemed tohave for work (nonnumerical and judgmental). Whenresearchers describe case studies as using qualitativedata, they usually mean the “thick” description. If theevaluation question involved an understanding ofworking conditions for migrant laborers or workers inheavy industry, a thick description, even includinginformation on how exhausted the evaluator felt inthe heat, would be more appropriate—according tosome case study methodologists—than only recordingthat the thermometer registered 95 degrees.

Suppose we needed to know about the availability ofhousing for low-income people. If official recordswere adequate, good quantitative measures ofavailability might be the number of low-incomepersons applying for housing relative to the numberof units that met minimum standards and cost within30 percent of household income or the number ofpersons on waiting lists for such housing and howlong they had to wait. We might also be able to reportthe number of applications for housing constructionpermits and how many units suitable for low-incomehousing were coming on the market within 12months. Often, however, the records are notadequate. Here we might rely on qualitativeinformation, such as the estimates of knowledgeableofficials of demand and supply (judgmental,


Appendix I

Theory and History

numerical) or of severity of the problem (judgmental,nonnumerical). We might also interview selectedlow-income families with regard to their experiencein seeking housing or we might, asparticipant-observers, pose as low-income applicantsand report our own experiences in finding housing forfamilies of different sizes and within differentpayment ranges (judgmental, numerical, andnonnumerical).

Many researchers who write case studies usequalitative data because they believe them to bericher, more insightful, and more flexible thanquantitative data. They believe that the meaning of anevent is more likely to be caught in the qualitative netthan on the quantitative hook. For example,qualitative data permit dealing fairly directly withvalues, politics, and factors that may be an importantpart of many situations. A frequency distribution ofevents—such as a table showing the number ofdecision points in a community economicdevelopment program and a decrease in theprobability of action as the number of decision pointsincreases—are about as numerical as qualitative dataare likely to be in a research case study, according tosome experts.

Method of DataCollection

To some researchers, case studies are synonymouswith methods of data collection deriving fromanthropology, psychology, and sociology. Thetechniques include fieldwork, ethnography,observation, and participant observation and have incommon that an observer is physically present at asite, stays at the site for a fairly long time, hasflexibility in deciding what data to collect from whomand under what circumstances, and can organize theinquiry according to the meaning of events to theparticipants rather than having to decide beforehand


Appendix I

Theory and History

on a closed set of constructs or data elements. Inmost instances, the observer is the senior investigatorand the only researcher: Margaret Mead in Samoa andOscar Lewis in Puerto Rico are famous examples.

The greatest difference, to some experts, betweenother methods and case studies is the distinctionbetween the researcher’s (1) beginning by presuming,a priori, to know the relevant constructs andvariables, measuring their incidence, and finding outhow changes in them may be influenced by otherevents and (2) entering into an event to learn what issignificant in it to the participants. As this implies, toresearchers, the case study is an intensely personalmethod, dependent on the investigator’s sensitivity,insights, and skill in noticing many things, recordingthem, and producing a narrative that suggests apattern of the elements—or that recognizes thepattern that is there in the culture in its own terms. Itis a demanding method, requiring specific skills (suchas fluency in the language of the participants) andgeneral self-awareness to maintain the fine balancebetween seeing things as others see them andidentifying their perspective wholly with one’s own.The researcher must weigh the value of experiencingwhat it is like to be part of the culture against thehazard of internalizing the experience too fully, whichcan jeopardize the capacity to see the culture frommany perspectives. Nonetheless, some of the bestreports have come from observers who entered asfully as possible as participants in the event beinginvestigated.

The case study method is further distinguished by theresearcher’s self-conscious effort to understand whatthe observed events mean to the participants. Noobserver can enter a scene without preconceivedideas, but they can be set aside. Thus, a study of howa group is organized economically might begin with


Appendix I

Theory and History

finding out what is valued in that group and howitems of value are exchanged. They might not begoods or services, and exchanges might not be equal.

For example, in a basic research study of 40low-income women, Belle and her colleagues lived formany months among them as observers, confidantes,and friends, listening to what they said and notingwhat they did. The researchers found that turning tosomeone for even modest help (like minding a childfor an hour) had the cost of later demands for a returnof the favor and that this cost was nearly intolerable.The researchers found expected stresses like the lossof a check in the mail and the illness of loved ones.They found also, unexpectedly, that any change at allwas stressful: being promoted to a higher-paying job,the graduation of a child from school, falling in love,even the restoration to health of a loved one who hadbeen ill. In terms of the purposes of thestudy—finding out what was stressful to the womenand why the incidence of mental health problemsamong them was so high—the case study methoddisclosed the importance of any change in lifecircumstances as a source of stress rather thanmerely confirming change that the observers mighthave thought stressful a priori.

Design Case study methods have been defined by someresearchers as designs that focus on a single instanceor a few instances. They also are identified withdesigns that are nonexperimental in the sense that theinvestigator is not deliberately manipulating somevariable to see its possible effects on the system beingstudied. Two classic aims of inquiry are to understandthe nature of events and to understand their causes.Since case study designs center on one or a few casesand lack the controls usually thought necessary to anunderstanding of causal relationships, knowledge that


Appendix I

Theory and History

results from case studies is controversial with regardto generalizability and causality.

With regard to generalizability, some methodologistssee case studies as above all particular, seeking todescribe and understand the aspects of an instancewithout much concern for knowing whether theyarise in or are characteristic of a larger population.The focus is on this school, this emergency room, thismilitary base, or this nuclear power plant.Researchers can choose relatively freely whichinstance to study on any one of several bases,depending on the questions to be examined.

Thus, in a case study design, an instance may beselected because it is a unique event of nationalinterest, such as the Cuban missile crisis or thedistribution of the swine flu vaccine, both subjects ofresearch case studies. The instance may be selectedbecause it has been affected by events of interest,such as the desegregation of schools. It may beselected as an unusually effective or unusuallyineffective instance. However, whenever the purposeis an understanding of the particular, the relationshipof the instance to the various populations that it ispart of is less important than the assurance that theselected instance can be fully examined.

With regard to causality, researchers using case studymethods cannot rely on familiar ways of ruling outalternative explanations. Case studies do not compareindividuals or groups to others randomly assigned todifferent treatments. Case studies do not usestatistical adjustments to facilitate comparison. Casestudies do not estimate statistically the influence ofthe many variables on the instance being examined.To understand a single case, the researcher mustdevelop hunches about what is happening in theinstance under study and systematically seek within it


Appendix I

Theory and History

evidence consistent or inconsistent with the hunches.As evidence accumulates, a second tier of evidence islooked for that would be consistent or inconsistentwith alternative explanations for why the hunches didor did not take the shape of a coherent pattern. Thatis, a very high standard of inferential logic is needed.

When this method produces a coherent, plausiblestory, the researcher can assert a relationshipbetween cause and effect. When conflicting evidencecannot be resolved, the careful investigator indicatesthat causality cannot be established. The standard formaking this judgment requires the diligence of theinvestigator in formulating alternative explanations ofwhat is happening, in specifying the kind of evidencethat would be supportive or nonsupportive, insearching intensively for evidence that would rule outinitial hypotheses, and in thoroughly considering thereasons for inconsistent patterns of evidence in thesecond tier. These techniques, of course, haveparallels in other research traditions.

The ability of the case study to capitalize on insight,to shift focus as the data demand, and to let disparatepieces of evidence fall into place in ways that are notalways easy to describe or command is believed toyield a richer, fuller, and truer explanation of whythings look the way they do than the more limitednumber of tests of a priori hypotheses that othermethods use. In case studies, the criterion fordeciding whether casuality has been established is thecoherence of the evidence, its consistency with thepatterns ascribed to it, and its inconsistency withother explanations. In research designs based onstatistical inference, the criterion for establishingcasuality is whether the findings are likely to haveoccurred by chance following appropriatecomparisons to eliminate alternative interpretations.


Appendix I

Theory and History

In both instances, comparisons must be appropriate ifalternative explanations are to be ruled out.

Method of DataAnalysis

Still another distinguishing feature of case studies,according to some researchers, is a nonstatisticalapproach to data analysis. The elements of analysisare familiar: the identification of regularities, patterns,and relationships and the assessment of theirimportance of meaning. In quantitative methods, theregularities are identified by manipulating numbers toproduce indicators agreed on as sensible descriptionsof the patterns. For example, an average is aconvention that creates a single number to representthe collection of all the numbers in a set. Importanceor meaning is assessed in part by estimating thevariability within the set of numbers to obtain aprobability that the regularity represents thecharacteristics of the population of instances.

The logic of analysis in case studies is the same; thetechniques for identifying regularities and assessingtheir meaning are different. Consider first theregularities. The case study analyst is trying to buildan accurate description and explanation of events asboth the observer and participants frame them. Therewould be little point in trying to identify a singlenumber as an accurate representation of somethingthis complex.

The analyst searches for clusters or paths in the data,using verbal notes and graphic aids, reviewing fielddata and other records of observations, until a patternis evident. Then the data base is searched for furtherevidence that confirms or conflicts with the pattern.When the evidence is more inconsistent thanconsistent, the pattern is rejected. When the evidenceis more consistent than not, the analyst confirms thepattern and looks for others related to it. When all the


Appendix I

Theory and History

evidence is consistent, no further examination isneeded. An array of techniques such as graphicdisplays has been developed to help standardize casestudy analysis.

A key element of case study analysis is the selectionand organization of material to account for thecomplexities and interactions of the events. The rulesare judgmental, not probabilistic. Have all the piecesof pertinent information been considered? Hasadequate attention been given to the outliers? Doesthe pattern seem plausible or forced? Haveinconsistencies been sensibly resolved?

Using an analogy, we can say that the case studyanalyst seeks to explain 100 percent of the varianceby relying on a data base that includes more variablesthan most quantitative studies can accommodate,over more points in time, and on a method that drawson the integrative powers of the mind, whichcomputers do not have. The method inherited fromsociology and anthropology entails early immersion inthe setting, recorded observations, reflections on thespot, and analysis that occur throughout fieldwork, asdata are being collected. Analysis is infusedthroughout the research process in case studies; it isnot a step after the completion of data collection.

Reporting Case studies are usually reported as narratives thatread like chronologies of what led up to an event andwhat happened during and after it. They have beencalled “then-they-did-this” studies. In order to becomprehensive and cohesive, the researchers providea great deal of detail and description and quotedirectly from the participants’ own words andvignettes in the observers’ field records.


Appendix I

Theory and History

To some proponents of case studies, the credibility ofthe method depends on what they call “naturalisticgeneralizability.” By this they mean that readerscompare their own observations, experience, andbelief to the narrative and regard the parts of theinvestigation that are consistent with these asconfirmed. What is inconsistent tends to be examinedmore closely and may be rejected as less credible,unless—so the argument runs—there is enough detailthat readers can “see it with their own eyes.” Forreaders who know a handicapped child or have anaging parent, for example, a case study report ofconditions in residential care for the severelyhandicapped or the aged may compare dramaticallywith vivid personal experiences. The case studyreport can provide an organizing framework forthinking about these and other experiences.

The usefulness of case study reports, therefore,depends to some degree on how well the investigatorhas portrayed the participants’ ways of thinking aboutwhat happened and on how divergent theinvestigator’s analysis is from the reader’s ways ofthinking about the subject. The credibility andauthenticity of the case study report may depend onthe writer’s having provided extensive detail anddescription, making unexpected conclusions asdifficult to deny as if the reader had been part of theevent. In this respect, the narrative mode is not astylistic choice; it is inherent in the purpose of casestudies and the nature of their inquiry. It becomes anobstacle only when authors seek publication throughoutlets whose customers generally ask for briefdetails.

The Case StudyAdapted forEvaluation

The traditional case study belongs to research, notevaluation. To apply case study methods toevaluation, evaluators have had to adapt what derivedlargely from sociological and anthropological


Appendix I

Theory and History

fieldwork (Patton, 1980). Before 1970, however,evaluation case studies were similar to research casestudies. They were longitudinal, were made by on-siteobservers who sought participant-observer roles, andconstituted an inquiry structured from an evolvingunderstanding of events and their meaning to thepersons involved in them. There was usually only oneresearch investigator, and the data consisted ofdescriptions, observations, impressions, unstructuredinterviews, and existing materials gathered at the sitethat were organized intuitively and informally. Thecase study report was a narrative whose purpose wasto illustrate or portray what a program was like, howit was being implemented, and how those who werepart of it both affected it and were affected by it.

In these early uses of the case study method,evaluators wrote their reports to stand alone. Littleeffort was made to integrate two or more sources ofdata, even when the evaluation design included them,although simple references might be made to thenumber of times a feature of other sites was alsocharacteristic of the site reported in the study. Thecase study was useful for readers interested in what aparticular program was like or what happened to atypical beneficiary.

Early in the 1970’s when evaluators wanted to designstudies that would capture some implementation oroutcome features that were different or expensive tomeasure reliably on a large scale—for example, theway a large organization handled a complexinnovation or the effect of an education program onmotivation to learn or growth inself-confidence—they felt caught between riskingconsiderable effort in trying to quantify qualitativevariables and risking the criticism that they wereignoring really important things because they couldnot be measured. The case study seemed a way out. It


Appendix I

Theory and History

offered a relatively inexpensive, low-risk supplementto the large-scale collection of other information thatcould be measured more cheaply and reliably thanwith large-scale studies. By and large, investigatorscommissioned to carry out the early case studyevaluations had been trained in the academicdisciplines with the strongest fieldwork traditions,and they had to struggle with the extent to whichtheir research method could be adapted to evaluation,retain its integrity, and yield positive benefits. Thestruggle is not yet fully over.

The StudyQuestions

The first adaptation was a shift of the specification ofstudy question from the principal investigator duringthe period of study performance to the persons whocommissioned the study in advance of data collection.For example, those who supported an evaluation of atraining program might want the researchers to findout whether the development of the participants’self-concepts, self-esteem, task orientation, workhabits, and personal and social traits seemedassociated with the program or with something else.

Researchers from fieldwork traditions would haveargued that they could provide the most usefulinformation by spending some time at the site of thetraining program, trying to understand what theprogram meant to those who were involved in it, andreporting on what was happening from theperspective of those who were making it happen.Since this might have everything to do what theparticipants’ chance to socialize with friends in anonthreatening environment and nothing to do withself-esteem or work habits, it seemed to theseresearchers that it was therefore logical not to decideon the evaluation questions until theirappropriateness could be determined.


Appendix I

Theory and History

The compromise that has developed is to include inthe evaluation the questions of interest to thecustomer and to permit the researchers to determinewhat data elements are relevant to these questionsand from whom and how they should be collected. Itallows the evaluator to remain alert to other questionsthat might prove more salient if allowed to emerge.

How satisfactory is the compromise? The final reportsof some non-GAO case studies show little or noresemblance between the final questions and those inthe approved study proposal, and a number of issuesabout this have not been resolved. We do not knowwhether the discrepancy is more frequent in casestudies than in other methods. It may be that the finalquestions are the ones the investigators wanted tolook at all along, so that the methodology isvulnerable to subterfuge. Possibly the emergentquestions were those that should reasonably havebeen expected to come into focus—and whoseemergence may be why case studies rather thansurveys are used. From the perspective of theauthenticity and integrity of results, the larger publicinterest may have been served. If the method is highlysusceptible to this kind of internal change, theappropriate scope for case studies should beexamined. When the case study involves one site andmodest expense, the price for identifying betterquestions early may seem affordable.

This is not to say that all case study evaluations showdivergence between the questions that were askedand those that were answered or that an appropriatebalance between the evaluator’s and the customer’sneeds is never reached. However, applying the casestudy methods of research to evaluation requiresdealing with matters of control, power, andresponsibility that were less visible in the work of


Appendix I

Theory and History

academic researchers before their methods wereadapted to evaluation.

The Number ofSites

The demands of evaluation led to other adjustmentsin ethnographic methods. One such demand was thata method developed for understanding the particularhad to be modified for learning about the general.Another was the need for something more adequatethan “naturalistic generalization” for evaluationpurposes. A third was the problem of site variation,which in the mid-1970’s was identified in quantitativestudies as an ill-understood source of greaterdifferences in a program’s outcomes than the programitself.

The case study method seemed born to help, but theforces of time and cost associated with makingmultisite evaluations led to considerable adaptation.First, since evaluators often needed simultaneousstudy at several sites, they needed several observers,which created issues of coordination andinterpretation. Second, the cost of maintaining atrained full-time field worker at a site runs high, sothat evaluators had to settle for shorter observationsor untrained field workers or both.

All these changes—to multiple observers,professionally supervised but not professionallytrained observers, and shorter observation times—ledto others. The across-sites data base got much largeras the number of sites in a study rose. The within-sitesdata bases became less extensive as observationtimes were shortened. It became a challenge tointegrate the work of different observers if theyfocused their attention on different topics from site tosite. And this much larger, much less extensive,probably less reliable data base had to be analyzedand reported in a much shorter time than that of fairly


Appendix I

Theory and History

leisurely academic research. Not surprisingly, analysishas become a major methodological concern, andmore structured and perhaps more efficientapproaches to analysis have been developed.

QuantitativeMethods

To these adaptations, another was added. The casestudy was given a purpose—programevaluation—beyond that of illustration, exploration,or generation of hypotheses. As the examination ofprogram implementation and program effectivenessbecame more central to the case study, so did theability to generalize findings. In turn, quantitativemethods in case studies expanded.

Quantitative methods were incorporated in the casestudy in two ways. The first was in triangulation: theuse of several forms of data within a single case studyin order to give many reference points for verifyingpatterns and ruling out alternative explanations inorder to achieve what evaluators call “internalvalidity.” The second was in the combination of casestudy methods with other methods, particularlysurveys, in order to achieve the generalizability thatevaluators called “external validity.” Theseadaptations created the need for a betterunderstanding of the relationship between case studytechniques and other techniques and betweenquantitative and qualitative approaches within casestudies.

Summary Table I.2 summarizes the changes that have beenmade to adapt the research case study to evaluators’needs. Adapting the research case study to theevaluator’s needs has entailed a number of changes.Less time is spent at sites. Information is collected byjunior staff working under the supervision of aninvestigator trained in case study methods. More time


Appendix I

Theory and History

is allowed for training and monitoring quality. Dataare combined from several sites to allowgeneralization; and data collection has been givengreater structure. Methods of assessing the reliabilityof observations, techniques for transforming very

Table I.2: Evaluation Adaptations of the Research Case StudyCase study element Research Evaluation

Design specifications

Study questions Researcherasks

Sponsor asks

Variables Emerge fromobservation

Sponsor specifies

Site selection Of specificinterest

Representative

Instances One Many

Data Researcherspecifies

Sponsor or sponsor and researcherspecify

Design Trends at onesite

Comparison of many sites

Methods One Several

Costs Usuallyinexpensive,time-consuming

May be very expensive andtime-consuming, particularly instudies with many sites

Data collection

Type of data Quantitative Quantitative and qualitative

Time span studied Long Short; may be cross-sectional

Time at sites Long Short

Sources Informants,observation

Informants, documents, administrativedata

Collection method Researcherspecifies

Sponsor or sponsor and researcherspecify

Role of insight Central Supplementary

Collector Researcher Staff

(continued)


Appendix I

Theory and History

Case study element Research Evaluation

Analysis

Analyst Researcher Staff

Researcher’s role Comprehensive Supervisory

Data reduction Minimal,original data

Considerable; codification, contentanalysis

Multiple data Triangulatewithin site

Triangulate across and within sites

Analysis techniques Nonformalistic,patternrecognition

Formalistic; graphic and contentanalysis techniques

Procedure Intuitive,thematic

Formal, comparative, thematic

Establishing casuality Coherent,plausible story

Greater emphasis on design elementsin addition to internal coherence

Reporting Narrative,descriptive,detailedbuilding ofcoherent story

Conclusion-oriented, use of vignettesfor examples

large amounts of qualitative data, and methods foraggregating qualitative data or findings from severalsites have been developed. The ability to generalizehas become a matter of design and analysis.Reporting methods have changed.

Case studies in evaluation today have made theseadaptations in different degrees. Some studies havenot only generalized but also tested hypotheses. Somecase studies rely wholly on quantitative data. Somerely wholly on information collected by others, nottrained as sociologists or anthropologists, rather thanon firsthand observation. Some aim for uniformity orcomparability of data both within a site with multipleobservers and across several sites. Some useinferential statistics as well as descriptive statistics.


Appendix I

Theory and History

Some present findings and conclusions in formsclosely resembling those of other methods.

These adaptations are not uniformly valued. Somecase study methodologists work with structuredevaluation questions, structured data collection, andobservers untrained as anthropologists orsociologists, but they believe that case studies offer aqualitative way of knowing that should not be mergedwith quantitative results. Others believe that casestudies cannot be used for making the kind ofgeneralizations that probabilistic models are used for,so that little is to be gained and so much is to be lostfrom increasing the number of sites. Still othersbelieve in using many sites in case studies forevaluations and see the next step as establishing moreexplicit procedures for analyzing data and reviewingquality.

“Case study” means different things to differentmethodologists, who reach different conclusionsabout how to do case studies, how to report them,and their overall appro-priateness for answering aspecific question. If case studies can vary so greatly,how can we assess their usefulness for evaluation?One way is to develop a working definition of the casestudy that embodies its essential methodologicalfeatures and then to examine the strengths andlimitations of case studies for different evaluationquestions. This is the approach taken in this paper indeveloping our initial definition.


Appendix II

Site Selection Example

Imagine that in 1987, within an effort to estimate theextent of tax revenues lost or delayed from the failureof businesses to file returns, the General AccountingOffice examined revenue shortfalls to individualstates. Imagine we found 170,076 such instances (anational projection based on a sample) and estimatedthat, cumulatively, over $500 million was lost to thestates. Our report attracted much congressionalinterest. Variation among states in the rate of such“missing returns” was of particular concern. Imaginewe now have been asked to examine in more detailwhat explains differences among states in “missingreturns,” since cumulatively the effect is to makestates look poorer than they actually would be if theycollected revenues authorized by their ownlegislatures. (Hypothetical data for this example aregiven in table 2.2.)

Question 1:InstanceSelection

Using the hypothetical data in table II.I, identify statesfor each type of purposive selection that we mightconsider.

Bracketing

Best case

Worst case

Cluster

Representative

Typical

Special interest


Appendix II


Table II.I: HypotheticalData on UnfiledCorporate Income TaxReturns for 1986 StateIncome Tax Returns

State Number unfiled Rate unfiled

Alabama 6,100 5

Alaska 610 2

Arizona 3,475 9

Arkansas 4,391 2

California 28,841 3

Colorado 3,012 2

Connecticut 2,738 3

Delaware 995 5

District of Columbia 1,562 3

Florida 13,372 4

Georgia 8,887 5

Hawaii 1,197 1

Idaho 732 2

Illinois 16,103 3

Indiana 6,077 3

Iowa 2,096 1

Kansas 2,125 1

Kentucky 3,724 3

Louisiana 8,462 4

Maine 1,032 1

Maryland 6,292 3

Massachusetts 4,427 2

Michigan 8,849 3

Minnesota 3,074 2

Mississippi 6,002 5

Missouri 5,886 3

Montana 770 1

Nebraska 1,324 2

Nevada 781 5

New Jersey 7,985 3

(continued)


Appendix II


State Number unfiled Rate unfiled

New Mexico 2,394 3

New Yorka 19,349 1

North Carolina 7,460 10

North Dakota 539 1

Ohio 12,088 6

Oklahoma 3,593 6

Oregon 2,246 3

Pennsylvania 11,774 2

Rhode Island 856 3

South Carolina 5,529 4

South Dakota 736 1

Tennessee 5,734 15

Texas 18,061 2

Utah 1,152 2

Vermont 463 2

Virginia 8,032 4

Washington 3,806 2

West Virginia 1,760 3

Wisconson 4,559 2

Wyoming 442 3

aIn 1984, New York implemented a “corporate responsibility”law that made CEOs personally liable for timely filing ofcorporate tax returns.

Answer to Question1

Bracketing Given the size differences between states, a doublebracket might be considered. New York and Texas


Appendix II


might form one pair; Kansas and Arizona a secondpair.

Best Case Three states have missing returns (unfiled) rates ofless than 1 percent. These are Hawaii, Kansas, andNew York. Hawaii and Kansas are relatively smallstates and New York has implemented a specialinitiative. Adding states with 1 percent unfiled rates tothe pool would not add larger states, however, sincethese are Iowa, Maine, Montana, North Dakota, andSouth Dakota. It may be that the correlation between“smaller” states and very low rates of unfiled returnsis a “real” phenomenon that should be examined andthe initial cut of less than 1 percent should stand.

Worst Case Texas is an outlier, with a 15-percent unfiled rate.North Carolina has a 10-percent rate, Arkansas9 percent. The next closest states are Ohio andOklahoma, with 6 percent each. Selecting Texas,North Carolina, and Arkansas would be a reasonableworst-case choice.

Cluster Except for New York, no information is given aboutprograms or state initiatives. Using only the data inthe table, several bases for clustering could beconsidered. One frequently used basis is “size of theproblem”: that is, 7 states account for about45 percent of all unfiled returns (California, Florida,Illinois, New York, Oklahoma, Pennsylvania, andTexas). This basis for selection should be ruled outfor this job, however, because there is no meaningfulcluster from the group, except that the states are allamong the larger states. The rationale for the job isbolstering each individual state’s revenue, not thenational pooled aggregrate. Since there are moresmaller, semirural states than big states, thewell-being of individual states would not necessarilybe best served by examining what happens in the fewlarger states. Another basis might be a


Appendix II


crosstabulation of state size and rate of unfiledreturns; here selection of six states could give areasonable fix on reasons for the problem but wouldessentially reproduce the strategy used in therepresentative sample. We would conclude that thedata in the table are not sufficient for drawing acluster sample.

Representative The distribution of unfiled rates is positively skewed,which means that instances are piled up at the lowend and scattered out over the high end. With such adistribution, “representative” in terms of unfiled rateswould sensibly mean at the low (1 and less than1 percent), lower middle (2 and 3 percent), uppermiddle (4, 5, and 6 percent), and high (9, 10, and15 percent) points. Assuming state size would be a“second cut” variable, New York (1 percent),California (3 percent), Ohio (6 percent), and Texas(15 percent) could be one group to study, whileKansas (less than 1 percent), Massachusetts,(2 percent), Oklahoma (6 percent), and Arizona(9 percent) could form a second group of smallerstates. Together, the eight states also would providereasonable geographic representativeness, as well asindustrialized versus more rural spreads.

Typical A frequency distribution of unfiled rates shows that14 states had rates of 3, which turns out to be both themode and the median for this distribution. States inthis category include California, Connecticut, Illinois,Indiana, Kentucky, Maryland, Michigan, Missouri,New Jersey, New Mexico, Oregon, Rhode Island, WestVirginia, and Wyoming. With no other information(for the purpose of this exercise), if fewer than 14case studies were to be made, selecting states typicalin size such as Maryland, Michigan, New Jersey, andIndiana would make sense.


Appendix II


Special Interest New York would be of special interest as a large statewith a very low rate of unfiled returns. New York alsois unique in implementing relevant legislation thatmight have some national potential.

Question 2 While it might be possible, given the data in table II.1,to select states on six of the seven purposive bases,would the evaluation question itself present asituation in which we would want to consider casestudies at all?

Answer Yes, but not as a stand-alone method. We have beenasked to examine the reasons for state variation inunfiled returns. One plausible reason is that thedifferences are the result of how states solicit returns,monitor compliance, and penalize failure to file. Wecould obtain tax codes and procedures for each state,examine these, interview selected officials, andgenerate some plausible patterns. However,understanding reasons for behavior as complex as notfiling is well suited for case studies. Explanationscould range from (for example) failures of managingreturns actually filed, which are quite susceptible toimprovement, to economic cycles that affect businesscircumstances and that may be less susceptible tochange. Since the underlying concern is that manystates may be asking for federal assistance when theywould have resources to handle more of their ownneeds if they collected revenues owing to them, casestudies of a representative sample of states coupledwith examination of the special interest state could bean efficient strategy for ensuring that we had acomprehensive understanding of what was happeningand why. To provide the generalization desirable, thecase studies could be followed by a national survey ofstate officials, checking out the findings from thein-depth studies. Such a sequence could be quite


Appendix II


efficient, since the national survey would not be afishing expedition but targeted to verify initialfindings. It also would offer considerable assurancethat we had accurately determined reasons affectingmost states.


Appendix III

Guidelines for Reviewing Case StudyReports

There are at least six different types of case studyapplication in evaluation, and their strengths andlimitations are different. Choosing an appropriatemethod depends on understanding the evaluationquestion. What is technically right for one question isnot necessarily right for another. However, there arestandards that can be applied to all case studies inevaluation. Studies that fail to meet them havequestionable merit. These quidelines present theminimum standard of quality in case study evaluation,taken in conjunction with the guidance in the “YellowBook,” Policy Manual, and Communications Manual.1

Design 1. Are the evaluation questions stated clearly andexplicitly? A good study informs the reader early inthe report about the questions that were answeredand the issues that were investigated.

2. Is the case study application clearly described? Is itappropriate? A good case study describes the casestudy application that was used. It explains why thisapplication is appropriate for the kind of evaluationquestions that were answered (descriptive, normative,cause-and-effect). Where several methods were used,the relationship of the case study to the othermethods is clear and appropriate.

3. Was the time span of the study long enough toaddress the core issues fairly? A good case studyreports how much time the investigation covered inrelation to the history of the instance or program.Case studies aiming at a comprehensive analysis of anevent as a whole begin as early as possible in its

1These guidelines have been adapted from “Guidelines forReporting Large Case Studies” by John R. Gilbert in David C.Hoaglin et al., Data for Decisions: Information Strategies forDecisionmakers (Cambridge, Mass.: Abt Books, 1982), pp. 138-39,and Robert K. Yin, Case Study Research: Design and Methods(Beverly Hills, Calif.: Sage, 1984), pp. 140-45.


Appendix III

Guidelines for Reviewing Case Study

Reports

history and continue through its completion orstabilization. Evaluation case studies have coveredshorter periods and involved less on-site investigationthan research case studies characteristically do.Readers should recognize, however, that as timeshortens, so may the value of the method as a way ofpresenting a comprehensive understanding of theevent as a whole.

4. Is the basis for case selection presented? Is itappropriate for the purpose of the case study? A goodcase study presents the reasons for selecting theinstances that were examined. The reasons areappropriate for the case study application, an issue ofparticular concern if a generalization of the findings isintended. For assessing the study’s adequacy, the kindof site selected is as important as the number of sitesselected. Attention should be paid to the physicalsetting, to the people who are served by the program,and to variations in treatment.

Data Collection 1. Are the methods of data collection presented? Arethey appropriate for the purpose of the case study?Unstructured methods may be appropriate forillustrative and exploratory applications.Semistructured approaches may be appropriate forcritical instance case studies involving multiple sites,particularly if more than one investigator wasresponsible for collecting data for several sites.

2. If more than one investigator collected the data,how were the other evaluators selected, trained, andsupervised? There is considerable agreement that theconsequence of the many variants in data collectionfor multiple sites is uncertain, but providing detailedinformation on the procedures that are used and anexplanation of the reasons for the approach areessential to a good case study.


Appendix III


Reports

3. Are information sources described clearly andfully? Are they appropriate? A good case studypresents in detail the sources of evidence. The detailis greater than that required in other methods. A goodcase study report gives the numbers and positions ofthe persons interviewed and the evidence that theywere appropriate for the evaluation. The readershould be able to judge from the information that isgiven in the case study report how credible theconclusions are in terms of the appropriateness andcompleteness of information sources.

Data BaseFormation andData AnalysisTechniques

1. Are the procedures for the formation of the database described? A good case study describes how thedata bases were formed and presents a justificationfor decisions that were made about the qualification,precision, and detail of information in the data base ateach site.

2. Are the techniques of data-gathering anddata-processing explicitly described? Readers of agood case study should know how the data werecollected and, step by step, how they were analyzed.If semistructured packets of directions were used toguide field workers through the issues, a good casestudy describes them or includes them in technicalappendixes. All the steps of data reduction andcoding are described, along with the basis fortransformations in these steps. The analytictechniques are explicitly described. What datasources were used in triangulation? In what order?How were discrepant findings resolved? The validityof case study methods partly depends on theresolution process. At each step, safeguards shouldhave been taken for completeness and the reductionof the threat of bias.


Appendix III


Reports

3. Were there interpretation differences, and if so howwere they resolved? A good case study is explicitabout differences in the interpretation of evidenceand events between members of the investigativeteam and the reviewers of the draft report. The casestudy method often uses data that are morejudgmental, interpretive, and subjective than othermethods. The data are often less accessible tosecondary analysis. Thus, a good case study states theargument and evidence more plainly than mostreports have to.

4. If other studies, investigations, or experimentsrelevant to the issue are available, have their resultsbeen presented and reconciled with the case studyfindings? A good case study presents the findings andconclusions for other studies on the same issue. Whenthe findings do not converge, the case studyreconciles or explains the differences as far aspossible. Completeness of information requires thisstep.

Reporting 1. Are methodological strengths and limitationsidentified clearly? A good case study reportsmethodological strengths and limitations foranswering the evaluation questions and explains thetradeoffs that were considered and who influencedthe decisions. When several decisionmakers wereinvolved, a good case study describes the types ofdecisions each one made and the constraints on thosedecisions.

2. Are the arguments for various resolutions of theevaluation question presented? Most case studies areon topics about which some kind of opinion has beenformed. In a good case study, the conceptualframework for organizing the inquiry is quite explicitabout expectations. A good case study identifies the


Appendix III


Reports

elements of the issue that was examined and presentsthe initial arguments in favor of the variousresolutions and the findings of the study that supportthese resolutions.

3. Are the arguments against various resolutions ofthe issue presented? A good case study presents theinitial arguments against the various resolutions ofthe issue that was considered. Case studyinvestigators are supposed to seek evidence thatconfirms and evidence that contradicts theobservations and conclusions. Explicitly stating theinitial arguments for and against various resolutionshelps readers know how thoroughly the investigatorsconsidered the issues and how thoroughly theysought evidence on both sides.

4. Does the case study identify the factors explainingthe phenomena that were observed and state clearlywhether the identification of these factors was basedon insight and recognition or on quantitativetechniques? Case studies are undertaken for theirexplanatory power and their superior ability toidentify the reasons for problems and the nature ofevents. A good case study explicitly identifiesalternative explanations, lays out the chain ofreasoning, and makes clear which conclusions restprimarily on the investigators’ insightful recognitionof patterns of evidence and which have beenrecognized in other ways.

Impartiality andGeneralizability

1. What is known about the competence andimpartiality of the investigators? A good case studyprovides information about the experience of theinvestigators with case study methods and what theyhave written previously about the questions that wereanswered. The more evidence there is that theinvestigators have had appropriate training in case


Appendix III


Reports

study methods, and that they have addressed relatedissues in ways that seem impartial and are intended toreduce bias, the greater confidence the reader canhave in the quality of the work. For GAO reports, thejob documentation should contain evidence that theevaluation team as a group possessed the skillsrequired and assurance that there were noimpediments to impartiality among individual teammembers. For others’ reports we plan to use in ourstudies, we should seek similar assurance in a reportitself or from knowledgable persons.

2. Are comments on the draft report available?Perhaps because case studies require more detail thanother methods, case study reports are sometimescriticized for failing to be convincing about theirimpartiality. One way that a good case study countersthis criticism is by the inclusion of a technicalappendix that gives the full comments of theinformants who reviewed the draft.

3. Is there adequate information for judginggeneralizability? The basis for claiminggeneralizability is explicit in a good case study. Itprovides the evidence, of whatever type and detail,that is needed for assessing this claim. In a good casestudy, generalizations do not exceed the basis forthese, considering program diversity and how thecases studies were selected.

We provide a checklist of the guidelines discussed inthis appendix in table III.1


Appendix III


Reports

Table III.1: Checklist for Reviewing Case Study ReportsYes No

Design

1. Are the evaluation questions stated clearly andexplicitly?

2. Is the case study application clearly described?

3. Was the time span of the study long enough toaddress the core issues fairly?

4a. Is the basis for case selection presented?

b. Is it appropriate for the purpose of the case study?

Data collection

1a. Are the methods of data collection presented?

b. Are they appropriate for the purpose of the case study?

2. If more than one investigator collected the data, werethe other evaluators properly selected, trained, andsupervised?

3a. Are information sources described clearly and fully?

b. Are they appropriate?

Data base information and data analysis technique

1a. Are the procedures for the formation of the data basedescribed?


2a. Are the techniques of data gathering and dataprocessing explictly described?


3a. Were there interpretation differences?

b. If so, how were they resolved?

4. If other studies relevant to the issue are available, havetheir results been presented and reconciled with the casestudy findings?

Reporting

1. Are methodological strengths and limitations identifiedclearly?

(continued)


Appendix III


Reports

Yes No

2. Are the arguments for various resolutions of theevaluation question presented?

3. Are the arguments against various resolutions of theissue presented?

4a. Does the case study identify the factors explainingthe phenomena that were observed?

b. Does the study state clearly whether identification ofthese factors was based on insight and recognition or onquantitative techniques?

Impartiality and generalizability

1. Have proper safeguards to ensure the competenceand impartiality of the investigators been taken?

2. Are comments on the draft report available?

3a. Is there adequate information for judginggeneralizability?

b. Have appropriate limitations to generalizations beenobserved?


Bibliography

For readers with an interest in further information,but limited time, a few key references are starred (*).

Abert, James G., ed. Program Evaluation at HEW:Research vs. Reality, parts 1-3. New York: MarcelDekker, 1979.

Abt, Wendy P., T. Cerva, and T. J. Marx. Why So LittleChange? The Effects on Pupils of the ExperimentalSchools Program. Cambridge, Mass.: 1978.

Abt Associates. First Annual Substantive Report for aStudy of Experimental Schools Projects in SmallSchools Serving Rural Areas. Cambridge, Mass.: 1975.

Acland, Henry. “Are Randomized Experiments theCadillacs of Design?” Policy Analysis, 5 (Spring 1979),pp. 223-41.

Allison, Graham T. The Essence of Decision:Explaining the Cuban Missile Crisis. Boston: Little,Brown, 1971.

Anderson, Scarvia B., et al. “Case Study Method.”Encyclopedia of Educational Evaluation, pp. 46-47.San Francisco: Jossey-Bass, 1976.

Arrow, Kenneth J. Social Choice and IndividualValues, 2nd ed. New York: John Wiley and Sons, 1963.

Barzun, Jacques, and Henry F. Graff. The ModernResearcher, 3rd ed. New York: Harcourt BraceJovanovich, 1977.

Becker, Howard S. “Problems of Inference and Proofin Participant Observation.” American SociologicalReview, 23 (1958), 652-59.


Bibliography

Belle, Deborah. Lives in Stress: Women andDepression. Beverly Hills, Calif.: Sage, 1982.

Berger, Michael A. “Studying Enrollment Decline (andOther Timely Issues) via the Case Survey.”Educational Evaluation and Policy Analysis, 5:3(1983), 307-17.

Berman, P., et al. How Schools View and Use theSchool

International, 1981.

Blalock, Hubert M., Jr. Causal Inferences inNonexperimental Research. Chapel Hill, N.C.:University of North Carolina Press, 1964.

Bloor, M. “On the Analysis of Observational Data: ADiscussion of the Worth and Use of InductiveTechniques and Respondent Validation.” Sociology:The Journal of the British Sociological Association, 12(1978), 545-52.

Bock, Edwin A., ed. Essays on the Case Study Methodin Public

of Administrative Sciences, 1962.

Bock, Edwin A., ed. Essays on the Case StudyMethod. Syracuse, N.Y.: International Institute ofAdministration Sciences, The Inter-University CaseProgram, November 1971.

Bogdan, Robert. Participant Observation inOrganizational Settings. Syracuse, N.Y.: SyracuseUniversity Press, 1972.

Brandt, R. Studying Behavior in Natural Settings. NewYork: Holt, Rinehart and Winston, 1972.


Bibliography

Broadhead, R., and Ray C. Rist. “Gatekeepers and theSocial Control of Social Research.” Social Problems,23 (1976), 325-26.

Bulmer, M. “Concepts in the Analysis of QualitativeData.” Sociological Review, 27 (1979), 651-77.

Burger, R., and M. Massaglia. RANN UtilizationExperience: Case Studies 22 Through 31, vol. 2.Research Triangle Park, N.C.: Research TriangleInstitute, August 1976.

Campbell, Donald T. “Degrees of Freedom and theCase Study.” Comparative Political Studies, 8 (1975),178-93.*

Campbell, Donald T., and Julian C. Stanley.Experimental and Quasi-experimental Designs forResearch. Chicago: Rand McNally, 1963.

Chelimsky, Eleanor. “GAO’s Institute for ProgramEvaluation.” State Evaluation Network Newsletter, 1(1981), 2-5.

Chelimsky, Eleanor, and J. Dahmann. Final Report ofthe Career Criminal Program National Evaluation:Case Studies of Four Jurisdictions, 1976-79. McLean,Va.: MITRE Corp., June 1980.

Chelimsky, Eleanor, and J. Sasfy. The National-LevelEvaluation of the Career Criminal Program: Conceptand Plan. McLean, Va.: MITRE Corp., May 1976.

Christoph, James B., ed. Cases in ComparativePolitics. Boston: Little, Brown, 1965.

Connelly, W. L. Continuity and Change in RuralSchooling. Cambridge, Mass.: Abt Associates, 1979.


Bibliography

Cook, Thomas D., and Charles S. Reichardt, eds.Qualitative and Quantitative Methods in EvaluationResearch. Beverly Hills, Calif.: Sage, 1979.

Cook, Thomas D., and Donald T. Campbell.Quasi-experimental Design and Analysis Issues forField Settings. Chicago: Rand McNally, 1979.*

Cronbach, Lee. “Remarks to the New Society.”Evaluation Research Society Newsletter, 1 (1977), 4.

Cronbach, Lee, et al. Toward Reform of ProgramEvaluation. San Francisco: Jossey-Bass, 1980.

Datta, Lois-ellin. “Strange Bedfellows.” AmericanBehavioral Scientist, 26:1 (1982), 133-44.

David, J. L., and Greene, D. A. Research Design forGeneralizing from Multiple Case Studies. Palo Alto,Calif.: Bay Area Research Group, 1981.

Dawson, Judith. “The Validity of QualitativeResearch.” Paper presented at the AmericanEducational Research Association meeting, SanFrancisco, Calif., April 1979.

Dawson, Judith A. “Qualitative Research Findings:What Do We Do to Improve and Estimate TheirValidity?” Paper presented at the Annual AmericanEducational Research Association meeting, NewYork, March 1982.

Denzin, Norman K., ed. Sociological Methods: ASourcebook. New York: McGraw-Hill, 1978a.

Denzin, Norman K. “The Logic of NaturalisticInquiry.” Social Forces, 50 (1971), 166-82.


Bibliography

Denzin, Norman K. The Research Act, 2nd ed. NewYork: McGraw-Hill, 1978b.

Derthick, Martha A. New Towns In-Town: Why aFederal Program Failed. Washington, D.C.: The UrbanInstitute, 1972.

Dittman, Laura, et al. “Study of Selected Children inHead Start Planned Variation, 1969-70. First YearReport. Case Studies of Children.” University ofMaryland, College of Education, College Park, Md.,1971.

Dobbert, Marion Lundy. Ethnographic Research:Theory and Applications for Modern Schools andSocieties. New York: Praeger, 1982.

Downey, H. Kirk, and Duane R. Ireland. “QuantitativeVersus Qualitative: The Case of EnvironmentalAssessment in Organizational Studies.” AdministrativeScience Quarterly, 24 (1979), 630-37.

Farley, Joanne. “Combining Quantitative andQualitative Methods in Evaluation Research.” StateEvaluation Network Newsletter, 1 (1981), 3-5.

Farrar, Eleanor, John DeSanctis, and David Cohen.“Views from Below: Implementation Research inEducation.” Teachers College Record, 82:1 (1980),77-100.

Fiedler, Judith. Field Research: A Manual for Logisticsand Management of Scientific Studies in NaturalSettings. San Francisco: Jossey-Bass, 1978.

Fienberg, S. E. “The Collection and Analysis ofEthnographic Data in Educational Research.”Anthropology and Education Quarterly, 8 (1977),50-57.


Bibliography

Filstead, William J., ed. Qualitative Methodology.Chicago: Markham, 1970.

Finsterbush, Kurt. “Statistical Summary of 52 AIDProjects: Lessons on Project Effectiveness.”University of Maryland, College Park, Md., 1984.

Glaser, Barney G. Theoretical Sensitivity: Advances inthe Methodology of Grounded Theory. Mill Valley,Calif.: Sociology Press, 1978.

Glaser, Barney G., and Anselm L. Strauss. TheDiscovery of Grounded Theory: Strategies forQualitative Research. Chicago: Aldine, 1967.*

Glazer, Myron. The Research Adventure: Promise andProblems of Fieldwork. New York: Random House,1972.

Goetz, J. P., and M. D. LeCompte. “EthnographicResearch and the Problem of Data Reduction.”Anthropology and Education Quarterly, 12 (1981),51-70.

Guba, Egon G. Toward a Methodology of NaturalisticInquiry in Educational Evaluation. Los Angeles:University of California, Center for the Study ofEvaluation, 1978.

Guba, Egon G. “Criteria for Assessing theTrustworthiness of Naturalistic Inquiries.”Educational Communications and TechnologyJournal, 8 (1981), 42-54.*

Guba, Egon G., and Yvonna S. Lincoln. EffectiveEvaluation: Improving the Usefulness of ResultsThrough Responsive and Naturalistic Approaches.San Francisco: Jossey-Bass, 1981.


Bibliography

Halpern, Edward S. “Auditing Naturalistic Inquiries:Some Preliminary Applications.” Paper presented atthe American Educational Research Associationmeeting, Toronto, Canada, April 1983.

Hamilton, D., et al. Beyond the Numbers Game.Berkeley, Calif: McCutchan, 1978.

Hargrove, Erwin C. “The Bureaucratic Politics ofEvaluation: A Case Study of the Department ofLabor.” Evaluation Studies Review Annual, vol. 6,Howard E. Freeman and Marian A. Solomon, eds., pp.179-288. Beverly Hills, Calif.: Sage, 1981.

Hedrick, Terry E., Robert F. Boruch, and K. J. Ross.“On Ensuring the Availability of Evaluation Data forSecondary Analysis.” Evaluation Studies Annual, vol.4, L. Sechrest et al., eds. Beverly Hills, Calif.: Sage,1979.

Herriott, Robert E. “Ethnographic Case Studies inFederally Funded Multi-disciplinary Policy Research:Some Design and Implementation Issues.”Anthropology and Education Quarterly, 9 (1977),106-15.

Herriott, Robert E. Federal Initiatives and RuralSchool Improvement. Cambridge, Mass.: AbtAssociates, 1980.

Herriott, Robert E. Case Study Methods in SchoolEvaluation and Research: A Synthesis of Experience.Part I. Final Report. Washington, D.C.: NationalInstitute of Education, June 15, 1982.

Hersen, Michel, and David H. Barlow. Single-CaseExperimental Designs: Strategies for StudyingBehavior Change. New York: Pergamon Press, 1976.


Bibliography

High/Scope Educational Research Foundation.National Home Start Evaluation Study, InterimReports IA and IB, Case Studies. Ypsilanti, Mich.:1972.

Hoaglin, David C., et al. Data for Decisions:Information Strategies for Policy Makers. Cambridge,Mass.: Abt Books, 1982.

Holt, Robert T., and John E. Turner, eds. “TheMethodology of Comparative Research.” TheMethodology of Comparative Research, pp. 1-20. NewYork: Free Press, 1970.

House, Ernest R. The Logic of Evaluative Argument.Los Angeles: University of California, Center for theStudy of Evaluation, 1977.

Huberman, A. M., and M. B. Miles. “Drawing ValidMeaning from Qualitative Data: Some Techniques ofData Reduction and Display.” Quality and Quantity, 17(1983), 283-339.

Jauch, L., R. Osborn, and T. Martin. “StructuredContent Analysis of Cases.” Academy of ManagementReview, 5 (1980), 517-25.

Jerome, Chris H. National Home Start EvaluationStudy, Interim Report III, Case Study Summaries.Ypsilanti, Mich.: High/Scope Educational ResearchFoundation, 1973.

Jerome, Chris H. National Home Start EvaluationStudy, Interim Report V, Case Studies. Ypsilanti,Mich.: High/Scope Educational Research Foundation,1974.


Bibliography

Jick, Todd D. “Mixing Qualitative and QuantitativeMethods: Triangulation in Action.” AdministrativeScience Quarterly, 24 (1979), 602-11.*

Johnson, John M. Doing Field Research. New York:Free Press, 1975.

Johnson, Steven D. “On the Use of QualitativeMethods in Policy Research: A Review of ThreeMulti-site Studies.” Mimeograph, Cornell University,Ithaca, N.Y., February 1980.

Kendall, Patricia L., and Katherine M. Wolf. “TheAnalysis of Deviant Cases in CommunicationsResearch.” Communications Research: 1948-1949,eds. Paul Lazarsfeld and Frank Stanton, pp. 152-57.New York: Harper & Row, 1949.

Kennedy, Mary M. “Generalizing from Single CaseStudies.” Evaluation Quarterly, 3 (1979), 661-78.*

Khadduri, Jill, and Raymond J. Struyk. “ImprovingSection 8 Rental Assistance: Translating Evaluationinto Policy.” Evaluation Review, 5 (1981), 189-206.

Kidder, Louise H. Research Methods in SocialRelations, 4th ed. New York: Holt, Rinehart andWinston, 1981.

Kirschen, Etienne S., and Lucien Morissens. “TheObjectives and Instruments of Economic Policy.”Qualitative Planning of Economic Policy, ed. Bert G.Hickman, pp. 111-13. Washington, D.C.: TheBrookings Institution, 1965.

Kirschen, Etienne S., et al. Economic Policy in OurTime. Amsterdam, Netherlands: North Holland Pub.Co., 1964.


Bibliography

Kraft, Richard H. P., et al. Four Evaluation Examples:Anthropological, Economic, Narrative and Portrayal.Chicago: Rand McNally, 1974.

Kratochwill, Thomas R. Single Subject Research. NewYork: Academic Press, 1978.

Kyle, Diane Wells, and Dorene D. Ross. “EvaluatingQualitative Research: Criteria and Their Application.”Paper presented at the Evaluation Network annualmeeting, Chicago, Ill., October 1983.

Lazarsfeld, Paul F., and Allen H. Barton. “QualitativeMeasurement in the Social Sciences: Classification,Typologies and Indices.” The Policy Sciences: RecentDevelopments in Scope and Methods, eds. DanielLerner and Harold D. Lasswell, pp. 155-92. Stanford,Calif.: Stanford University Press, 1951.

Lecompte, Margaret D., and Judith P. Goetz.“Problems of Reliability and Validity in EthnographicResearch.” Review of Educational Research, 152(Spring 1982), 31-60.

Levine, Harold G. “Principles of Data Storage andRetrieval for Use in Qualitative Evaluations.” Paperpresented at the American Educational ResearchAssociation meeting, Montreal, Canada, April, 1983.

Lewy, Arieh, and Marvin, Alkin, The Impact of a MajorNational Evaluation Study: Israel’s Van Leer Report.Los Angeles: University of California,

Center for the Study of Evaluation, InternationalMonograph Series in Evaluation, April 1983.

Light, Richard J. “Six Evaluation Issues ThatSynthesis Can Resolve Better Than Single Studies.”Issues in Data Synthesis, eds. William H. Yeaton and


Bibliography

Paul M. Wortman, pp. 57-74. San Francisco:Jossey-Bass, 1984.

Light, Richard J., and David B. Pillemer. Summing Up:The Science of Reviewing Research. Cambridge,Mass.: Harvard University Press, 1984.*

Lijphart, Arend. “Comparative Politics and theComparative Method.” American Political ScienceReview, 65 (1971), 682-93.

Lincoln, Yvonna S. “Strategies for Inquiring About theDependability (Reliability) of Naturalistic Studies.”Paper presented at the Evaluation Research Societymeeting, Austin, Texas, September 30-October 3,1981.

Lipset, Seymour Martin, Martin A. Trow, and James S.Coleman. Union Democracy. Glencoe, Ill.: Free Press,1956.

Lofland, J. Analyzing Social Settings. Belmont, Calif.:Wadsworth, 1971.

Lucas, W. The Case Survey Method of AggregatingCase Experience. Santa Monica, Calif.: Rand, 1974.

McCall, George J., and J. L. Simmons, eds. Issues inParticipant Observation: A Text and Reader. Reading,Mass.: Addison-Wesley, 1969.

McClintock, Charles C., Diane Brannon, and StevenMaynard-Moody. “Applying the Logic of SampleSurveys to Qualitative Case Studies: The Case ClusterMethod.” Administrative Science Quarterly, 24 (1979),612-29.

McDaniels, Garry, et al. “Case Studies of Children inHead Start Planned Variation, 1970-71.” University of


Bibliography

Maryland, College of Education, College Park, Md.,1972.

McGowan, Eleanor, and David Cohen. “RationalFantasies.” Policy Sciences Journal, 1 (1979), 439-54.

Marsh, Robert M. “The Bearing of ComparativeAnalysis on Sociological Theory.” Social Forces, 43(1964), 191-96.

Miles, Matthew B. “Qualitative Data as an AttractiveNuisance.” Administrative Science Quarterly, 24(1979), 590-601.

Miles, Matthew B. “A Mini-Cross-Site Analysis.”American Behavioral Scientist, 26:1 (1982), 121-32.

Miles, Matthew B., and A. M. Huberman. QualitativeData Analysis: A Sourcebook of New Methods.Beverly Hills, Calif.: Sage, 1984.*

Mill, John Stuart. A System of Logic, 8th ed. London,England: Longmans Green, 1972.

Mintzberg, Henry. “The Emerging Strategy of ’Direct’Research.” Administrative Science Quarterly, 24(1979), 582-89.

Moberg, P. “The Collection and Analysis of QualitativeData in Evaluation Research.” Paper presented at theNational Conference on Evaluation in Alcohol, DrugAbuse, and Mental Health Programs, Washington,D.C., 1974.

Mosteller, F., and D. L. Wallace. Inference andDisputed Authorship: The Federalist. Reading, Mass.:Addison-Wesley, 1964.


Bibliography

Mosteller, F., et al. “The Pre-Election Polls of 1948.”Social Science Research Council Bulletin, 60 (1949).

Mulhauser, Frederick, “Ethnography and PolicyMaking: The Case of Education.” HumanOrganization, 3 (1975), 311-15.

National Science Foundation. Case Studies in ScienceEducation, Vols. 1 and 2. Washington, D.C.: U.S.Government Printing Office, 1978.

National Science Foundation. Cooperative Science: ANational Study of University and IndustryResearchers. Assessment of the Industry/UniversityCooperative Research Projects Program, vols. 1 and 2.Washington, D.C.: November 1984.

Neustadt, R. E., and H. V. Fineberg. The Swine FluAffair: Decision Making on a Slippery Disease.Washington, D.C.: U.S. Government Printing Office,1978.

Office of Technology Assessment. Assessing theEfficacy and Safety of Medical Technologies.Washington, D.C.: U.S. Government Printing Office,1978.

Paige, Glenn. “Problems and Use of the Single Case inPolitical Research.” Ph.D. diss., NorthwesternUniversity, Evanston, Ill., 1959.

Patton, Michael Quinn. Qualitative EvaluationMethods. Beverly Hills, Calif.: Sage, 1980.*

Pelto, Pertti J., and Gretel H. Pelto. AnthropologicalResearch: The Structure of Inquiry, 2nd ed.Cambridge, England: Cambridge University Press,1978.


Bibliography

Philadelphia Inquirer. “Anti-Terrorism Videos: AirlinePersonnel Say FAA-Ordered Training Program IsBoring, Ineffective.” April 29, 1986.

Pierce, William Spangar. Bureaucratic Failure andPublic Expenditure. New York: Academic Press, 1981.

Piore, Michael J. “Qualitative Research Techniques inEconomics.” Administrative Science Quarterly, 24(1979), 560-69.

Platt, J. R. “Strong Inference.” Science, 146 (1964),347-53.

Plog, Michael. “The Use of Case Study Methodology.”State Evaluation Network Newsletter, 1:2 (1980), 5-6.

Popkewitz, Thomas S., and Robert B. Tabachnick.The Study of Schooling: Field-Based Methodologies inEducational Research and Evaluation. New York:Praeger, 1981.

Pressman, Jeffrey L., and Aaron Wildavsky.Implementation. Berkeley, Calif.: University ofCalifornia Press, 1973.*

Przeworski, Adam, and Henry Teune. The Logic ofComparative Social Inquiry. New York: John Wileyand Sons, 1970.

Rist, Ray C. “On the Relations Between EducationalResearch Paradigms: From Disdain to Detente.”Anthropology and Education Quarterly, 8 (1977),42-49.

Rist, Ray C. Earning and Learning: Youth EmploymentPolicies and Programs. Beverly Hills, Calif.: Sage,1981a.


Bibliography

Rist, Ray C. “On the Utility of Ethnographic Researchfor the Policy Process.” Urban Education, 15 (1981b),485-94.

Rist, Ray C. “Mandating Collaboration ThroughFederal Legislation: YEDPA and the CETA-SchoolLinkage.” Research in Sociology of Education andSocialization, vol. 3, ed. R. Corwin, pp. 187-205. NewYork: JAI Press, 1982a.

Rist, Ray C. “On the Application of EthnographicInquiry to Education: Procedures and Possibilities.”Journal of Research in Science Teaching, 19:6(1982b), 439-50.

Rist, Ray C. “Beyond the Quantitative Cul-de-Sac: AQualitative Perspective on Youth EmploymentPrograms.” Applied Poverty Research, eds. RichardGoldstein and Stephen M. Sachs, pp. 123-38. Totawa,N.J.: Rowman and Allanheld, 1984.

Roncek, Dennis W., and Gail Weinberger.“Neighborhoods of Leased Public Housing.”Evaluation Review, 5 (1981), 231-44.

Sanday, Peggy R. “The Ethnographic Paradigm(s).”Administrative Science Quarterly, 24 (1979), 527-38.

Scheirer, Mary Ann, and Eva L. Resmovic. “Measuringthe Degree of Program Implementation.” EvaluationReview, 7 (1983), 599-633.

Scriven, Michael. “Objectivity and Subjectivity inEducational Research.” Philosophical Redirection inEducational Research, ed. L. G. Thomas. Chicago:University of Chicago Press, 1972.

Searle, Barbara (ed.). Evaluation in World BankEducation Projects: Lessons from Three Case Studies,


Bibliography

Report EDT5. Washington, D.C.: World Bank,July 1985.

Sechrest, Lee, ed. Unobtrusive Measurement Today.San Francisco, Calif.: Jossey-Bass, 1979.

Shapiro, E. “Educational Evaluation: Rethinking theCriteria of Competence.” School Review, 81 (1973),523-49.

Sieber, Sam D. “The Integration of Fieldwork andSurvey Methods.” American Journal of Sociology, 78:6(1973), 1335-59.

Sjoberg, Gideon. “The Comparative Method in theSocial Sciences.” Philosophy of Science, 22 (1955),106-17.

Smith, Allen G., and Karen S. Louis, eds.“Multimethod Policy Research: Issues andApplications.” American Behavioral Scientist, 26:1(1982), 1-144.

Smith, Louis M. “An Aesthetic Education Workshopfor Administrators: Some Implications for a Theory ofCase Studies.” Paper presented at the AmericanEducational Research Association meeting, Chicago,Ill., 1974.

Smith, Louis M. “An Evolving Logic of ParticipantObservation, Educational Ethnography and OtherCase Studies.” Review of Research in Education, 6(1979), 316-77.

Smith, Louis M., and S. Schumacher. Extended PilotTrials of the Aesthetic Education Program: AQualitative Description. St. Louis: CEMREL, 1972.


Bibliography

Snow, Richard E. “Representative andQuasi-representative Designs for Research onTeaching.” Review of Educational Research, 44(1974), 265-91.

Spirer, Janet E. The Case Study Method: Guidelines,Practices and Applications for Vocational Education.Columbus, Ohio: National Center for Research inVocational Education, 1980.

Stake, Robert E. “The Case Study Method in SocialInquiry.” Educational Researcher, 7 (1978), 5-8.

Stake, Robert E., and J. Easley, eds. Case Studies inScience Education. Urbana, Ill.: Center forInstructional Research and Curriculum Evaluation,1978.

Stein, Harold. Public Administration and PolicyDevelopment: A Case Book. New York: HarcourtBrace Jovanovich, 1952.

Stenhouse, Lawrence. “Case Study in ComparativeEducation: Particularity and Generalizability.”Comparative Education, 15:1 (1979), 5-10.

Trend, M. G. “On the Reconciliation of Qualitative andQuantitative Analyses: A Case Study.” HumanOrganization, 37 (1978), 345-54.

Trow, Martin. “Comment on Participant Observationand Interviewing: A Comparison.” HumanOrganization, 16 (1957), 33-35.

Turner, B. A. “Some Practical Aspects of QualitativeData Analysis: One Way of Organizing the CognitiveProcesses Associated with the Generation ofGrounded Theory.” Quality and Quantity, 15 (1981),225-47.


Bibliography

University of Sussex, Social Science Research PolicyUnit. Success and Failure in Industrial Innovation.London, England: Center for the Study of IndustrialInnovation, 1972.

U.S. General Accounting Office. Lands in the LakeChelan National Recreation Area Should Be Returnedto Private Ownership, GAO/CED-81-10. Washington,D.C.: January 22, 1981.

U.S. General Accounting Office. Housing Block GrantActivity in Pittsburgh: A Case Study, CED-82-52.Washington, D.C.: March 24, 1982.

U.S. General Accounting Office. Housing Block GrantActivity in Seattle: A Case Study, CED-82-60.Washington, D.C.: March 30, 1982.

U.S. General Accounting Office. Review of theOperations of the Sea Island Comprehensive HealthCare Corporation and the Franklin C. Fetter FamilyHealth Center, HRD-82-69. Washington, D.C.: April 23,1982.

U.S. General Accounting Office. Housing Block GrantActivity in Dallas: A Case Study, CED-82-75.Washington, D.C.: April 30, 1982.

U.S. General Accounting Office. Content Analysis: AMethodology for Structuring and Analyzing WrittenMaterial. Methodology transfer paper 3. Washington,D.C.: June 1982.

U.S. General Accounting Office. Cleaning Up theEnvironment: Progress Achieved but MajorUnresolved Issues Remain, vols. 1 and 2, CED-82-72.Washington, D.C.: July 21, 1982.


Bibliography

U.S. General Accounting Office. Block Grants forHousing: A Study of Local Experiences and Attitudes,GAO/RCED-83-21. Washington, D.C.: December 13,1982.

U.S. General Accounting Office. How Well Do theMilitary Services Perform Jointly in Combat? DOD’sJoint Test-and- Evaluation Program Provides FewCredible Answers, GAO/PEMD-84-3. Washington,D.C.: February 22, 1984.

U.S. General Accounting Office. An Evaluation of the1981 AFDC Changes: Initial Analyses,GAO/PEMD-84-6. Washington, D.C.: April 2, 1984.

U.S. General Accounting Office. Need to BetterAssess Consequences Before Reducing TaxpayerAssistance, GAO/GGD-84-13. Washington, D.C.:April 5, 1984.

U.S. General Accounting Office. ImplementationStatus of the Office of Management and BudgetCircular A-76 Program at the Department of theInterior’s National Park Service and Bureau ofReclamation, GAO/RCED-85-56. Washington, D.C.:March 15, 1985.

U.S. General Accounting Office. Projects Funded inNortheast Texas by the Emergency JobsAppropriations Act of 1983, GAO/HRD-85-42.Washington, D.C.: March 26, 1985.

U.S. General Accounting Office. Projects Funded inthe Montgomery, Alabama, Metropolitan Area by theEmergency Jobs Appropriations Act of 1983,GAO/HRD-85-59. Washington, D.C.: May 7, 1985.

U.S. General Accounting Office. Projects Funded inFresno County, California by the Emergency Jobs


Bibliography

Appropriations Act of 1983, GAO/HRD-85-90.Washington, D.C.: August 27, 1985.

U.S. General Accounting Office. Projects Funded inSouth Central Georgia by the Emergency JobsAppropriations Act of 1983, GAO/HRD-85-98.Washington, D.C.: September 25, 1985.

U.S. General Accounting Office. Improving Operatingand Staffing Practice Can Increase Productivity andReduce Costs in SSA’s Atlanta Region,GAO/GGD-85-85. Washington, D.C.: September 11,1985.

U.S. General Accounting Office. Emerging Issues inExport Competition: A Case Study of the BrazilianMarket, GAO/NSIAD-85-121. Washington, D.C.:September 26, 1985.

U.S. General Accounting Office. Information on theForest Service’s Efforts to Control the Spread of theWestern Spruce Budworm in the Carson NationalForest, GAO/RCED-86-6. Washington, D.C.:October 30, 1985.

U.S. General Accounting Office. Department ofCommerce’s Second-Year Efforts to Implement theFederal Manager’s Financial Integrity Act,GAO/RCED-86-21. Washington, D.C.: November 5,1985.

U.S. General Accounting Office. Emergency Jobs Actof 1983: Projects Funded in the Lawrence-Haverhill,Massachusetts, Area, GAO/HRD-86-30. Washington,D.C.: December 6, 1985.

U.S. General Accounting Office. Using StatisticalSampling. Methodology transfer paper 6. Washington,D.C.: May 15, 1986.


Bibliography

U.S. General Accounting Office. Bigeye Bomb: AnEvaluation of DOD Chemical and Development Tests,GAO/PEMD-86-12BR. Washington, D.C.: May 23, 1986

U.S. General Accounting Office. ForeignRepresentation: Former High-Level Federal OfficialsRepresenting Foreign Interests,GAO/NSIAD-86-175BR. Washington, D.C.: July 11,1986.

U.S. General Accounting Office. Social Security:Improved Telephone Accessibility Would Better Servethe Public. GAO/HRD-85-86. Washington, D.C.:August 29, 1986.

U.S. General Accounting Office. The Nation’s WaterQuality: Key Unanswered Questions About the Qualityof Rivers and Streams, GAO/PEMD-86-6. Washington,D.C.: September 19, 1986.

U.S. General Accounting Office. Cargo Imports:Customs Need to Better Assure Compliance withTrade Laws and Regulations, GAO/GGD-86-136.Washington, D.C.: December 1986.

U.S. General Accounting Office. Water Quality: AnEvaluation Method for the Construction GrantsProgram—Methodology, GAO/PEMD-87-4A.Washington, D.C.: December 17, 1986a.

U.S. General Accounting Office. Water Quality: AnEvaluation Method for the Construction GrantsProgram—Case Studies. GAO/PEMD-87-4B, vol. 2Washington, D.C.: December 17, 1986b.

U.S. General Accounting Office. Medical Malpractice:Six Case Studies Show Claims and Insurance CostsStill Rise Despite Reforms, GAO/HRD-87-21.Washington, D.C.: December 31, 1986.


Bibliography

U.S. General Accounting Office. Parks andRecreation: Construction Contract at Jean LaFitteNational Historical Park, GAO/RCED 86-232FS.Washington, D.C.: September 26, 1987.

U.S. General Accounting Office. Parks andRecreation: Concerns Raised About National ParkService Actions at Delaware Water Gap,GAO/RCED-87-24BR. Washington, D.C. October 28,1987.

VanderPutten, Elizabeth. “Toward a Theory ofLoosely Coupled Systems.” Ph.D. diss., GeorgeWashington University, Washington, D.C., 1983.

Van Maanen, John. “The Fact of Fiction inOrganizational Ethnography.” Administrative ScienceQuarterly, 24 (1979), 539-50.

Van Maanen, John. Qualitative Methodology. BeverlyHills, Calif.: Sage, 1983.

Vidich, Arthur J., and Gilbert Shapiro. “A Comparisonof Participant Observation and Survey Data.”American Sociological Review, 20 (1955), 28-33.

Voss, Harwin L. “Pitfalls in Social Research: A CaseStudy.” American Sociologist, 1 (1966), 136-40.

Wax, Rosalie. Doing Field Work: Warnings andAdvice. Chicago: University of Chicago Press, 1971.

Weatherly, R., and M. Lipsky. “Street LevelBureaucrats and Institutional Innovation:Implementing Special-Education Reforms.” HarvardEducational Review, 47 (May 1977), 171-97.


Bibliography

Webb, Eugene, and Karl E. Weick. “UnobtrusiveMeasures in Organizational Theory: A Reminder.”Administrative Science Quarterly, 24 (1979), 650-59.

Weiss, Robert S., and M. Rein. “The Evaluation ofBroad Aim Programs: Experimental Design, ItsDifficulties and an Alternative.” AdministrationScience Quarterly, 15 (1980), 97-109.

Wilson, Steve. “The Use of Ethnographic Methods inEducational Evaluation.” Human Organization, 36(1977), 2.

Wilson, Steve. “Explorations of the Usefulness ofCase Study Evaluations.” Evaluation Quarterly, 3(1979), 446-59.

Yeaton, William H., “The Case Study Crisis: SomeAnswers.” Administrative Science Quarterly, 26(1981b), 58-66.

Yeaton, William H., Eveleen Bingham, and Karen A.Heald. “The Difference That Quality Makes: The Caseof Literature Reviews.” Sociological Methods andResearch, 5 (1976), 139-56.

Yeaton, William H., and Paul M. Wortman. Issues inData Synthesis. San Francisco: Jossey-Bass, 1984.

Yin, Robert K. Case Study Research: Design andMethods, rev. ed. Beverly Hills, Calif.: Sage, 1989.*

Yin, Robert K., et al. A Review of Case Studies ofTechnological Innovation in State and Local Services.Santa Monica, Calif.: Rand, 1976.

Yin, Robert K., and Karen A. Heald. “Using the CaseSurvey Method to Analyze Policy Studies.”Administrative Science Quarterly, 20 (1975), 371-81.


Bibliography

Yin, Robert K., and Ingrid Heinsohn. Using theResearch Sponsored by the AOA. Case Study No. 1.Transportation Service for the Elderly. Washington,D.C.: American Institute for Research, 1980.

Zelditch, Morris J. “Some Methodological Problems ofField Studies.” American Journal of Sociology, 67(1962), 566-76.


Glossary

Backfill Techniques Techniques used in cumulative case studies to collectinformation needed if the study is to be usable foraggregation; these techniques include, for example,obtaining missing information from the authors onhow instances studied were identified and on thebases for instance selection.

Bias The extent to which a measurement, sampling, oranalytic method systematically underestimates oroverestimates the true value of an attribute.

Case Study A method for learning about a complex instance,based on a comprehensive understanding of thatinstance, obtained by extensive description andanalysis of the instance, taken as a whole and in itscontext.

Convenience Sample Instances selected where the only basis is feasibilityor ease of data collection. Rarely useful in evaluationand is usually hazardous.

Construct An attribute, usually unobservable, such aseducational attainment or socioeconomic status, thatis represented by an observable measure.

Construct Validity The extent to which a measurement methodaccurately represents a construct and produces anobservation distinct from that produced by a measureof another construct.

External Validity The extent to which a finding applies (or can begeneralized) to persons, objects, settings, or timesother than those that were the subject of study.


Glossary

Focused Interview An interview organized around several predeterminedquestions or topics but providing some flexibility inthe sequencing of the questions and without apredetermined set of response categories or specificdata elements to be obtained.

Generalizability Used interchangeably with “external validity.”

Internal Validity The extent to which the causes of an effect areestablished by an inquiry.

Longitudinal Data Sometimes called “time series data,” observationscollected over a period of time; the sample (instancesor cases) may or may not be the same each time.

Matrix of Categories A method of displaying relationships among themes inanalyzing case study data that shows whetherchanges in categories or degrees along one dimensionare associated with changes in the categories ofanother dimension.

Normative Question A type of evaluation question requiring comparisonbetween what is happening (the condition) to normsand expectations or standards for what should behappening (the criterion).

Open-EndedInterview

An interview in which, after an initial or leadquestion, subsequent questions are determined bytopics brought up by the person being interviewed;the concerns discussed, their sequence, and specificinformation obtained are not predetermined and thediscussion is unconstrained, able to move inunexpected directions.


Glossary

Outliers Instances that are aberrant or do not fit with otherinstances; instances that, compared to other membersof a population, are at the extremes on relevantdimensions.

ProgramEffectivenessEvaluation

The application of scientific research methods toestimate how much observed results, intended or not,are caused by program activities. Effect is linked tocause by design and analyses that compare observedresults with estimates of what might have beenobserved in the absence of the program.

Program Evaluation The application of scientific research methods toassess program concepts, implementation, andeffectiveness.

Purposive Sample Instances appropriately selected to answer differentevaluation questions, on various systematic bases,such as best or worst practices; a judgmental sample.If conducted systematically, can be widely useful inevaluation.

Qualitative Data Information based on judgments (such as theestimated speed of a UFO) which may be expressedin numerical or nonnumerical ways and data that maynot be based on judgments (such as state of birth) butare not meaningfully expressed numerically. The datasources are often textual and observational andexpressed in words.

Quantitative Data Information based on measures that do not rely onjudgments and that are meaningfully measured. Theseare usually expressed numerically and often usecontinuous rather than discrete or categorical levels


Glossary

of measurement and scales with interval or ratioproperties.

Reliability The extent to which a measurement process producessimilar results on repeated observations of the samecondition or event.

RepresentativeSample

A sample that has approximately the samedistribution of characteristics as the population fromwhich it was drawn.

Simple RandomSample

A method for drawing a sample from a populationsuch that all samples of a given size have equalprobability of being drawn.

Structured Interview An interview in which questions to be asked, theirsequence, and the detailed information to be gatheredare all predetermined; used where maximumconsistency across interviews and interviewees isneeded.

Triangulation The combination of methodologies in the study of thesame phenomenon or construct; a method ofestablishing the accuracy of information bycomparing three or more types of independent pointsof view on data sources (for example, interviews,observation, and documentation; differentinvestigations; different times) bearing on the samefindings. Akin to corroboration and an essentialmethodological feature of case studies.


Glossary

Yoked Concurrent with. For example, data collection andanalyses in case studies are iterative andconcurrent—that is, are yoked.


Papers in This Series

This is a flexible series continually being added to andupdated. The interested reader should inquire aboutthe possibility of additional papers in the series.

The Evaluation Synthesis. Transfer paper 10.1.2,formerly methods paper I.

Content Analysis: A Methodology for Structuring andAnalyzing Written Material. Transfer paper 10.1.3,formerly methodology transfer paper 3.

Designing Evaluations. Transfer paper 10.1.4,formerly methodology transfer paper 4.

Using Structured Interviewing Techniques. Transferpaper 10.1.5, formerly methodology transfer paper 5.

Using Statistical Sampling. Transfer paper 10.1.6,formerly methodology transfer paper 6.

Developing and Using Questionnaires. Transfer paper10.1.7, formerly methodology transfer paper 7.

Case Study Evaluations. Transfer paper 10.1.9,formerly methodology transfer paper 9.

Prospective Evaluation Methods: The ProspectiveEvaluation Synthesis. Transfer paper 10.1.10, formerlymethodology transfer paper 10.

(973317) GAO/PEMD-91-10.1.9 Case Study EvaluationsPage 150

Ordering Information

The first copy of each GAO report and testimony

is free. Additional copies are $2 each. Orders

should be sent to the following address,

accompanied by a check or money order made

out to the Superintendent of Documents, when

necessary. VISA and MasterCard credit cards

are accepted, also. Orders for 100 or more

copies to be mailed to a single address are

discounted 25 percent.

Orders by mail:

U.S. General Accounting Office

P.O. Box 6015

Gaithersburg, MD 20884-6015

or visit:

Room 1100

700 4th St. NW (corner of 4th & G Sts. NW)

U.S. General Accounting Office

Washington, DC

Orders may also be placed by calling

(202) 512-6000 or by using fax number

(301) 258-4066, or TDD (301) 413-0006.

Each day, GAO issues a list of newly available

reports and testimony. To receive facsimile

copies of the daily list or any list from the past

30 days, please call (202) 512-6000 using a

touchtone phone. A recorded menu will provide

information on how to obtain these lists.

For information on how to access GAO reports

on the INTERNET, send an e-mail message with

"info" in the body to:

[email protected]

United States

General Accounting Office

Washington, D.C. 20548-0001

Official Business

Penalty for Private Use $300

Address Correction Requested

Bulk Rate

Postage & Fees Paid

GAO

Permit No. G100