evaluating evidence about educational programs (eeep): the...

Evaluating Evidence about Educational Programs (EEEP): The Handbook

June 1, 2012

A Knowledge Network for Applied Education Research-funded Project

of the Association of Educational Researchers of Ontario and the Ontario Institute for Studies in Education

Visit http://www.aero-aoce.org/eeep.html for more information

1

Table of Contents Preface: About the Evaluating Evidence about Educational Programs (EEEP) Project ............................... 2 Introduction ................................................................................................................................................... 3

Effectiveness, Equity and Efficiency ........................................................................................................ 3

DECISION TREE ......................................................................................................................................... 5 PHASE 1: FINDING THE EVIDENCE ....................................................................................................... 6 PHASE 2: SORTING AND ASSESSING THE EVIDENCE ...................................................................... 7 1. Research Studies & Reviews .................................................................................................................... 7

Section A: Description of Research Design Building Blocks ................................................................... 8

Quantitative Designs ............................................................................................................................. 8

Qualitative Designs ............................................................................................................................... 9

Mixed Methods Designs ..................................................................................................................... 10

Research Reviews ............................................................................................................................... 10

Section B: Quality Indicators for Research Studies & Reviews ............................................................. 11

Applicable to All Studies ........................................................................................................................ 11

Applicable to all Quantitative Designs ................................................................................................... 13

Applicable to Group Experiment Designs .............................................................................................. 14

Applicable to Studies with Multiple Data Collections over Time .......................................................... 15

Applicable to Quantitative Graphical Analysis ...................................................................................... 15

Applicable to Correlational/Descriptive Designs ................................................................................... 16

Applicable to Qualitative Studies ........................................................................................................... 16

Applicable to Mixed Methods Studies .................................................................................................... 16

Applicable to Research Reviews ............................................................................................................. 17

2. Professional Judgement ........................................................................................................................... 18 Quality Indicators for Action Research .................................................................................................. 18

Quality Indicators for Professional Opinion ........................................................................................... 19

3. Media, Journalism & Anecdote .............................................................................................................. 20 Quality Indicators for Media, Journalism & Anecdote ........................................................................... 20

PHASE 3: SUMMARIZE & SYNTHESIZE THE EVIDENCE ................................................................ 21

2

Preface: About the Evaluating Evidence about Educational Programs (EEEP) Project

The EEEP project is a collaboration between researchers from the Ontario Institute for Studies in Education (OISE)and the Association of Educational Researchers of Ontario (AERO). With funding from the Knowledge Network for Applied Education Research (KNAER), we have worked together since mid-2011 to develop a framework to help educators evaluate evidence about educational programs.

The purpose of this project is to support Ontario school boards in using evidence to inform their decision-making, especially about educational programs for students with special education needs. The EEEP process is intended to support preliminary investigations of educational programs that school boards may be thinking of adopting in the future. It focuses on gathering and evaluating existing evidence. This is, of course, only one part of a larger process, which may include community consultations and piloting of the program within the school board.

The target audience for the EEEP project is school board researchers and those who work with them to form evidence-based decisions. The project team has collaborated, through discussions and pilot testing, to refine the framework. One notable development has been a broadening from a focus on formal educational research to include other types of evidence. Another is the presentation of the framework as a ‘decision tree’ that focuses on three principles –effectiveness, efficiency, and equity – and three kinds of evidence – research; professional judgement; and media, journalism and testimonials.

The project team includes, from OISE, Christie Fraser and Jayme Herman, both doctoral students, Professor Ruth Childs, and Susan Elgie, and, from AERO, Kim Bennett, Stephanie Pagan, Susan Palijan, and Greg Rousell. Other AERO members have also contributed, most notably through a SIG meeting in late October 2011.

With questions or suggestions, please email Ruth Childs at [email protected].

3

Introduction

In the current policy context, educators are expected to base decisions on evidence. This approach requires that information about educational programs, innovations or practices be evaluated and described, sometimes a difficult undertaking, prior to consideration by decision-makers. Evaluating Evidence about Educational Programs for Children with Special Needs (EEEP) is a process developed especially for use in Ontario and is intended as a support to educators. The EEEP process consists of three steps for handling evidence:

1) gathering and filtering, 2) sorting and assessing, and 3) summarizing and synthesizing.

There are three fundamental criteria that must be assessed at all steps: effectiveness, equity and efficiency (the 3 E’s). There are three types of evidence for which evaluative criteria are described: formal research, professional judgement and journalism, media and anecdotes.

This document is intended to provide supporting material for users of the EEEP process. Immediately following are brief definitions of the 3 E’s. The document also contains sections on types and qualities of evidence from research, from professional judgement and from journalism, media and anecdotes.

Effectiveness, Equity and Efficiency The EEEP process was developed within the context of educational policy and practice in Ontario. According to Ontario’s Education Act, “the purpose of education is to provide students with the opportunity to realize their potential and develop into highly skilled, knowledgeable, caring citizens who contribute to their society.”1 The Act provides further specification with its requirement that school boards “promote student achievement and well-being.”2 These goals are rather abstract; as educators, we interpret the Act to mean that elementary and secondary education must support:

• student cognitive, social, and psychological development; • student mastery of knowledge and skills defined in the curriculum (supplemented for some in

Individual Educational Plan or IEP); and • student well-being.

Each of the “3 E’s” can be defined as a reflection of these goals.

What does it mean for an education program to be effective? The Oxford Dictionary of English defines effective as “successful in producing a desired or intended result.”3 To understand the concept of educational effectiveness in the Ontario context we need to regard program objectives within the framework provided by the Education Act; that is, not only does the program meet its stated objectives but do those objectives also meet one or more of the Ontario goals without detracting from others? For example, if a program raised achievement, but students were miserable in class, it would not be seen as effective since student well-being was reduced.

1 From Education Act, section 0.1 (2). 2 From Education Act, section 169.1 (1). 3 From Oxford Dictionary of English, Oxford University Press (Edited by Angus Stevenson), 2010, Oxford, UK: Author.

4

The second E is equity. Equity of programming provides students with equal opportunities to reach their potential development, to achieve, and to experience well-being. Equity considerations prevail regardless of students’ attributes, background or location. No single education program will be effective for every student. School boards have the responsibility, therefore, to provide multiple education programs, matched to the needs of the students. As the Ontario Human Rights Commission’s Guidelines on Accessible Education, make clear, the Ontario Human Rights Code “guarantees the right to equal treatment in services, without discrimination on the ground of disability [and] education, in its broadest sense, is a ‘service’ within the meaning of the Code.”4 The requirement of equity implies fairness, not necessarily equality, in allocating educational resources and in defining and judging student outcomes. For example, some programs might allocate more resources to a low achieving subgroup of students than to other students. If the program increased successful outcomes among the more highly resourced students and did not decrease successful outcomes among other students it would (other aspects being equal) be equitable.

In addition to effectiveness and equity, school boards must consider a program’s efficiency – that is, the resource requirements of a program relative to other programs that might have similar levels of effectiveness and equity.5 That is not to say that programs should not be selected simply because they are costly. However, the Education Act reminds us as well that school boards are responsible for “stewardship of the board’s resources.”6 Unless the Ministry of Education provides additional resources for a particular program, the resources allocated to one program make fewer resources available for other programs. Thus, efficiency can have implications for equity. Especially where the effectiveness of two or more programs is similar, the resource requirements of a program should be considered.

4 From Guidelines on Accessible Education (p. 5), Ontario Human Rights Commission, 2004, Toronto: Author. 5 For a detailed discussion of the evaluation of efficiency, see pages 333-337 of Evaluation: A Systematic Approach (7th ed.), by P. Rossi, M. W. Lipsey, & H. E. Freeman, 2004, Thousand Oaks, CA: Sage. 6 From Education Act, section 169.1 (2).

5

DECISION TREE

6

PHASE 1: FINDING THE EVIDENCE The first phase involves defining the scope of the evaluation, gathering the evidence and finally, filtering the evidence collected.

A clear and focussed definition of the scope of the evaluation should be developed at the onset, and may need adjustment as the project progresses. It is important to develop clarity about:

• the topic or program of investigation, • special student attributes, if applicable, • grade level, • context (for example, community type, public/private, SES mix), and, • Type of anticipated effect.

Information of this nature can usually be gleaned from the abstract or introduction of a paper or report. One might perhaps develop a template or spreadsheet page to record the information about scope for subsequent use.

Gathering evidence may provide opportunities for a variety of activities: library and internet search, consulting board colleagues and administrators, getting in touch with colleagues at other boards, attending conferences, collecting information from parents or other stakeholders, checking newspaper and other archives. Search as broadly as you can! If there are time or other constraints, try nonetheless to search a wide variety of sources and media. At this step, you need not read deeply but simply note information relevant to the scope.

Filtering the evidence to retain only evidence relevant to the identified scope will result in considerable time savings and will also help to keep the project progressing in a consistent and focussed direction. You will in the end have a considerable information base for reference, but will study in detail and assess only the information relevant to the scope of your evaluation.

7

PHASE 2: SORTING AND ASSESSING THE EVIDENCE

1. Research Studies & Reviews

Research evidence is available in published or unpublished reports and also other media such as conference presentations and websites. Published reports may appear in professional or research journals. Unpublished reports are often posted or circulated in other ways by school boards and professional organization. Research reviews are summaries of research studies.

There are many approaches to carrying out educational research. Before you proceed to evaluating the research evidence, you will need to read through at least the abstract, so that you can decide which design building blocks were used. Then you will be able to choose the sets of quality indicators that apply. The building blocks are described in Section A.

In section B there are several lists of quality indicators. The first list should be used with all studies; the others as they fit the research in question. Each point should be marked Yes, No, Not Sure or Not Applicable. At the end of the process, please tally the count the number of Yes, No, and Not sure ratings separately, and then take the total. The percentage of Yes to total potential ratings should be at least 80% for the research to be considered a ‘good’ piece of evidence and 90% for the research to be considered ‘excellent’.

In addition, consider whether there is any rating or other aspect of the study that is a ‘deal breaker’ for you personally or for your organization. This kind of reaction tends to be quite specific to particular situations but is important for decision making. For example, a high rate of attrition in a group design may be considered a deal breaker for some.

8

Section A: Description of Research Design Building Blocks

The major designs are formed according to the kind of data that is collected. Quantitative designs analyze numeric data, while qualitative research may involve a variety of non-numeric data formats including text, video, audio, pictures. Mixed methods research comprises the collection and analysis of both qualitative and quantitative data. Action research may involve any kind of data, and is distinguished from the other types because it is based in and informs a particular instructor’s practice.

Quantitative Designs There are many possible quantitative designs, which have different assumptions, methods and quality indicators.

Group Comparison Designs There are three basic kinds of group comparison designs (combinations are possible); they are distinguished by how the participants are chosen. Randomized Controlled Trials and Quasi-Experimental Designs are kinds of experiments while designs that investigate ‘groups’ characterized by demographic or other characteristics are judged differently.

Randomized Controlled Trial

These are classic ‘experiments’, usually carried out in a laboratory setting. Often an experimental treatment group (the program group) is compared to a comparison or control group. The defining characteristic of this design is that participants must be randomly assigned to groups. The groups in this design consist of people who are treated in a similar way—the people may never actually meet.

Use the group experiment quality indicators.

Quasi-Experimental Design

These studies are based in the ‘real world,’ often in schools. The defining characteristic of this design is that groups are intact groups such as school classes; classes that receive a treatment are compared to those who do not. The groups should be randomly assigned to treatments.

Use the group experiment quality indicators.

Comparison of demographic or diagnostic categories

These studies compare people who differ on a demographic or other characteristic, for example studies that compare boys and girls or learning disabled with normally achieving. Participants should be selected randomly from the appropriate sub-populations. The researcher does not have as active a role in forming or assigning groups, but does have control over sampling.

Use the correlational design quality indicators.

9

Designs Involving Multiple Data Collections over Time Many designs involve multiple measures on individuals over time (waves of measurement).

Repeated-measures design

Studies employing repeated measures often involve an intervention, with measures taken at least pre- and post- intervention, preferably also as a followup, after a few months have passed. Such designs are also referred to as pre-post-followup designs. These studies are usually not longer than a year, often considerably shorter and involve no more than 5 measures.

Longitudinal Studies

Longitudinal studies repeat tests or surveys over a period of years with administrations taking place annually or more often. Examples are: tracing ELL children’s vocabulary development from Grade 1 to high school graduation, the Youth in Transition Survey that has surveyed a sample who were 15 in 1999 every two years since then, or studies of Response to Intervention (RTI) undertaken by teachers with children who have special needs.

Single-Case Design

Single-case designs involve intensive study of a small number of respondents, or very occasionally groups of respondents. Thus the each respondent constitutes a separate unit of study. A large number of observations are taken, often using a behaviour checklist, before, during and after treatment. Typically, treatment is withdrawn and reintroduced. The design requires 3 demonstrations of experimental effect with a single participant or across different participants. The results are often displayed graphically and document a pattern of experimental control.

Correlational/Descriptive Designs Correlational/descriptive designs involve quantitative data that may have come from administrative or other records, test or survey data. Designs may involve comparison of groups or may be a whole-sample analysis and may also involve multiple measures over time. The focus is often more on discerning relationships among variables than differences between groups but may be simply descriptive.

If a correlational design involves repeated or group comparison elements, the quality indicators for those designs are also relevant.

Qualitative Designs There are many qualitative research models; intrinsic to them is the search for meaning intrinsic to the data, in what is often termed an interpretive stance. Researchers strive to understand the meanings the participants have constructed -- how they make sense of their experiences within the context.

Although there are a number of different qualitative models or schools, the major quality indicators apply to most—only one set of quality indicators is presented here.

Policy statements, student work, open-ended questionnaire responses and other text data may be analyzed using qualitative techniques such as coding or content analysis although researchers do not adhere to the full qualitative model.

10

Mixed Methods Designs Mixed methods studies have both qualitative and quantitative components. Designs may be quite complicated as connections between qualitative and quantitative methods may be made at all or each of the design, implementation, analysis and interpretation stages.

Research Reviews Research reviews are summaries of the research evidence on a set of related individual papers or studies. A narrative review contains a critical appraisal of the state of research on a topic. Typically the literature is reviewed according to method, and usually also by sub-topic or theme.

A meta-analysis combines the results of several studies that address a set of related research questions. The term was originally defined for use in quantitative research; one common approach is to calculate a common measure of effect size over the studies. Some authorities also refer to qualitative meta-analyses.

11

Section B: Quality Indicators for Research Studies & Reviews

In this section there are lists of quality indicators for:

• all studies, • all quantitative designs,

o group experiment designs, o studies with multiple data collections over time, o quantitative graphical analysis, o correlational/descriptive designs,

• qualitative studies, • mixed methods studies, and • research reviews.

Applicable to All Studies

All Studies: Background & Design

1. There is a review of the literature that places the research topic in a framework.

2. The research questions or problems are listed.

3. The research design is identified.

4. Exact criteria for inclusion of participants are provided and appropriate to the design.

5. The sampling procedure for the study is described.

6. Ethical considerations are discussed.

7. There is convincing information that the data collection method and/or instruments are valid; that is, the construct being measured is what is intended to be measured.

8. There is convincing information that the data collection method and/or instruments are reliable; that is, that they are accurate and robust.

12

All Studies: Description of Context, Participants and Procedures

9. Demographic and personal information about participants is provided. Description of participants is of a type and level of detail appropriate to the design.

10. Information is provided about the context of the study (e.g. community type, type of school or institution if applicable, another context if not).

11. If the study involves an intervention,

• The intervention must be described in sufficient detail as to be replicable. • The treatment given to the comparison group (or the comparison treatment in a

treatment withdrawal design) should be described in equal detail. • Information is provided to enable the assessment of treatment integrity

12. The data collection method is described in detail including information about the schedule of testing or observation.

13. Information is provided about attrition during the study and the rate is not high for the situation under investigation.

All Studies: Analysis & Interpretation

14. Details about the mechanics of data storage and analysis are provided (e.g method of data input and analysis software).

15. The data analysis method is clearly described and appropriate.

16. Results are presented clearly.

17. Results are interpreted, with suggestions for application.

18. The results support the conclusion and the conclusions answer the research questions.

19. Limitations of the study and suggestions for further research are presented.

20. There is evidence of freedom from bias.

13

Applicable to all Quantitative Designs 1. Measures including tests, items and questions, if applicable, were appropriate and

sufficient measures of the construct(s).

2. The mean and standard deviation is provided for each measure at each measurement occasion.

3. For test or other continuous data, information is provided about the inter-correlation of variables.

4. There is reference to the completeness of the data set and treatment of missing data.

5. The analysis is appropriate to the level of measurement.

6. The analysis is done at the appropriate level of analysis, usually where the random sampling took place.

7. The analysis is described in such a way that the comparisons made and relationships explored are clear.

8. The summary of findings accurately reflects the results of the statistical tests.

9. Effect sizes are presented.

14

Applicable to Group Experiment Designs

1. Participants are appropriately selected and assigned to treatment.

• For Randomized Controlled Treatment studies, participants are randomly assigned to

treatment groups;

• For Quasi-Experimental Designs, pre-existing groups are randomly assigned to

treatments.

2. If applicable, the groups are demographically comparable (similar in age, gender, SES).

3. If applicable, the groups are comparable on pre-treatment and inclusion measures (that is

group pre-test means are not separated by more than .5 of the pooled standard deviation).

4. The sample size is justified (e.g. power analysis or another convincing argument).

5. The intensity of the treatments each group receives is equal.

6. The attrition rates are not large, and do not differ between groups.

15

Applicable to Studies with Multiple Data Collections over Time

1. Steps were taken to account for pre-test or repeated test priming, for example by using alternate test forms.

2. Steps were taken to account for participant maturation, for example, by using age-graded test forms.

3. Measures were made of outcomes directly and less directly related to the intervention (i.e. proximal and distal outcomes).

4. Measures were taken at appropriate times (e.g baseline / pre-test and follow up assessments are taken at appropriate times for the program under investigation)

5. Measures were taken on at least 3 occasions or situations.

6. If observers or raters are used, the rating scheme has been validated and the authors present evidence that raters used the rating scheme consistently over time.

Applicable to Quantitative Graphical Analysis 1. The graphs are clearly labelled.

2. The axes of the graph display the entire range of values reasonable to expect.

3. Graphs are not cluttered.

4. For single case designs, graphs reflect and display the components of the study comprehensively.

16

Applicable to Qualitative Studies 1. The particular qualitative research design is identified and it is clear how the study

fits the requirements of the tradition.

2. There is a description of the perspectives, values and philosophies of the researchers and their impact on the methodology.

3. Sufficient detail is provided about both participants and setting that subsequent data can be understood in context.

4. There is a description of how descriptive and analytic categories have been derived and refined.

5. There is a discussion of how judgments and conclusions have been reached.

6. The richness and nature of the data are presented, with examples.

7. There is triangulation among data sources and/or analysis methods.

Applicable to Correlational/Descriptive Designs 1. The sample was chosen randomly OR includes all possible participants.

2. If types or categories of participants are compared, the groups are comparable on other variables (e.g. age, gender, SES, context).

Applicable to Mixed Methods Studies 1. There are convincing arguments for all components of the research design.

2. The study meets appropriate requirements of qualitative and quantitative designs for the respective study components.

3. There is a discussion of how the methods were integrated to meet the purpose of the study, preferably including a visual depiction.

4. The evidence from all data sources has been described and there is an integrated interpretation.

17

Applicable to Research Reviews

1. The question or aim of the review is clearly stated.

2. There is a detailed description of the methods used for searching for studies (listed

search terms, data bases searched, years selected).

3. The search was exhaustive, including both published and unpublished studies.

4. There is a clear rationale for including and excluding studies.

5. There are clear criteria and a rationale for assessing the quality of studies.

6. The data extraction methods are clearly delineated.

7. The primary studies are summarized appropriately using a narrative or quantitative

summary.

8. There is a clear description of how the quality of body of evidence was evaluated. The

authors’ conclusions were supported by the evidence they presented.

18

2. Professional Judgement

Action research projects are undertaken by a practitioner in direct response to practice problems. The research is carried out by the practitioner and the end result is incorporation of results into practice. Results of action research are typically published in practice rather than research journals.

Quality Indicators for Action Research 1. The problem investigated stems from the problems and needs of those involved.

2. The action researcher has undertaken a self-reflective inquiry approach.

3. There is a description of the method used, and a description of the iterative process of action and reflection.

4. Evidence is provided about the data, the process, the reflection and the process.

5. There is information about how practice was changed and suggestions for continued action.

19

The opinion of professionals or experts may be sought or expressed on an issue or program. Experts might include teachers, psychologists, psychiatrists, administrators, researchers, lawyers and others, depending on the issue.

Quality Indicators for Professional Opinion

1. There is evidence of expertise from the individual’s degrees or other formal

qualifications.

2. The expert’s record shows skill, experience and/or publication in the field.

3. The expert’s knowledge is up-to-date.

4. The scope of the question at hand is within or mostly within the expert’s field.

5. The expert’s opinion was clear.

6. The opinion was given directly rather than quoted.

7. The opinion was given in the framework of a professional relationship.

8. The opinion is consistent with knowledge in the field, or if not, the expert explains why

it is not.

9. The expert is personally reliable: honest, conscientious, and not biased.

20

3. Media, Journalism & Anecdote

Anecdotes are directly expressed by persons who have experienced a program or service, or by persons closely related to it (parents might be in the latter category). Anecdotes often convey an opinion, either positive or negative.

Media and journalism often concern content about phenomena that the author has observed, and thus convey indirect experience or reported information. Some media and journalism may be opinion, but is written from a journalistic stance.

Some articles or statements may be considered both as journalism/media and as anecdote; an example would be a review of a program or course from someone who has experienced it.

Quality Indicators for Media, Journalism & Anecdote 1. The author is credible, either by direct experience with the program or issue or by

background.

2. The content is current, explicit, and well-written.

3. The opinions expressed are objective and moderate.

4. The source includes references to supporting evidence.

21

PHASE 3: SUMMARIZE & SYNTHESIZE THE EVIDENCE

The purpose of gathering and evaluating various types of evidence is to determine whether a program is appropriate for an educational context. An appropriate program is defined as a program that is effective, equitable, and efficient (3 E’s). Any type of evidence may provide information as to whether a program is effective, equitable and efficient. The nature of the evidence will determine which of the 3 E’s the evidence supports or does not support. For example, quantitative research that compares an intervention group to a control group typically gives us information regarding the effectiveness of a program. Professional judgement, on the other hand, may speak to whether a program is effective and equitable because a teacher has had firsthand experience determining whether each of her students has equal access to the benefits of the program.

There is no hard and fast rule on the quantity of evidence needed to make an informed decision about a program. Any one piece of evidence cannot provide proof that an educational program is appropriate, but it can support or weaken a hypothesis about the program’s appropriacy. The quantity of evidence needed will depend on the quality and type of evidence. For example, it is recommended that you have more pieces of evidence when the evidence is of lower quality. If the evidence is of high quality, however, fewer numbers may suffice. Consistency of the evidence, whether all the evidence points in the same direction, is an additional consideration. When considering effectiveness, do all pieces of evidence indicate that the intervention is effective? Another consideration is how relevant the study is to your particular context. You may have efficiency information for a grade 3 classroom but be working in a grade 6 classroom. Finally, you would need to consider the authorship of the evidence. Evidence derived solely by the creator of an educational program or a single researcher may be biased; therefore obtaining evidence from multiple sources is preferable.

evaluating evidence about educational programs (eeep): the...

Documents