overview of criteria for evaluating educational assessment

24
OVERVIEW OF CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT Reliability (Chapter 3) Validity (Chapter 4) Absence-of-Bias (Chapter 5)

Upload: aquila

Post on 06-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

OVERVIEW OF CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT. the BIG 3. Reliability (Chapter 3) Validity (Chapter 4) Absence-of-Bias (Chapter 5). Reliability of Assessment Chapter 3. (p.61-82) W. James Popham. Standardized tests are evaluated by reliability. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

OVERVIEW OF

CRITERIA FOR EVALUATING

EDUCATIONAL ASSESSMENT

Reliability (Chapter 3)

Validity (Chapter 4)

Absence-of-Bias (Chapter 5)

Page 2: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

Reliability of Assessment

Chapter 3(p.61-82) W. James Popham

Page 3: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

RELIABILITY of assessment:

Standardized tests are evaluated by reliability. However, an individual’s performances and

responses vary from one occasion to another, even under controlled conditions.

“An individual’s score and the average score of a group will always reflect at least a small amount of measurement error” (qtd. In American Education Research Association, 1999).

Page 4: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

RELIABILITY = CONSISTENCY

1. STABILITY RELIABILITY: consistency of tests over time.

Test students on one occasion, wait a week or two, and test them again using the same assessment.

Compare the scores from the two testing times, to determine the test’s stability.(See Table 3.2, p. 65)(However, teachers don’t normally determine the reliability of their own classroom tests, unless they’re doing research or

evaluating an end of term test, etc..)

Page 5: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

RELIABILITY = CONSISTENCY

“In general, if you construct your own classroom tests WITH CARE, those tests will be sufficiently reliable for the decisions you will base on the tests’ results” (Popham 75).

“In short, you need to be at least knowledgeable about the fundamental meaning of reliability, but I do not suggest you make your own classroom tests pass any sort of reliability muster” (inspection) (Popham 75).

Page 6: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

RELIABILITY = CONSISTENCY 2. ALTERNATE-FORM RELIABILITY: Using two

different test forms (Form A, Form B) that are allegedly equivalent.

To determine the alternate-form consistency, give both test forms to the same individuals.

Compare the students’ scores from the two test forms, and decide on a level of performance that you would consider to be passing.

Commercially published or state-developed tests should have evidence to support that their alternate test forms are indeed equivalent (67).

Page 7: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

RELIABILITY = CONSISTENCY

3. INTERNAL CONSISTENCY RELIABILITY: Are the items in a test doing their measurement job in a consistent manner?

The internal items on a test should be homogeneous, or designed to measure a single variable such as “reading achievement” or “mathematical problem solving” (69).

“The more items on an educational assessment, the more reliable it will tend to be” (69). (Example: a 100-item test on mathematics will give you a more reliable fix on a student’s ability in math, than a 20 item test.)

Page 8: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

STANDARD ERROR OF MEASUREMENT (SEM)

The consistency of an individual’s scores if given the same assessment again and again and again.

The higher the reliability of the test, the smaller the SEM will be (72).

(See the formula for SEM on p. 73).

Page 9: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

STANDARD ERROR OF MEASUREMENT (SEM)

The SEM is linked to the way students’ performances on state accountability tests are reported:

(Exemplary, Exceeds Standards, Meets Standards, Approaching Standards, Academic Warning)

(Advanced, Proficient, Basic, Below Basic) (74).

Page 10: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

What do teachers really need to know about RELIABILITY?

Reliability is a central concept in the measurement of an assessment’s consistency (75).

•Teachers may be called on to explain to parents the meaning of a student’s important test scores, and the reliability of the test is at stake.•Teachers should be knowledgeable about the three types of reliability evidence. Test manuals should support the reliability evidence.

•Popham doesn’t think teachers need to devote time in calculating the reliability of their own tests; however, teachers DO need to have a general knowledge about

reliability and why it’s important (77).

Page 11: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

Validity

Chapter 4

(p.83-110) W. James Popham

Page 12: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

“VALIDITY is the most significant concept in assessment”

However, according to Popham, “There is no such thing as a ‘valid test’(85). Rather, validity centers on the accuracy of the inferences that teachers make about their students through evidence gathered formally or informally (87).

Popham suggests that teachers focus on score-based and test-based inferences and make sure they are accurate / valid (87).

Page 13: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

VALIDITY EVIDENCE

1. CONTENT RELATED: The content standards, or skills and knowledge that

need to be mastered, are represented on the assessment.

Teachers should focus their instructional efforts on content standards or curriculum aims.

Page 14: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

VALIDITY EVIDENCE 2. CRITERION RELATED: An assessment that is used in order to predict how well a

student will perform at some later point. An example is the relationship between students’ scores on (1) an

aptitude test such as the SAT or the ACT test that a student takes in high school, and (2)the scores the student gets from this test that predicts how well the student is apt to perform in college (97).

However, Popham says that “these tests are far from perfect.” In fact, he suggests that “25 % of student’s college grades can be explained by their scores on aptitude tests. Other factors such as motivation and study habits account for the other 75%”(98).

Page 15: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

VALIDITY EVIDENCE

CONSTRUCT RELATED: The test items are accurately measuring what they

are supposed to be measuring. Constructed evidence, such as a student’s ability

to write an essay, is accurately assessed.

Page 16: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

What do teachers really need to know about VALIDITY?

Remember that it is not the validity of the ‘test’, but score-based inferences. “And because score-based inferences reflect judgments made by people, some of those judgments will be in error” (Popham 106).

•Again, Popham doesn’t think teachers should worry about gathering validity evidence; however, they DO need to have a reasonable understanding of the 3 types of validity. They must be especially concerned about content related validity.

Page 17: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

Absence-of-Bias

Chapter 5

(p.111-137) W. James Popham

Page 18: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

“Assessment bias” is:

Any element in an assessment procedure that offends or unfairly penalizes students because of personal characteristics;

These characteristics include students’ gender, race, ethnicity, socioeconomic status, religion, etc.

(111).

Page 19: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

Examples of “offensive” content on test items:

Only males are shown in high-paying and prestigious positions (attorneys, doctors); while women are portrayed in low-paying and unprestigious positions (housewives, clerks).

Word problems are based on competitive sports, using mostly boys’ names, suggesting that girls are less skilled.

Page 20: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

Examples of “unfair penalization” on test items:

Problem solving items (deals with attending operas and symphonies); may be more advantageous for children from affluent families, than lower socioeconomic students who may lack these experiences (113).

Page 21: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

CHILDREN WITH DISABILITIES AND FEDERAL LAW

1975, The Education for All Handicapped Children Act (Public Law 94-142)

“Individualized Education Program” (IEP) 1997, The Individuals with Disabilties Act

(IDEA) Special education significantly altered in 2002

with No Child Left Behind (NCLB). NCLB required adequate yearly progress

(AYP) not only for students overall, but for subgroups including children with disabilities (123).

Page 22: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

ENGLISH LANGUAGE LEARNERS (ELL)

Students whose first language is not English and know very little, if any;

Students who are beginning to learn English; Students who are proficient in English but

need additional assistance. Popham states, “To provide equivalent tests in

English and all other first languages spoken by students is an essentially impossible task” (130).

Page 23: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

ASSESSMENT MODIFICATIONS?

Today’s federal laws oblige teachers to test ELL students and students with disabilities on the same curricular goals as all other children.

Assessment accommodations or modifications are not completely satisfying nor accurate.

“Absence-of-bias is especially difficult to attain in cases of ELL students and students with disabilities”

(Popham 133).

Page 24: OVERVIEW OF  CRITERIA FOR EVALUATING EDUCATIONAL ASSESSMENT

OVERVIEW OF

CRITERIA FOR EVALUATING

EDUCATIONAL ASSESSMENT

Reliability (Chapter 3)

Validity (Chapter 4)

Absence-of-Bias (Chapter 5)