assessment literacy and performance-based assessments

Assessment Literacy and Performance-Based Assessments

Jennifer BorgioliLearner-Centered Initiatives, Ltd.

Organizational Focus

Assessment to produce learning…

and not just measure learning.

“Less than 20% of teacher preparation programs contain higher level or advanced courses in psychometrics (assessment design) or instructional data analysis.”

Inside Higher Education, April 2009

Do you honestly want to know what X exactly is? Is your life going to be improved by momentarily knowing what x is? No. Absolutely not. This whole problem is a conspiracy against hardworking American students. Let me tell you, solving for X right now is not going to stop the recession. It fact, it’s not going to do anything. And another thing. When have you ever had to know what is X is in your long esteemed professional career? Exactly. This is a futile attempt for “educators” in this district to boast of their student’s success rate. I am going to go the rest of my life not knowing what X is. Because what is X when you really think about it? A letter, the spot, two lines crossing each other. I don’t think anyone will ever really know what X truly is because the essence of X is beyond our brain potential. In conclusion, Harry S. Truman’s middle name was just the letter S, not an actual name. Now that is a letter that’s actually being utilized. See, you learned something, and it was not because of this logarithm. The End.

ImplicationsMinimize interruptions.

Make them worthy.

To be assessment savvy….

1999 APA Testing Standards

“The higher the stakes of an assessment’s results, the higher the expectation for the documentation supporting the assessment

design and the decisions made based on the assessment results.”

Assessment

• Definition: The strategic collection of evidence of student learning. (Martin-Kniep, 2005)

• Analogy: Assessment: test as dogs: pitbull

• A thing and a process

Traditional Assessment

Performance-Based

Assessment

Performance-Based Assessments (PBAs)

A performance task is an assessment that requires students to demonstrate achievement by producing an extended written or spoken answer, by engaging in group or individual activities, or by creating a specific product. (Nitko, 2001)

Performance vis-à-vis- traditionalLiskin-Gasparro (1997) and Mueller (2008)

Attribute Traditional PerformanceAssessment activity Selecting a

responsePerforming a task

Nature of activity Contrived Emulates real lifeCognitive level Knowledge/

comprehensionApplication/ analysis/synthesis

Development of solution

Teacher-structured Student-structured

Objectivity of scoring

Easily achieved Difficult to achieve

Evidence of mastery Indirect Direct

Assessment considerationsWhy?

Purpose

Assessment of Learning

Assessment for Learning

For Whom?Audience

Student

Teacher

Parent

Administration(NYSED)

What?Learning Targets

Knowledge

Skills and Abilities

Reasoning

Dispositions

When? Timing

Periodic

Diagnostic

Formative

Summative

How? Types

Recall

Product

Performance

Process

Validity = Accuracy

How do we ensure alignment and validity in assessment?

Degrees of Alignment

S

1. The assessment clearly aligns to the target; the assessment and the target are almost the same.

2. The language of the standard is explicit. 3. You can confidently conclude the level of student learning/ understanding of

the target.

M

1. The assessment addresses the target; the target is included in the assessment but is not the primary focus.

2. The language of the standard is only partially used. 3. You need more data points to confidently infer the level of student

learning/understanding of the target.

W1. The assessment misses the target; it might prepare kids for the target, but

doesn’t address it. 2. The language of the standards is missing or barely referenced.3. You cannot assess level of student learning/understanding of the target.

If you want to assess your students’ ability to perform, design, apply, interpret. . .

. . . then assess them with a performance or product task that requires them to perform, design, apply, or interpret.

I cannot claim my assessment is valid if I do not

have some type of articulated test map

Minimum

Articulated

New York State Learning Standard: Read to collect and interpret data, facts, and ideas from unfamiliar texts (4 items, 15% of test)

23

The student chose a response that completes the sentence

with an inference that is related to another element in the

passage but not to the specified detail


with an inference that is related to the main idea of the

passage but not to the specified detail

Correct Response: The student chose the correct response,

demonstrating that the student can infer a detail from passage

text


with an inference that may be based on prior knowledge and not supported by the passage

24

The student chose a response that describes a point of view

that is mentioned in the passage, but that is not the

author or narrator's point of view

The student chose a response that describes a point of view

that is related to passage content, but that is not stated

or implied in the passage

Correct Response: The student chose the correct response,

demonstrating that the student can infer an author or

narrator's point of view

The student chose a response that describes a point of view that is contradicted by details

in the passage

How many?3-5

3 – 5 standards in a PBA (reflected in rows in the rubric)

3 – 5 items per standard on a traditional test

Reliability = Consistency

I cannot claim my assessment is reliable if I do not have statistics to support

my claim

Reliability

Indication of how consistently an assessment measures its intended target and the extent to which scores are relatively free of error. Low reliability means that scores cannot be trusted for decision making. Necessary but not sufficient condition to ensure validity.

three general ways to collect evidence of reliability

• Stability: How consistent are the results of an assessment when given at two time-separated occasions?

• Alternate Form: How consistent are the results of an assessment when given in two different forms?;

• Internal Consistency: How consistently do the test’s items function?

Three Types of Measurement Error

• Subject effect• Test effect• Environmental effects

Subject Effects

Others…

• Fatigue• Sleep deprivation• Illness• Disability

Testing FatigueTest Familiarity

Bias

Score

Score

Test Effects

Examples

• Not enough space for a response• Confusing items• Typos• Misleading (or lacking) directions• Scorer inconsistencies

10. Format the item vertically instead of horizontally.

From A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment by Haladyna, Downing, and Rodriguez

21. Place choices in logical or numerical order. Students should not have to hunt to find an

answer. Answers should be provided in a logical, predictable pattern.

Compare with . . .

Final Eyes isn’t about editing

rather “is this what you want the students to see/read?”

From Haladyna:26. Avoid All-of-the-above.28. Avoid giving clues to the right answer, such as specific determiners including always, never, completely, and absolutely

Develop Test Maps and Item Analysis Procedures

• The higher the stakes of an assessment, the more we need to play by the rules

• If it’s a mid-term or final exam, there should be a test map.

• Consider also:– Item analysis– Using choice E (primarily for pre-assessments)

Engage in peer review “Final Eyes”

– Is each item aligned to a standard?*– Is each item rigorous?– Is each item fair?– Does each item have one, unambiguous correct key?

*– Are all plausible/text based?– Are all tasks meaningful and build upon student

comprehension?

*Very hard to answer without a test map

3. Develop Context-Dependent Item Sets for Content Areas

Test from Period 1

Test from Period 2

Environmental Effects

Cronbach’s Alpha

• “In statistics, Cronbach's (alpha) is a coefficient of reliability. It is commonly used as a measure of the internal consistency or reliability of a psychometric test score for a sample of examinees. Alpha is not robust against missing data.”

Item Analysis

“This isn’t familiar to me”

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300%

10%

20%

30%

40%

50%

60%

70%

80%

90%

Percent of Students Selecting Choice “E”

One assessment does not an assessment

system make.

Fairness and Bias

Fair tests are accessible and enable all students to show what they know. Bias emerges when features of the assessment itself impede students’ ability to demonstrate their knowledge or skills.

In 1876, General George Custer and his troops fought Lakota and Cheyenne warriors at the Battle of the Little Big Horn. In there had been a scoreboard on hand, at the end of that battle which of the following score-board representatives would have been most accurate?

A. Soldiers > IndiansB. Soldiers = IndiansC. Soldiers < IndiansD. All of the above scoreboards are equally accurate

What are other attributes of quality assessments?

WHEN DESIGNING A PRE/POST PERFORMANCE TASK

• the standards and thinking demands must stay the same.

• the modality that students express their thinking through must also stay the same.

• the content of the baseline and post must be different. • the rubrics for the pre/post will be the same in terms of thinking

and modality, but the content dimension will be different.

Jennifer [email protected]

@datadiva

assessment literacy and performance-based assessments

Documents

assessment results

assessment literacy

assessment opportunities

assessment savvy

essence of x

formal assessment interrupts

hardworking american

students success rate