assessment literacy and performance-based assessments
DESCRIPTION
Assessment Literacy and Performance-Based Assessments. Jennifer Borgioli Learner-Centered Initiatives, Ltd. Organizational Focus. Assessment to produce learning… a nd not just measure learning. - PowerPoint PPT PresentationTRANSCRIPT
Assessment Literacy and Performance-Based Assessments
Jennifer BorgioliLearner-Centered Initiatives, Ltd.
Organizational Focus
Assessment to produce learning…
and not just measure learning.
“Less than 20% of teacher preparation programs contain higher level or advanced courses in psychometrics (assessment design) or instructional data analysis.”
Inside Higher Education, April 2009
Do you honestly want to know what X exactly is? Is your life going to be improved by momentarily knowing what x is? No. Absolutely not. This whole problem is a conspiracy against hardworking American students. Let me tell you, solving for X right now is not going to stop the recession. It fact, it’s not going to do anything. And another thing. When have you ever had to know what is X is in your long esteemed professional career? Exactly. This is a futile attempt for “educators” in this district to boast of their student’s success rate. I am going to go the rest of my life not knowing what X is. Because what is X when you really think about it? A letter, the spot, two lines crossing each other. I don’t think anyone will ever really know what X truly is because the essence of X is beyond our brain potential. In conclusion, Harry S. Truman’s middle name was just the letter S, not an actual name. Now that is a letter that’s actually being utilized. See, you learned something, and it was not because of this logarithm. The End.
ImplicationsMinimize interruptions.
Make them worthy.
To be assessment savvy….
1999 APA Testing Standards
“The higher the stakes of an assessment’s results, the higher the expectation for the documentation supporting the assessment
design and the decisions made based on the assessment results.”
Assessment
• Definition: The strategic collection of evidence of student learning. (Martin-Kniep, 2005)
• Analogy: Assessment: test as dogs: pitbull
• A thing and a process
Traditional Assessment
Performance-Based
Assessment
Performance-Based Assessments (PBAs)
A performance task is an assessment that requires students to demonstrate achievement by producing an extended written or spoken answer, by engaging in group or individual activities, or by creating a specific product. (Nitko, 2001)
Performance vis-à-vis- traditionalLiskin-Gasparro (1997) and Mueller (2008)
Attribute Traditional PerformanceAssessment activity Selecting a
responsePerforming a task
Nature of activity Contrived Emulates real lifeCognitive level Knowledge/
comprehensionApplication/ analysis/synthesis
Development of solution
Teacher-structured Student-structured
Objectivity of scoring
Easily achieved Difficult to achieve
Evidence of mastery Indirect Direct
Assessment considerationsWhy?
Purpose
Assessment of Learning
Assessment for Learning
For Whom?Audience
Student
Teacher
Parent
Administration(NYSED)
What?Learning Targets
Knowledge
Skills and Abilities
Reasoning
Dispositions
When? Timing
Periodic
Diagnostic
Formative
Summative
How? Types
Recall
Product
Performance
Process
Validity = Accuracy
How do we ensure alignment and validity in assessment?
Degrees of Alignment
S
1. The assessment clearly aligns to the target; the assessment and the target are almost the same.
2. The language of the standard is explicit. 3. You can confidently conclude the level of student learning/ understanding of
the target.
M
1. The assessment addresses the target; the target is included in the assessment but is not the primary focus.
2. The language of the standard is only partially used. 3. You need more data points to confidently infer the level of student
learning/understanding of the target.
W1. The assessment misses the target; it might prepare kids for the target, but
doesn’t address it. 2. The language of the standards is missing or barely referenced.3. You cannot assess level of student learning/understanding of the target.
If you want to assess your students’ ability to perform, design, apply, interpret. . .
. . . then assess them with a performance or product task that requires them to perform, design, apply, or interpret.
I cannot claim my assessment is valid if I do not
have some type of articulated test map
Minimum
Basic
Articulated
New York State Learning Standard: Read to collect and interpret data, facts, and ideas from unfamiliar texts (4 items, 15% of test)
23
The student chose a response that completes the sentence
with an inference that is related to another element in the
passage but not to the specified detail
The student chose a response that completes the sentence
with an inference that is related to the main idea of the
passage but not to the specified detail
Correct Response: The student chose the correct response,
demonstrating that the student can infer a detail from passage
text
The student chose a response that completes the sentence
with an inference that may be based on prior knowledge and not supported by the passage
24
The student chose a response that describes a point of view
that is mentioned in the passage, but that is not the
author or narrator's point of view
The student chose a response that describes a point of view
that is related to passage content, but that is not stated
or implied in the passage
Correct Response: The student chose the correct response,
demonstrating that the student can infer an author or
narrator's point of view
The student chose a response that describes a point of view that is contradicted by details
in the passage
How many?3-5
3 – 5 standards in a PBA (reflected in rows in the rubric)
3 – 5 items per standard on a traditional test
Reliability = Consistency
I cannot claim my assessment is reliable if I do not have statistics to support
my claim
Reliability
Indication of how consistently an assessment measures its intended target and the extent to which scores are relatively free of error. Low reliability means that scores cannot be trusted for decision making. Necessary but not sufficient condition to ensure validity.
three general ways to collect evidence of reliability
• Stability: How consistent are the results of an assessment when given at two time-separated occasions?
• Alternate Form: How consistent are the results of an assessment when given in two different forms?;
• Internal Consistency: How consistently do the test’s items function?
Three Types of Measurement Error
• Subject effect• Test effect• Environmental effects
Subject Effects
Others…
• Fatigue• Sleep deprivation• Illness• Disability
Testing FatigueTest Familiarity
Bias
Score
Score
Test Effects
Examples
• Not enough space for a response• Confusing items• Typos• Misleading (or lacking) directions• Scorer inconsistencies
10. Format the item vertically instead of horizontally.
From A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment by Haladyna, Downing, and Rodriguez
21. Place choices in logical or numerical order. Students should not have to hunt to find an
answer. Answers should be provided in a logical, predictable pattern.
Compare with . . .
Final Eyes isn’t about editing
rather “is this what you want the students to see/read?”
From Haladyna:26. Avoid All-of-the-above.28. Avoid giving clues to the right answer, such as specific determiners including always, never, completely, and absolutely
Develop Test Maps and Item Analysis Procedures
• The higher the stakes of an assessment, the more we need to play by the rules
• If it’s a mid-term or final exam, there should be a test map.
• Consider also:– Item analysis– Using choice E (primarily for pre-assessments)
Engage in peer review “Final Eyes”
– Is each item aligned to a standard?*– Is each item rigorous?– Is each item fair?– Does each item have one, unambiguous correct key?
*– Are all plausible/text based?– Are all tasks meaningful and build upon student
comprehension?
*Very hard to answer without a test map
3. Develop Context-Dependent Item Sets for Content Areas
Test from Period 1
Test from Period 2
Environmental Effects
Cronbach’s Alpha
• “In statistics, Cronbach's (alpha) is a coefficient of reliability. It is commonly used as a measure of the internal consistency or reliability of a psychometric test score for a sample of examinees. Alpha is not robust against missing data.”
Item Analysis
“This isn’t familiar to me”
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 300%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Percent of Students Selecting Choice “E”
One assessment does not an assessment
system make.
Fairness and Bias
Fair tests are accessible and enable all students to show what they know. Bias emerges when features of the assessment itself impede students’ ability to demonstrate their knowledge or skills.
In 1876, General George Custer and his troops fought Lakota and Cheyenne warriors at the Battle of the Little Big Horn. In there had been a scoreboard on hand, at the end of that battle which of the following score-board representatives would have been most accurate?
A. Soldiers > IndiansB. Soldiers = IndiansC. Soldiers < IndiansD. All of the above scoreboards are equally accurate
What are other attributes of quality assessments?
WHEN DESIGNING A PRE/POST PERFORMANCE TASK
• the standards and thinking demands must stay the same.
• the modality that students express their thinking through must also stay the same.
• the content of the baseline and post must be different. • the rubrics for the pre/post will be the same in terms of thinking
and modality, but the content dimension will be different.
Jennifer [email protected]
@datadiva