grading your assessments: how to evaluate the quality of your exams

Grading your Assessments: How to Evaluate the Quality of your Exams

An ExamSoft Client Webinar

Grading Your Assessments: How to evaluate the quality

of your exams

A INSLIE T. NIBERT, PHD, RN, FAAN

MARCH 12, 2015

3

Sound Instruction

Educator’s

Golden

Triangle Instruction

Evaluation Objectives

Outcomes

4

Five Guidelines to Developing Effective Critical Thinking Exams

q  Assemble the “basics.”

q  Write critical thinking test items. q  Pay attention to housekeeping duties.

q  Develop a test blueprint. q  Scientifically analyze all exams.

5

Definition

Critical Thinking

The process of analyzing and understanding how and why we reached a certain conclusion.

6

Bloom’s Taxonomy: Benjamin Bloom, 1956 (revised)

Terminology changes "The graphic is a representation of the NEW verbage associated with the long familiar Bloom's Taxonomy. Note the change from Nouns to Verbs [e.g., Application to Applying] to describe the different levels of the taxonomy. Note that the top two levels are essentially exchanged from the Old to the New version." (Schultz, 2005) (Evaluation moved from the top to Evaluating in the second from the top, Synthesis moved from second on top to the top as Creating.) Source: http://www.odu.edu/educ/llschult/blooms_taxonomy.htm

Post-Exam Item Analysis: An important aspect of item writing

Helps to determine the quality of a test

8

9

Consistency of Scores

Reliability Tools

10

q Kuder-Richardson Formula 20 (KR20)—EXAM

Ø Range from –1 to + 1

q Point Biserial Correlation Coefficient (PBCC)—TEST ITEMS

Ø Range from – 1 to + 1

11

q  Item difficulty 30% - 90%

q  Item Discrimination Ratio 25% and Above

q  PBCC 0.20 and Above

q  KR20 0.70 and Above

Standards of Acceptance

Thinking more about mean item difficulty on teacher-made tests… Mean difficulty level for a teacher-‐made

nursing exam should be 80 – 85%.

So, why might low NCLEX-‐RN® pass rates persist when mean difficulty levels on teacher-‐made exams remain consistently

within this desired range?

12

…. and one “absolute” rule about item difficulty  Since the mean difficulty level for a teacher-‐made nursing exam is 80 – 85%, what should the lowest acceptable value be for each test item on the exam? TEST ITEMS ANSWERED CORRECTLY BY 30% or LESS of the examinees should always be considered too difficult, and the instructor must take acSon.

Why?

13

…but what about high difficulty levels? q Test items with high difficulty levels (>90%) oIen yield poor discriminaJon values. q Is there a situaJon where faculty can legiJmately expect that 100% of the class will answer a test item correctly, and be pleased when this happens?

q RULE OF THUMB ABOUT MASTERY ITEMS: Due to their negaJve impact on test discriminaJon and reliability, they should comprise no more than 10% of the test.

14

15

q  Item difficulty 30% - 90%

q  Item Discrimination Ratio 25% and Above

q  PBCC 0.20 and Above

q  KR20 0.70 and Above

Standards of Acceptance

Thinking more about item discrimination on teacher-made tests… q IDR can be calculated quickly, but doesn’t consider variance of the enJre group. Use it to quickly idenJfy items that have zero/negaJve discriminaJon values, since these need to be edited before using again.

q PBCC is a more powerful measure discriminaJon. q Correlates the correct answer to a single test items with the total test score of the student.

q Considers the variance of the enJre student group, not just the lower and upper 27% groups.

q For a small ‘n,’ consider cumulaJve value.

16

… what decisions need to be made about items?

q When a test item has poor difficulty and/or discriminaJon values, acJon is needed. q All of these acSons require that the exam be rescored. q Credit can be given for more than one choice. q Test item can be nullified. q Test item can be deleted.

q Each of these acSons has a consequence, so faculty need to carefully consider these when choosing an acSon. Faculty judgment is crucial when determining acSons affecSng test scores.

17

Standards of Acceptance Nursing

Nursing-PBCC 0.15 and Above

Nursing-KR20 0.60 - 0.65 and Above

18

Thinking more about adjusting standard of acceptance for nursing tests… q Remember that the key staJsJcal concept inherent in calculaJng coefficients is VARIANCE.

q When there is less variance in test scores, reliability of the test will decrease, ie the KR-‐20 value will drop.

q What contributes to lack of variance in nursing students’ test scores?

19

..and a word about using Response Frequencies  SomeJmes LESS is MORE when it comes to ediJng a test item.

 A review of the response frequency data can focus your ediJng.

 For items where 100% of students answer correctly, and no other opJons were chosen, make sure that this is indeed intenJonal (MASTERY ITEM), and not just reflecJve of an item that is too easy (>90% DIFFICULTY.)

 Target re-‐wriJng the “zero” distracters – those opJons that are ignored by students. Replacing “zeros” with plausible opJons will immediately improve item DISCRIMINATION.

21

22

3-Step Method for Item Analysis

1. Review Difficulty Level

2. Review Discrimination Data

q  Item Discrimination Ratio (IDR)

q  Point Biserial Correlation Coefficient (PBCC)

3. Review Effectiveness of Alternatives

q  Response Frequencies

q  Non-distracters Source: Morrison, Nibert, Flick, J. (2006). Critical thinking and test item writing (2nd ed.).Houston, TX: Health Education Systems, Inc.

28

Does the test measure what it claims to measure?

C o n t e n t V a l i d i t y

29

Use a Blueprint to Assess a Test’s Validity

q  Test Blueprint Ø  Reflects Course Objectives

Ø  Rational/Logical Tool

Ø  Testing Software Program Ø  Storage of item analysis data (Last & Cum)

Ø  Storage of test item categories

30

Test Blueprints

q  Faculty Generated

q  Electronically Generated

An electronic blueprint for each exam in each course

31

NCLEX-‐RN® Client Needs Percentages of Items 2011 vs. 2014

34

Source: https://www.ncsbn.org/4701.htm

NCLEX-‐RN® Client Needs Percentages of Items 2011 vs. 2014

Increases vs. Decreases

35

Item Writing Tools for Success …

Knowledge

Test Blueprint

Testing Software

References Morrison, S., Nibert, A., & Flick, J. (2006). Cri$cal thinking and test item wri$ng (2nd ed.). Houston, TX: Health EducaJon Systems, Inc.

Morrison, S. (2004). Improving NCLEX-‐RN pass rates through internal and external curriculum evaluaJon. In M. Oermann & K. Heinrich (Eds.), Annual review of nursing educaJon (Vol. 3). New York: Springer

NaJonal Council of State Boards of Nursing. (2013) 2013 NCLEX-‐RN test plan. Chicago, IL: NaJonal Council of State Boards of Nursing. hpps://www.ncsbn.org/3795.htm

Nibert, A. (2010) Benchmarking for student progression throughout a nursing program: Implica$ons for students, faculty, and administrators. In CapuJ, L. (Ed.), Teaching nursing: The art and science, 2nd ed. (Vol. 3). (pp.45-‐64). Chicago: College of DuPage Press.

37

Have Ques]ons? Need More Info?

 Thanks for your Jme & apenJon today!

38

866-429-8889

grading your assessments: how to evaluate the quality of your exams

Education

q item difficulty

q kr20

q pbcc

q item discrimination

mean item difficulty

test discriminajon

critical thinking test

single test items