grading your assessments: how to evaluate the quality of your exams

38
Grading your Assessments: How to Evaluate the Quality of your Exams An ExamSoft Client Webinar

Upload: examsoft

Post on 16-Jul-2015

82 views

Category:

Education


1 download

TRANSCRIPT

Grading your Assessments: How to Evaluate the Quality of your Exams

An ExamSoft Client Webinar

Grading  Your  Assessments:  How  to  evaluate  the  quality  

of  your  exams

A INSLIE  T.  NIBERT,  PHD,  RN,  FAAN

MARCH  12,  2015

3

Sound Instruction

Educator’s

Golden

Triangle Instruction

Evaluation Objectives

Outcomes

4

Five Guidelines to Developing Effective Critical Thinking Exams

q  Assemble the “basics.”

q  Write critical thinking test items. q  Pay attention to housekeeping duties.

q  Develop a test blueprint. q  Scientifically analyze all exams.

5

Definition

Critical Thinking

The process of analyzing and understanding how and why we reached a certain conclusion.

6

Bloom’s  Taxonomy:  Benjamin  Bloom,  1956    (revised)

Terminology changes "The graphic is a representation of the NEW verbage associated with the long familiar Bloom's Taxonomy. Note the change from Nouns to Verbs [e.g., Application to Applying] to describe the different levels of the taxonomy. Note that the top two levels are essentially exchanged from the Old to the New version." (Schultz, 2005) (Evaluation moved from the top to Evaluating in the second from the top, Synthesis moved from second on top to the top as Creating.) Source: http://www.odu.edu/educ/llschult/blooms_taxonomy.htm

7

Post-Exam Item Analysis: An important aspect of item writing

Helps to determine the quality of a test

8

9

Consistency of Scores

Reliability Tools

10

q Kuder-Richardson Formula 20 (KR20)—EXAM

Ø Range from –1 to + 1

q Point Biserial Correlation Coefficient (PBCC)—TEST ITEMS

Ø Range from – 1 to + 1

11

q  Item difficulty 30% - 90%

q  Item Discrimination Ratio 25% and Above

q  PBCC 0.20 and Above

q  KR20 0.70 and Above

Standards of Acceptance

Thinking more about mean item difficulty on teacher-made tests… Mean  difficulty  level  for  a  teacher-­‐made  

nursing  exam  should  be  80  –  85%.    

So,  why  might  low  NCLEX-­‐RN®  pass  rates  persist  when  mean  difficulty  levels  on  teacher-­‐made  exams  remain  consistently  

within  this  desired  range?    

12

…. and one “absolute” rule about item difficulty   Since  the  mean  difficulty  level  for  a  teacher-­‐made  nursing  exam  is  80  –  85%,  what  should  the  lowest  acceptable  value  be  for  each  test  item  on  the  exam?    TEST  ITEMS  ANSWERED  CORRECTLY  BY  30%  or  LESS  of  the  examinees  should  always  be  considered  too  difficult,  and  the  instructor  must  take  acSon.  

 

Why?  

13

…but what about high difficulty levels? q Test  items  with  high  difficulty  levels  (>90%)  oIen  yield  poor  discriminaJon  values.  q Is  there  a  situaJon  where  faculty  can  legiJmately  expect  that  100%  of  the  class  will  answer  a  test  item  correctly,  and  be  pleased  when  this  happens?  

q RULE  OF  THUMB  ABOUT  MASTERY  ITEMS:  Due  to  their  negaJve  impact  on  test  discriminaJon  and  reliability,  they  should  comprise  no  more  than  10%  of  the  test.  

14

15

q  Item difficulty 30% - 90%

q  Item Discrimination Ratio 25% and Above

q  PBCC 0.20 and Above

q  KR20 0.70 and Above

Standards of Acceptance

Thinking more about item discrimination on teacher-made tests… q IDR  can  be  calculated  quickly,  but  doesn’t  consider  variance  of  the  enJre  group.  Use  it  to  quickly  idenJfy  items  that  have  zero/negaJve  discriminaJon  values,  since  these  need  to  be  edited  before  using  again.  

q PBCC  is  a  more  powerful  measure  discriminaJon.  q Correlates  the  correct  answer  to  a  single  test  items  with  the  total  test  score  of  the  student.  

q Considers  the  variance  of  the  enJre  student  group,  not  just  the  lower  and  upper  27%  groups.    

q For  a  small  ‘n,’  consider  cumulaJve  value.  

16

… what decisions need to be made about items?

q When  a  test  item  has  poor  difficulty  and/or  discriminaJon  values,  acJon  is  needed.  q All  of  these  acSons  require  that  the  exam  be  rescored.  q Credit  can  be  given  for  more  than  one  choice.  q Test  item  can  be  nullified.  q Test  item  can  be  deleted.  

q Each  of  these  acSons  has  a  consequence,  so  faculty  need  to  carefully  consider  these  when  choosing  an  acSon.  Faculty  judgment  is  crucial  when  determining  acSons  affecSng  test  scores.  

17

Standards of Acceptance Nursing

 

Nursing-PBCC 0.15 and Above

Nursing-KR20 0.60 - 0.65 and Above

18

Thinking more about adjusting standard of acceptance for nursing tests… q Remember  that  the  key  staJsJcal  concept  inherent  in  calculaJng  coefficients  is  VARIANCE.    

q When  there  is  less  variance  in  test  scores,  reliability  of  the  test  will  decrease,  ie  the  KR-­‐20  value  will  drop.  

q What  contributes  to  lack  of  variance  in  nursing  students’  test  scores?  

19

20

..and a word about using Response Frequencies   SomeJmes  LESS  is  MORE  when  it  comes  to  ediJng  a  test  item.  

  A  review  of  the  response  frequency  data  can  focus  your  ediJng.  

  For  items  where  100%  of  students  answer  correctly,  and  no  other  opJons  were  chosen,  make  sure  that  this  is  indeed  intenJonal  (MASTERY  ITEM),  and  not  just  reflecJve  of  an  item  that  is  too  easy  (>90%  DIFFICULTY.)    

  Target  re-­‐wriJng  the  “zero”  distracters  –  those  opJons  that  are  ignored  by  students.  Replacing  “zeros”  with  plausible  opJons  will  immediately  improve  item  DISCRIMINATION.  

21

22

3-Step Method for Item Analysis

1. Review Difficulty Level

2. Review Discrimination Data

q  Item Discrimination Ratio (IDR)

q  Point Biserial Correlation Coefficient (PBCC)

3. Review Effectiveness of Alternatives

q  Response Frequencies

q  Non-distracters Source: Morrison, Nibert, Flick, J. (2006). Critical thinking and test item writing (2nd ed.).Houston, TX: Health Education Systems, Inc.

23

24

25

26

27

28

Does the test measure what it claims to measure?

C o n t e n t V a l i d i t y

29

Use a Blueprint to Assess a Test’s Validity

q  Test Blueprint Ø  Reflects Course Objectives

Ø  Rational/Logical Tool

Ø  Testing Software Program Ø  Storage of item analysis data (Last & Cum)

Ø  Storage of test item categories

30

Test Blueprints

q  Faculty Generated

q  Electronically Generated

An  electronic  blueprint  for  each  exam  in  each  course

31

32

33

NCLEX-­‐RN®  Client  Needs  Percentages  of  Items  2011  vs.  2014

34

Source: https://www.ncsbn.org/4701.htm

NCLEX-­‐RN®  Client  Needs  Percentages  of  Items  2011  vs.  2014  

Increases  vs.  Decreases

35

Item Writing Tools for Success …

Knowledge

Test Blueprint

Testing Software

References  Morrison,  S.,  Nibert,  A.,  &  Flick,  J.  (2006).  Cri$cal  thinking  and  test  item  wri$ng  (2nd  ed.).  Houston,  TX:  Health  EducaJon  Systems,  Inc.  

Morrison,  S.  (2004).  Improving  NCLEX-­‐RN  pass  rates  through  internal  and  external  curriculum  evaluaJon.  In  M.  Oermann  &  K.  Heinrich  (Eds.),  Annual  review  of  nursing  educaJon  (Vol.  3).  New  York:  Springer  

NaJonal  Council  of  State  Boards  of  Nursing.  (2013)  2013  NCLEX-­‐RN  test  plan.  Chicago,  IL:  NaJonal  Council  of  State  Boards  of  Nursing.  hpps://www.ncsbn.org/3795.htm  

Nibert,  A.  (2010)  Benchmarking  for  student  progression  throughout  a  nursing  program:    Implica$ons  for  students,  faculty,  and  administrators.  In  CapuJ,  L.  (Ed.),  Teaching  nursing:    The  art  and  science,  2nd  ed.  (Vol.  3).  (pp.45-­‐64).  Chicago:  College  of  DuPage  Press.    

 37

Have  Ques]ons?  Need  More  Info?  

  Thanks  for  your  Jme  &  apenJon  today!  

38

866-429-8889