learn to love the argument - dashboard - wiki@ucsf

51
Learn to love the argument: Using Kane's Framework in Simulation-based Assessment Ryan Brydges, PhD

Upload: others

Post on 25-Jan-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Learn to love the argument: Using Kane's Framework in Simulation-based Assessment

Ryan Brydges, PhD

•••

Conceptualizing Validity & Validation

Simulation Educator Training - Principles of Assessment in Simulation Supplement

Special thanks to David Cook & Rose Hatala.

Otherwise none

Disclosure

Expected Actions

Observer’s frame

Learners’ frames

Judgment Rating,

Feedback

Why should I care about this?

Objectives • Re-conceptualize how you define and think about validity and validation

• Define differences between two contemporary validity frameworks

• List types of evidence associated with the four inferences in Kane’s validity framework

• Define the interpretation-use argument

• Evaluate common mistakes to avoid in validation

Objectives

• Re-conceptualize how you define and think about validity and validation

Assessment = Decision

Who makes the team?

What is validity?

What is validity?

“….systematic argument in support of score interpretation.”

Clauser, 2008

What is validity?

“Validity refers to the evidence presented to support or refute the meaning or interpretation assigned to assessment results.”

S. Downing, 2003

Inference, not instrument

“We used a validated checklist”

“Training using our validated simulation curriculum”

Collecting evidence to evaluate appropriateness of inferences:

Assessment

ScoreInference/

Decision

Action/

Use

Activity: Small Group Discussion (7-10 mins)

What is tripping you up about this new conceptualization of validity?

What is hard to let go from your previous way of thinking about validity and validation?

Objectives

• Re-conceptualize how you define and think about validity and validation

• Define differences between two contemporary validity frameworks

Validity Frameworks

What if no gold standard?Risk of confirmation bias.

Too many types.Everything relates to the construct.Where to fit reliability?

How to prioritize evidence?

Validity Frameworks

Validity of Test/ Instrument Scores

Messick Model

(Standards of Psychological Testing)

Content, Response Process, Internal Structure, Relation other variables, Consequences

Kane Model

Scoring, Generalization, Extrapolation, Decision

Content

Response Process

Internal Structure

Relations to Other Variables

Consequences

Scoring

Generalization

Extrapolation

Decision

Benefits of the Argument-based Framework

1. Focuses attention on broad array of issues associated with interpreting and using assessment scores.

2. Emphasizes that we make assumptions when we interpret scores and the need to check our assumptions.

3. Allows for alternative interpretations and uses of assessment scores.

Adaptability of Framework

Benefit of Kane is that the framework is well suited to:

• Quantitative data

• Qualitative data

• Programmatic assessment

Objectives

• Re-conceptualize how you define and think about validity and validation

• Define differences between two contemporary validity frameworks

• List types of evidence associated with the four inferences in Kane’s validity framework

Validation =

Kane’s Validity Framework

proposed interpretation/use

+

evaluate claims

Kane’s Validity Inferences

• Scoring

• Generalization

• Extrapolation

• Decision

Observation

Scoring Single Score

Scoring Generalization Extrapolation Implication

Scoring rubric/criteria(e.g. empiric comparisonof different procedures,think-aloud study)

Observation format (e.g.empiric comparison ofdifferent formats, suchas live vs.video-based)

QUAL: The richness, accuracy,authenticity and fairnessof qualitative data

Observation

Scoring Single Score

GeneralizationPerformance:

test setting

Distinction: Workplace-based vs. Simulation-based Assessment

Sampling and ability to have multiple ‘stations’

Scoring Generalization Extrapolation Implication

Scoring rubric/criteria(e.g. empiric comparisonof different procedures,think-aloud study)

Observation format (e.g.empiric comparison ofdifferent formats, suchas live vs.video-based)

QUAL: The richness, accuracy,authenticity and fairnessof qualitative data

Reliability orgeneralisability (items,raters, tasks, occasions)

QUAL: Consistency andreflexivity ofinterpretations formedby different interpreters

Observation

Scoring Single Score

GeneralizationPerformance:

test setting

ExtrapolationPerformance:

real world

Distinction: Workplace-based vs. Simulation-based Assessment

Ensure simulation “matches”clinical context / construct of interest

Scoring Generalization Extrapolation Implication

Scoring rubric/criteria(e.g. empiric comparisonof different procedures,think-aloud study)

Observation format (e.g.empiric comparison ofdifferent formats, suchas live vs.video-based)

QUAL: The richness, accuracy,authenticity and fairnessof qualitative data

Reliability orgeneralisability (items,raters, tasks, occasions)

QUAL: Consistency andreflexivity ofinterpretations formedby different interpreters

Correlation with anothermeasure having anexpected relationship(convergent; concurrentor predictive)

Discrimination (knowngroups comparison)

QUAL: Agreement ofstakeholders thatinterpretations will applyto new contexts intraining or practice(transferability)

Observation

Scoring Single Score

GeneralizationPerformance:

test setting

ExtrapolationPerformance:

real world

Implication Decision

Scoring Generalization Extrapolation Implication

Scoring rubric/criteria(e.g. empiric comparisonof different procedures,think-aloud study)

Observation format (e.g.empiric comparison ofdifferent formats, suchas live vs.video-based)

QUAL: The richness, accuracy,authenticity and fairnessof qualitative data

Reliability orgeneralisability (items,raters, tasks, occasions)

QUAL: Consistency andreflexivity ofinterpretations formedby different interpreters

Correlation with anothermeasure having anexpected relationship(convergent; concurrentor predictive)

Discrimination (knowngroups comparison)

QUAL: Agreement ofstakeholders thatinterpretations will applyto new contexts intraining or practice(transferability)

Pass/fail standard (e.g.ROC curve)

QUAL/QUAN: Effectiveness of actionsbased on assessmentresults

Intended or unintendedconsequences of testing

Cook DA, Zendejas B, Hamstra SJ, Hatala R, Brydges R. What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Adv Health Sci Educ Theory Pract. 2014 May;19(2):233-50.

Large Group Discussion (10 mins + short break)

How did all that land for/on you?

Questions / Comments?

Objectives

• Re-conceptualize how you define and think about validity and validation

• Define differences between two contemporary validity frameworks

• List types of evidence associated with the four inferences in Kane’s validity framework

• Define the interpretation-use argument

Interpretation/Use Argument (IUA)

• Making an interpretation/use argument

• Specific purpose, meaning or interpretation

• Specific point in time

• Well-defined population and use

• Organizes your inferences and assumptions and allows you to test if the evidence supports or refutes them

Example: Interpretation/Use Argument

I plan to measure lumbar puncture skills using a simulator to assess residents’ performance during acquisition and retention; all for the purposes of discerning the effects of two learning interventions in a research program.

Example: Interpretation/Use Argument

I plan to measure lumbar puncture skills using a simulator to assess residents’ performance during acquisition and retention; all for the purposes of discerning the effects of two learning interventions in a research program.

Activity: Small Group Discussion (20 mins)

Consider an assessment tool you plan to use in your simulation program

Develop your own Interpretation-Use Argument that fits with how you are using that tool to achieve your program purpose

What evidence would you still need to collect for this IUA?

Objectives

• Re-conceptualize how you define and think about validity and validation

• Define differences between two contemporary validity frameworks

• List types of evidence associated with the four inferences in Kane’s validity framework

• Define the interpretation-use argument

• Evaluate common mistakes to avoid in validation

Common Mistakes – Cook & Hatala, 2016

• Don’t use a validity framework

• Reinvent the wheel by creating a new instrument each time a need arises

• Make expert-novice comparisons the crux of the validity argument

• Focus on the easily-accessible validity evidence rather than the most important

• Focus on the instrument rather than score interpretations and uses

• Don't synthesize or critique the validity evidence

• Ignore best practices for assessment development

• Omit details about the instrument

• Let the availability of the simulator/assessment instrument drive the assessment

Common Mistakes – Cook & Hatala, 2016

1. Don’t use a validity framework

2. Reinvent the wheel by creating a new instrument each time a need arises

3. Make expert-novice comparisons the crux of the validity argument

4. Focus on the easily-accessible validity evidence rather than the most important

5. Focus on the instrument rather than score interpretations and uses

6. Don't synthesize or critique the validity evidence

7. Ignore best practices for assessment development

8. Omit details about the instrument

9. Let the availability of the simulator/assessment instrument drive the assessment

Activity: Small Group Discussion

Each table assigned 2-3 of the common mistakes

Why are these mistakes? What is the issue?

How would you talk to / teach a colleague (or your boss) about the importance of this mistake?

Pearls

• Test scores can have multiple possible interpretations/uses & these are what validity evidence is collected for.

• Validity frameworks exist and we can use them to organize our work.

• Validity of proposed interpretation/use depends on how well evidence supports the IUA.

• More evidence required for more ambitious proposed interpretation/uses.

Learn to love the argument: Using Kane's Framework in Simulation-based Assessment

Email me for a template:

[email protected]

References

Clauser, B. E., Margolis, M. J., & Swanson, D. B. (2008). Issues of validity and reliability for assessments in medical education. Practical guide to the evaluation of clinical competence, 10-23.

Downing, S. M. (2003). Validity: on the meaningful interpretation of assessment data. Medical education, 37(9), 830-837.

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1-73.

Cook, D. A., Zendejas, B., Hamstra, S. J., Hatala, R., & Brydges, R. (2014). What counts as validity evidence? Examples and prevalence in a systematic review of simulation-based assessment. Advances in Health Sciences Education, 19(2), 233-250.

Cook, D. A., Brydges, R., Ginsburg, S., & Hatala, R. (2015). A contemporary approach to validity arguments: a practical guide to Kane's framework.Medical education, 49(6), 560-575.

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999).Standards for educational and psychological testing. American Educational Research Association.

Cook, D.A., Hatala R. (In Press). Validation of Educational Assessments: A Primer for Simulation and Beyond. Advances in Simulation.