validation studies in simulation-based education - deb rooney
Upload: department-of-learning-health-sciences-university-of-michigan-medical-school
Post on 18-Jul-2015
66 views
TRANSCRIPT
Validation Studies in Simulation-based Education
April 15, 2015 Deb Rooney, PhD Professor of Learning Health Sciences All Rights Reserved.
1
• Validity in the current framework used to evaluate evidence
• How we gather /evaluate validity evidence from a simulator and it’s associated measures
• Context of academic product (manuscripts)
• Final considerations
Objectives
2
A few definitions to consider….
Validity: What is it?
2. the degree to which evidence and theory support the interpretations of test scores as entailed by proposed uses of tests –Standards (1999)
1. the degree to which the tool measures what it claims to measure
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education.
(1999) Standards for educational and psychological testing. Washington, DC: American Educational Research Association. 3
• Evidence relevant to relationships to other variables (e.g. novice versus expert discriminant validity) is over-represented
• Evidence relevant to response process and consequences of testing and are infrequently reported
Cook, D. A., Brydges, R., Zendejas, B., Hamstra, S. J., & Hatala, R. (2013). Technology-enhanced simulation to assess health professionals: A systematic review of validity evidence, research methods, and reporting quality. Academic Medicine. Jun;88(6):872-83.
• Apply current Standards to ensure rigorous research and reporting
Simulator Validation: The Framework
4
Simulator Validation: The Evidence
Current AERA Standards* o Not new/novel o Unitary Construct (All evidence falls under “Construct” validity)
o Test content
o Internal structure
o Response Processes
o Relationships to other variables
o Consequences of testing
(psychometric properties of measures)
(comparison w/ previously-validated measures)
(standard setting, rater/ings quality, fidelity vs. stakes)
(reliability, dimensionality, function across groups)
(face validity, subjective measures, construct alignment)
*Standards for Educational and Psychological Testing (AERA, APA, & NCME, 2014) 5
Validity evidence;
Validity: What is it NOT?
• Does not allow us to make inferences about a curriculum
• Does not allow us to make inferences about different applications, settings, or learners
• Is not a terminal quality determination (about the quality of your measures, application)
• “The scale was valid” ! “evidence supports the use of a scale to measure X in a particular setting/application” 6
We have a much more complex environment to evaluate!
Simulator test content relationships to other variables consequences of testing
Instrument (measures) test content internal structure relationships to other variables response processes consequences of testing
Simulator Validation: How does this evidence apply to us?
7
CREATION DESIGN & PLANNING
IMPLEMENTATION & EVALUATION
Test content (measures/simulator)
Internal structure (m)
Response processes (m)
Relationship to other variables (m/s)
Consequences of testing (m/s)
Validity Evidence in Simulation How/when do we gather evidence?
8
PAPER 2 PAPER 1 PAPER 3
Test content (measures & simulator) Before implementation
Validity Evidence in Simulation How disseminate findings?
Quality of performance measures Before or After Implementation
Impact on performance and/or patient outcomes After full implementation
9
Most Recent Example: Neurosurgery Sim
a Tai B, Rooney D Stephenson F, Liao P, Sagher O, Shih A, Savastano LE. Development of 3D-printing built ventriculostomy placement simulator, Journal of Neurosurgery, (in press) bRooney DM, Tai BL, Sagher O, Shih AJ, Wilkinson DA, Savastano LE. A Simulator and two tools: Validation of performance measures from a novel neurosurgery simulator using the current Standards framework. Surgery, submitted 3/15.
Paper 1a: Preliminary evaluation of quality of simulator/ measures Paper 2b: Evaluation of performance measures from simulator Paper 3: Evaluation of Impact on performance measures / patient outcomes
10
Paper 1: Simulator validation process
n=7 n=5 n=5 4 months
11
The Content Validity Form: 5 domains
• Physical Attributes
• Realism-experience
• Value
• Relevance
• Overall (global)
12
Paper 1: The preliminary validation process (Sim)*
• Using Rasch model, analyzed data for; • Looked at domain rating differences across 3 sites • Mean averages by item • Looked at Rasch variability indices to identify possible
inconsistency in ratings
• Ensured psychometric quality of survey • Using traditional methods, estimated;
• Inter-item consistency, Cronbach alpha • Inter-rater agreement, ICC(2,k)
• Using a Rasch model ensured rating scales’ function
*performance checklist is separate/ different process 13
Results: Domain mean averages by site
0 0.5 1
1.5 2
2.5 3
3.5 4
4.5
UM
HF
WS
3.4 3.3 3.9 3.3 2.4 Combined mean averages
“This simulator requires minor adjustments before it can be considered for use in ventriculostomy placement training.” 14
Results: Mean averages by item
15
Paper 1*: Test Content-Checklist
*shoulda, coulda, woulda 16
Proposed items Definitely do not include this task
(1)
Not sure if this task should be
included (2)
Pretty sure this task should be
included (3)
Definitely include this task
(4)
Position head and mark midline
Locate Kocher's point (10.5 cm posterior to the nasion and 3 cm lateral to midline)
Mark incision (approximately 2cm long in a parasagittal location)
Incise, clear tissue off cranium, retracted scalp
…
Suture wound (Staples or a 3-0 running nylon or prolene suture)
Paper 1*: Test Content-Checklist
17
Paper 1*: Test Content-Checklist
Ask expert instructors about the value of included steps (items) for measuring X at doing Y. • Reasonable number of experts ~ 3 What else do you ask about?
• Clarity of item • Appropriateness of qualifiers (use X instrument, at x location) • Rating scale • Missing steps • Objective measures to include (eg. time to)
*shoulda, coulda, woulda 18
PAPER 2 PAPER 1 PAPER 3
Before implementation Test content (measures & simulator)
Before or After Implementation* Quality of performance measures
After full implementation Impact on performance and/or patient outcomes
Next Step: Deeper Evaluation (Paper 2)
19
Next Step: Deeper Evaluation (Paper 2)
• Evaluation of all validity evidence of performance measures [ala Standards]
• Capture broader (regional/national) sample of performance data via videotaped performances
• ideal N (+++) and range of experience
• Compare measures from the novel performance checklist and gold standard (eg-OSATS] (relationship with other variables*)
• Set/test performance standards (if appropriate) Rooney DM, Tai BL, Sagher O, Shih A, Wilkinson DA, Savastano, L. A Simulator and two tools: Validation of performance measures from a novel neurosurgery simulator using the current Standards framework. Surgery, submitted 3/15)
20
• Nationally-recognized training program sponsored by Society of Neurological Surgeons
• Total n=14 (11 trainees*, 3 attendings) performed ventriculostomy on simulator
• All performances were video-captured, scored by 3 raters using novel checklist and modified version of OSATS
Paper 2: study design
*first year neurosurgery fellows 21
Checklist: Ventriculostomy Procedural Assessment Tool (V-PAT)
22
Martin JA, Regehr G, Reznick R, MacRae H, Murnaghan J, Hutchison C, et al. (1997). Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg 84:273-278.
Modified Objective Structured Assessment of Technical Skills:
m-OSATS
23
Measures adequately reflect ventriculostomy performance “quality” • Relationships to other variables
• Trainee v expert ratings • Correlation of summed V-PAT with summed OSATS scores
Measures are psychometrically sound (Adequate “Quality Control”) • Psychometric function of V-PAT & OSATS measures;
• Response processes: Rasch indices ! rating scale function • Test Content: Rasch item point-measure correlations, item fit (variability)
• Internal structure: Inter-item consistency-Cα, Inter-rater agreement- ICC(2,k)
Measures are free from rater bias • Consequences of testing
• Evaluated Rasch bias indices to identify potential rating differences at rater level
Paper 2: evidence examined Examined evidence from 5 sources, but packaged a bit differently;
24
Response processes • Rasch indices: (Avg meas., Fit statistics, and RA
thresholds) indicated all rating scales for both V-PAT & OSATS were well-functioning
Test content Point-measure correlations: all positive, [.39, .81] Rasch item Outfit MS: all < 2.0
Internal structure • Cronbach α: both α = 0.95 • Intraclass correlation: V-PAT= [-0.33, 0.93] OSATS = [0.80, 0.93]
Results: “Quality control”
25
Do V-PAT measures adequately differentiate trainee and expert performances?
Instrument Indices V-PAT
Resident Observed Average
(SE)
Attending Observed Average
(SE)
P- value
ICC (2,k) Value
1. Position head and mark midline 3.75 (.20) 4.00 (.47) 0.54 * 2. Locate Kocher's point 3.84 (.18) 4.11 (.34) 0.52 .86 3. Mark an incision 3.61 (.16) 4.00 (.32) 0.42 .10 4. Select drain exit site from the scalp 2.94 (.17) 3.71 (.50) 0.22 * 5. Incise, clear tissue off cranium, retract scalp
3.56 (.14)
4.11 (.34)
0.20
.83
6. Set drill stop and drill trephine 2.80 (.16) 3.78 (.37) 0.08 .70 7. Confirm dura and pierce 2.98 (.17) 3.67 (.35) 0.25 .76
8. Confirm landmarks and place catheter
2.88 (.17)
3.33 (.35)
0.24
.91
9. Confirm CSF flow 3.41 (.13) 3.67 (.30) 0.39 .73 10. Remove trocar cover, tunnel trocar to exit site and recap trocar
2.78 (.14)
3.00 (.47)
0.62
-.33
11. Place purse string suture to anchor the catheter at scalp exit site
3.00 (.36)
4.25 (.69)
0.26
*
Overall average 3.30 (.06) 3.80 (.11) 0.01 –
OSATS 1. Respect for Tissue 2.66 (.21) 4.11 (.34) 0.004 .93 2. Time and Motion 2.42 (.22) 4.00 (.43) 0.005 .85 3. Instrument Handling 2.51 (.22) 4.00 (.46) 0.007 .86 4. Knowledge of Instruments 2.36 (.21) 4.33 (.43) 0.002 .84 5. Flow of Operation 2.36 (.21) 4.22 (.45) 0.001 .85 6. Knowledge of specific procedure 2.33 (.23) 4.33 (.46) .80 Overall average 2.32 (0.08) 3.73 (0.15) 0.001 –
* too few cases to estimate
Instrument Indices V-PAT
Resident Observed Average
(SE)
Attending Observed Average
(SE)
P- value
ICC (2,k) Value
1. Position head and mark midline 3.75 (.20) 4.00 (.47) 0.54 * 2. Locate Kocher's point 3.84 (.18) 4.11 (.34) 0.52 .86 3. Mark an incision 3.61 (.16) 4.00 (.32) 0.42 .10 4. Select drain exit site from the scalp 2.94 (.17) 3.71 (.50) 0.22 * 5. Incise, clear tissue off cranium, retract scalp
3.56 (.14)
4.11 (.34)
0.20
.83
6. Set drill stop and drill trephine 2.80 (.16) 3.78 (.37) 0.08 .70 7. Confirm dura and pierce 2.98 (.17) 3.67 (.35) 0.25 .76
8. Confirm landmarks and place catheter
2.88 (.17)
3.33 (.35)
0.24
.91
9. Confirm CSF flow 3.41 (.13) 3.67 (.30) 0.39 .73 10. Remove trocar cover, tunnel trocar to exit site and recap trocar
2.78 (.14)
3.00 (.47)
0.62
-.33
11. Place purse string suture to anchor the catheter at scalp exit site
3.00 (.36)
4.25 (.69)
0.26
*
Overall average 3.30 (.06) 3.80 (.11) 0.01 –
OSATS 1. Respect for Tissue 2.66 (.21) 4.11 (.34) 0.004 .93 2. Time and Motion 2.42 (.22) 4.00 (.43) 0.005 .85 3. Instrument Handling 2.51 (.22) 4.00 (.46) 0.007 .86 4. Knowledge of Instruments 2.36 (.21) 4.33 (.43) 0.002 .84 5. Flow of Operation 2.36 (.21) 4.22 (.45) 0.001 .85 6. Knowledge of specific procedure 2.33 (.23) 4.33 (.46) .80 Overall average 2.32 (0.08) 3.73 (0.15) 0.001 –
* too few cases to estimate
Results: Relationship to Other variables (V-PAT)
Instrument Indices V-PAT
Resident Observed Average
(SE)
Attending Observed Average
(SE)
P- value
ICC (2,k) Value
1. Position head and mark midline 3.75 (.20) 4.00 (.47) 0.54 * 2. Locate Kocher's point 3.84 (.18) 4.11 (.34) 0.52 .86 3. Mark an incision 3.61 (.16) 4.00 (.32) 0.42 .10 4. Select drain exit site from the scalp 2.94 (.17) 3.71 (.50) 0.22 * 5. Incise, clear tissue off cranium, retract scalp
3.56 (.14)
4.11 (.34)
0.20
.83
6. Set drill stop and drill trephine 2.80 (.16) 3.78 (.37) 0.08 .70 7. Confirm dura and pierce 2.98 (.17) 3.67 (.35) 0.25 .76
8. Confirm landmarks and place catheter
2.88 (.17)
3.33 (.35)
0.24
.91
9. Confirm CSF flow 3.41 (.13) 3.67 (.30) 0.39 .73 10. Remove trocar cover, tunnel trocar to exit site and recap trocar
2.78 (.14)
3.00 (.47)
0.62
-.33
11. Place purse string suture to anchor the catheter at scalp exit site
3.00 (.36)
4.25 (.69)
0.26
*
Overall average 3.30 (.06) 3.80 (.11) 0.01 –
OSATS 1. Respect for Tissue 2.66 (.21) 4.11 (.34) 0.004 .93 2. Time and Motion 2.42 (.22) 4.00 (.43) 0.005 .85 3. Instrument Handling 2.51 (.22) 4.00 (.46) 0.007 .86 4. Knowledge of Instruments 2.36 (.21) 4.33 (.43) 0.002 .84 5. Flow of Operation 2.36 (.21) 4.22 (.45) 0.001 .85 6. Knowledge of specific procedure 2.33 (.23) 4.33 (.46) .80 Overall average 2.32 (0.08) 3.73 (0.15) 0.001 –
* too few cases to estimate
26
Do m-OSATS measures adequately differentiate trainee and expert performances?
• Correlation of summed V-PAT scores with summed
m-OSATS • Pearson’s r = 0.72, p = 0.001
Instrument Indices V-PAT
Resident Observed Average
(SE)
Attending Observed Average
(SE)
P- value
ICC (2,k) Value
1. Position head and mark midline 3.75 (.20) 4.00 (.47) 0.54 * 2. Locate Kocher's point 3.84 (.18) 4.11 (.34) 0.52 .86 3. Mark an incision approximately 2 cm long in a parasagittal location
3.61 (.16)
4.00 (.32)
0.42
.10
4. Select drain exit site from the scalp 2.94 (.17) 3.71 (.50) 0.22 * 5. Incise, clear tissue off cranium, retract scalp
3.56 (.14)
4.11 (.34)
0.20
.83
6. Set drill stop and drill trephine 2.80 (.16) 3.78 (.37) 0.08 .70 7. Confirm dura and pierced with 18g spinal needle or 11 blade scalpel
2.98 (.17)
3.67 (.35)
0.25
.76
8. Confirm landmarks and place catheter to 6-7cm from outer table of skull
2.88 (.17)
3.33 (.35)
0.24
.91 9. Confirm CSF flow 3.41 (.13) 3.67 (.30) 0.39 .73 10. Remove trocar cover, tunnel trocar to exit site and recap trocar
2.78 (.14)
3.00 (.47)
0.62
-.33
11. Place purse string suture at the scalp exit site to anchor the catheter
3.00 (.36)
4.25 (.69)
0.26
*
Overall average 3.30 (.06) 3.80 (.11) 0.01 –
OSATS 1. Respect for Tissue 2.66 (.21) 4.11 (.34) 0.004 .93 2. Time and Motion 2.42 (.22) 4.00 (.43) 0.005 .85 3. Instrument Handling 2.51 (.22) 4.00 (.46) 0.007 .86 4. Knowledge of Instruments 2.36 (.21) 4.33 (.43) 0.001 .84 5. Flow of Operation 2.36 (.21) 4.22 (.45) 0.001 .85 6. Knowledge of specific procedure 2.33 (.23) 4.33 (.46) 0.001 .80 Overall average 2.32 (.08) 3.73 (.15) 0.001 –
* too few cases to estimate
Instrument Indices V-PAT
Resident Observed Average
(SE)
Attending Observed Average
(SE)
P- value
ICC (2,k) Value
1. Position head and mark midline 3.75 (.20) 4.00 (.47) 0.54 * 2. Locate Kocher's point 3.84 (.18) 4.11 (.34) 0.52 .86 3. Mark an incision approximately 2 cm long in a parasagittal location
3.61 (.16)
4.00 (.32)
0.42
.10
4. Select drain exit site from the scalp 2.94 (.17) 3.71 (.50) 0.22 * 5. Incise, clear tissue off cranium, retract scalp
3.56 (.14)
4.11 (.34)
0.20
.83
6. Set drill stop and drill trephine 2.80 (.16) 3.78 (.37) 0.08 .70 7. Confirm dura and pierced with 18g spinal needle or 11 blade scalpel
2.98 (.17)
3.67 (.35)
0.25
.76
8. Confirm landmarks and place catheter to 6-7cm from outer table of skull
2.88 (.17)
3.33 (.35)
0.24
.91 9. Confirm CSF flow 3.41 (.13) 3.67 (.30) 0.39 .73 10. Remove trocar cover, tunnel trocar to exit site and recap trocar
2.78 (.14)
3.00 (.47)
0.62
-.33
11. Place purse string suture at the scalp exit site to anchor the catheter
3.00 (.36)
4.25 (.69)
0.26
*
Overall average 3.30 (.06) 3.80 (.11) 0.01 –
OSATS 1. Respect for Tissue 2.66 (.21) 4.11 (.34) 0.004 .93 2. Time and Motion 2.42 (.22) 4.00 (.43) 0.005 .85 3. Instrument Handling 2.51 (.22) 4.00 (.46) 0.007 .86 4. Knowledge of Instruments 2.36 (.21) 4.33 (.43) 0.002 .84 5. Flow of Operation 2.36 (.21) 4.22 (.45) 0.001 .85 6. Knowledge of specific procedure 2.33 (.23) 4.33 (.46) .80 Overall average 2.32 (0.08) 3.73 (0.15) 0.001 –
* too few cases to estimate
OSATS
Results: Relationship to Other variables (m-OSATS)
27
Are measures free from rater bias?
Results: Consequences of Testing
V-PAT
Observed Average (SE)
1. Rater 1 (LS) 3.60 (.11) 2. Rater 2 (OS) 3.60 (.10) 3. Rater 3 (DW) 3.60 (.11) 4. Participants (Variable)† 2.90 (.11) Overall average 3.60 ( – )
OSATS 1. Rater 1 (LS)* 2.10 (.17) 2. Rater 2 (OS) 3.30 (.15) 3. Rater 3 (DW) 3.00 (.15) Overall average 2.80 (–)
†Comparison with 3 expert raters, p = 0.01 * Comparison with 2 expert raters, p = 0.01
V-PAT
Observed Average (SE)
1. Rater 1 (LS) 3.60 (.11) 2. Rater 2 (OS) 3.60 (.10) 3. Rater 3 (DW) 3.60 (.11) 4. Participants (Variable)† 2.90 (.11) Overall average 3.60 ( – )
OSATS 1. Rater 1 (LS)* 2.10 (.17) 2. Rater 2 (OS) 3.30 (.15) 3. Rater 3 (DW) 3.00 (.15) Overall average 2.80 (–)
†Comparison with 3 expert raters, p = 0.01 * Comparison with 2 expert raters, p = 0.01
V-PAT
Observed Average (SE)
1. Rater 1 (LS) 3.60 (.11) 2. Rater 2 (OS) 3.60 (.10) 3. Rater 3 (DW) 3.60 (.11) 4. Participants (Variable)† 2.90 (.11) Overall average 3.60 ( – )
OSATS 1. Rater 1 (LS)* 2.10 (.17) 2. Rater 2 (OS) 3.30 (.15) 3. Rater 3 (DW) 3.00 (.15) Overall average 2.80 (–)
†Comparison with 3 expert raters, p = 0.01 * Comparison with 2 expert raters, p = 0.01
28
Are OSATS measures free from rater bias?
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Q1 Q2 Q3 Q4 Q5 Q6
Obs
erve
d av
erag
e
OSATS item
Bias interaction: Rater by item
Rater1-LS
Rater2-OS
Rater3-DW
Results: Consequences of Testing
29
Summary of Results
Evidence Inferences V-PAT OSATS
Quality Control Response Processes Adequate rating scale function √ √
Test Content Items align with construct √ √
Internal Structure Inter-item consistency, inter-rater agreement*
X* √
Test of Assumptions Rel. Other Variables Measures dif. bet. N/E performances X √
Rel. Other Variables V-PAT/OSATS summed scores correlate √ √
Cons. of Testing V-PAT/OSATS measures are bias free √ X
X = challenges that require resolution 30
Standards for educational and psychological testing. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Washington, DC: American Educational Research. Available for purchase via http://teststandards.org/
Validity: ala 2015
• Evidence is important, but interpretive argument is critical • Content of interpretive argument determines the kinds of
evidence that are most relevant (most important) in validation • Strategy of developing interpretive argument based on
• Validity evidence relevant to inferences • Assumptions • Challenges (alternative interpretations)
31
Problematic inter-rater agreement (ICC) for 5 items should be resolved;
• Item 3 (Mark an incision approximately 2 cm long in a parasagittal location), ICC= .10
• Item 10 (Remove trocar cover, tunnel trocar to exit site and recap trocar), ICC= -.33
Examine, refine items to ensure alignment with simulator capabilities/ are
mutually exclusive
Challenges: potential threats to validity (V-PAT)
• Item 1 (Position head and mark midline)* • Item 4 (Select drain exit site from the scalp)* • Item 1 (Place purse string suture at the scalp exit site to
anchor the catheter) *
* ICC incalculable 32
“Hawkish” OSATS ratings by one expert rater requires follow-up
Refine items, add rater training on scoring rubric and administration standards
Challenges: potential threats to validity (m-OSATS)
33
PAPER 2 PAPER 1 PAPER 3
Before implementation Test content (measures & simulator)
Before or After Implementation* Quality of performance measures
After full implementation Impact on performance and/or patient outcomes
Next Step: Evaluation of Impact (Paper 3)
34
• Evaluation of impact on trainee’s clinical performance or patient outcomes [relationship with other variables]
• Examine;
• Change in trainees’ clinical performance (checklist ratings, objective measures (“time to”, LOS, adverse events)
• Impact on hospital costs
Barsuk JH, Cohen ER, Feinglass J, Kozmic SE, McGaghie WC, Ganger D, Wayne DB. Cost savings of performing paracentesis procedures at the bedside after simulation-based education. Simul Healthc. 2014 Oct;9(5):312-8.
Next Step: Evaluation of Impact (Paper 3)
35
• Validation process is fluid/reiterative/on-going
• It takes a team; • Development (clinicians, instructors,
engineers, researchers assistants)! • Outcomes (+QI, hospital info)
• There is funding; • AHRQ • P-CORi • Michigan Blue Cross Blue Shield
Considerations
36