moving beyond the psychometric discourse: a model for programmatic assessment researching medical...
TRANSCRIPT
Moving beyond the psychometric discourse: A model for programmatic assessmentResearching Medical EducationAssociation for the Study of Medical educationLondon, 23 November 2010
Cees van der VleutenMaastricht UniversityThe Netherlands
Powerpoint: www.fdg.unimaas.nl/educ/cees/asme
The first step is to measure whatever can be easily measured. This is ok as far as it goes.
The second step is to disregard that which can't be easily measured or to give it an arbitrary quantitative value. This is artificial and misleading.
The third step is to presume that what can't be measured easily really isn't important. This is blindness.
The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide.
—Charles Handy, The Empty Raincoat, page 219.
The Mc Namara fallacy refers to Robert McNamara, The Secretary of defence from 1961-1968 and his belief that the body count was a good way of measuring how the war was going.
As long as more Vietcong were being killed than US forces, the war was being won
Planned arrangement of individual assessments in a learning program
Quality compromises are made for individual methods/data points but not for the program as a whole
Programmatic assessment is fit for purpose:◦ Assessment for learning◦ Robust decision making over learner’s
performance◦ Improving the learned curriculum
Proposed in 2005 (Van der Vleuten & Schuwirth, 2005)
Multitude of quality criteria and self-assessment instrument of program quality (Baartman et al, 2006, 2007)
Framework of design (Dijkstra et al, 2009) and design guidelines (Dijkstra et al, in prep)
Need for theoretical model of programmatic assessment for program in action
Any point measurement or single data point of assessment is flawed
Standardized methods of assessment can have ´built-in´ validity◦ Quality control around test construction and administration◦ Assessment ‘technology’ is available
Validity in unstandardized methods lies in the users of the instruments, more than in the instruments
1Theoretical/empirical account:Van der Vleuten, C. P., Schuwirth, L. W., Scheele, F., Driessen, E. W., & Hodges, B. (2010). The assessment of professional competence: building blocks for theory development. Best Pract Res Clin Obstet Gynaecol, 24, 703-719.
Assessment drives learning◦ Requires richness or meaningfulness of data◦ Qualitative, narrative information carries a lot of weight◦ Theoretical understanding emerging (Cilliers et al., 2010, under editorial
review)
Stakes (formative and summative assessment) is a continuum
N of datapoints need to be proportional to stakes Expert judgment is imperative for assessing complex
competencies and when diverse information is to be combined◦ Sampling strategies can reduce random error◦ Procedural strategies can reduce bias
1Van der Vleuten et al., 2010
Time
AssessmentActivities
TrainingActivities
SupportingActivities
Artifacts of learning- Outcome artifacts: Products of learning tasks- Process artifacts: Learning or working activities
v v
Learning task- PBL case- Patient encounter- Operation- Project- Lecture- Self-study
Time
AssessmentActivities
TrainingActivities
SupportingActivities
Individual data points of assessment- Fit for purpose- Multiple/all levels of Miller- Learning oriented, Information rich documentation, meaningful (quantitative, qualitative)- Low stake
Certification of mastery-oriented learning tasks- Rescuscitation- Normal delivery of infant
v v
Time
AssessmentActivities
TrainingActivities
SupportingActivities
Supportive social interaction- Coaching/mentoring/supervision- Peer interaction (intervision)
(P)Reflective activity by learner- Interpretation of feedback- Planning new learning objectives and tasks
v v
Time
AssessmentActivities
TrainingActivities
SupportingActivities
Intermediate evaluation- Aggregate information held against performance standard- Committee of examiners- Decision making: diagnostic, therapeutic, prognostic - Remediation oriented, not repetition oriented- Informative- Longitudinal- Intermediate stake
v v
Firewall dilemma- Dilemma between access to rich information and compromising relationship supporting person(s) and learner
Time
AssessmentActivities
TrainingActivities
SupportingActivities
v v v v
Time
AssessmentActivities
TrainingActivities
SupportingActivities
v v v v v v
Final evaluation- Aggregate information held against performance standard- Committee of examiners- Pass/fail(/distinction) high stake decision- Based on many data points and rich information- Decision trustworthiness optimized though procedural measures, inspired qualitative methodology strategies - High stake
Quantitative QualitativeCriterion approach approach
Truth value Internal validity CredibilityApplicability External validity TransferabilityConsistency Reliability DependabilityNeutrality Objectivity Confirmability
Strategy toestablish trustworthiness Criteria
Potential Assessment Strategy (sample)
Credibility Prolonged engagement Training of examiners
Triangulation Tailored volume of expert judgment based on certainty of information
Peer examination Benchmarking examiners
Member checking Incorporate learner view
Structural coherence Scrutiny of committee inconsistencies
Transferability Time sampling Judgment based on broad sample of data points
Thick description Justify decisions
Dependability Stepwise replication Use multiple assessors who have credibility
Confirmability Audit Give learners the possibility to appeal to the assessment decision
v v
Time
AssessmentActivities
TrainingActivities
SupportingActivities
v v v v
v v
Time
AssessmentActivities
TrainingActivities
SupportingActivities
v v v v
v v
Time
AssessmentActivities
TrainingActivities
SupportingActivities
v v v v
Resources/cost (do fewer things well rather than doing more but poorly)
Bureaucracy, Trivialization, Reductionism Legal restrictions Novelty/unknown
Assessment for learning combined with rigorous decision making
Post-psychometric era of individual instruments Theory driven assessment design Infinite research opportunities
◦ Multiple formalized models of assessment (e.g. psychometrics, Bayesian approaches to information gathering, new conceptions of validity……)
◦ Judgment (bias, expertise, learning….)◦ How and why is learning facilitated (theory of
assessment driving learning)◦ ………..