moving beyond the psychometric discourse: a model for programmatic assessment researching medical...

Moving beyond the psychometric discourse: A model for programmatic assessmentResearching Medical EducationAssociation for the Study of Medical educationLondon, 23 November 2010

Cees van der VleutenMaastricht UniversityThe Netherlands

Powerpoint: www.fdg.unimaas.nl/educ/cees/asme

The first step is to measure whatever can be easily measured. This is ok as far as it goes.

The second step is to disregard that which can't be easily measured or to give it an arbitrary quantitative value. This is artificial and misleading.

The third step is to presume that what can't be measured easily really isn't important. This is blindness.

The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide.

—Charles Handy, The Empty Raincoat, page 219.

The Mc Namara fallacy refers to Robert McNamara, The Secretary of defence from 1961-1968 and his belief that the body count was a good way of measuring how the war was going.

As long as more Vietcong were being killed than US forces, the war was being won

Planned arrangement of individual assessments in a learning program

Quality compromises are made for individual methods/data points but not for the program as a whole

Programmatic assessment is fit for purpose:◦ Assessment for learning◦ Robust decision making over learner’s

performance◦ Improving the learned curriculum

Proposed in 2005 (Van der Vleuten & Schuwirth, 2005)

Multitude of quality criteria and self-assessment instrument of program quality (Baartman et al, 2006, 2007)

Framework of design (Dijkstra et al, 2009) and design guidelines (Dijkstra et al, in prep)

Need for theoretical model of programmatic assessment for program in action

Any point measurement or single data point of assessment is flawed

Standardized methods of assessment can have ´built-in´ validity◦ Quality control around test construction and administration◦ Assessment ‘technology’ is available

Validity in unstandardized methods lies in the users of the instruments, more than in the instruments

1Theoretical/empirical account:Van der Vleuten, C. P., Schuwirth, L. W., Scheele, F., Driessen, E. W., & Hodges, B. (2010). The assessment of professional competence: building blocks for theory development. Best Pract Res Clin Obstet Gynaecol, 24, 703-719.

Assessment drives learning◦ Requires richness or meaningfulness of data◦ Qualitative, narrative information carries a lot of weight◦ Theoretical understanding emerging (Cilliers et al., 2010, under editorial

review)

Stakes (formative and summative assessment) is a continuum

N of datapoints need to be proportional to stakes Expert judgment is imperative for assessing complex

competencies and when diverse information is to be combined◦ Sampling strategies can reduce random error◦ Procedural strategies can reduce bias

1Van der Vleuten et al., 2010

Time

AssessmentActivities

TrainingActivities

SupportingActivities

Artifacts of learning- Outcome artifacts: Products of learning tasks- Process artifacts: Learning or working activities

v v

Learning task- PBL case- Patient encounter- Operation- Project- Lecture- Self-study

Time


TrainingActivities


Individual data points of assessment- Fit for purpose- Multiple/all levels of Miller- Learning oriented, Information rich documentation, meaningful (quantitative, qualitative)- Low stake

Certification of mastery-oriented learning tasks- Rescuscitation- Normal delivery of infant

v v

Time


TrainingActivities


Supportive social interaction- Coaching/mentoring/supervision- Peer interaction (intervision)

(P)Reflective activity by learner- Interpretation of feedback- Planning new learning objectives and tasks

v v

Time


TrainingActivities


Intermediate evaluation- Aggregate information held against performance standard- Committee of examiners- Decision making: diagnostic, therapeutic, prognostic - Remediation oriented, not repetition oriented- Informative- Longitudinal- Intermediate stake

v v

Firewall dilemma- Dilemma between access to rich information and compromising relationship supporting person(s) and learner

Time


TrainingActivities


v v v v

Time


TrainingActivities


v v v v v v

Final evaluation- Aggregate information held against performance standard- Committee of examiners- Pass/fail(/distinction) high stake decision- Based on many data points and rich information- Decision trustworthiness optimized though procedural measures, inspired qualitative methodology strategies - High stake

Quantitative QualitativeCriterion approach approach

Truth value Internal validity CredibilityApplicability External validity TransferabilityConsistency Reliability DependabilityNeutrality Objectivity Confirmability

Strategy toestablish trustworthiness Criteria

Potential Assessment Strategy (sample)

Credibility Prolonged engagement Training of examiners

Triangulation Tailored volume of expert judgment based on certainty of information

Peer examination Benchmarking examiners

Member checking Incorporate learner view

Structural coherence Scrutiny of committee inconsistencies

Transferability Time sampling Judgment based on broad sample of data points

Thick description Justify decisions

Dependability Stepwise replication Use multiple assessors who have credibility

Confirmability Audit Give learners the possibility to appeal to the assessment decision

v v

Time


TrainingActivities


v v v v

v v

Time


TrainingActivities


v v v v

v v

Time


TrainingActivities


v v v v

Resources/cost (do fewer things well rather than doing more but poorly)

Bureaucracy, Trivialization, Reductionism Legal restrictions Novelty/unknown

Assessment for learning combined with rigorous decision making

Post-psychometric era of individual instruments Theory driven assessment design Infinite research opportunities

◦ Multiple formalized models of assessment (e.g. psychometrics, Bayesian approaches to information gathering, new conceptions of validity……)

◦ Judgment (bias, expertise, learning….)◦ How and why is learning facilitated (theory of

assessment driving learning)◦ ………..

moving beyond the psychometric discourse: a model for programmatic assessment researching medical...

Documents

summative assessment

wholeprogrammatic assessment

vleuten schuwirth

narrative information

diverse information

design guidelines dijkstra

individual methodsdata

information rich documentation