debbie jaarsma

European College of Veterinary Internal Medicine Congress2012

Debbie Jaarsma ([email protected])Academic Medical Centre, University of Amsterdam

The Netherlands

With many thanks to prof. dr. Cees van der Vleuten, Maastricht University

General principles in

assessment of

professional competence

SHORT BIOGRAPHY

� VETERINARIAN BY TRAINING

� SPECIALIST TRAINING PATHOLOGY AT FACULTY OF VETERINARY MEDICINE, UTRECHT UNIVERSITY (FVMU)

� TEACHER UNIVERSITY OF APPLIED ANIMAL SCIENCES AT DEN BOSCH

� SALES MANAGER FARMACEUTICAL INDUSTRY (JOHNSON & JOHNSON)

� PhD IN VETERINARY EDUCATION AT FVMU*

� ASSISTANT PROFESSOR ‘QUALITY IMPROVEMENT VETERINARY EDUCATION’ AT FVMU

� FULL PROFESSOR ‘EVIDENCE BASED EDUCATION’ AT

ACADEMIC MEDICAL CENTRE, UNIVERSITY OF AMSTERDAM

* Dissertation title: Developments in Veterinary Medical Education – Intentions, Perceptions, Learning Processes and Outcomes

(http://igitur-archive.library.uu.nl/dissertations/2008-1014-200404/UUindex.html)

Welcome everyone!

� Are you familiar with assessment literature?

A. Not at all

B. To some extent

C. Pretty familiar

D. As my backyard

Overview of presentation

� General principles of assessment

� Implications for practice

� Criteria for good assessment

� Final note

Assessment:

“involves testing, measuring, collecting and

combining information, and providing feedback”

� Drives and stimulates learning

� Provides information on educational efficacy

institutions/teachers

� Protects patients and society

Simple competence model

Miller GE. The assessment of clinical skills/competence/performance. Academic Medicine (Supplement) 1990; 65: S63-S7.

Knows

Shows how

Knows how

Does

Pro

fessio

nal auth

enticity

Cognition

Behaviour

Simple competence model

Miller GE. The assessment of clinical skills/competence/performance. Academic Medicine (Supplement) 1990; 65: S63-S7.

Knows

Shows how

Knows how

Does

Pro

fessio

nal auth

enticity

Establishedtechnology

Under construction

Knows

Shows how

Knows how

Does

Stimulus format: fact orientedResponse format: written, open, computer-based, oral

Stimulus format: (patient) scenario, simulationResponse format: written, open, oral, computer-based

Stimulus format: hands-on (patient) standardized scenario or simulationResponse format: direct observation, checklists, rating scales

Stimulus format: habitual practice performanceResponse format: direct observation, checklists, rating scales, narratives

Assessment formats used

What do you think?

� Clinical reasoning, particularly with experts,

is:

A. More generic than context specific

B. More context specific than generic

C. Equally specific and generic

Assessment principle 1

� Competence is specific, not generic

TestingTime inHours

1

2

4

8

MCQ1

0.62

0.76

0.93

0.93

Case-BasedShortEssay2

0.68

0.73

0.84

0.82

PMP1

0.36

0.53

0.69

0.82

OralExam3

0.50

0.69

0.82

0.90

LongCase4

0.60

0.75

0.86

0.90

OSCE5

0.47

0.64

0.78

0.88

PracticeVideoAssess-ment7

0.62

0.76

0.93

0.93

1Norcini et al., 19852Stalenhoef-Halling et al., 19903Swanson, 1987

4Wass et al., 20015Petrusa, 20026Norcini et al., 1999

MiniCEX6

0.73

0.84

0.92

0.96

7Ram et al., 19998Gorter, 2002

TestingTime inHours

1

2

4

8

MCQ1

0.62

0.76

0.93

0.93


0.68

0.73

0.84

0.82

PMP1

0.36

0.53

0.69

0.82

OralExam3

0.50

0.69

0.82

0.90

LongCase4

0.60

0.75

0.86

0.90

OSCE5

0.47

0.64

0.78

0.88


0.62

0.76

0.93

0.93



MiniCEX6

0.73

0.84

0.92

0.96


Competence is not generic

Practical implications

� Competence is specific, not generic

� One measure is no measure� Increase sampling (across content, examiners, patients@)

within measures

� Combine information across measures and across time

What do you think?

� Multiple choice questions are objective and

therefore more reliable.

A. True

B. False


� Objectivity is not the same as reliability

TestingTime inHours

1

2

4

8

MCQ1

0.62

0.76

0.93

0.93


0.68

0.73

0.84

0.82

PMP1

0.36

0.53

0.69

0.82

OralExam3

0.50

0.69

0.82

0.90

LongCase4

0.60

0.75

0.86

0.90

OSCE5

0.47

0.64

0.78

0.88


0.62

0.76

0.93

0.93



MiniCEX6

0.73

0.84

0.92

0.96


TestingTime inHours

1

2

4

8

MCQ1

0.62

0.76

0.93

0.93


0.68

0.73

0.84

0.82

PMP1

0.36

0.53

0.69

0.82

OralExam3

0.50

0.69

0.82

0.90

LongCase4

0.60

0.75

0.86

0.90

OSCE5

0.47

0.64

0.78

0.88


0.62

0.76

0.93

0.93



MiniCEX6

0.73

0.84

0.92

0.96


Objectivity is not the same as reliability

Reliability oral examination

TestingTime inHours

1

2

4

8

Two NewExaminers

forEach Case

0.61

0.76

0.86

0.93

NewExaminer

forEach Case

0.50

0.69

0.82

0.90

SameExaminer

forAll Cases

0.31

0.47

0.47

0.48

Numberof

Cases

2

4

8

12

TestingTime inHours

1

2

4

8

Two NewExaminers

forEach Case

0.61

0.76

0.86

0.93

NewExaminer

forEach Case

0.50

0.69

0.82

0.90

SameExaminer

forAll Cases

0.31

0.47

0.47

0.48

Numberof

Cases

2

4

8

12

Swanson, 1987


� Objectivity is not the same as reliability

� Don’t trivialize assessment (and compromise on validity) with unnecessary objectification and standardization

� Don’t be afraid of holistic judgement

� Sample widely across sources of subjective influences (raters, examiners, patients)

What do you think?

� Which format measures ‘understanding’

best?

A. MCQs

B. Essay questions

C. Orals

D. All of the above


� What is being measured is more determined

by the stimulus format than by the response

format.

Knows

Shows how

Knows how

Does

Stimulus format: fact orientedResponse format: written, open, computer-based, oral

Stimulus format: (patient) scenario, simulationResponse format: written, open, oral, computer-based

Stimulus format: hands-on (patient) standardized scenario or simulationResponse format: direct observation, checklists, rating scales

Stimulus format: habitual practice performanceResponse format: direct observation, checklists, rating scales, narratives

Assessment formats used

Empirical findings

� Once reliable (= sufficient sampling):

correlations across formats are huge

� Cognitive activities follow the task you pose

in the stimulus format

Moving from assessing knows1

Knows:What is arterial blood gas analysis most likely to show in dogs with cardiogenic shock?

A. Hypoxemia with normal pHB. Metabolic acidosisC. Metabolic alkalosisD. Respiratory acidosisE. Respiratory alkalosis

1Case, S. M., & Swanson, D. B. (2002). Constructing written test questions for the basic and clinical sciences. Philadelphia: National Board of Medical Examiners.

To assessing knowing how1

Knowing How:A 7-year-old bitch is brought to the emergency department. She is restless and panting. On admission, her temperature is 37.7 C, pulse is 120/min, and resp are 40/min. During the next hour, she becomes increasingly stuporous, pulse increases to 140/min, and respirations increase to 60/min. Blood gas analysis is most likely to show:

A. Hypoxemia with normal pHB. Metabolic acidosisC. Metabolic alkalosisD. Respiratory acidosisE. Respiratory alkalosis

1Case, S. M., & Swanson, D. B. (2002). Constructing written test questions for the basic and clinical sciences. Philadelphia: National Board of Medical Examiners.


� What is being measured is more determined by the stimulus format than by the response format

� Don’t be married to a format (e.g. essays)

� Worry about improving the stimulus format

� Make the stimulus as (clinically) authentic as possible (e.g. in MCQs, OSCEs)

What do you think?

� The best strategy for constructing good test

material is:

A. Training staff to write test material

B. Peer review of test material


� Validity can be ‘built-in’

Empirical findings

� Validity is a matter of good quality assurance

around item construction (Verhoeven et al 1999)

� Generally, medical (and veterinary) schools

can do a much better job (Jozewicz et al 2002)

Item review process

anatomy

physiology

int medicine

surgery

psychology

item poolreview

committee

test

administration

item analyses

student

comments

Info to users

item bank

Pre-test review Post-test review

anatomy

physiology

int medicine

surgery

psychology

anatomyanatomy

physiologyphysiology

int medicineint medicine

surgerysurgery

psychologypsychology

item poolitem poolreview

committee

review

committee

test

administration

test

administration

item analyses

student

comments

item analyses

student

comments

Info to usersInfo to users

item bankitem bank

Pre-test reviewPre-test review Post-test reviewPost-test review


� Validity can be ‘built-in’

� Outcomes and content need to be clear and known to the item constructors AND the learner

� Assessment is as good as you are prepared to put into it

� Develop quality assurance cycles around test development

� Share (good) test material across institutions

What do you think?

� What drives students’ learning most?

A. The teacher

B. The curriculum

C. The assessment


� Assessment drives learning

Curriculum

Teacher

Assessment

Student

Assessment

Learner

Assessment may drive learning through:� Content� Format� Programming/scheduling� Regulations� ..........

An alternative view

Empirical findings

� The relationship between assessment and learning

is complex

� Learning strategy is mediated by the perception of

the students on the assessment

� Summative assessment systems often drive in a

negative way

� Formative feedback has dramatic impact on learning

� Learners want feedback (more than grades)


� Assessment drives learning

� For every evaluative action there is an educational reaction

� Verify and monitor the impact of assessment : many intended effects are not actually effective

� No assessment without feedback!

� Embed the assessment within the learning programme

� Use the assessment strategically to reinforce desirable learning behaviours

What do you think?

� The best method of assessment is:

A. Vignette based MCQs

B. Orals

C. Portfolios

D. None of these


� No single method can do it all

Empirical findings

� One measure is no measure

� All methods have limitations (no single superior

method exists)

� Different methods may serve a different function

� In combination, information from various

methods provide a richer picture and combines

formative and summative functions


� No single method can do it all

� Use a cocktail of methods across the competency pyramid

� Arrange methods in a programme of assessment

� Any method may have utility

� Compare assessment design with curriculum design� Responsible people/committees

� Use an overarching structure

� Involve your stakeholders

� Implement, monitor and change (assessment programmes ‘wear out’)

Assessment principles

1. Competence is specific, not generic

2. Objectivity is not the same as reliability

3. What is being measured is more determined

by the stimulus format than by the response

format

4. Validity can be ‘built-in’

5. Assessment drives learning

6. No single method can do it all

Criteria for good assessment

� Validity or Coherence

� Reproducibility or Consistency

� Feasibility

� Educational effect

� Catalytic effect

� Acceptability

Theme group “Criteria for Good assessment”, Norcini et al.

Medical Teacher 2011

Finally

� Assessment in medical education has a rich history of

research and development with clear practical

implications

� Veterinary education is catching up @..

� Assessment is much more than psychometrics; it

involves educational design

� Lots of exciting developments lie still ahead of us!

Literature� Cillier, F. (In preparation). Assessment impacts on learning, you say? Please explain how. The impact of summative

assessment on how medical students learn.

� Driessen, E., van Tartwijk, J., van der Vleuten, C., & Wass, V. (2007). Portfolios in medical education: why do they meet with mixed success? A systematic review. Med Educ, 41(12), 1224-1233.

� Driessen, E. W., Van der Vleuten, C. P. M., Schuwirth, L. W. T., Van Tartwijk, J., & Vermunt, J. D. (2005). The use of qualitative

� research criteria for portfolio assessment as an alternative to reliability evaluation: a case study. Medical Education, 39(2), 214-220.

� Dijkstra, J. , Schuwirth, L. & Van der Vleuten (In preparation) A model for designing assessment programmes.

� Eva, K. W., & Regehr, G. (2005). Self-assessment in the health professions: a reformulation and research agenda. Acad Med, 80(10 Suppl), S46-54.

� Gorter, S., Rethans, J. J., Van der Heijde, D., Scherpbier, A., Houben, H., Van der Vleuten, C., et al. (2002). Reproducibility of clinical performance assessment in practice using incognito standardized patients. Medical Education, 36(9), 827-832.

� Govaerts, M. J., Van der Vleuten, C. P., Schuwirth, L. W., & Muijtjens, A. M. (2007). Broadening Perspectives on Clinical Performance Assessment: Rethinking the Nature of In-training Assessment. Adv Health Sci Educ Theory Pract, 12, 239-260.

� Hodges, B. (2006). Medical education and the maintenance of incompetence. Med Teach, 28(8), 690-696.

� Jozefowicz, R. F., Koeppen, B. M., Case, S. M., Galbraith, R., Swanson, D. B., & Glew, R. H. (2002). The quality of in-house medical school examinations. Academic Medicine, 77(2), 156-161.

� Meng, C. (2006). Discipline-specific or academic ? Acquisition, role and value of higher education competencies., PhD Dissertation, Universiteit Maastricht, Maastricht.

� Norcini, J. J., Swanson, D. B., Grosso, L. J., & Webster, G. D. (1985). Reliability, validity and efficiency of multiple choice question and patient management problem item formats in assessment of clinical competence. Medical Education, 19(3), 238-247.

� Papadakis, M. A., Hodgson, C. S., Teherani, A., & Kohatsu, N. D. (2004). Unprofessional behavior in medical school is associated with subsequent disciplinary action by a state medical board. Acad Med, 79(3), 244-249.

� Papadakis, M. A., A. Teherani, et al. (2005). "Disciplinary action by medical boards and prior behavior in medical school." NEngl J Med 353(25): 2673-82.

� Papadakis, M. A., G. K. Arnold, et al. (2008). "Performance during internal medicine residency training and subsequent disciplinary action by state licensing boards." Annals of Internal Medicine 148: 869-876.

Literature� Petrusa, E. R. (2002). Clinical performance assessments. In G. R. Norman, C. P. M. Van der Vleuten & D. I. Newble (Eds.),

International Handbook for Research in Medical Education (pp. 673-709). Dordrecht: Kluwer Academic Publisher.

� Ram, P., Grol, R., Rethans, J. J., Schouten, B., Van der Vleuten, C. P. M., & Kester, A. (1999). Assessment of general practitioners by video observation of communicative and medical performance in daily practice: issues of validity, reliability and feasibility. Medical Education, 33(6), 447-454.

� Stalenhoef- Halling, B. F., Van der Vleuten, C. P. M., Jaspers, T. A. M., & Fiolet, J. B. F. M. (1990). A new approach to assessign clinical problem-solving skills by written examination: Conceptual basis and initial pilot test results. Paper presented at the Teaching and Assessing Clinical Competence, Groningen.

� Swanson, D. B. (1987). A measurement framework for performance-based tests. In I. Hart & R. Harden (Eds.), Further developments in Assessing Clinical Competence (pp. 13 - 45). Montreal: Can-Heal publications.

� van der Vleuten, C. P., Schuwirth, L. W., Muijtjens, A. M., Thoben, A. J., Cohen-Schotanus, J., & van Boven, C. P. (2004). Cross institutional collaboration in assessment: a case on progress testing. Med Teach, 26(8), 719-725.

� Van der Vleuten, C. P. M., & D. Swanson, D. (1990). Assessment of Clinical Skills With Standardized Patients: State of the Art. Teaching and Learning in Medicine, 2(2), 58 - 76.

� Van der Vleuten, C. P. M., & Newble, D. I. (1995). How can we test clinical reasoning? The Lancet, 345, 1032-1034.

� Van der Vleuten, C. P. M., Norman, G. R., & De Graaff, E. (1991). Pitfalls in the pursuit of objectivity: Issues of reliability. Medical Education, 25, 110-118.

� Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (2005). Assessment of professional competence: from methods to programmes. Medical Education, 39, 309-317.

� Van der Vleuten, C. P. M., & Schuwirth, L. W. T. (Under editorial review). On the value of (aggregate) human judgment. Med Educ.

� Van Luijk, S. J., Van der Vleuten, C. P. M., & Schelven, R. M. (1990). The relation between content and psychometric characteristics in performance-based testing. In W. Bender, R. J. Hiemstra, A. J. J. A. Scherp bier & R. P. Zwierstra (Eds.), Teaching and Assessing Clinical Competence. (pp. 202-207). Groningen: Boekwerk Publications.

� Wass, V., Jones, R., & Van der vleuten, C. (2001). Standardized or real patients to test clinical competence? The long case revisited. Medical Education, 35, 321-325.

� Williams, R. G., Klamen, D. A., & McGaghie, W. C. (2003). Cognitive, social and environmental sources of bias in clinical performance ratings. Teaching and Learning in Medicine, 15(4), 270-292.

debbie jaarsma

Documents