diagnostic measurement and reporting on concept inventories

21
1 Diagnostic Measurement and Diagnostic Measurement and Reporting on Concept Reporting on Concept Inventories Inventories Lou DiBello and Jim Pellegrino DRK-12 PI Meeting Washington, DC December 3, 2010

Upload: abba

Post on 05-Jan-2016

72 views

Category:

Documents


2 download

DESCRIPTION

Diagnostic Measurement and Reporting on Concept Inventories. Lou DiBello and Jim Pellegrino DRK-12 PI Meeting Washington, DC December 3, 2010. Acknowledge NSF Support. For substantial portions of the work presented here we acknowledge NSF support under projects: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Diagnostic Measurement and Reporting on Concept Inventories

1

Diagnostic Measurement and Diagnostic Measurement and Reporting on Concept Reporting on Concept

Inventories Inventories

Lou DiBello and Jim PellegrinoDRK-12 PI Meeting

Washington, DCDecember 3, 2010

Page 2: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

2

Acknowledge NSF Support

For substantial portions of the work presented here we acknowledge NSF support under projects:

REESE-TTCI Project (NSF #0918552; Collaborative Research: Integrating Cognition and Measurement with Conceptual Knowledge: Establishing the Validity and Diagnostic Capacity)

DRK-12 Project (NSF #DRL-0732090; Evaluation of the Cognitive, Psychometric, and Instructional Affordances of Curriculum-Embedded Assessments: A Comprehensive Validity-Based Approach)

CCLI Project (NSF # 0920242; Collaborative Research: ciHUB, a Virtual Community to Support Research, Development, and Dissemination of Concept Inventories)

REESE Synthesis (NSF #0815065; Practical and Theoretical Foundations for Informative Classroom Assessment: A Synthesis of Cognitive Science, Curriculum, Instruction, and Measurement)

Page 3: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

3

General Features of CIs

CIs typically assess a relatively narrow domain—“the concept of force” in physics (FCI, Hestenes); the area of “statics,” (CATS, Steif & Dantzler); or “heat transfer, thermodynamics, and fluid mechanics” (TTCI, Streveler, Olds, Miller, Nelson)

CIs attempt to measure deeper conceptual understanding, not just rote facts or procedures

CIs typically are used in courses in high school, college, community college, & technical schools

Page 4: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

4

Unresolved Issues Related to CI’s & Their Applications

Rigorous empirical support for the diagnostic and formative instructional usefulness of CIs has yet to be shown

General need to validate CIs’ conceptual underpinnings and to find ways to reliably extract useful diagnostic information for instructional application

Page 5: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

5

Diagnostic Modelingfor CIs

CI development framework claims that each item taps particular conceptual knowledge

We attempt to identify a set of concepts & skills for diagnostic reporting that simultaneously represents the CIs conceptual framework tapped by the full set of items—finding the “sweet spot”

Develop hypothesized matrix of items x diagnostic skills—we assume multivariate skill-item mapping

Apply multivariate methods to test and refine the theory, & extract item and person level diagnostic information, validate the skills framework & inventory

Page 6: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

6

Diagnostic Goals Derive person and population information:

For each student a “skills profile” telling mastery or non-mastery for each skill: (0,0,1,*,1,1,0,1,1,0) {* means not sure about skill 4}

Derive item and test information: Estimate item parameters that represent measurement features of

items and skills

Critique and evaluate the model-based analysis Reasonability, reliability, model-data fit,

Examine the classroom usefulness of the model-based information

Are student skills profiles useful for students and instructors? Can information about skills and items improve CI use and impact?

Page 7: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

7

Example of ApplyingDiagnostic Analysis to a CI

CATS is multiple choice test with 27 questions and multiple choice distractors developed by first asking open ended questions and taking account of common student errors

Santiago-Román built a “skills” framework consisting of 10 skills for diagnostic reporting

Used Fusion Model/Arpeggio to analyze CATS data (DiBello & Stout, 2007)

Page 8: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

8

A General Diagnostic Modeling Procedure for CIs

Begin with the conceptual framework of CATS (or any specific CI)

Develop from that framework a set of skills or conceptual understandings for diagnostic measurement and reporting

Map skills to items—Q matrix Construct diagnostic model using skills & Q matrix Perform the model-based statistical analysis and

evaluate and critique the results Modify skills, items or aspects of the model Iterate the analysis process

Page 9: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

9

Q matrix—Strong Cognitive & Conceptual Assumptions

Skill 1 Skill 2 Skill 3 …

Item 1 1 0 1

Item 2 0 1 1

Item 3 1 1 0

1 = indicated skill is required for that item

Page 10: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

10

Clusters of Concepts for CATS (Steif & Dantzler, 2005, p. 363)

Page 11: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

11

Sample Item #1 from CATS (Steif)

Page 12: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

12

Four-Phase Procedure to investigate diagnostics in CATS (Santiago-Román, PhD thesis, Purdue, 2009)

Phase 1 – Identify “skills” for diagnostic reports Build upon the conceptual foundation for CATS Which cognitive attributes are required for each question in CATS?

Phase 2 – Estimate Fusion Model parameters Estimate model parameters Infer skills profiles for students Evaluate reasonability, model estimation and model-data fit

Phase 3 – Evaluate model fit and reliability Model-data fit Reliability of skill profile reports

Phase 4 – Consider model implications Compute expected student skill patterns Which cognitive attributes estimated as more/less difficult Are any modifications indicated to skills or items?

Page 13: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

13

Second representation (after conversations with Dr. Steif) (Santiago-Román, 2009)

Page 14: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

14

Initial cognitive attributes for each item(Santiago-Román, 2009)

Page 15: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

15

Final “Skills” and their relation toCATS Framework

Page 16: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

16

Student Scores &Skill Profiles

1 = master; 0 = NON-master; 9 = uncertain

Stu

dent

Ale

x

Am

y

And

y

Nat

han

Nin

a

Noa

h

Ber

nie

Bea

vis

Ber

yl

score 21 21 21 17 17 17 12 12 12Equivalence S01 0 1 1 1 0 1 1 0 0Newton's 3rd Law S02 0 0 1 1 0 0 9 0 1Contact Forces S03 1 1 0 1 1 1 1 0 0Representation & Tension in Ropes S04 1 1 1 0 1 1 0 0 1Friction Force S05 1 1 1 0 0 1 0 0 0Couples and Equilibrium S06 1 1 1 0 1 1 0 1 0Representation of Forces S07 1 1 0 1 1 1 0 0 1Pin on Slot S08 1 1 1 1 1 0 1 1 1Roller Support S09 1 0 1 1 1 1 0 9 0Fixed Support S10 1 1 1 1 1 1 0 1 1

Page 17: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

17

Population Proportion of Masters for each skillSkill Skill # p k

Newton's 3rd Law S02 22%Friction Force S04 24%Contact Forces S03 28%Couples and Equilibrium S10 28%Equivalence S01 31%Representation & Tension in Ropes S08 52%Fixed Support S07 57%Representation of Forces S09 58%Roller Support S06 60%Pin on Slot S05 61%

pk = estimated population proportion of masters of skill k

Page 18: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

18

Model-data fit outcomes: Item Masters vs Item Non-masters

pdiff = 0.6517

Mp+ = 0.8899; NMp+ = 0.2332

+ = proportion correct by item masters

- = proportion correct by item NON=masters

Page 19: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

19

Diagnostic Results

Results (Santiago-Román, 2009) : Skills were identified consonant with foundations Student profiles were generated Estimated parameter values were reasonable—for

example they identified easier and harder skills Successfully fit diagnostic model to data. Fit

indicators nearly twice as good as similar indices for retro-fit assessments: Average across all items: Pdiff = 0.6517

Page 20: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

20

Next Steps for CATS Consider implications of current results

Item quality and diagnostic utility Diagnosticity of overall instrument Conceptual quality and coherence of the model

Engage in external validation studies Student protocols Validation with other student samples

Add information from the distractors to the modeling effort & diagnostic output

Page 21: Diagnostic Measurement and Reporting on Concept Inventories

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

21

Next Steps for other CI’s and other STEM Assessments

Applicability of these procedures to other CIs and other STEM assessments Develop a conceptual model Collect adequate data to perform model analyses Iteratively refine & interpret

Interpretive use of student & class information for instructional improvement

Future development of “item pools” and “testlets” for web based administration