diagnostic measurement and reporting on concept inventories

1

Diagnostic Measurement and Diagnostic Measurement and Reporting on Concept Reporting on Concept

Inventories Inventories

Lou DiBello and Jim PellegrinoDRK-12 PI Meeting

Washington, DCDecember 3, 2010

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

2

Acknowledge NSF Support

For substantial portions of the work presented here we acknowledge NSF support under projects:

REESE-TTCI Project (NSF #0918552; Collaborative Research: Integrating Cognition and Measurement with Conceptual Knowledge: Establishing the Validity and Diagnostic Capacity)

DRK-12 Project (NSF #DRL-0732090; Evaluation of the Cognitive, Psychometric, and Instructional Affordances of Curriculum-Embedded Assessments: A Comprehensive Validity-Based Approach)

CCLI Project (NSF # 0920242; Collaborative Research: ciHUB, a Virtual Community to Support Research, Development, and Dissemination of Concept Inventories)

REESE Synthesis (NSF #0815065; Practical and Theoretical Foundations for Informative Classroom Assessment: A Synthesis of Cognitive Science, Curriculum, Instruction, and Measurement)



3

General Features of CIs

CIs typically assess a relatively narrow domain—“the concept of force” in physics (FCI, Hestenes); the area of “statics,” (CATS, Steif & Dantzler); or “heat transfer, thermodynamics, and fluid mechanics” (TTCI, Streveler, Olds, Miller, Nelson)

CIs attempt to measure deeper conceptual understanding, not just rote facts or procedures

CIs typically are used in courses in high school, college, community college, & technical schools



4

Unresolved Issues Related to CI’s & Their Applications

Rigorous empirical support for the diagnostic and formative instructional usefulness of CIs has yet to be shown

General need to validate CIs’ conceptual underpinnings and to find ways to reliably extract useful diagnostic information for instructional application



5

Diagnostic Modelingfor CIs

CI development framework claims that each item taps particular conceptual knowledge

We attempt to identify a set of concepts & skills for diagnostic reporting that simultaneously represents the CIs conceptual framework tapped by the full set of items—finding the “sweet spot”

Develop hypothesized matrix of items x diagnostic skills—we assume multivariate skill-item mapping

Apply multivariate methods to test and refine the theory, & extract item and person level diagnostic information, validate the skills framework & inventory



6

Diagnostic Goals Derive person and population information:

For each student a “skills profile” telling mastery or non-mastery for each skill: (0,0,1,*,1,1,0,1,1,0) {* means not sure about skill 4}

Derive item and test information: Estimate item parameters that represent measurement features of

items and skills

Critique and evaluate the model-based analysis Reasonability, reliability, model-data fit,

Examine the classroom usefulness of the model-based information

Are student skills profiles useful for students and instructors? Can information about skills and items improve CI use and impact?



7

Example of ApplyingDiagnostic Analysis to a CI

CATS is multiple choice test with 27 questions and multiple choice distractors developed by first asking open ended questions and taking account of common student errors

Santiago-Román built a “skills” framework consisting of 10 skills for diagnostic reporting

Used Fusion Model/Arpeggio to analyze CATS data (DiBello & Stout, 2007)



8

A General Diagnostic Modeling Procedure for CIs

Begin with the conceptual framework of CATS (or any specific CI)

Develop from that framework a set of skills or conceptual understandings for diagnostic measurement and reporting

Map skills to items—Q matrix Construct diagnostic model using skills & Q matrix Perform the model-based statistical analysis and

evaluate and critique the results Modify skills, items or aspects of the model Iterate the analysis process



9

Q matrix—Strong Cognitive & Conceptual Assumptions

Skill 1 Skill 2 Skill 3 …

Item 1 1 0 1

Item 2 0 1 1

Item 3 1 1 0

…

1 = indicated skill is required for that item



10

Clusters of Concepts for CATS (Steif & Dantzler, 2005, p. 363)



11

Sample Item #1 from CATS (Steif)



12

Four-Phase Procedure to investigate diagnostics in CATS (Santiago-Román, PhD thesis, Purdue, 2009)

Phase 1 – Identify “skills” for diagnostic reports Build upon the conceptual foundation for CATS Which cognitive attributes are required for each question in CATS?

Phase 2 – Estimate Fusion Model parameters Estimate model parameters Infer skills profiles for students Evaluate reasonability, model estimation and model-data fit

Phase 3 – Evaluate model fit and reliability Model-data fit Reliability of skill profile reports

Phase 4 – Consider model implications Compute expected student skill patterns Which cognitive attributes estimated as more/less difficult Are any modifications indicated to skills or items?



13

Second representation (after conversations with Dr. Steif) (Santiago-Román, 2009)



14

Initial cognitive attributes for each item(Santiago-Román, 2009)



15

Final “Skills” and their relation toCATS Framework



16

Student Scores &Skill Profiles

1 = master; 0 = NON-master; 9 = uncertain

Stu

dent

Ale

x

Am

y

And

y

Nat

han

Nin

a

Noa

h

Ber

nie

Bea

vis

Ber

yl

score 21 21 21 17 17 17 12 12 12Equivalence S01 0 1 1 1 0 1 1 0 0Newton's 3rd Law S02 0 0 1 1 0 0 9 0 1Contact Forces S03 1 1 0 1 1 1 1 0 0Representation & Tension in Ropes S04 1 1 1 0 1 1 0 0 1Friction Force S05 1 1 1 0 0 1 0 0 0Couples and Equilibrium S06 1 1 1 0 1 1 0 1 0Representation of Forces S07 1 1 0 1 1 1 0 0 1Pin on Slot S08 1 1 1 1 1 0 1 1 1Roller Support S09 1 0 1 1 1 1 0 9 0Fixed Support S10 1 1 1 1 1 1 0 1 1



17

Population Proportion of Masters for each skillSkill Skill # p k

Newton's 3rd Law S02 22%Friction Force S04 24%Contact Forces S03 28%Couples and Equilibrium S10 28%Equivalence S01 31%Representation & Tension in Ropes S08 52%Fixed Support S07 57%Representation of Forces S09 58%Roller Support S06 60%Pin on Slot S05 61%

pk = estimated population proportion of masters of skill k



18

Model-data fit outcomes: Item Masters vs Item Non-masters

pdiff = 0.6517

Mp+ = 0.8899; NMp+ = 0.2332

+ = proportion correct by item masters

- = proportion correct by item NON=masters



19

Diagnostic Results

Results (Santiago-Román, 2009) : Skills were identified consonant with foundations Student profiles were generated Estimated parameter values were reasonable—for

example they identified easier and harder skills Successfully fit diagnostic model to data. Fit

indicators nearly twice as good as similar indices for retro-fit assessments: Average across all items: Pdiff = 0.6517



20

Next Steps for CATS Consider implications of current results

Item quality and diagnostic utility Diagnosticity of overall instrument Conceptual quality and coherence of the model

Engage in external validation studies Student protocols Validation with other student samples

Add information from the distractors to the modeling effort & diagnostic output



21

Next Steps for other CI’s and other STEM Assessments

Applicability of these procedures to other CIs and other STEM assessments Develop a conceptual model Collect adequate data to perform model analyses Iteratively refine & interpret

Interpretive use of student & class information for instructional improvement

Future development of “item pools” and “testlets” for web based administration

diagnostic measurement and reporting on concept inventories

Documents