diagnostic measurement and reporting on concept inventories
DESCRIPTION
Diagnostic Measurement and Reporting on Concept Inventories. Lou DiBello and Jim Pellegrino DRK-12 PI Meeting Washington, DC December 3, 2010. Acknowledge NSF Support. For substantial portions of the work presented here we acknowledge NSF support under projects: - PowerPoint PPT PresentationTRANSCRIPT
1
Diagnostic Measurement and Diagnostic Measurement and Reporting on Concept Reporting on Concept
Inventories Inventories
Lou DiBello and Jim PellegrinoDRK-12 PI Meeting
Washington, DCDecember 3, 2010
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
2
Acknowledge NSF Support
For substantial portions of the work presented here we acknowledge NSF support under projects:
REESE-TTCI Project (NSF #0918552; Collaborative Research: Integrating Cognition and Measurement with Conceptual Knowledge: Establishing the Validity and Diagnostic Capacity)
DRK-12 Project (NSF #DRL-0732090; Evaluation of the Cognitive, Psychometric, and Instructional Affordances of Curriculum-Embedded Assessments: A Comprehensive Validity-Based Approach)
CCLI Project (NSF # 0920242; Collaborative Research: ciHUB, a Virtual Community to Support Research, Development, and Dissemination of Concept Inventories)
REESE Synthesis (NSF #0815065; Practical and Theoretical Foundations for Informative Classroom Assessment: A Synthesis of Cognitive Science, Curriculum, Instruction, and Measurement)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
3
General Features of CIs
CIs typically assess a relatively narrow domain—“the concept of force” in physics (FCI, Hestenes); the area of “statics,” (CATS, Steif & Dantzler); or “heat transfer, thermodynamics, and fluid mechanics” (TTCI, Streveler, Olds, Miller, Nelson)
CIs attempt to measure deeper conceptual understanding, not just rote facts or procedures
CIs typically are used in courses in high school, college, community college, & technical schools
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
4
Unresolved Issues Related to CI’s & Their Applications
Rigorous empirical support for the diagnostic and formative instructional usefulness of CIs has yet to be shown
General need to validate CIs’ conceptual underpinnings and to find ways to reliably extract useful diagnostic information for instructional application
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
5
Diagnostic Modelingfor CIs
CI development framework claims that each item taps particular conceptual knowledge
We attempt to identify a set of concepts & skills for diagnostic reporting that simultaneously represents the CIs conceptual framework tapped by the full set of items—finding the “sweet spot”
Develop hypothesized matrix of items x diagnostic skills—we assume multivariate skill-item mapping
Apply multivariate methods to test and refine the theory, & extract item and person level diagnostic information, validate the skills framework & inventory
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
6
Diagnostic Goals Derive person and population information:
For each student a “skills profile” telling mastery or non-mastery for each skill: (0,0,1,*,1,1,0,1,1,0) {* means not sure about skill 4}
Derive item and test information: Estimate item parameters that represent measurement features of
items and skills
Critique and evaluate the model-based analysis Reasonability, reliability, model-data fit,
Examine the classroom usefulness of the model-based information
Are student skills profiles useful for students and instructors? Can information about skills and items improve CI use and impact?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
7
Example of ApplyingDiagnostic Analysis to a CI
CATS is multiple choice test with 27 questions and multiple choice distractors developed by first asking open ended questions and taking account of common student errors
Santiago-Román built a “skills” framework consisting of 10 skills for diagnostic reporting
Used Fusion Model/Arpeggio to analyze CATS data (DiBello & Stout, 2007)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
8
A General Diagnostic Modeling Procedure for CIs
Begin with the conceptual framework of CATS (or any specific CI)
Develop from that framework a set of skills or conceptual understandings for diagnostic measurement and reporting
Map skills to items—Q matrix Construct diagnostic model using skills & Q matrix Perform the model-based statistical analysis and
evaluate and critique the results Modify skills, items or aspects of the model Iterate the analysis process
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
9
Q matrix—Strong Cognitive & Conceptual Assumptions
Skill 1 Skill 2 Skill 3 …
Item 1 1 0 1
Item 2 0 1 1
Item 3 1 1 0
…
1 = indicated skill is required for that item
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
10
Clusters of Concepts for CATS (Steif & Dantzler, 2005, p. 363)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
11
Sample Item #1 from CATS (Steif)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
12
Four-Phase Procedure to investigate diagnostics in CATS (Santiago-Román, PhD thesis, Purdue, 2009)
Phase 1 – Identify “skills” for diagnostic reports Build upon the conceptual foundation for CATS Which cognitive attributes are required for each question in CATS?
Phase 2 – Estimate Fusion Model parameters Estimate model parameters Infer skills profiles for students Evaluate reasonability, model estimation and model-data fit
Phase 3 – Evaluate model fit and reliability Model-data fit Reliability of skill profile reports
Phase 4 – Consider model implications Compute expected student skill patterns Which cognitive attributes estimated as more/less difficult Are any modifications indicated to skills or items?
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
13
Second representation (after conversations with Dr. Steif) (Santiago-Román, 2009)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
14
Initial cognitive attributes for each item(Santiago-Román, 2009)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
15
Final “Skills” and their relation toCATS Framework
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
16
Student Scores &Skill Profiles
1 = master; 0 = NON-master; 9 = uncertain
Stu
dent
Ale
x
Am
y
And
y
Nat
han
Nin
a
Noa
h
Ber
nie
Bea
vis
Ber
yl
score 21 21 21 17 17 17 12 12 12Equivalence S01 0 1 1 1 0 1 1 0 0Newton's 3rd Law S02 0 0 1 1 0 0 9 0 1Contact Forces S03 1 1 0 1 1 1 1 0 0Representation & Tension in Ropes S04 1 1 1 0 1 1 0 0 1Friction Force S05 1 1 1 0 0 1 0 0 0Couples and Equilibrium S06 1 1 1 0 1 1 0 1 0Representation of Forces S07 1 1 0 1 1 1 0 0 1Pin on Slot S08 1 1 1 1 1 0 1 1 1Roller Support S09 1 0 1 1 1 1 0 9 0Fixed Support S10 1 1 1 1 1 1 0 1 1
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
17
Population Proportion of Masters for each skillSkill Skill # p k
Newton's 3rd Law S02 22%Friction Force S04 24%Contact Forces S03 28%Couples and Equilibrium S10 28%Equivalence S01 31%Representation & Tension in Ropes S08 52%Fixed Support S07 57%Representation of Forces S09 58%Roller Support S06 60%Pin on Slot S05 61%
pk = estimated population proportion of masters of skill k
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
18
Model-data fit outcomes: Item Masters vs Item Non-masters
pdiff = 0.6517
Mp+ = 0.8899; NMp+ = 0.2332
+ = proportion correct by item masters
- = proportion correct by item NON=masters
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
19
Diagnostic Results
Results (Santiago-Román, 2009) : Skills were identified consonant with foundations Student profiles were generated Estimated parameter values were reasonable—for
example they identified easier and harder skills Successfully fit diagnostic model to data. Fit
indicators nearly twice as good as similar indices for retro-fit assessments: Average across all items: Pdiff = 0.6517
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
20
Next Steps for CATS Consider implications of current results
Item quality and diagnostic utility Diagnosticity of overall instrument Conceptual quality and coherence of the model
Engage in external validation studies Student protocols Validation with other student samples
Add information from the distractors to the modeling effort & diagnostic output
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
21
Next Steps for other CI’s and other STEM Assessments
Applicability of these procedures to other CIs and other STEM assessments Develop a conceptual model Collect adequate data to perform model analyses Iteratively refine & interpret
Interpretive use of student & class information for instructional improvement
Future development of “item pools” and “testlets” for web based administration