datashop: an educational data mining platform for the learning science community
DESCRIPTION
DataShop: An Educational Data Mining Platform for the Learning Science Community. John Stamper Pittsburgh Science of Learning Center Human-Computer Interaction Institute Carnegie Mellon University. About me. EDM Data. What kinds of data can we collect? What levels? - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/1.jpg)
DataShop: An Educational Data Mining Platform for the Learning Science Community
John StamperPittsburgh Science of Learning CenterHuman-Computer Interaction InstituteCarnegie Mellon University
![Page 2: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/2.jpg)
2
About me.
![Page 3: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/3.jpg)
3
EDM Data
• What kinds of data can we collect?
• What levels?
• What is the right size for EDM discovery?
![Page 4: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/4.jpg)
Data Granularity
4
Finest – TransactionStepsProblemsUnitsTestsClass GradesClass AvgsSchools
Coarsest - ….
We are mostly here
Policy is being made here
![Page 5: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/5.jpg)
5
LearnLabPittsburgh Science of Learning Center (PSLC)• Created to bridge the Chasm between science &
practice– Low success rate (<10%) of randomized field trials
• LearnLab = a socio-technical bridge between lab psychology & schools– E-science of learning & education – Social processes for research-practice engagement
• Purpose: Leverage cognitive theory and computational modeling to identify the conditions that cause robust student learning
![Page 6: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/6.jpg)
• Central Repository– Secure place to store & access research data– Supports various kinds of research
• Primary analysis of study data• Exploratory analysis of course data• Secondary analysis of any data set
• Analysis & Reporting Tools– Focus on student-tutor interaction data– Data Export
• Tab delimited tables you can open with your favorite spreadsheet program or statistical package
• Web services for direct access
DataShop
6
![Page 7: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/7.jpg)
Repository
• Allows for full data management• Controlled access for collaboration• File attachments• Paper attachments• Great for secondary analyses
How much data does DataShop have?
![Page 8: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/8.jpg)
8
How big is DataShop?Domain Files Papers Datasets Student Actions Students Student Hours
Language 64 11 76 6,185,358 6,401 6,820Math 217 50 174 65,566,816 31,774 144,152Science 91 19 90 12,776,806 15,793 42,813Other 18 12 43 6,826,989 11,691 24,676
Total390 92 383 91,355,969 65,659 218,463
As of January 2013
![Page 9: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/9.jpg)
What kinds of data?• By domain based on studies from the Learn Labs
• Data from intelligent tutors
• Data from online instruction
• Data from games
The data is fine grained at a transaction level!
![Page 10: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/10.jpg)
Web Application
![Page 11: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/11.jpg)
• Explore data through the DataShop tools• Where is DataShop?
– http://pslcdatashop.org– Linked from DataShop homepage and learnlab.org
• http://pslcdatashop.web.cmu.edu/about/• http://learnlab.org/technologies/datashop/index.php
Getting to DataShop
11
![Page 12: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/12.jpg)
Creating an account
• On DataShop's home page, click "Sign up now". Complete the form to create your DataShop account.
• If you’re a CMU student/staff/faculty, click “Log in with WebISO” to create your account.
12
![Page 13: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/13.jpg)
Getting access to datasets
• By default, you will have access to the public datasets.
• For access to other datasets, you can request access from dataset
13
![Page 14: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/14.jpg)
DataShop Terminology• Problem: a task for a student to perform that
typically involves multiple steps
• Step: an observable part of the solution to a problem
• Transaction: an interaction between the student and the tutoring system.
![Page 15: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/15.jpg)
DataShop Terminology • Observation: a group of transactions for a particular
student working on a particular step.
• Attempt: transaction; an attempt toward a step
• Opportunity: a chance for a student to demonstrate whether he or she has learned a given knowledge component. An opportunity exists each time a step is present with the associated knowledge component.
![Page 16: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/16.jpg)
DataShop Terminology
• KC: Knowledge component– also known as a skill/concept/fact– a piece of information that can be used to
accomplish tasks– tagged at the step level
• KC Model:– also known as a cognitive model or skill model– a mapping between correct steps and knowledge
components
![Page 17: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/17.jpg)
17
Example
![Page 18: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/18.jpg)
Learning Curves
18
Visualizes changes in student performance over time
Time is represented on the x-axis as ‘opportunity’, or the # of times a student (or students) had an opportunity to demonstrate a KC
Hover the y-axis to change the type of Learning Curve.
Types include:• Error Rate• Assistance Score • Number of Incorrects• Number of Hints• Step Duration• Correct Step Duration• Error Step Duration
![Page 19: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/19.jpg)
Learning Curves: Drill Down
19
Click on a data point to view point information
Click on the number link to view details of a particular drill down information.
Details include:• Name• Value• Number of Observations
Four types of information for a data point: • KCs• Problems• Steps• Students
![Page 20: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/20.jpg)
Learning Curve: Latency Curves
20
For latency curves, a standard deviation cutoff of 2.5 is applied by default.
The number of included and dropped observations due to the cutoff is shown in the observation table.
Step Duration = the total length of time spent on a step. It is calculated by adding all of the durations for transactions that were attributed to a given step. Error Step Duration = step duration when first attempt is an errorCorrect Step Duration = step duration when the first attempt is correct
![Page 21: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/21.jpg)
Dataset Info: KC Models
21
Handy information displayed for each KC Model:
• Name• # of KCs in the model• Created By• Mapping Type• AIC & BIC Values
Toolbox allows youto export one or more KC models, work with them, then reimport into theDataset.
DataShop generates twoKC models for free: • Single-KC • Unique-stepThese provide upper and lower bounds for AIC/BIC.
Click to viewthe list of KCsfor this model.
![Page 22: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/22.jpg)
Dataset Info: Export a KC Model
22
Export multiple models at once.
Select the models you wishto export and click the“Export” button.
Model information as well asother useful information isprovided in a tab-delimitedText file.
Selecting the “export”option next to a KC Modelwill auto-select the modelfor you in the exporttoolbox.
![Page 23: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/23.jpg)
Dataset Info: Import a KC Model
23
When you are ready to import,upload your file to DataShop forverification.
Once verification is successful,click the “Import” button.
Your new or updated model willbe available shortly (dependingon the size of the dataset).
![Page 24: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/24.jpg)
24
Getting the KC Model Right!
The KC model drives instruction in adaptive learning– Problem and topic sequence– Instructional messages– Tracking student knowledge
![Page 25: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/25.jpg)
What makes a good Cognitive Model?
• A correct expert model is one that is consistent with student behavior.
• Predicts task difficulty • Predicts transfer between instruction and test
The model should fit the data!
![Page 26: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/26.jpg)
Good Cognitive Model => Good Learning Curve
• An empirical basis for determining when a cognitive model is good
• Accurate predictions of student task performance & learning transfer– Repeated practice on tasks involving the same skill
should reduce the error rate on those tasks=> A declining learning curve should emerge
![Page 27: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/27.jpg)
27
How do we make KC Models?
![Page 28: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/28.jpg)
28
Traditionally CTA has been used
But Cognitive Task Analysis has some issues…– Extremely human driven – It is highly subjective– Leading to differing results from different analysts
And these human discovered models are usually wrong!
![Page 29: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/29.jpg)
29
If Human centered CTA is not the answer
How should student models be designed?
They shouldn’t!
Student models should be discovered not designed!
![Page 30: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/30.jpg)
30
Solution – Use computers – Today we have lots of log data from tutors
– We can harness this data to validate and improve existing student models
![Page 31: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/31.jpg)
31
Human-Machine Student Model DiscoveryDataShop provides easy interface to add and modify
student models and ranks the models using AFM
![Page 32: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/32.jpg)
32
Human-Machine Student Model Discovery
3 strategies for discovering improvements to the student model
– Smooth learning curves
– No apparent learning
– Problems with unexpected error rates
![Page 33: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/33.jpg)
A good cognitive model produces a learning
curve
Without decomposition, using just a single “Geometry” skill,
Is this the correct or “best” cognitive model?
no smooth learning curve.
a smooth learning curve.
But with decomposition, 12 skills for area,
(Rise in error rate because poorer students get assigned more problems)
![Page 34: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/34.jpg)
Inspect curves for individual knowledge components (KCs)
Some do not =>Opportunity to improve model!
Many curves show a reasonable decline
![Page 35: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/35.jpg)
35
No apparent Learning
![Page 36: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/36.jpg)
36
Problems with Unexpected Error Rates
![Page 37: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/37.jpg)
37
These strategies suggest an improvement
– Hypothesized there were additional skills involved in some of the compose by addition problems
– A new student model (better BIC value) suggests the splitting the skill.
![Page 38: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/38.jpg)
38
Redesign based on Discovered Model
Our discovery suggested changes needed to be made to the tutor
– Resequencing – put problems requiring fewer skills first
– Knowledge Tracing – adding new skills– Creating new tasks – new problems– Changing instructional messages, feedback or
hints
![Page 39: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/39.jpg)
Discovering cognitive models from data
• Abstract from a computational symbolic cognitive model to a statistical cognitive model
• For each task label the knowledge components that are required:
Item | Skill Add Sub Mul
2*8 0 0 1
2*8 – 3 0 1 1
2*8 - 30 0 1 1
3+2*8 1 0 1
Original “Q matrix” Other possible “learning factors”Item | Skill Deal with
negativeOrder of Ops
…
2*8 0 0
2*8 – 3 0 0
2*8 - 30 1 0
3+2*8 0 1
![Page 40: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/40.jpg)
Can a data-driven process be automated & brought to scale?
Yes!
• Combine Cognitive Science, Psychometrics, Machine Learning …
• Collect a rich body of data• Develop new model discovery algorithms,
visualizations, & on-line collaboration support
![Page 41: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/41.jpg)
Discovery of new cognitive models: Strategy & Results
• “Mixed initiative” human & machine discovery– Visualizations to aid human discovery– AI search for statistically better models
• Better models discovered in Geometry, Statistics, English, Physics
Stamper, J., Koedinger, K.R. (2011) Human-machine Student Model Discovery and Improvement Using DataShop.
![Page 42: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/42.jpg)
Logistic Regression Model of StudentPerformance & Learning
“Additive Factor Model” (AFM) (cf., Draney, Pirolli, Wilson, 1995)
• Evaluate with BIC, AIC, cross validation to reduce over-fit
![Page 43: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/43.jpg)
LFA –Model Search ProcessOriginalModel
BIC = 4328
4301 4312
4320
43204322
Split by Embed Split by Backward Split by Initial
43134322
4248
50+
4322 43244325
15 expansions later
Automates the process of hypothesizing alternative cognitive models & testing them against data
• Search algorithm guided by a heuristic: BIC
• Start from an existing cog model (Q matrix)
![Page 44: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/44.jpg)
Cognitive Model Leaderboard for Geometry Area Data Set
Some models are machine generated (based on human-generated learning factors)
Some models are human generated
![Page 45: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/45.jpg)
Crowdsourcing EDM
• Allowing human generated models to work with machine generated models is a form of crowd sourcing.
• Another way is through competitions.
![Page 46: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/46.jpg)
46
2010 KDD Cup Competition
9/12/2012PSLC Corporate Partner Meeting 2012
Knowledge Discovery and Data Mining (KDD) is the most prestigious conference in the data mining and machine learning fields
KDD Cup is the premier data mining challenge 2010 KDD Cup called “Educational Data
Mining Challenge” Ran from April 2010 through June 2010
![Page 47: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/47.jpg)
47
KDD Cup Competition
9/12/2012PSLC Corporate Partner Meeting 2012
Competition goal is to predict student responses given tutor data provided by Carnegie Learning
Dataset Students Steps File sizeAlgebra I 2008-2009 3,310 9,426,966 3 GBBridge to Algebra 2008-2009
6,043 20,768,884 5.43 GB
![Page 48: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/48.jpg)
48
KDD Cup Competition
9/12/2012PSLC Corporate Partner Meeting 2012
655 registered participants
130 participants who submitted predictions
3,400 submissions
![Page 49: DataShop: An Educational Data Mining Platform for the Learning Science Community](https://reader036.vdocuments.mx/reader036/viewer/2022062816/5681674a550346895ddbf9c1/html5/thumbnails/49.jpg)
49
KDD Cup Competition
9/12/2012PSLC Corporate Partner Meeting 2012
Advances in prediction and cognitive modeling
Excitement in the KDD Community The datasets are now in the “wild” and
showing up in non KDD conferences New competitions have been done and are
in the works