you’ve collected it … now what ? exploratory data analysis for not-very-big data
Post on 29-Dec-2015
216 Views
Preview:
TRANSCRIPT
Telling Stories with your data
Graphs, Tables and Basic, Basic Statistics with SAS Enterprise
Guide ®
You’ve collected it …NOW WHAT ?
Exploratory data analysis for
not-very-big data
Our sampleData from the pilot study for Spirit Lake: The Game, an educational game for students in grades four through six
Our Enterprise Guide Project
A first look at the data, Filter data setsTables of descriptive statistics
Our Enterprise Guide Project
A second look
Cross-tabulations, Graphics, Summary tables T-test
Let’s replace math class with a game ……. Incredibly, the schools went along with this!
Our sites Two intervention schools, six classrooms
Three fourth-grade classes
Fifteen students (five each) from three fifth-grade classes
One control group school with one fourth-grade and one fifth-grade
All on the same American Indian reservation
Exercise 1: ready
Figure 1.1
FILE> OPEN> DATA
Figure 1.2
Tools> options
Figure 1.3
Results General> RTF under Result Formats
Figure 1.4
Great. You have data and you are set to have pretty results.
What now?
Tasks> describe> characterize data
Figure 1.5
ALWAYS DO THIS !!
Just click through the windows and accept all of the defaults.
Especially this one (you’ll find out why shortly)
Real grown-up statisticians know ….
Look for errors in data
Look for missing data
There is no teacher named “test”
This outlier with the perfect score wasn’t a student, it was the answer key!
Filtering data
Exercise 2: set
Figure 2.1
OR……
Tasks > data > filter and sort
Figure 2.2
Double arrows select all of the variables.
Figure 2.3
Filter out records
Click on the …
Exercise 3: go!
Is 64% missing okay?
Exercise 3: go!
Variable Label N NMiss Total Min MeanMedian Max StdMean
Gender Gender 67 21 99 1 1.4776 1.0 2 0.06148
Variable Label N NMiss Total Min MeanMedia
nMa
x StdMean
Grade grade 83 5 362 4 4.314 4.0 5 0.05305
Table 3.1
Table 3.2
DATA
We naively believed that there would be little attrition over a 10-
week period
"He's just gone".
Lesson learned Real data analysis is
more than pointing and clicking, it’s learning to ask questions from the results you get and learn from the answers to those questions
Data visualization at its most basic
Figure 3.1 Figure 3.2
Take Note!
File > new note
Figure 3.3
Figure 3.4
Right-click on note to link to procedure that inspired your note
Figure 3.4
Exercise 4: table analysis- looking a little closer
TASKS> DESCRIBE> TABLE ANALYSIS
Figure 4.1
CHANGING THE DATA SET
PULL DOWN TO SELECT DATA SET
Drag the variables, School and Grade, under the Table variables heading.
Drag “School” to top of the table as your column variable and “grade” to the side as your row variable.
Table of grade by School
grade(grade) School(School)FrequencyCol Pct
CONTROL EXPERIME Total4 15
55.5638
67.8653
5 1244.44
1832.14
30
Total 27 56 83
Frequency Missing = 5
What of the students with complete data
Is there still a disproportion by grade?
Right-click on the Table Analysis icon
Drag the variable “missdata” under Group Analysis By
Table of grade by School
grade(grade) School(School)FrequencyCol Pct
CONTROL EXPERIME Total4 4
80.0010
62.5014
5 120.00
637.50
7
Total 5 16 21
Frequency Missing = 5
TOO MUCH MISSING DATA MAKES PEOPLE GRUMPY
WHO ARE THESE PEOPLE MISSING DATA?
Figure 4.9
What Happened? To insure anonymity and protect student data, we
never had the students' names – the teachers had a roster of students matched with username. The experimental school was able to fill in the blanks for all students missing grade data, the control group school did not.
What We Did About It This isn't something we just waved our hands about
and moved on. It seriously concerned us. The degree of missing data overall disturbed us, as did the fact that we didn't seem to be able to get follow-up data. It concerned us enough that we hired a data coordinator on each reservation where we are testing in the upcoming year and will be analyzing the pretest data as it comes in and trying to update any missing data we can in the same week it is collected. This is the purpose of pilot studies, to find problems and fix them.
What is an item analysis and how is it helpful?
Two types of item analysis.
Examination of the distribution of responses which choice the student selected “a”,”b”, “c” or “d” as the correct answer. (Look at CHARACTERIZE DATA plots)
Does one of the distractors gets selected more often than the correct answer?
Item difficulty analysis
Examine what percentage of students answered each item correctly
A basic means of establishing test validity. One would expect that items at the second-grade level would have the lowest level of difficulty, items at fifth-grade level the highest difficulty, and be answered correctly by the fewest students.
If items are scored 0 = wrong, 1 = right, can use the means to see what percentage of students answered correctly.
EXERCISE 5: ITEM ANALYSIS TWO WAYS WITH THE CHARACTERIZE DATA
TASK
Item Difficulty Analysis in Six (or fewer) Easy Steps
1. Click on the univariate statistics data set produced by the CHARACTERIZE DATA TASK to select it
2. From the top menu, select TASKS > DESCRIBE > LIST DATA
3. From the Variables to assign pane, select the ones you want in your report, in this case Variable, N, NMISS, Mean, Min and Max.
EDIT
4. Select the records you want in your report.
5. Format the columns in the report
Figure 5.4
I click the CHANGE button next to format.
Then I click on Numeric for the format category. >
6. Next, click Options and un-check the box next to Row Numbers
Figure 5.7
Item N NMiss Mean Min Max
postsc2 68 20 0.88 0 1
postsc3 68 20 0.87 0 1
postsc4 68 20 0.81 0 1
sc3 82 6 0.8 0 1
sc2 82 6 0.78 0 1
sc4 82 6 0.78 0 1
postsc18 68 20 0.74 0 1
postsc8 68 20 0.68 0 1
postsc1 68 20 0.65 0 1
COPIED AND PASTED INTO EXCEL & SORTED
What does this table tell you?
What data tells us Since items are in order of grade level, first few
items should be answered correctly by the most people.
Expected pattern holds for post-test and pre-test, although it's not perfect
A higher percentage of students answered the post-test questions correctly than the pretest, as we would hope
Unexpectedly, items 5 and 6 have some of the lowest percentage correct of any item
EXERCISE 6: GRAPHING ITEM DIFFICULTY
(If you forget the Sum variable, you'll just get a chart that shows each item occurred in the data set once. Not very helpful.)
SELECT ONLY THE ITEMS OF INTEREST
Click Layout tab under Appearance and select Descending Bar Height
Figure 6.5
Item Difficulty AnalysisPost-Test
Compare with pre-test data
Right-click on the bar chart icon in process flow and select Modify.
Three modifications are needed: Click on the EDIT button and change the
filter. First click the X at the end of the row to delete the current filter. Then select “item” and “in a list” and the variables sc1 – sc24 for your items to chart.
Change the chart title.
X axis must be the same, from 0 to 1 so you can compare the two charts
How to set the X axis. Click on Major Ticks and then under Major Horizontal
Ticks click Specify. In the input box on the top right, enter each of the major ticks you want (from 0, .2 to 1) and click ADD.
Pre-test
Post-test
Our story so far …
We've found one error, that the answer key was left in as a record.
We've seen that we have an issue with missing data that needs to be fixed
It appears that the test is reasonably reliable, although, of course, more sophisticated statistics are needed to examine that issue further.
We've also realized that we can't really compare the pretest and post-test since we have a large proportion of missing subjects. We need to match pre- and post-test scores.
EXERCISE 7: GETTING DOWN TO BUSINESS WITH T-TESTS
TASKS > ANOVA > T-test
Control GroupResults
Table 7.1
Experimental Group
Results
The moral of the story Experimental group improved more
Students who played the game less improved less
Differences were not explained by outliers in either group
There was a definite shift in the distribution of the experimental group only
CONCLUSION Exploratory data analysis is a key first step
A few simple tasks in SAS Enterprise Guide can go a surprisingly long way
While exploring your data, it’s crucial to note the concerns raised, follow-up questions and policy recommendations that come out of your analysis
Thank YouU.S. Department of
Agriculture
Small Business Innovation Research
Rural Economic Development
Teachers and students of the
Spirit Lake Dakota Nation
ContactThe Julia Group/ 7 Generation Games
2111 7th St. #8
Santa Monica, CA 90405
(310) 717-9089
annmaria@thejuliagroup.com
http://www.thejuliagroup.comwww.7generationgames.com
top related