you’ve collected it … now what ? exploratory data analysis for not-very-big data

Post on 29-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Telling Stories with your data

Graphs, Tables and Basic, Basic Statistics with SAS Enterprise

Guide ®

You’ve collected it …NOW WHAT ?

Exploratory data analysis for

not-very-big data

Our sampleData from the pilot study for Spirit Lake: The Game, an educational game for students in grades four through six

Our Enterprise Guide Project

A first look at the data, Filter data setsTables of descriptive statistics

Our Enterprise Guide Project

A second look

Cross-tabulations, Graphics, Summary tables T-test

Let’s replace math class with a game ……. Incredibly, the schools went along with this!

Our sites Two intervention schools, six classrooms

Three fourth-grade classes

Fifteen students (five each) from three fifth-grade classes

One control group school with one fourth-grade and one fifth-grade

All on the same American Indian reservation

Exercise 1: ready

Figure 1.1

FILE> OPEN> DATA

Figure 1.2

Tools> options

Figure 1.3

Results General> RTF under Result Formats

Figure 1.4

Great. You have data and you are set to have pretty results.

What now?

Tasks> describe> characterize data

Figure 1.5

ALWAYS DO THIS !!

Just click through the windows and accept all of the defaults.

Especially this one (you’ll find out why shortly)

Real grown-up statisticians know ….

Look for errors in data

Look for missing data

There is no teacher named “test”

This outlier with the perfect score wasn’t a student, it was the answer key!

Filtering data

Exercise 2: set

Figure 2.1

OR……

Tasks > data > filter and sort

Figure 2.2

Double arrows select all of the variables.

Figure 2.3

Filter out records

Click on the …

Exercise 3: go!

Is 64% missing okay?

Exercise 3: go!

Variable Label N NMiss Total Min MeanMedian Max StdMean

Gender Gender 67 21 99 1 1.4776 1.0 2 0.06148

Variable Label N NMiss Total Min MeanMedia

nMa

x StdMean

 Grade grade 83 5 362 4 4.314 4.0 5 0.05305

Table 3.1

Table 3.2

DATA

We naively believed that there would be little attrition over a 10-

week period

"He's just gone".

Lesson learned Real data analysis is

more than pointing and clicking, it’s learning to ask questions from the results you get and learn from the answers to those questions

Data visualization at its most basic

Figure 3.1 Figure 3.2

Take Note!

File > new note

Figure 3.3

Figure 3.4

Right-click on note to link to procedure that inspired your note

Figure 3.4

Exercise 4: table analysis- looking a little closer

TASKS> DESCRIBE> TABLE ANALYSIS

Figure 4.1

CHANGING THE DATA SET

PULL DOWN TO SELECT DATA SET

Drag the variables, School and Grade, under the Table variables heading.

Drag “School” to top of the table as your column variable and “grade” to the side as your row variable.

Table of grade by School

grade(grade) School(School)FrequencyCol Pct

CONTROL EXPERIME Total4 15

55.5638

67.8653

5 1244.44

1832.14

30

Total 27 56 83

Frequency Missing = 5

What of the students with complete data

Is there still a disproportion by grade?

Right-click on the Table Analysis icon

Drag the variable “missdata” under Group Analysis By

Table of grade by School

grade(grade) School(School)FrequencyCol Pct

CONTROL EXPERIME Total4 4

80.0010

62.5014

5 120.00

637.50

7

Total 5 16 21

Frequency Missing = 5

TOO MUCH MISSING DATA MAKES PEOPLE GRUMPY

WHO ARE THESE PEOPLE MISSING DATA?

Figure 4.9

What Happened? To insure anonymity and protect student data, we

never had the students' names – the teachers had a roster of students matched with username. The experimental school was able to fill in the blanks for all students missing grade data, the control group school did not.

What We Did About It This isn't something we just waved our hands about

and moved on. It seriously concerned us. The degree of missing data overall disturbed us, as did the fact that we didn't seem to be able to get follow-up data. It concerned us enough that we hired a data coordinator on each reservation where we are testing in the upcoming year and will be analyzing the pretest data as it comes in and trying to update any missing data we can in the same week it is collected. This is the purpose of pilot studies, to find problems and fix them.

What is an item analysis and how is it helpful?

Two types of item analysis.

Examination of the distribution of responses which choice the student selected “a”,”b”, “c” or “d” as the correct answer. (Look at CHARACTERIZE DATA plots)

Does one of the distractors gets selected more often than the correct answer?

Item difficulty analysis

Examine what percentage of students answered each item correctly

A basic means of establishing test validity. One would expect that items at the second-grade level would have the lowest level of difficulty, items at fifth-grade level the highest difficulty, and be answered correctly by the fewest students.

If items are scored 0 = wrong, 1 = right, can use the means to see what percentage of students answered correctly.

EXERCISE 5: ITEM ANALYSIS TWO WAYS WITH THE CHARACTERIZE DATA

TASK

Item Difficulty Analysis in Six (or fewer) Easy Steps

 

1. Click on the univariate statistics data set produced by the CHARACTERIZE DATA TASK to select it

2. From the top menu, select TASKS > DESCRIBE > LIST DATA

3. From the Variables to assign pane, select the ones you want in your report, in this case Variable, N, NMISS, Mean, Min and Max.

EDIT

4. Select the records you want in your report.

5. Format the columns in the report

Figure 5.4

I click the CHANGE button next to format.

Then I click on Numeric for the format category. >

6. Next, click Options and un-check the box next to Row Numbers

Figure 5.7

Item N NMiss Mean Min Max

postsc2 68 20 0.88 0 1

postsc3 68 20 0.87 0 1

postsc4 68 20 0.81 0 1

sc3 82 6 0.8 0 1

sc2 82 6 0.78 0 1

sc4 82 6 0.78 0 1

postsc18 68 20 0.74 0 1

postsc8 68 20 0.68 0 1

postsc1 68 20 0.65 0 1

COPIED AND PASTED INTO EXCEL & SORTED

What does this table tell you?

What data tells us Since items are in order of grade level, first few

items should be answered correctly by the most people.

Expected pattern holds for post-test and pre-test, although it's not perfect

A higher percentage of students answered the post-test questions correctly than the pretest, as we would hope

Unexpectedly, items 5 and 6 have some of the lowest percentage correct of any item

EXERCISE 6: GRAPHING ITEM DIFFICULTY

(If you forget the Sum variable, you'll just get a chart that shows each item occurred in the data set once. Not very helpful.)

SELECT ONLY THE ITEMS OF INTEREST

Click Layout tab under Appearance and select Descending Bar Height

Figure 6.5

Item Difficulty AnalysisPost-Test

Compare with pre-test data

Right-click on the bar chart icon in process flow and select Modify.

Three modifications are needed: Click on the EDIT button and change the

filter. First click the X at the end of the row to delete the current filter. Then select “item” and “in a list” and the variables sc1 – sc24 for your items to chart.

Change the chart title.

X axis must be the same, from 0 to 1 so you can compare the two charts

How to set the X axis. Click on Major Ticks and then under Major Horizontal

Ticks click Specify. In the input box on the top right, enter each of the major ticks you want (from 0, .2 to 1) and click ADD.

Pre-test

Post-test

Our story so far …

We've found one error, that the answer key was left in as a record.

We've seen that we have an issue with missing data that needs to be fixed

It appears that the test is reasonably reliable, although, of course, more sophisticated statistics are needed to examine that issue further.

We've also realized that we can't really compare the pretest and post-test since we have a large proportion of missing subjects. We need to match pre- and post-test scores.

EXERCISE 7: GETTING DOWN TO BUSINESS WITH T-TESTS

TASKS > ANOVA > T-test

Control GroupResults

Table 7.1

Experimental Group

Results

The moral of the story Experimental group improved more

Students who played the game less improved less

Differences were not explained by outliers in either group

There was a definite shift in the distribution of the experimental group only

CONCLUSION Exploratory data analysis is a key first step

A few simple tasks in SAS Enterprise Guide can go a surprisingly long way

While exploring your data, it’s crucial to note the concerns raised, follow-up questions and policy recommendations that come out of your analysis

Thank YouU.S. Department of

Agriculture

Small Business Innovation Research

Rural Economic Development

Teachers and students of the

Spirit Lake Dakota Nation

ContactThe Julia Group/ 7 Generation Games

2111 7th St. #8

Santa Monica, CA 90405

(310) 717-9089

annmaria@thejuliagroup.com

http://www.thejuliagroup.comwww.7generationgames.com

top related