analysing and understanding learning assessment for ...€¦ · theory. we can do statistical...
TRANSCRIPT
Australian Council for Educational Research
Analysing and Understanding Learning Assessment for Evidence-based Policy
Making
Introduction to statisticsDr Alvin Vista, ACER
Bangkok, 14-18, Sept. 2015
Structure of workshop
Lecture/presentation – focus on concepts, brief reviewPractical exercises – using the most common and accessible software – Excel; Hands-on, to maximise transfer of knowledge and develop skillDiscussion and interpretation of sample studiesCollaborative setting
• If you don’t know how to do something, seek help.• If you know how to do something, provide help.• If you’re not sure, interact.
Drawing knowledge from reality
Measurement Theory Data Statistical
Theory
Statistics allows us to draw knowledge or conclusions from the data.Measurement theory allows us to draw meaningful data from reality.
What is Measurement?
A formal definition:‘Measurement may be regarded as the
construction of homomorphisms (scales) fromempirical relational structures of interest into
numerical relational structures that are useful.’(Krantz et al., 1971, p.9)
In other words:Measurement is a process where a variable (or construct) can be converted into a number in a
consistent manner.
Data and Models
DATAObservations
MeasurementsSensory perceptions
MODELSTheories
InterpretationsGeneralisations
Data and Models
DATAWhat you see
(observed)
MODELSWhat Google Maps says
Data and ModelsIf there is a mismatch between data and model, which is more likely to be wrong?
Data cannot be changed but the methods to collect data can be improved to increase the quality of data.
Drawing knowledge from reality
Measurement Theory Data Statistical
Theory
We can do statistical analysis ONLY AFTER we’re confident that our data is reliable
Better data = better fitting models
Better fitting models = better understanding of reality
Fundamentals: Statistics
• Statistics is the study of data.• It concerns with the:
– Collection– Analysis– Presentation– Interpretation
of data.
Data in education research
• In educational context – records are usually students or schools or
parents– variables are usually
• responses of the students to the test itemsor• responses of students or school principals or
parents to the questionnaire items
Data: values for variables & records
• In the educational data – responses to a particular item from all
respondents form the values for the corresponding variable
• a column of values in our imaginable table– responses from a particular respondent to
all items form the values for the corresponding record
• a row of values in our imaginable table
Levels of measurement• Nominal: Denote a category; statistics
include counts such as mode and frequency distributions
• Ordinal: Rank order is described but successive categories do not denote equal differences of the measured attribute; statistics include median
• Interval: Where the measurement is presumed to denote equal intervals between scores. Both the base point and unit of measurement are arbitrary
• Ratio: Note that ratio scales have a natural base value that cannot be changed (i.e., a zero in one unit means the same in all other units). Only the unit of measurement is arbitrary.
Non-metric --categorical measures which describe differences in type or kind; arithmetical operations are not applicable
Metric -- continuous measures which reflect differences in amount or degree
In a nutshell
Level of measurement has direct implications for how relationships within and between variables can be contained and identified
Levels of measurement and measures of distribution characteristics
Level Central tendency Spread
Nominal Mode Percent distribution
Ordinal MedianMode
Minimum/MaximumRangePercentilesPercent distribution
Interval/Ratio MeanMedianMode
VarianceStandard deviationMinimum/MaximumRangePercentilesPercent distribution
Measures of central tendency
Level Definition Example
Mode The attribute of a variable that occurs most often in the data set
Variable = NationalityMode = Indonesian
Median The value of the middle case when the cases have been placed in order or in line from low to high
Variable = Rank (1st, 2nd, 3rd, … 7th)Median = 5th
Mean The arithmetic mean or average. Computed as the sum of all the valid cases together and dividing by the number of valid cases.
Variable = AgeMean = 24.35
Levels of measurement determine the possible statistical analyses
Nominal– Cross-tabulations– Chi-square– Frequencies
7 64 7116 55 71
3 36 399 30 395 74 79
18 61 7913 71 8419 65 84
7 39 4610 36 4611 25 36
8 28 3617 19 36
8 28 3612 7 19
4 15 1912 2 14
3 11 1412 3 15
3 12 1599 340 43999 340 439
CountExpected CountCountExpected CountCountExpected CountCountExpected CountCountExpected CountCountExpected CountCountExpected CountCountExpected CountCountExpected CountCountExpected CountCountExpected Count
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
INCOME
Total
SINGLE MARRIEDMARITAL STATUS
Total
Levels of measurement determine the possible statistical analyses
Ordinal– Spearman
correlations– Non-parametric
analysesInterval and ratio
– Pearson correlations– Parametric analyses
Measures of central tendency
• Mode [=MODE(target range)]• Median [=MEDIAN(target range)]• Mean [=AVERAGE(target range)]
• Lets try that using the sample data!– TIMSS Country_X Grade 8 data.xlsx
Measures of spread
• Standard deviation [=STDEV(target range)]
• Min [=MIN(target range)]• Max [=MAX(target range)]• Percentiles [=PERCENTILE(target
range, kth percentile)] where k ranges from 0.00 to 1.00
Characteristics of a distributionSkewness: a measure of the asymmetry of a distribution.[=SKEW(target range)]
The normal distribution is symmetric and has a skewnessvalue of zero.
– Positive skewness: a long right tail. – Negative skewness: a long left tail.
Characteristics of a distributionKurtosis: A measure of the extent to which observations cluster around a central point. • For a normal distribution, the value of the
kurtosis statistic is zero. • [=KURT(target range)]
– Leptokurtic data values are more peaked (positive kurtosis)– Platykurtic data values are flatter and more dispersed along the X axis (negative kurtosis)
Measures of spread• Frequencies [=FREQUENCY(target range, groups)] where bin ranges
are groups that includes values less than and up to each bin value– Bins = 10, 20, 30, will result in 4 groups (bin +1)– Group 1= less than or equal to 10– Group 2= 11 to 20– Group 3= 21 to 30– Group 4= more than 30– Enter as array formula – Write the formula in the first cell of the output range, select output range equal to
number of groups, press F2, then CTRL+SHIFT+ENTER)• Percent distribution can be computed by dividing Frequencies with
total cases [=COUNTIF(target range, value)]
• Lets try that using the sample data!
Practical exercise!
• TIMSS Country_X grade 8.xlsx
• Complete the Descriptive statistics for Boys, Girls, and the whole sample
• Save your results as we will use them in later sessions.
Australian Council for Educational Research
Analysing and Understanding Learning Assessment for Evidence-based Policy
Making
Inferential statistics and Hypothesis testing
Bangkok, 14-18, Sept. 2015
Confidence intervalsStandard error of the mean The standard error is an indicator of how precise the statistic is, and how close it is ‘probabilistically’ to the parameter (e.g., the true mean). Confidence intervals are based on the SE
𝑆𝑆𝑆𝑆 �𝑋𝑋 = 𝑠𝑠𝑛𝑛
𝐶𝐶𝐶𝐶lower = �𝑋𝑋 − 𝑍𝑍(𝑆𝑆𝑆𝑆 �𝑋𝑋)𝐶𝐶𝐶𝐶upper = �𝑋𝑋 + 𝑍𝑍(𝑆𝑆𝑆𝑆 �𝑋𝑋)Z=1.96 corresponds to a 95% CI
Confidence intervals
Confidence intervals
Confidence intervals
• Lets try computing SEs and CIs with data!
• Means and SDs for boys and girls on Math achievement
• Standard error of the means• Confidence intervals
Inferential StatisticsEstimating population parameters Inferential statistics can show how closely the sample statistics approximate parameters of the overall population.
• The sample is randomly chosen and representative of the total population.
• The means we might obtain from an infinite number of samples form a normal distribution.
Source: Johnson, B. & Christensen, L. (2012). Educational Research:Quantitative, Qualitative, and Mixed Approaches. Thousand Oaks, CA: Sage.
Inferential Statistics
What can we say if we have a sample, and it’s confidence interval does not overlap with the confidence interval of another sample?
Inferential Statistics
Testing Hypotheses (1)
Research hypothesis vs. statistical hypothesis Statistical hypothesis testing: comparing the distribution of data collected by a researcher with an ideal, or hypothetical distribution
- significance level/alpha (α): e.g., .05, .01 must be set before testing!- “statistically significant” means there is sufficient evidence to reject the null hypothesis.- it does NOT mean that the alternative hypothesis is true.
Statistical hypothesis testing Testing Hypotheses (2)
Making errors in hypothesis testing
- Type I error: alpha error, or false positive
- Type II error: beta error, or false negative
Relationship between α and β
Statistical hypothesis testingOur conclusions
Not guilty Guilty
Correct conclusion Type I error
(false positive)
Type II error Correct (false negative) conclusion
Rea
lity
Gui
lty
N
ot g
uilty
Sometimes, false positive is worse“It’s better for ten guilty men to go free than for one innocent man to be executed” -- anonymous wise lawyer
Our conclusions
Not guilty Guilty
Correct conclusion Condemnedinnocent
Freed guilty Correct conclusion
Rea
lity
Gui
lty
N
ot g
uilty
In fire alarms, false positive is better
A false negative would REALLY suck.-- anonymous wise homeowner
Our conclusions
No fire Fire
Correct Unnecessaryconclusion panic
Burned Correct to death conclusion
Rea
lity
Fire
N
o fir
e
Statistical hypothesis testingTesting for significance
• Remember: statistical significance is NOT the same as practical significance!
• In inferential statistics, we are only concerned with statistical significance. Practical significance is a judgment call to be made by the researcher and audience.
Statistical hypothesis testingMaking errors in hypothesis testing
Increase the power of a statistical test• Use as large a sample size as is reasonably possible• Maximize the validity and reliability of your measures. • Use parametric rather than non parametric statistics
whenever possible.
• Whenever we test more than one statistical hypothesis, we increase the probability of making at least one Type 1 error.
• For multiple hypotheses, a correction (e.g., Bonferroni) needs to be applied to our statistical tests.
Statistical hypothesis testingSteps in Hypothesis Testing1. State the null and alternative hypotheses.2. Set the significance level before the research study.
(Most educational researches use .05 as the significance level. Note that the significance level is also called the alpha level or more simply, alpha.)
3. Obtain the probability value using a computer program such as SPSS .4. Compare the probability value to the significance level and make the statistical decision.
Rule 1:If probability value is less than alpha, reject the null hypothesis. Conclude that finding is significant.Rule 2: If probability value is greater than alpha, fail to reject the null hypothesis. Conclude that the finding is not significant.
5. lnterpret the results. That is, make a substantive real-world decision and determine practical significance.
Source: Johnson, B. & Christensen, L. (2012). Educational Research: Quantitative, Qualitative, and Mixed Approaches. Thousand Oaks, CA: Sage.
Statistical hypothesis testingExercise 1
Let’s compare two sets of data and test the hypothesis that their means are equal or different
Simple T-Test for independent means[T.TEST(data range 1, data range 2)]
Data: data check.xlsxSets to compare: Science scores for sets A vs B, and B vs D
Statistical hypothesis testingExercise 2
Let’s compare the science achievement of those parents completed only lower secondary (PHEL=4) vs those whose parents have a university degree (PHEL=1), and test the hypothesis that their means are equal or different
Simple T-Test for independent means[T.TEST(data range 1, data range 2)]
Data: data check.xlsxSets to compare: Science scores for sets PHEL=4, and PHEL=1