descriptive statistics becoming familiar with the data
TRANSCRIPT
Descriptive StatisticsDescriptive Statistics becoming familiar with the becoming familiar with the datadata
The StrategiesThe Strategies
Initial ScreeningInitial Screening Levels of MeasurementLevels of Measurement Five Descriptive Questions Five Descriptive Questions Graphical Presentations Graphical Presentations Search for OutliersSearch for Outliers
Initial ScreeningInitial Screening
Missing valuesMissing values Defining labelsDefining labels Key punch errorsKey punch errors Valid values Valid values Understanding what you haveUnderstanding what you have Understanding the population, Understanding the population,
sampling frame, and samplesampling frame, and sample
Levels of MeasurementLevels of Measurement
NominalNominalOrdinalOrdinal IntervalIntervalRatioRatioDetermining what statistics are Determining what statistics are
appropriateappropriate
NominalNominal
Naming things.Naming things.
Creating groups that are Creating groups that are qualitatively different or qualitatively different or unique…unique…
But not necessarily But not necessarily quantitatively different.quantitatively different.
NominalNominal
Placing individuals or Placing individuals or objects into categories.objects into categories.
Making mutually Making mutually excusive categories.excusive categories.
Numbers assigned to Numbers assigned to categories are arbitrary.categories are arbitrary.
NominalNominal
Sample variables:Sample variables:– GenderGender– RaceRace– EthnicityEthnicity– Geographic locationGeographic location– Hair or eye colorHair or eye color
OrdinalOrdinal
Rank ordering things.Rank ordering things.
Creating groups or Creating groups or categories when only rank categories when only rank order is known.order is known.
Numbers imply order but not Numbers imply order but not exact quantity of anything.exact quantity of anything.
OrdinalOrdinal
The difference between The difference between individuals with adjacent individuals with adjacent ranks, on relevant ranks, on relevant quantitative variables, is quantitative variables, is not necessarily the same not necessarily the same across the distribution.across the distribution.
OrdinalOrdinal
Sample variables:Sample variables:– Class RankClass Rank– Place of finish in a race (1Place of finish in a race (1stst, 2, 2ndnd, ,
etc.)etc.)– Judges ratingsJudges ratings– Responses to Likert scale items Responses to Likert scale items
(for example – SD, D, N, A, SA) (for example – SD, D, N, A, SA)
IntervalInterval
Orders observations Orders observations according to the quantity of according to the quantity of some attribute.some attribute.
Arbitrary origin.Arbitrary origin. Equal intervals.Equal intervals. Equal differences expressed Equal differences expressed
as equal distances.as equal distances.
IntervalInterval
Sample variables:Sample variables:– Test ScoresTest Scores
•SATSAT•GREGRE• IQ testsIQ tests
– Temperature Temperature •CelsiusCelsius•FahrenheitFahrenheit
RatioRatio
Quantitative measurement.Quantitative measurement. Equal intervals.Equal intervals. True zero point.True zero point. Ratios between values are Ratios between values are
useful.useful.
RatioRatio
Sample variables:Sample variables:– Financial variablesFinancial variables– Finish times in a raceFinish times in a race– Number of units soldNumber of units sold– Test scores scaled as percent Test scores scaled as percent
correct or number correctcorrect or number correct
Levels of Measurement Levels of Measurement ReviewReview What level of measurement?What level of measurement?
– Today is a fall day.Today is a fall day.– Today is the third hottest day of Today is the third hottest day of
the month.the month.– The high today was 70The high today was 70o o
Fahrenheit.Fahrenheit.– The high today was 20The high today was 20oo Celsius. Celsius.– The high today was 294The high today was 294oo Kelvin. Kelvin.
Levels of Measurement Levels of Measurement ReviewReview What level of measurement?What level of measurement?
– Student #1256 is:Student #1256 is:– a malea male– from Lawrenceville, GA.from Lawrenceville, GA.– He came in third place in the race He came in third place in the race
today.today.– He scored 550 on the SAT verbal He scored 550 on the SAT verbal
section.section.– He has turned in 8 out of the 10 He has turned in 8 out of the 10
homework assignments.homework assignments.
Levels of Measurement Levels of Measurement ReviewReview What level of measurement?What level of measurement?
– Student #3654 is:Student #3654 is:– in the third reading group.in the third reading group.– Nominal?Nominal?– Ordinal?Ordinal?– Interval?Interval?– Ratio?Ratio?
Five Descriptive QuestionsFive Descriptive Questions
What is the middle of the set of What is the middle of the set of scores?scores?
How spread out are the scores?How spread out are the scores? Where do specific scores fall in the Where do specific scores fall in the
distribution of scores?distribution of scores? What is the shape of the distribution?What is the shape of the distribution? How do different variables relate to How do different variables relate to
each other?each other?
Five Descriptive QuestionsFive Descriptive Questions
MiddleMiddle SpreadSpread Rank or Relative PositionRank or Relative Position ShapeShape CorrelationCorrelation
Descriptive Statistics Answer Sheet
Descriptive Questions in Excel, SPSS, and TI-83
MiddleMiddle
MeanMeanMedianMedianModeMode
SpreadSpread
Standard DeviationStandard Deviation VarianceVariance RangeRange IQRIQR
Rank or Relative PositionRank or Relative Position
Five number summaryFive number summary Min, 25Min, 25thth, 50, 50thth, 75, 75thth, Max, Max Identifying specific values that Identifying specific values that
have interpretive meaninghave interpretive meaning Identifying where they fall in Identifying where they fall in
the set of scoresthe set of scoresBox plotsBox plotsOutliersOutliers
ShapeShape
Positive SkewnessPositive Skewness Negative SkewnessNegative Skewness NormalityNormality HistogramsHistograms
Shape - NormalityShape - Normality
Scanning
50.0
47.5
45.0
42.5
40.0
37.5
35.0
32.5
30.0
27.5
25.0
100
80
60
40
20
0
Std. Dev = 4.84
Mean = 38.0
N = 344.00
344N =
Scanning
60
50
40
30
20
184719125
23312240
Shape- Positive SkewnessShape- Positive Skewness
Total for IIP
50
40
30
20
10
0
Std. Dev = .56
Mean = 2.10
N = 344.00
344N =
Total for IIP
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
.5
1102733625610710429
Shape – Negative Shape – Negative SkewnessSkewness
PREACT
40
30
20
10
0
Std. Dev = .42
Mean = 3.32
N = 154.00
154N =
PREACT
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
13079
91119
11164118
CorrelationCorrelation
Direction of RelationshipsDirection of Relationships Positive or NegativePositive or Negative Magnitude of RelationshipsMagnitude of Relationships Weak , Moderate, Strong Weak , Moderate, Strong ScatterplotsScatterplots OutliersOutliers
OutliersOutliers
OutliersOutliers
Boxplot shows middle 50% of Boxplot shows middle 50% of scores as the box.scores as the box.
Q3 (75Q3 (75thth) – Q1 (25) – Q1 (25thth) = IQR) = IQRData outside 1.5 IQR rule are Data outside 1.5 IQR rule are
outliersoutliersQ1 – (1.5*IQR)Q1 – (1.5*IQR)Q3 + (1.5*IQR)Q3 + (1.5*IQR)
OutliersOutliers
344N =
BDI Total
50
40
30
20
10
0
-10
107321196125276851132930018336
22061
71
82120
OutliersOutliers
BDI Total
40.0
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
BDI TotalF
requ
ency
140
120
100
80
60
40
20
0
Std. Dev = 7.10
Mean = 7.1
N = 344.00
OutliersOutliers
Statistics
BDI Total344
0
7.12
5.00
0
7.101
50.426
0
40
2.00
5.00
10.00
Valid
Missing
N
Mean
Median
Mode
Std. Deviation
Variance
Minimum
Maximum
25
50
75
Percentiles
OutliersOutliers
If normality of the population can be If normality of the population can be assumed, other rules can be used.assumed, other rules can be used.
Mean +/- 2 SDs or Mean +/- 3 SDsMean +/- 2 SDs or Mean +/- 3 SDsEmpirical RuleEmpirical RuleApproximately 68% within +/- 1 SDApproximately 68% within +/- 1 SDApproximately 95% within +/- 2 SDApproximately 95% within +/- 2 SDApproximately 99% within +/- 3 SDApproximately 99% within +/- 3 SD
OutliersOutliers
You can also look at outliers in You can also look at outliers in the bivariate case.the bivariate case.
Examine the scatterplots for Examine the scatterplots for values out of the pattern.values out of the pattern.
OutliersOutliers