fpp 3-6 exploratory data analysis: one variable. plan of attack distinguish different types of...
TRANSCRIPT
![Page 1: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/1.jpg)
FPP 3-6
Exploratory Data Analysis: One Variable
![Page 2: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/2.jpg)
Plan of attackDistinguish different types of variables
Summarize data numerically
Summarize data graphically
Use theoretical distributions to potentially learn more about a variable.
2
![Page 3: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/3.jpg)
The five steps of statistical analyses1. Form the question2. Collect data3. Model the observed data
1. We start with exploratory techniques.
4. Check the model for reasonableness5. Make and present conclusions
![Page 4: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/4.jpg)
Just to make sure we are on the same pageMore (or repeated) vocabulary
Individuals are the objects described by a set of dataexamples: employees, lab mice, states…
A variable is any characteristic of an individual that is of interest to the researcher. Takes on different values for different individualsexamples: age, salary, weight, location…
How is this different from a mathematical variable?
![Page 5: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/5.jpg)
Just to make sure we are on the same page #2Measurement The value of a variable
obtained and recorded on an individualExample: 145 recorded as a person’s
weight, 65 recorded as the height of a tree, etc.
Data is a set of measurements made on a group of individuals
The distribution of a variable tells us what values it takes and how often it takes these values
Possible values -> Chest Size 33-34 35-36 37-38 39-40 41-42 43-44 45-46 47-48How often each occur -> count 21 266 1169 2152 1592 462 71 5
Chest Sizes of 5,738 Militamen
![Page 6: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/6.jpg)
Two Types of Variablesa categorical/qualitative variable places an
individual into one of several groups or categoriesexamples:
Gender, Race, Job Type, Geographic location… JMP calls these variables nominal
a quantitative variable takes numerical values for which arithmetic operations such as adding and averaging make senseexamples:
Height, Age, Salary, Price, Cost…Can be further divided to ordinal and continuous
Why two types?Both require their own summaries (graphically and
numerically) and analysis.
I can’t emphasis enough the importance of identifying the type of variable being considered before proceeding with any type of statistical analysis
![Page 7: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/7.jpg)
Example
Age: quantitative Gender: categoricalRace: categoricalSalary: quantitativeJob type: categorical
Name Age Gender Race Salary Job TypeFleetwood, Delores 39 Female White 62,100 ManagementPerez, Juan 27 Male White 47,350 TechnicalWang, Lin 20 Female Asian 18,250 ClericalJohnson, LaVerne 48 Male Black 77,600 Management
![Page 8: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/8.jpg)
Variable types in JMPQualitative/categorical
JMP uses Nominal
QuantitativeDiscrete
JMP uses Ordinal
ContinuousJMP uses Continuous
![Page 9: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/9.jpg)
Exploratory data analysisStatistical tools that help examine data in
order to describe their main features
Basic strategyExamine variables one by one, then look at
the relationships among the different variables
Start with graphs, then add numerical summaries of specific aspects of the data
![Page 10: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/10.jpg)
Exploratory data analysis: One variableGraphical displays
Qualitative/categorical data: bar chart, pie chart, etc.Quantitative data: histogram, stem-leaf, boxplot, timeplot
etc.
Summary statisticsQualitative/categorical: contingency tablesQuantitative: mean, median, standard deviation, range etc.
Probability modelsQualitative: Binomial distribution(others we won’t cover in
this class)Quantitative: Normal curve (others we won’t cover in this
class)
![Page 11: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/11.jpg)
Example categorical/qualitative data
![Page 12: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/12.jpg)
Summary tablewe summarize categorical data using a table. Note
that percentages are often called Relative Frequencies.
Class Frequency Relative FrequencyHighest Degree Obtained Number of CEOs ProportionNone 1 0.04Bachelors 7 0.28Masters 11 0.44Doctorate / Law 6 0.24Totals 25 1.00
![Page 13: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/13.jpg)
Bar graphThe bar graph
quickly compares the degrees of the four groups
The heights of the four bars show the counts for the four degree categories
![Page 14: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/14.jpg)
Pie chart
A pie chart helps us see what part of the whole group forms
To make a pie chart, you must include all the categories that make up a whole
![Page 15: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/15.jpg)
Summary of categorical Summary of categorical variablesvariablesGraphically
Bar graphs, pie chartsBar graph nearly always preferable to a pie chart. It
is easier to compare bar heights compared to slices of a pie
Numerically: tables with total counts or percents
![Page 16: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/16.jpg)
Quantitative variablesGraphical summary
HistogramStemplotsTime plotsmore
Numerical sumaryMeanMedianQuartilesRangeStandard deviationmore
![Page 17: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/17.jpg)
Histograms The bins are:3.0 ≤ rate < 4.04.0 ≤ rate < 5.05.0 ≤ rate < 6.06.0 ≤ rate < 7.07.0 ≤ rate < 8.08.0 ≤ rate < 9.09.0 ≤ rate <
10.010.0 ≤ rate <
11.011.0 ≤ rate <
12.012.0 ≤ rate <
13.013.0 ≤ rate <
14.014.0 ≤ rate <
15.0
![Page 18: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/18.jpg)
Histograms The bins are:3.0 ≤ rate < 4.04.0 ≤ rate < 5.05.0 ≤ rate < 6.06.0 ≤ rate < 7.07.0 ≤ rate < 8.08.0 ≤ rate < 9.09.0 ≤ rate <
10.010.0 ≤ rate <
11.011.0 ≤ rate <
12.012.0 ≤ rate <
13.013.0 ≤ rate <
14.014.0 ≤ rate <
15.0
![Page 19: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/19.jpg)
Histograms
The bins are:2.0 ≤ rate < 4.04.0 ≤ rate < 6.06.0 ≤ rate < 8.08.0 ≤ rate <
10.010.0 ≤ rate <
12.012.0 ≤ rate <
14.014.0 ≤ rate <
16.016.0 ≤ rate <
18.0
![Page 20: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/20.jpg)
HistogramsWhere did the bins come from?
They were chosen rather arbitrarily
Does choosing other bins change the picture?Yes!! And sometimes dramatically
What do we do about this?Some pretty smart people have come up
with some “optimal” bin widths and we will rely on there suggestions
![Page 21: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/21.jpg)
HistogramThe purpose of a graph is to help us
understand the data
After you make a graph, always ask, “What do I see?”
Once you have displayed a distribution you can see the important features
![Page 22: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/22.jpg)
HistogramsWe will describe the features of the
distribution that the histogram is displaying with three characteristics
1.ShapeSymmetric, skewed right, skewed left, uni-
modal, multi-modal, bell shaped
2.CenterMean, median
3.Spread (outliers or not)Standard deviation, Inter-quartile range
![Page 23: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/23.jpg)
Body temperatures of 30 people
96.5 97 97.5 98 98.5 99 99.5 100
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
99.800
99.800
99.800
99.500
99.125
98.600
98.125
97.330
97.000
97.000
97.000
Quantiles
Mean
Std Dev
Std Err Mean
upper 95% Mean
lower 95% Mean
N
98.563333
0.7508539
0.1370865
98.843707
98.28296
30
Moments
Body Temp (F)
Distributions
![Page 24: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/24.jpg)
Incomes from 500 households in 2000 current population survey
50
100
150
200
Cou
nt A
xis
0 50000 150000 250000
100.0%
99.5%
97.5%
90.0%
75.0%
50.0%
25.0%
10.0%
2.5%
0.5%
0.0%
maximum
quartile
median
quartile
minimum
282577
255901
168707
101999
63135
33722
17292
7871
3773
0
0
Quantiles
Mean
Std Dev
Std Err Mean
upper 95% Mean
lower 95% Mean
N
46854.196
43094.6
1929.1792
50644.53
43063.863
499
Moments
household income
Distributions
![Page 25: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/25.jpg)
Histogram vs. Bar graphSpaces mean something in histograms but not in
bar graphsShape means nothing with bar graphsThe biggest difference is that they are displaying
fundamentally different types of variables
![Page 26: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/26.jpg)
Time PlotsMany variables are measured at intervals
over time
ExamplesClosing stock pricesNumber of hurricanesUnemployment rates
If interest is a variable is to see change over time use a time plot
![Page 27: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/27.jpg)
Time PlotsPatterns to look for
Patterns that repeat themselves at known regular intervals of time are called seasonal variation
A trend is a persistant, long-term rise or fall
![Page 28: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/28.jpg)
Time plots
number of hurricanes each year from 1970 - 1990
0
2
4
6
8
10
Hurricanes
1965 1970 1975 1980 1985 1990 1995
Year
![Page 29: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/29.jpg)
Numerical summaries of quantitative variablesWant a numerical summary for center and
spreadCenter
MeanMedianMode
SpreadRange Inter-quartile rangeStandard deviation
5 number summary is a popular collection of the followingmin, 1st quartile, median, 3rd quartile, max
![Page 30: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/30.jpg)
MeanTo find the mean of a set of
observations, add their values and divide by the number of observations
equation 1:
equation 2:
€
μ =x1 + x2 +K + xN
N
€
μ =1
Nx i
i=1
N
∑
![Page 31: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/31.jpg)
Mean exampleThe average age of 20 people in a room is
25. A 28 year old leaves while a 30 year old enters the room. Does the average age change?If so, what is the new average age?
![Page 32: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/32.jpg)
MedianThe median is the midpoint of a distribution
The number such that half the observations are smaller and the other half are larger
Also called the 50th percentile or 2nd quartileTo compute a median
Order observationsIf number of observations is odd the median
is the center observationIf number of observations is even the median
is the average of the two center observations
![Page 33: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/33.jpg)
Median exampleThe median age of 20 people in a room is
25. A 28 year old leaves while a 30 year old enters the room. Does the median age change?If so, what is the new median age?
The median age of 21 people in a room is 25. A 28 year old leaves while a 30 year old enters the room. Does the median age change?If so, what is the new median age?
![Page 34: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/34.jpg)
Mean vs MedianWhen histogram is symmetric mean and median
are similar
Mean and median are different when histogram is skewedSkewed to the right mean is larger than medianSkewed to the left mean is smaller than median
The business magazine Forbes estimates that the “average” household wealth of its readers is either about $800,000 or about $2.2 million, depending on which “average” it reports. Which of these numbers is the mean wealth and which is the median wealth? Why?
![Page 35: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/35.jpg)
Mean vs MedianSymmetric distribution
![Page 36: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/36.jpg)
Mean vs MedianRight skewed distribution
![Page 37: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/37.jpg)
Mean vs MedianLeft skewed distribution
![Page 38: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/38.jpg)
Extreme exampleIncome in small town of 6 people
$25,000 $27,000 $29,000 $35,000 $37,000 $38,000
Mean is $31,830 and median is $32,000Bill Gates moves to town
$25,000 $27,000 $29,000 $35,000 $37,000 $38,000 $40,000,000
Mean is $5,741,571 median is $35,000Mean is pulled by the outlier while the
median is not. The median is a better of measure of center for these data
![Page 39: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/39.jpg)
Is a central measure enough?A warm, stable climate greatly affects some
individual’s health. Atlanta and San Diego have about equal average temperatures (62o
vs. 64o). If a person’s health requires a stable climate, in which city would you recommend they live?
![Page 40: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/40.jpg)
Measures of spreadRange:
subtract the largest value form the smallestInter-quartile range:
subtract the 3rd quartile from the 1st quartile
Standard Deviation (SD):“average” distance from the mean
Which one should we use?
![Page 41: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/41.jpg)
Standard DeviationThe standard deviation looks at how far
observations are from their meanIt is the square root of the average squared
deviations from the meanCompute distance of each value from meanSquare each of these distancesTake the average of these squares and
square root
Often we will use SD to denote standard deviation
€
σ =1
N
⎛
⎝ ⎜
⎞
⎠ ⎟ x i −μ( )
2
i=1
n
∑
![Page 42: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/42.jpg)
Example
![Page 43: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/43.jpg)
Standard deviationOrder these
histograms by the SD of the numbers they portray. Go from smallest largest
What is a reasonable guess of the SD for each?
-15 -10 -5 0 5 10 15 20
-1 -0.5 0 .5 1 1.5 2 2.5
-30 -20 -10 0 10 20 30
![Page 44: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/44.jpg)
Histograms on same scale
-30 -20 -10 0 10 20 30
-30 -20 -10 0 10 20 30
-30 -20 -10 0 10 20 30
![Page 45: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/45.jpg)
Problem from text (p. 74, #2)Which of the following sets of numbers has
the smaller SD’ a) 50, 40, 60, 30, 70, 25, 75
b) 50, 40, 60, 30, 70, 25, 75, 50, 50, 50
Repeat for these two sets c) 50, 40, 60, 30, 70, 25, 75
d) 50, 40, 60, 30, 70, 25, 75, 99, 1
![Page 46: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/46.jpg)
More intuition behind the SDThis is a variance contest. You must give a list
of six numbers chosen from the whole numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with repeats allowed.
Give a list of six numbers with the largest standard deviation such a list described above can possibly have.
Give a list of six numbers with the smallest standard deviation such a list can possibly have.
![Page 47: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/47.jpg)
Properties of SDSD ≥ 0. (When is SD = 0)?
Has the same unit of measurement as the original observations
Inflated by outliers
![Page 48: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/48.jpg)
Mean and SDWhat happens to the mean if you add 5 to
every number in a list?What happens to the SD?
€
σ =1
N
⎛
⎝ ⎜
⎞
⎠ ⎟ x i −μ( )
2
i=1
n
∑€
μ =1
Nx i
i=1
N
∑
![Page 49: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/49.jpg)
Standard deviationSDs are like measurement units on a rulerAny quantitative variable can be converted
into “standardized” unitsThese are often called z-scores and are
denoted by the letter z
Important formula
ExampleACT versus SAT scoresWhich is more impressive
A 1340 on the SAT, or a 32 on the ACT?
€
z =value −mean
SD=value −μ
σ
![Page 50: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/50.jpg)
The normal curveWhen histogram looks like a bell-shaped curve, z-
scores are associated with percentages
The percentage of the data in between two different z-score values equals the area under the normal curve in between the two z-score values
A bit of notation here. N(μ, σ) is short hand for writing normal curve with
mean μ and standard deviation σ (get used to this notation as it will be used fairly regularly through out the course)
![Page 51: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/51.jpg)
Normal curves
![Page 52: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/52.jpg)
Normal curves
![Page 53: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/53.jpg)
Properties of normal curve In the Normal distribution with mean μ and
standard deviation σ:68% of the observations fall within 1 σ of μ95% of the observations fall within 2 σs of μ99.7% of the observations fall within 3 σs of μ
By remembering these numbers, you can think about Normal curves without constantly making detailed calculations
![Page 54: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/54.jpg)
Properties of normal curvesFor a N(0,1) the following holds
![Page 55: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/55.jpg)
IQA person is considered to have mental
retardation when
1.IQ is below 702.Significant limitations exist in two or more
adaptive skill areas3.Condition is present from childhood
What percentage of people have IQ that meet the first criterion of mental retardation
![Page 56: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/56.jpg)
IQA histogram of all people’s IQ scores has a
μ=100 and a σ=16How to get % of people with IQ < 70
![Page 57: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/57.jpg)
More IQ Reggie Jackson, one of the greatest baseball players ever, has
an IQ of 140. What percentage of people have bigger IQs than Reggie?
Marilyn vos Savant, self-proclaimed smartest person in the world, has a reported IQ of 205. What percentage of people have IQ scores smaller than Marilyn’s score?
Mensa is a society for “intelligent people.” To qualify for Mensa, one needs to be in at least the upper 2% of the population in IQ score. What is the score needed to qualify for Mensa?
![Page 58: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/58.jpg)
Checking if data follow normal curve
Look for symmetric histogram
A different method is a normal probability plot. When normal curve is a good fit, points fall on a nearly straight line
![Page 59: FPP 3-6 Exploratory Data Analysis: One Variable. Plan of attack Distinguish different types of variables Summarize data numerically Summarize data graphically](https://reader036.vdocuments.mx/reader036/viewer/2022062515/56649c575503460f948feb76/html5/thumbnails/59.jpg)
Measurement errorMeasurement error model
Measurement = truth + chance error
OutliersBias effects all measurements in the same
way
Measurement = truth + bias + chance error
Often we assume that the chance error follows a normal curve that is centered at 0