warm-up: believe it or not?
DESCRIPTION
Warm-Up: Believe It or Not?. A student claims that they have flipped a fair coin 200 times and only had 84 times the heads side of the coin showed up. Do you believe this student or not, discuss with your neighbor why or why not. - PowerPoint PPT PresentationTRANSCRIPT
Warm-Up: Believe It or Not?
A student claims that they have flipped a fair coin 200 times and only had 84 times the heads side of the coin showed up.
Do you believe this student or not, discuss with your neighbor why or why not.
Chapters 2 - 4
The Role of Statistics&
Graphical Methods for Describing Data
In order to learn Statistics, we need to learn the language of statistics first.
We’ll be learning a lot of new vocabulary today – through examples and activities
Statistics the science of collecting, analyzing, and drawing conclusions from data
Suppose we wanted to know something about the GPAs of high school graduates in the nation this year.
We could collect data from all high schools in the nation.
Suppose we wanted to know something about the GPAs of high school graduates in the nation this year.
We could collect data from all high schools in the nation.
What term would be used to describe “all
high school graduates”?
Population The entire group of individuals or
objects we want information about A census attempts to contact every
individual in the entire population
What do you call it when you collect data about the entire population?
Suppose we wanted to know something about the GPAs of high school graduates in the nation this year.
We could collect data from all high schools in the nation.
Suppose we wanted to know something about the GPAs of high school graduates in the nation this year.
We could collect data from all high schools in the nation.If we didn’t perform a
census, what would we do?
Why might we not want to use a census here?
Sample A part of the population that we
actually examine in order to gather information
What would a sample of all high school graduates across the nation look like?
A list created by randomly selecting the GPAs of all high school graduates from each state.
Suppose we wanted to know something about the GPAs of high school graduates in the nation this year.
We could collect data from a sample of high schools in the nation.
Suppose we wanted to know something about the GPAs of high school graduates in the nation this year.
We could collect data from a sample of high schools in the nation.
Once we have collected the data, what would we do with it?
Descriptive Statistics
the methods of organizing & summarizing data
• Create a graph
If the sample of high school GPAs contained 10,000 numbers, how could the data be described or summarized?
• State the range of GPAs• Calculate the average GPA
Suppose we wanted to know something about the GPAs of high school graduates in the nation this year.
We could collect data from a sample of high schools in the nation.
Could we use the data from this sample to answer our question?
Inferential statistics involves making generalizations
from a sample to a population
Be sure to sample from the population of interest!!
Inferential statistics involves making generalizations
from a sample to a populationBased on the sample, if the average GPA for high school graduates was 3.0, what generalization could be made?
The average national GPA for this year’s high school graduate is approximately 3.0.
Could someone claim that the average GPA for FISD graduates is 3.0?
No. Generalizations based on the results of a sample can only be made back to the population from which the sample came from.
Variable any characteristic whose value may change from one individual or object to another
Variable any characteristic whose value may change from one individual or object to another
Is this a variable . . .The number of wrecks per week
at the intersection outside?
Dataobservations on a single variable or simultaneously on two or more variables
Dataobservations on a single variable or simultaneously on two or more variables
For this variable . . .The number of wrecks per week at the intersection outside . . . what could the
observations be?
VariabilityThe range of possible data values
The goal of statistics is to understand the nature of variability in a population
VariabilityThe range of possible data values
The goal of statistics is to understand the nature of variability in a population
Can you think of a population that has no variability?
Populations with no variability are rare and boring (of little statistical interest).
The two histograms below display the distribution of heights of gymnasts and the distribution of heights of female basketball players. Which is which? Why?
Heights – Figure A
Heights – Figure B
Variability
Suppose you found a pair of size 6 shoes left outside the locker room. Which team would you go to first to find the owner of the shoes? Why?
Suppose a tall woman (5 ft 11 in) you see is looking for her sister who is practicing in the gym. To which team would you send her? Why?
Suppose you found a pair of size 6 shoes left outside the locker room. Which team would you go to first to find the owner of the shoes? Why?
Suppose a tall woman (5 ft 11 in) you see is looking for her sister who is practicing in the gym. To which team would you send her? Why?
What aspects of the graphs helped you answer these questions?
Types of variables
Categorical variables (qualitative) Variables where the possible
values are set of categories
Numerical variables or quantitative Variables where the values are
numbers (are numerical) (makes sense to average these
values) two types - discrete & continuous
Numerical: Discrete Values are isolated points on
a number line usually counts of items
Numerical: Continuous Set of possible values form an
entire interval on the number line
usually measurements of something
Classifying variables by the number of variables in a data set
Suppose that the PE coach records the height of each student in his class.
Univariate - data that describes a single characteristic of the population
This is an example of a univariate data
Classifying variables by the number of variables in a data set
Suppose that the PE coach records the height and weight of each student in his class.
Bivariate - data that describes two characteristics of the population
This is an example of a bivariate data
Classifying variables by the number
of variables in a data setSuppose that the PE coach records the height, weight, number of sit-ups, and number of push-ups for each student in his class.
Multivariate - data that describes more than two characteristics (beyond the scope of this course)
This is an example of a multivariate data
Identify the following variables:1. the appraised value of homes in Faraway
2. the color of cars in the teacher’s lot
3. the number of calculators owned by students at your school
4. the zip code of an individual
5. the amount of time it takes students to drive to school
Continuous numerical
Discrete numerical
Continuous numerical
Categorical
Categorical
Warm-Up: Classifying variables
Write an example of a variable on the index card provided (try to come up with something we have not discussed in class already). Please include your name.
When done, fold your index card in half and place in the bowl in the back of the room.
We will classify these before completing notes on display types.
Graphs for categorical data
Bar Graph Used for categorical data Bars do not touch Categorical variable is
typically on the horizontal axis
Best used to describe or comment on which occurred the most often or least often
May make a double bar graph or segmented bar graph for bivariate categorical data sets
Comparative Bar Charts
Use relative frequency If observations are the same for all groups (50
boys and 50 girls), you could use the frequency Vertical scale the same
Pie Chart (circle graph) Used for categorical data To make:
Proportion X 360° Using a protractor, mark
off each part Best used to describe or
comment on which occurred the most often or least often
Using class survey data, make bar graphs for:
birth month
gender & handedness
Graphs for numerical data
Dotplot Used with numerical data (either discrete or
continuous) Made by putting dots (or X’s) on a number line Can make comparative dotplots by using the
same axis for multiple groups
Dotplot
To compare the weights of the males and females we put the dotplots on top of each other, using the same scales.
Using class survey data make dot plots of:
# AP classes
# siblings
Types (shapes)of Distributions
1) Symmetrical refers to data in which both sides
are (more or less) the same when the graph is folded vertically down the middle
bell-shaped is a special typehas a center mound with two
sloping tails
2) Uniform refers to data in which every
class has equal or approximately equal frequency
3) Skewed (left or right) refers to data in which one
side (tail) is longer than the other side
the direction of skewness is on the side of the longer tail
4) Bimodal (multi-modal) refers to data in which two
(or more) classes have the largest frequency & are separated by at least one other class
Warm-Up: Example 1 (From Your Notes) Looking at Example 1 (about sports-related
injuries), complete the columns titled “Tally” and “Frequency”.
Graphical Methods for
Describing Data
Frequency DistributionsThe relative frequency for a particular
category is the fraction or proportion of the time that the category appears in the data set. It is calculated as
frequencyrelative frequency number of observations in the data set
When the table includes relative frequencies, it is sometimes referred to as a relative frequency distribution.
Category Tally Frequency Relative freq. Cum rel. freq
Sprain
Contusion
Fracture
Strain
Laceration
Chronic
Dislocation
Concussion
Dental
13
12
14
6
3
2
3
2
1
13/56.2321
12/56.2143
.25
.1071
.0534
.0357
.0534
.0357
.0179
Example1
.2321
.4464
.6964
.8035
.8569
.8926
.946
.9817
1.00
Bar Chart – Example (frequency)
02468
10121416
Injury
Freq
uenc
y
Bar Chart – (Relative Frequency)
0
0.05
0.1
0.15
0.2
0.25
0.3
Injury
Rela
tive
Freq
uenc
y
Pie ChartRelative freq.
23%
21%
25%
11%
5%
4%
5%
4%
2%
Sprain
Contusion
Fracture
Strain
Laceration
Chronic
Dislocation
Concussion
Dental
Class data: fastest speed you have ever driven.
Take that speed and your gender and put it on a sticky note.
In a moment, you will place you sticky note next to the digit(s) that represent tenths on the
white board. For example, you drove 102 – put it next to 10
You drove 87 – put it by the 8
Fastest speed driven3456789
10111213
8│7 is 87 mph
Stemplots (stem & leaf plots)
Used with univariate, numerical data Must have key so that we know how to read
numbers
Stemplots (stem & leaf plots)
Used with univariate, numerical data Must have key so that we know how to read
numbers
Would a stemplot be a good graph for the number of pieces of gun chewed per day by
Stat students? Why or why not?
Stemplots (stem & leaf plots)
Used with univariate, numerical data Must have key so that we know how to read
numbers Can split stems when you have long list of
leaves Can have a comparative stemplot with two
groups
Stemplots (stem & leaf plots)
Used with univariate, numerical data Must have key so that we know how to read
numbers Can split stems when you have long list of
leaves Can have a comparative stemplot with two
groups
Would a stemplot be a good graph for the number of pairs of shoes owned by AP Stat
students? Why or why not?
Example 2:
The following data are price per ounce for various brands of dandruff shampoo at a local grocery store.
0.32 0.21 0.29 0.54 0.17 0.28 0.36 0.23
Can you make a stemplot with this data?
Do examples on separate note paper
Example 3: Tobacco use in G-rated Movies
Total tobacco exposure time (in seconds) for Disney movies:223 176 548 37 158 51 299 37 11 165 74 9 2 6 23 206 9
Total tobacco exposure time (in seconds) for other studios’ movies:205 162 6 1 117 5 91 155 24 55 17
Make a comparative stemplot. Greed
How to describe a graph
DotplotsStem & leaf plots
HistogramsBoxplots
1. Centerdiscuss where the middle of
the data fallsthree types of central
tendency–mean, median, & mode
2. Spreaddiscuss how spread out the data
isrefers to the variability of the
data–Range, standard deviation, IQR
3. Type of distributionrefers to the overall shape of
the distributionsymmetrical, uniform,
skewed, or bimodal
4. Unusual occurrencesoutliers - value that lies
away from the rest of the data
gapsclustersanything else unusual
5. In contextYou must write your answer
in reference to the specifics in the problem, using correct statistical vocabulary and using complete sentences!
Histograms
Used with numerical data Bars touch on histograms Two types
– Discrete• Bars are centered over discrete values
– Continuous• Bars cover a class (interval) of values
For comparative histograms – use two separate graphs with the same scale on the horizontal axis
Example 4Height (inches)
Tally Height (Inches)
Tally
62 1 69 1063 3 70 1064 6 71 965 7 72 666 9 73 367 12 74 168 13 75 1
Cumulative Relative Frequency Plot(Ogive)
. . . is used to answer questions about percentiles. Percentiles are the percent of individuals that are
at or below a certain value. Quartiles are located every 25% of the data. The
first quartile (Q1) is the 25th percentile, while the third quartile (Q3) is the 75th percentile. What is the special name for Q2?
Interquartile Range (IQR) is the range of the middle half (50%) of the data.
IQR = Q3 – Q1
Ex. 4
Example 5 Notes
Relative Frequencies, Cumulative Frequencies, & Ogives
Review: Histograms Patterns
Center: for now, the value that divides the observations roughly in half
Spread (variability): the extent of the data from smallest to largest value
Shape: overall appearance of distribution Outlier (Unusual): an individual
observation that falls outside the overall pattern
Let’s practice….
Can you describe the shape of the following distributions?
symmetricalFr
eque
ncy
slightly skewed left (negative skew)
Freq
uenc
y
strongly skewed right (positive skew)
Freq
uenc
y
bimodalFr
eque
ncy
skewed left with outlierFr
eque
ncy
bimodalFr
eque
ncy
strongly skewed leftFr
eque
ncy
Guess Age
Guess the age of the person in the picture (do not discuss this with your partner, just write down your guess.)
After everyone has guessed, I will give you the actual ages.
Graph a scatter plot of Guess Age vs. Actual Age (Be sure to label the axes.)
A
B
C
D
E
F and G
COMPARE
A 27 B 38 C 8 D 115 E 22 F she’s 80, G he’s 90 H 19
Generate a scatterplot
What would you expect from these scatterplots?– If your guesses were accurate?– If you under estimated?– Over estimated?
Scatterplots
A scatterplot shows the relationship between two quantitative variables measured on the same individuals.
The explanatory variable, if there is one, is graphed on the x-axis.
Scatterplots reveal the direction, form, and strength.
Patterns
Direction: variables are either positively associated or negatively associated
Form: linear is preferred, but curves and clusters are significant
Strength: determined by how close the points in the scatterplot are linear
Note: A strong association does NOT indicate cause and effect!
Scatterplot
Plot the data.
Describe the relationship between the SAT Math and SAT Verbal scores
Math
Verbal
690 510720 610590 550760 690660 700660 630650 700710 730710 540800 720620 780
Time Plot (aka time-series plot)
This a plot of each observation against the time at which it was measured– Stock prices, sales figures,
other socio-economic data– Invaluable for identifying
trends– Y-variable, x-time when
observation made– Used to plot trends or
cycles
Life Expectancy
60
62
64
66
68
70
72
74
76
78
80
1940 1950 1960 1970 1980 1990
Life Expectancy
0
10
20
30
40
50
60
70
80
90
100
1940 1950 1960 1970 1980 1990