random raw data · includes precise details. on the other hand, boxplots can be especially useful...
TRANSCRIPT
Algebra 2: Statistics
1
Random Raw Data: Since the dawn of civilization of people had the urge to count things. In fact, the earliest form of
writing were invented for counting. As civilizations grew… so did the number of things needed to be
counted. This created a new challenge. Sometimes it’s impossible to count all the things you want to
know which is why, long ago, someone dreamed up the strategy of studying a SAMPLE to learn
something about an entire POPULATION.
There are a few facts to keep in mind before we actually try to start taking samples. First, it is
impossible to use a sample to achieved absolute certainty about a population. That is why statistics is
about making our best possible guess, and never about being certain. Second, if we’re stuck with a
single sample, we’d better be sure we collected it CAREFULLY! Because any mistakes we make when we
collect our sample can totally screw up what we conclude about eh larger population.
Perhaps the biggest challenge in collecting a sample is figuring out exactly what to include in it.
The goal is to avoid any BIAS in our sample that might lead us to MISCHARACTERIZE THE POPULATION.
Ideally, we’d like to gather a sample that accurately mirrors the population. To avoid bias, we ALWAYS
collect samples RANDOMLY.
Types of Statistical Studies When collecting data, the way you collect the data can be divided into 4 different types of
studies: Observational, Experimental, Simulation and survey.
Observational:
A research observes and measures characteristics of interest of part of a population, but does
not interfere or change existing conditions.
Example: A person sits on the side of a road counting vehicles running a red light at a busy
intersection
Experimental:
A treatment is applied to part of a population and responses are observed. Another part of
population my e used as a control group, in which no treatment is applied.
Example: A study was performed where diabetics took cinnamon extract daily while a control
group took none. After 4 days, the diabetics who took the cinnamon reduced their risk of heart disease
where the control group experienced no change.
Simulation:
The use of mathematical or physical model to reproduce the conditions of situation or process.
Collecting data often involves the use of computers. Simulations allow you to study situations that are
impractical or even dangerous to create in real life, and often they save time and money.
Algebra 2: Statistics
2
Example: The insurance institute uses crash test dummies to determine the physical damage
done to the human body due to side impact crashes in a Smart Car.
Survey:
An investigation of one or more characteristics of population. Most often, surveys are carried
out on people asking them questions. The most common types of surveys are done by interview, mail,
computer or telephone. In designing a survey, it is important to word the questions so that they do not
lead to biased results which will not be representative of population.
Example: A question flier was sent out to new physicians to determine whether the primary
reason for their career choice is financial stability.
Sorting the data: Categorical Data:
When we’re studying features that we can describe only with words or yes/no answer this is
considered categorical data. After we gather categorical data we can easily pile it or slice it to give us a
sense of proportions in our sample.
Numerical data:
When we’re studying features that we can compare using number, this is considered numerical
data. As we’ll see in part two, all these numbers make numerical data much more useful overall.
The crucial difference between the two types of data is that we can’t do mat on categorical data, but we
can do math on numerical data!
For better or worse most of our brains aren’t great processing large piles of raw numbers. So the first
thing we do after we’ve collected a big mess of numerical data is draw pictures with it.
The Histogram:
To draw a histogram of our sample, we start with a number line
Algebra 2: Statistics
3
Then we pile our data of top of it, piece by piece
The Box Plot:
Another useful way to visualize numerical data is with a BOX PLOT. To draw a Box plot of our sample we
start with the same number line
But in this case we cram the middle 50% of our sample value into one big box.
In general, we draw histograms when we want a complete portrait of our entire pile of data that
includes precise details. On the other hand, boxplots can be especially useful when we want an overview
of our data, or want to compare different samples or groups Boxplots can give us a quick sense of how
data clumps together and whether it trails off in one direction or another.
Then we
indicate the
minimum…
And the
maximum
individual
values with
these bars
… Middle
Algebra 2: Statistics
4
Analyzing Data: Analyzing data is like solving a mystery, our ultimate goal is to gather evidence from one random
sample, and just it to piece together a story about a population. When we start to investigate any pile of
data we always look at for primary characteristics.
a. Sample size: How much data is in there?
a. In general, a Larger sample size is better
b. Size of the sample directly related to the level of confidence we can have about a
population
c. Size of a sample is always limited by something
b. Shape: What does the pile look like?
a. Flat graph: all possible outcomes are equally likely
b. Normal distributive (bell shape) when something is causing it to clump around one
particular value
i. Z-SCORE
1. 𝑍 = 𝑥−𝜇
𝜎 Where x is a single data point, μ is the population mean, and σ
is the standard deviation.
ii. Here is how to interpret z-scores.
A z-score less than 0 represents a data point less than the mean.
A z-score greater than 0 represents a data point greater than the mean.
A z-score equal to 0 represents a data point equal to the mean.
A z-score equal to 1 represents a data point that is 1 standard deviation
greater than the mean; a z-score equal to 2, 2 standard deviations
greater than the mean; etc.
A z-score equal to -1 represents a data point that is 1 standard deviation
less than the mean; a z-score equal to -2, 2 standard deviations less
than the mean; etc.
If the number of data points in the set is large, about 68% of the
elements have a z-score between -1 and 1; about 95% have a z-score
between -2 and 2; and about 99% have a z-score between -3 and 3.
Algebra 2: Statistics
5
c. Skewed: something is causing it to trail off more than one direction then the other.
c. Location: Where is it exactly on the number line?
a. The measure of where the bulk of the data sits on the number line
b. Defining location with words can be tricky, so often we describe it with a single number:
The AVERAGE (AKA MEAN).
i. To calculate the average we simply add up all the data values, then divide by the
number of data values
Example 1:
You take the SAT and score 1100. The mean score for the SAT is 1026 and the standard
deviation is 209. How well did you score on the test compared to the average test taker?
Step 1: Write your X-value into the z-score equation. For this sample question the X-value is your
SAT score, 1100.
Step 2: Write the mean, μ, into the z-score equation.
Step 3: Write the standard deviation, σ into the z-score equation.
Step 4: Calculate the answer using a calculator:
(1100 – 1026) / 209 = .354. This means that your score was .354 standard deviation above the
mean.
Step 5: (Optional) Look up your z-value in the z-table to see what percentage of test-takers scored
below you. A z-score of .354 is .1368 + .5000* = .6368 or 63.68%.
Algebra 2: Statistics
6
ii. �̅� = 𝒙𝟏+𝒙𝟐+𝒙𝟑+⋯+𝒙𝒏
𝒏
c. Although the average is useful and precise, as a measure of location, it is not
perfect. Unfortunately, averages can be deceptive, for example, if a pile of data
is skewed, and average value can be seriously misleading, with skewed data, the
MEDIAN is often more revealing as a measure of location because it can give a
better sense of a “typical” value.
i. To find the Median, place the numbers in value order and find the
middle. ii. BUT, with an even amount of numbers things are slightly different. In that case
we find the middle pair of numbers, and then find the value that is half way
between them. This is easily done by adding them together and dividing by two.
Example 2:
The frequency table shows the number of job offers received by each student within two months of
graduating with a mathematics degree from a small College. What is the mean for the job offers per
student.
Job offers 0 1 2 3 4
Students 2 2 4 5 2
Mean: 𝑥 = 2 0 +2 1 +4 2 +5 3 +2 4
15 = 2.2
Mean of this data set is 2.2
** You multiply by the amount of times the number occurs, for example 2 students got 2 job offers so
that was a total of 4 job offers given.
Algebra 2: Statistics
7
d. Spread: where does the data start and end?
a. The measure of the width of a pile of data, but also a measure of variation.
b. The most common Measure of spread is STANDARD DEVIATION (s – sigma)
i. Our goal when we calculate Standard deviation is to get a sense of the distance
from the average value. Here’s how to do it (mostly in plain English :)
1. Calculate the distance between each measurement x and the sample
average𝑥 . We call this distance DEVIATION
2. Square each Deviation
3. Add up all the squared deviations
4. Divide the sum by n-1 (if we stop here we get what’s called the
VARIANCE)
5. Take the square root of the whole Shebang.
ii. 𝒔 = √ 𝒙𝟏−�̅� 𝟐+ 𝒙𝟐−�̅� 𝟐+…+ 𝒙𝒏−�̅� 𝟐
𝒏−𝟏
Example 3:
The frequency table shows the number of job offers received by each student within two months
of graduating with a mathematics degree from a small College. What is the mode for the job
offers per student.
Job offers 0 1 2 3 4
Students 2 2 4 5 2
Median: 0+0+1+1+2+2+2+2+3+3+3+3+3+4+4
n=16, and to find the mode we half that, so term 8 will be our mode.
Mode of the data will be 2
** List each value the number of times it occurs. For example there are 5 students with 3 job
offers, you would have 5 number 3s in the equation because 3 job offers occurs 5 times.
Algebra 2: Statistics
8
e. The last thing to consider when looking at data is the MODE
a. The mode is the number that occurs the most often in the data set.
b. There can be 2 modes (BIMODAL) or 3 modes (TRIMODAL).
Example 4:
The frequency table shows the number of job offers received by each student within two months
of graduating with a mathematics degree from a small College. What is the standard deviation
for the job offers per student.
Job offers 0 1 2 3 4
Students 2 2 4 5 2
Standard deviation: 𝑠 = √ 0−2.2 2+ 0−2.2 2+ 2−2.2 2+ 2−2.2 2+ 2−2.2 2+ 2−2.2 2+⋯+ 4−2.2 2+ 4−2.2 2
16−1
= 1.264
n=16
Standard deviation of the data will be 1.264
** List each value the number of times it occurs. For example there are 5 students with 3 job
offers, so you should have (3-2.2)2 five times.
Example 5:
The frequency table shows the number of job offers received by each student within two months
of graduating with a mathematics degree from a small College. What is the mode for the job
offers per student.
Job offers 0 1 2 3 4
Students 2 2 4 5 2
Mode: 0+0+1+1+2+2+2+2+3+3+3+3+3+4+4
Mode of the data will be 3 since it occurs the most often.
** List each value the number of times it occurs. For example there are 5 students with 3 job
offers, you would have 5 number 3s in the equation because 3 job offers occurs 5 times.
Algebra 2: Statistics
9
Using the TI calculator to find mean, median and mode. Step 1: Use STAT EDIT to enter the data in L1 (dependent values)
Step 2: In STAT CALC select 1-Var Stats.
Example 6:
The table displays the number of US hurricane strikes by decade from year 1851 to 2000.
What are the mean and standard deviation for the data set?
Decade 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Strikes 19 15 20 22 21 18 21 13 19 24 17 14 12 15 14
Step 1: Use STAT EDIT to enter the data in L1
Algebra 2: Statistics
10
Dependent data for this data set would be the strikes, input this into L1
STEP 2: Use STAT CALC Select 1-Var Stats
Hit enter till your screen looks like the one blow
ANS: The mean is 17.6, the standard deviation is 3.5
Sample
mean
Population Std
Dev
Sample Std Dev
Scroll down on
calculator while
still looking at 1-
VAR
Median
Population
size