probability definition: randomness, chance, likelihood, proportion, percentage, odds. probability is...
TRANSCRIPT
Probability
Definition: randomness, chance, likelihood, proportion, percentage, odds.
Probability is the mathematical ideal.
Not sure what will happen in a single event but, in the long run, certain patterns emerge.
We use letters like X and Y to represent quantities.These will be called random variables.
Probability ModelList the outcomes for a given event (experiment or
question) and associated probabilities.
Example: pick a card out of a standard deck
S: sample space (contains all possible outcomes)
Event: single outcome or collection of outcomes
S: sample space contains
Event: pick a
pick a
Basic Rules1. Event A has probability P(A), which is
between 0 and 1 (inclusive).
2. Probability of entire sample space, P(S), is .
3. Addition: If two events are disjoint (nothing in common), then P(A or B) = .
4. Complement: P(not A) =
Probability Model for standard deck of cards
52 cards, 4 suits (Diamonds, Hearts, Clubs, Spades)Each suit as 13 cards: 2, 3, 4, 5, … , 9, 10, J, Q, K, A
P(picking any single card)=
A = event that a red 5 is picked
B = event that a club is picked
C = event that a face card (J, Q, K) is picked
P(A or B)=
P(not C) =
Discrete ModelIf sample space is finite, the probability model
is called discrete.
Roll 2 six-sided dice and record the sum.
List all outcomes and associated probabilities in a table.
)6sum(P
Sum 2 3 4 5 6 7 8 9 10 11 12
Prob. 1 36
1 18
1 12
1 9
5 36
1 6
5 36
1 9
1 12
1 18
1 36
)9sum(P
Continuous Model
If sample space contains a range of values, the probability model is called continuous.
Density curves record probability as the area under the
curve for a given range of outcomes.
So, total area under the curve will always equal 1.
Continuous Model – Example 1The uniform distribution for any real number, X,
from 3 to 7 looks like:
)1.5Xor 2.3X(P
)6X(P
3 5 7
41
Area 3 5 7
41
6
3 5 7
41
5.13.2
Continuous Model – Example 2The symmetric triangular distribution for any real
number, X, from 0 to 8 looks like:
)4X(P
4 8
Area4 8
)6X(P
4 8
More Probability
Use Venn diagrams to visualize probability rules.
If events are disjoint, don’t overlap circles.
Sample space, S: rectangle
Events (A, B, C, …): circles inside
Keep track of # of outcomes in each region of the rectangle.
Venn diagram - exampleExample: pick a card out of a standard deck
S: sample space contains 52 outcomes (52 cards)A = event that a red 5 is picked
B = event that a club is picked
C = event that a face card (J, Q, K) is picked
S
AB
C
General Addition RuleA = event that a red card is picked
B = event that a number card is picked
P(A or B)
SA B
General Addition: P(A or B) =
Conditional Probability RuleGiven a condition (you know something happened), how
does that change the chances of something else happening?
P(B|A)= probability of B given A
P(A)
B) andP(A A)|P(B
SA B
General Multiplication RuleRewrite
)A2nd andK 1st(P
P(A)
B) andP(A A)|P(B to get:
A)|P(BP(A) B) andP(A
Experiment: pick two cards out of a standard deck
)K2nd andK 1st(P
Independent EventsTwo (or more) events are independent if knowledge of
one event does not change the chances of the other.
Multiplication Rule for Independent Events:
row) ain s7' threeP(rolling
For a cholesterol-lowering drug, there is a 5% chance that a loss-of-sleep side effect will occur.
What are the chances that two people picked at random take the drug and experience sleep loss?
What are the chances that at least 1out of 3 loses sleep?
)loses 2nd and loses1st (P
)oneleast at (P
sleep) lose people P(three
The Normal Distribution
Curve will capture 100% of all observations. Hence, there will be a total area of 1 below it.
Then the area under the curve for a given range of values will represent the proportion (percent, fraction) of observations that fall in that range.
Use curves to describe overall pattern seen in a histogram.
The proportion of scores above 80 is roughly 26.8%.
The area under the density curve for scores above 80 is roughly 0.261 =26.1%.
Curves and proportions%
v
%
v
20 40 60 80 100
20 40 60 80 100
Mean and Medians
Location of the median on a density curve is where area under is cut in half.
Location of the mean on a density curve is where the length of the curve is cut in half.
On symmetric curves:
On skewed curves:
Normal curves are special kinds of density curves• Symmetric, single peaked, bell-shaped
Use m (mu) and s(sigma) to talk about mean and std. dev.
m–
s –
m - s m m + s
68-95-99.7 Rule• About 68% of data fall within• About 95% of data fall within• About 99.7% of data fall within
m - s m m + s
m + 2s m + 3sm - 2s
m - 3s
Example 1Grasshopper jumps can be described by a Normal
distribution with m = 12 inches and s = 2 inches.
About 68% of all jumps are between inches.
10 12 14 16 1886
68%95%99.7%
About where would you find the top 2.5%?
Example 1 – continuedWhat % falls below 14 inches?
10 12 14 16 1886
What % of jumps are more than 14 inches?
10 12 14 16 1886 10 12 14 16 1886
What % of jumps are between 14 and 16 inches?
Finding values without 68-95-99.7We use tables or calculators to find harder values, like where
is the top 10% or what percent falls below a given observation.
N(m, s) means observations come from a Normal distribution with a mean of m and a standard deviation of s.
Standardize observation x from N(m, s) by:
The standardized value is called a
sm
x
z
Two functions on the calculator
(found under 2nd VARS => DISTR)• normalcdf( : will give area between two bounds for a
given m, s.• invNorm( : will give the observation that has a
particular area to its left for a given m, s.• normalcdf(lower bound, upper bound, m, s)• invNorm(area, m, s)
m - s m m + s
Using the calculator with grasshopper N(12, 2)
What % of jumps fall below 17 inches?
normalcdf(lower bound, upper bound, m, s)= normalcdf( ) = area below 17 =
No lower bound, so:10 12 14 16 1886
What % of jumps fall above 11.5 inches?
Since total area is 1 and we have :
First, find area
we want
10 12 14 16 1886
normalcdf(lower bound, upper bound, m, s)= normalcdf( ) = area =
Using the table with grasshopper N(12, 2)
What % of jumps fall between 10 and 16.36 inches?
Calculator does this all at once with the normalcdf( function.
Area between = area below 16.36 – area below 10.
10 12 14 16 1886 10 12 14 16 1886 10 12 14 16 1886
= -
normalcdf(lower bound, upper bound, m, s)= normalcdf( ) = area between =
Using the table with grasshopper N(12, 2)
What jumps fell in the top 10%?
Use invNorm function to find that observation.
10 12 14 16 1886
10%
What observation has an area of .10 above it?
What observation has an area of .90 below it?
invNorm(area, m, s)= invNorm( ) = value with .9 area below=
Using the table with grasshopper N(12, 2)
Where do the middle 50% fall?
Use invNorm function to find those observations.
10 12 14 16 1886
50%
What observation has an area of below it?
What observation has an area of below it?
invNorm(area, m, s)= invNorm( ) = value with area below =
invNorm(area, m, s)= invNorm( ) = value with area below =
Sampling Distributions
Know the entire population:
(parameter)
Know only a sample (SRS):
(statistic)
Law of Large Numbers- As you increase the sample size, sample mean gets closer
to population mean
Population = 3, 3, 8, 15, 20, 21, 22, 31, 39
Sample of size 1= 8
Sample of size 2= 8, 22
Sample of size 3= 8, 22, 31
Sample of size 4= 8, 22, 31, 3
Sample of size 5= 8, 22, 31, 3, 20
Population of 7 people and their weights (in pounds)
120
Samples of size 1: {122}, {140}, {150}, {155}, {160}, {170}, {195}
Mark off the sample mean for each sample with an “x”
122, 140, 150, 155, 160, 170, 195 156m
130 140 150 160 170 180 190 200
x x x x x x x
120
Mark off the sample mean for each sample with an “x”
130 140 150 160 170 180 190 200
x
Samples of size 2: {122, 140}, {122, 150}, {122, 155}, {122, 160}, {122, 170}, {122, 195}, {140, 150}, …, (170, 195}. There are 21 possible samples.
xx x xx xx xx xxxxx xx x xx x
Population of 7 people (continued)
120
Samples of size 1:
140, 122, 160, 195, 150, 155, 170 156m
130 140 150 160 170 180 190 200
x x x x x x x
Samples of size 2:
130 140 150 160 170 180 190
x xx x xx xx xx xxxxx xx x xx x
Samples of size 6: 7 possible sample of this size. {122, 140, 160, 150, 155, 170}, …
140 150 160 170 180
x xx xxxx5.149x
Sampling distribution of Sampling from a large population with mean m and
standard deviation s:
samples of size n will have their sample means distributed
with a mean m and standard deviation s over root n.
If population is N(m, s), then
nx
x
ss
mm
If population is not Normal but n is large, then
x
., is
n
Nxsm
.,ely approximat is
n
Nxsm
Ex. 1 - Weight of eggs is N(65, 3)Your egg carton holds 9 eggs, so consider each carton as a
random sample of 9 eggs. Let X be the weight of a single egg in grams and X be average weight of your carton.
What is the sampling distribution for your carton’s average weight?
)67( XP
xsxm
)67( XP
Weight of eggs is N(65, 3) – continuedMean weight of carton is N(65,1)
62 65 68 71 745956 67
Convert 67 to a z-score
for a single egg:
Convert 67 to a z-score
for the carton:
Ex. 2 - Length of trout is N(17.5, 2.5)Your local waters contain a multitude of trout. Let X be the
length of a single fish in inches and X be average length of your daily catch of five fish.
What is the sampling distribution for your daily catch?
)1916( XP
xsxm
)1916( XP
Trout length is N(17.5, 2.5) – continuedMean length of daily catch is N(17.5,1.118)
15 17.5 20 22.5 2512.510
Convert 16 to a z-score
for a single fish:
Convert 16 to a z-score
for the daily catch:
Trout length is N(17.5, 2.5) – continuedMean length of daily catch is N(17.5,1.118)
15 17.5 20 22.5 2512.51015 17.5 20 22.5 2512.510
Ex 3 - Length of trout is N(10, 2)Your fishing pond has another type of trout. Let X be the
length of a single fish in inches taken at random and X be average length of a sample of 16 fish.
What is the sampling distribution for a sample of 16 fish?
)72.10( XP
xsxm
)72.10( XP