probability definition: randomness, chance, likelihood, proportion, percentage, odds. probability is...

44
Probability Definition: randomness, chance, likelihood, proportion, percentage, odds. ability is the mathematical ideal. Not sure what will happen in a single event but, in the long run, certain patterns emerge. letters like X and Y to represent quan ill be called random variables .

Upload: edmund-cain

Post on 31-Dec-2015

240 views

Category:

Documents


0 download

TRANSCRIPT

Probability

Definition: randomness, chance, likelihood, proportion, percentage, odds.

Probability is the mathematical ideal.

Not sure what will happen in a single event but, in the long run, certain patterns emerge.

We use letters like X and Y to represent quantities.These will be called random variables.

Probability ModelList the outcomes for a given event (experiment or

question) and associated probabilities.

Example: pick a card out of a standard deck

S: sample space (contains all possible outcomes)

Event: single outcome or collection of outcomes

S: sample space contains

Event: pick a

pick a

Basic Rules1. Event A has probability P(A), which is

between 0 and 1 (inclusive).

2. Probability of entire sample space, P(S), is .

3. Addition: If two events are disjoint (nothing in common), then P(A or B) = .

4. Complement: P(not A) =

Probability Model for standard deck of cards

52 cards, 4 suits (Diamonds, Hearts, Clubs, Spades)Each suit as 13 cards: 2, 3, 4, 5, … , 9, 10, J, Q, K, A

P(picking any single card)=

A = event that a red 5 is picked

B = event that a club is picked

C = event that a face card (J, Q, K) is picked

P(A or B)=

P(not C) =

Discrete ModelIf sample space is finite, the probability model

is called discrete.

Roll 2 six-sided dice and record the sum.

List all outcomes and associated probabilities in a table.

)6sum(P

Sum 2 3 4 5 6 7 8 9 10 11 12

Prob. 1 36

1 18

1 12

1 9

5 36

1 6

5 36

1 9

1 12

1 18

1 36

)9sum(P

Continuous Model

If sample space contains a range of values, the probability model is called continuous.

Density curves record probability as the area under the

curve for a given range of outcomes.

So, total area under the curve will always equal 1.

Continuous Model – Example 1The uniform distribution for any real number, X,

from 3 to 7 looks like:

)1.5Xor 2.3X(P

)6X(P

3 5 7

41

Area 3 5 7

41

6

3 5 7

41

5.13.2

Continuous Model – Example 2The symmetric triangular distribution for any real

number, X, from 0 to 8 looks like:

)4X(P

4 8

Area4 8

)6X(P

4 8

More Probability

Use Venn diagrams to visualize probability rules.

If events are disjoint, don’t overlap circles.

Sample space, S: rectangle

Events (A, B, C, …): circles inside

Keep track of # of outcomes in each region of the rectangle.

Venn diagram - exampleExample: pick a card out of a standard deck

S: sample space contains 52 outcomes (52 cards)A = event that a red 5 is picked

B = event that a club is picked

C = event that a face card (J, Q, K) is picked

S

AB

C

P(A or B)

S

AB

C

2 3

10

928

S

AB

C

2 3

10

928

P(B or C)

General Addition RuleA = event that a red card is picked

B = event that a number card is picked

P(A or B)

SA B

General Addition: P(A or B) =

Conditional Probability RuleGiven a condition (you know something happened), how

does that change the chances of something else happening?

P(B|A)= probability of B given A

P(A)

B) andP(A A)|P(B

SA B

Venn Diagram of 70 studentsC: owns a cat

D: owns a dog

P(C)

SC D30 20

10

10

D)|P(C

General Multiplication RuleRewrite

)A2nd andK 1st(P

P(A)

B) andP(A A)|P(B to get:

A)|P(BP(A) B) andP(A

Experiment: pick two cards out of a standard deck

)K2nd andK 1st(P

Independent EventsTwo (or more) events are independent if knowledge of

one event does not change the chances of the other.

Multiplication Rule for Independent Events:

row) ain s7' threeP(rolling

For a cholesterol-lowering drug, there is a 5% chance that a loss-of-sleep side effect will occur.

What are the chances that two people picked at random take the drug and experience sleep loss?

What are the chances that at least 1out of 3 loses sleep?

)loses 2nd and loses1st (P

)oneleast at (P

sleep) lose people P(three

The Normal Distribution

Curve will capture 100% of all observations. Hence, there will be a total area of 1 below it.

Then the area under the curve for a given range of values will represent the proportion (percent, fraction) of observations that fall in that range.

Use curves to describe overall pattern seen in a histogram.

The proportion of scores above 80 is roughly 26.8%.

The area under the density curve for scores above 80 is roughly 0.261 =26.1%.

Curves and proportions%

v

%

v

20 40 60 80 100

20 40 60 80 100

Mean and Medians

Location of the median on a density curve is where area under is cut in half.

Location of the mean on a density curve is where the length of the curve is cut in half.

On symmetric curves:

On skewed curves:

Normal curves are special kinds of density curves• Symmetric, single peaked, bell-shaped

Use m (mu) and s(sigma) to talk about mean and std. dev.

m–

s –

m - s m m + s

68-95-99.7 Rule• About 68% of data fall within• About 95% of data fall within• About 99.7% of data fall within

m - s m m + s

m + 2s m + 3sm - 2s

m - 3s

Example 1Grasshopper jumps can be described by a Normal

distribution with m = 12 inches and s = 2 inches.

About 68% of all jumps are between inches.

10 12 14 16 1886

68%95%99.7%

About where would you find the top 2.5%?

Example 1 – continuedWhat % falls below 14 inches?

10 12 14 16 1886

What % of jumps are more than 14 inches?

10 12 14 16 1886 10 12 14 16 1886

What % of jumps are between 14 and 16 inches?

Finding values without 68-95-99.7We use tables or calculators to find harder values, like where

is the top 10% or what percent falls below a given observation.

N(m, s) means observations come from a Normal distribution with a mean of m and a standard deviation of s.

Standardize observation x from N(m, s) by:

The standardized value is called a

sm

x

z

z-scores from example 1, N(12, 2)

14x

16x

10x

17x

sm

x

z

Two functions on the calculator

(found under 2nd VARS => DISTR)• normalcdf( : will give area between two bounds for a

given m, s.• invNorm( : will give the observation that has a

particular area to its left for a given m, s.• normalcdf(lower bound, upper bound, m, s)• invNorm(area, m, s)

m - s m m + s

Using the calculator with grasshopper N(12, 2)

What % of jumps fall below 17 inches?

normalcdf(lower bound, upper bound, m, s)= normalcdf( ) = area below 17 =

No lower bound, so:10 12 14 16 1886

What % of jumps fall above 11.5 inches?

Since total area is 1 and we have :

First, find area

we want

10 12 14 16 1886

normalcdf(lower bound, upper bound, m, s)= normalcdf( ) = area =

Using the table with grasshopper N(12, 2)

What % of jumps fall between 10 and 16.36 inches?

Calculator does this all at once with the normalcdf( function.

Area between = area below 16.36 – area below 10.

10 12 14 16 1886 10 12 14 16 1886 10 12 14 16 1886

= -

normalcdf(lower bound, upper bound, m, s)= normalcdf( ) = area between =

Using the table with grasshopper N(12, 2)

What jumps fell in the top 10%?

Use invNorm function to find that observation.

10 12 14 16 1886

10%

What observation has an area of .10 above it?

What observation has an area of .90 below it?

invNorm(area, m, s)= invNorm( ) = value with .9 area below=

Using the table with grasshopper N(12, 2)

Where do the middle 50% fall?

Use invNorm function to find those observations.

10 12 14 16 1886

50%

What observation has an area of below it?

What observation has an area of below it?

invNorm(area, m, s)= invNorm( ) = value with area below =

invNorm(area, m, s)= invNorm( ) = value with area below =

68-95-99.7 Rule1,2,3 standard deviations away accurate to two decimal places

Sampling Distributions

Know the entire population:

(parameter)

Know only a sample (SRS):

(statistic)

Law of Large Numbers- As you increase the sample size, sample mean gets closer

to population mean

Population = 3, 3, 8, 15, 20, 21, 22, 31, 39

Sample of size 1= 8

Sample of size 2= 8, 22

Sample of size 3= 8, 22, 31

Sample of size 4= 8, 22, 31, 3

Sample of size 5= 8, 22, 31, 3, 20

Population of 7 people and their weights (in pounds)

120

Samples of size 1: {122}, {140}, {150}, {155}, {160}, {170}, {195}

Mark off the sample mean for each sample with an “x”

122, 140, 150, 155, 160, 170, 195 156m

130 140 150 160 170 180 190 200

x x x x x x x

120

Mark off the sample mean for each sample with an “x”

130 140 150 160 170 180 190 200

x

Samples of size 2: {122, 140}, {122, 150}, {122, 155}, {122, 160}, {122, 170}, {122, 195}, {140, 150}, …, (170, 195}. There are 21 possible samples.

xx x xx xx xx xxxxx xx x xx x

Population of 7 people (continued)

120

Samples of size 1:

140, 122, 160, 195, 150, 155, 170 156m

130 140 150 160 170 180 190 200

x x x x x x x

Samples of size 2:

130 140 150 160 170 180 190

x xx x xx xx xx xxxxx xx x xx x

Samples of size 6: 7 possible sample of this size. {122, 140, 160, 150, 155, 170}, …

140 150 160 170 180

x xx xxxx5.149x

Sampling distribution of Sampling from a large population with mean m and

standard deviation s:

samples of size n will have their sample means distributed

with a mean m and standard deviation s over root n.

If population is N(m, s), then

nx

x

ss

mm

If population is not Normal but n is large, then

x

., is

n

Nxsm

.,ely approximat is

n

Nxsm

Ex. 1 - Weight of eggs is N(65, 3)Your egg carton holds 9 eggs, so consider each carton as a

random sample of 9 eggs. Let X be the weight of a single egg in grams and X be average weight of your carton.

What is the sampling distribution for your carton’s average weight?

)67( XP

xsxm

)67( XP

Weight of eggs is N(65, 3) – continuedMean weight of carton is N(65,1)

62 65 68 71 745956 67

Convert 67 to a z-score

for a single egg:

Convert 67 to a z-score

for the carton:

Ex. 2 - Length of trout is N(17.5, 2.5)Your local waters contain a multitude of trout. Let X be the

length of a single fish in inches and X be average length of your daily catch of five fish.

What is the sampling distribution for your daily catch?

)1916( XP

xsxm

)1916( XP

Trout length is N(17.5, 2.5) – continuedMean length of daily catch is N(17.5,1.118)

15 17.5 20 22.5 2512.510

Convert 16 to a z-score

for a single fish:

Convert 16 to a z-score

for the daily catch:

Trout length is N(17.5, 2.5) – continuedMean length of daily catch is N(17.5,1.118)

15 17.5 20 22.5 2512.51015 17.5 20 22.5 2512.510

Ex 3 - Length of trout is N(10, 2)Your fishing pond has another type of trout. Let X be the

length of a single fish in inches taken at random and X be average length of a sample of 16 fish.

What is the sampling distribution for a sample of 16 fish?

)72.10( XP

xsxm

)72.10( XP

Trout length is N(10, 2) – continuedMean length of 16 fish is N(10,0.5)

8 10 12 14 16648 10 12 14 1664