random variables and probability distributions random variables - random outcomes corresponding to...

17
Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability Distributions - A listing of the possible outcomes and their probabilities (discrete r.v. s ) or their densities (continuous r.v. s ) Normal Distribution - Bell-shaped continuous distribution widely used in statistical inference Sampling Distributions - Distributions corresponding to sample statistics (such as mean and proportion) computed from random samples

Upload: theodore-roberts

Post on 22-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Random Variables and Probability Distributions

• Random Variables - Random outcomes corresponding to subjects randomly selected from a population.

• Probability Distributions - A listing of the possible outcomes and their probabilities (discrete r.v.s) or their densities (continuous r.v.s)

• Normal Distribution - Bell-shaped continuous distribution widely used in statistical inference

• Sampling Distributions - Distributions corresponding to sample statistics (such as mean and proportion) computed from random samples

Page 2: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Discrete Probability Distributions

• Discrete RV - Random variable that can take on a finite (or countably infinite) set of discontinuous possible outcomes (Y)

• Discrete Probability Distribution - Listing of outcomes and their corresponding probabilities (y , P(y))

1)(1)(0 yallyPyP

Page 3: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Example - Supreme Court Vacancies

• Supreme Court Vacancies by Year 1837-1975

• Y # Vacancies in Randomly selected year

# Vacancies (y) Frequency (# of Years) Proportion (P(y))0 81 81/139=.58271 43 43/139=.30942 14 14/139=.10073 1 1/139=.0072

>3 0 0/139=.0000Total 139 1.0000

Source: R.J. Morrison (1977), “FDR and the Supreme Court: An Example of the Use of Probability Theory in Political History”,

History and Theory, Vol. 16, pp 137-146

Page 4: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Parameters of a P.D.

• Mean (aka Expected Value) - Long run average outcome

)()( yyPYE Standard Deviation - Measure of the “typical” distance of an outcome from the mean

2222 )()()()( yPyyPyYE

Page 5: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Example - Supreme Court Vacanciesy P(y) yP(y) y2P(y)

0 .5827 .0000 .0000

1 .3094 .3094 .3094

2 .1007 .2014 .4028

3 .0072 .0216 .0648

Total 1.0000 .5324 .7770

7025.4936.)5324(.7770.)(

5324.)(

222

yPy

yyP

Page 6: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Normal Distribution• Bell-shaped, symmetric family of distributions• Classified by 2 parameters: Mean () and standard

deviation (). These represent location and spread• Random variables that are approximately normal have

the following properties wrt individual measurements:– Approximately half (50%) fall above (and below) mean

– Approximately 68% fall within 1 standard deviation of mean

– Approximately 95% fall within 2 standard deviations of mean

– Virtually all fall within 3 standard deviations of mean

• Notation when Y is normally distributed with mean and standard deviation :

),(~ NY

Page 7: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Normal Distribution

95.0)22(68.0)(50.0)( YPYPYP

Page 8: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Example - Heights of U.S. Adults

• Female and Male adult heights are well approximated by normal distributions: YF~N(63.7,2.5) YM~N(69.1,2.6)

INCHESM

76.5

75.5

74.5

73.5

72.5

71.5

70.5

69.5

68.5

67.5

66.5

65.5

64.5

63.5

62.5

61.5

60.5

59.5

Cases weighted by PCTM

20

10

0

Std. Dev = 2.61

Mean = 69.1

N = 99.23

INCHESF

70.5

69.5

68.5

67.5

66.5

65.5

64.5

63.5

62.5

61.5

60.5

59.5

58.5

57.5

56.5

55.5

Cases weighted by PCTF

20

18

16

14

12

10

8

6

4

2

0

Std. Dev = 2.48

Mean = 63.7

N = 99.68

Source: Statistical Abstract of the U.S. (1992)

Page 9: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Standard Normal (Z) Distribution

• Problem: Unlimited number of possible normal distributions (- < < , > 0)

• Solution: Standardize the random variable to have mean 0 and standard deviation 1

)1,0(~),(~ NY

ZNY

• Probabilities of certain ranges of values and specific percentiles of interest can be obtained through the standard normal (Z) distribution

Page 10: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Standard Normal (Z) Distribution

• Standard Normal Distribution Characteristics:– P(Z 0) = P(Y ) = 0.5000

– P(-1 Z 1) = P(-Y +) = 0.6826

– P(-2 Z 2) = P(-2Y +2) = 0.9544

– P(Z za) = P(Z -za) = a (using Z-table)

a 0.500 0.100 0.050 0.025 0.010 0.005za 0.000 1.282 1.645 1.960 2.326 2.576

Page 11: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Finding Probabilities of Specific Ranges

• Step 1 - Identify the normal distribution of interest (e.g. its mean () and standard deviation () )

• Step 2 - Identify the range of values that you wish to determine the probability of observing (YL , YU), where often the upper or lower bounds are or -

• Step 3 - Transform YL and YU into Z-values:

UU

LL

YZ

YZ

• Step 4 - Obtain P(ZL Z ZU) from Z-table

Page 12: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Example - Adult Female Heights

• What is the probability a randomly selected female is 5’10” or taller (70 inches)?

• Step 1 - Y ~ N(63.7 , 2.5)

• Step 2 - YL = 70.0 YU =

• Step 3 -

UL ZZ 52.2

5.2

7.630.70

• Step 4 - P(Y 70) = P(Z 2.52) = .0059 ( 1/170)

z .00 .01 .02 .032.4 .0082 .0080 .0078 .00752.5 .0062 .0060 .0059 .00572.6 .0047 .0045 .0044 .0043

Page 13: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Finding Percentiles of a Distribution

• Step 1 - Identify the normal distribution of interest (e.g. its mean () and standard deviation () )

• Step 2 - Determine the percentile of interest 100p% (e.g. the 90th percentile is the cut-off where only 90% of scores are below and 10% are above)

• Step 3 - Turn the percentile of interest into a tail probability a and corresponding z-value (zp):– If 100p 50 then a = 1-p and zp = za

– If 100p < 50 then a = p and zp = -za

• Step 4 - Transform zp back to original units:

pp zY

Page 14: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Example - Adult Male Heights

• Above what height do the tallest 5% of males lie above?

• Step 1 - Y ~ N(69.1 , 2.6)

• Step 2 - Want to determine 95th percentile (p = .95)

• Step 3 - Since 100p > 50, a = 1-p = 0.05

zp = za = z.05 = 1.645

• Step 4 - Y.95 = 69.1 + (1.645)(2.6) = 73.4

z .03 .04 .05 .061.5 .0630 .0618 .0606 .05941.6 .0516 .0505 .0495 .04851.7 .0418 .0409 .0401 .0392

Page 15: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Statistical Models

• When making statistical inference it is useful to write random variables in terms of model parameters and random errors

YYY )(

• Here is a fixed constant and is a random variable

• In practice will be unknown, and we will use sample data to estimate or make statements regarding its value

Page 16: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Sampling Distributions and the Central Limit Theorem

• Sample statistics based on random samples are also random variables and have sampling distributions that are probability distributions for the statistic (outcomes that would vary across samples)

• When samples are large and measurements independent then many estimators have normal sampling distributions (CLT):– Sample Mean:

– Sample Proportion:

n

NY,~

n

N)1(

,~^

Page 17: Random Variables and Probability Distributions Random Variables - Random outcomes corresponding to subjects randomly selected from a population. Probability

Example - Adult Female Heights

• Random samples of n = 100 females to be selected• For each sample, the sample mean is computed• Sampling distribution:

)25.0,5.63(100

5.2,5.63~ NNY

• Note that approximately 95% of all possible random samples of 100 females will have sample means between 63.0 and 64.0 inches