means and variances of random variableshosung.weebly.com/uploads/1/7/9/6/17964019/slide4-3.pdf ·...

43
1/ 43 Means and Variances of Random Variables Probability Hosung Sohn Department of Public Administration and International Affairs Maxwell School of Citizenship and Public Affairs Syracuse University Lecture Slide 4-3 (October 8, 2015) Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

Upload: truongkhuong

Post on 05-Apr-2018

225 views

Category:

Documents


3 download

TRANSCRIPT

1/ 43

Means and Variances of Random Variables

Probability

Hosung Sohn

Department of Public Administration and International AffairsMaxwell School of Citizenship and Public Affairs

Syracuse University

Lecture Slide 4-3 (October 8, 2015)

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

2/ 43

Means and Variances of Random Variables

Table of Contents

1 Means and Variances of Random Variables

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

3/ 43

Means and Variances of Random Variables

Announcement

Revised Lecture Note 4.

Lecture Note 5 will be posted by weekend.

Please submit midterm evaluation forms you received via email.

Problem Set 2 will be posted on Friday (October 9, 2015):

=⇒ Due on October 20, 2015!

A mistake in the syllabus:

=⇒ The deadline for Problem Set 4 is December 8, 2015, notDecember 1, 2015.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

4/ 43

Means and Variances of Random Variables

Review of Previous Lecture

Random variable:

=⇒ A variable X is a random variable if the value that X takes atthe conclusion of an experiment is a chance or random occurrence thatcannot be predicted with certainty in advance.

Two types of random variables:

1. Discrete random variables.

2. Continuous random variables.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

5/ 43

Means and Variances of Random Variables

Review of Previous Lecture

Discrete random variables:

=⇒ A random variable X is discrete if X can take only a finitenumber of different values.

Discrete probability distributions:

=⇒ A discrete probability distribution is a table, graph, or rulethat associates a probability P (X = xi) with each possible value xi thatthe discrete random variable X can take.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

6/ 43

Means and Variances of Random Variables

Review of Previous Lecture

Discrete probability distribution in tables:

Value of X x1 x2 x3 x4 · · ·Probability P (X = x1) P (X = x2) P (X = x3) P (X = x4) · · ·

Discrete probability distributions in figures:

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

7/ 43

Means and Variances of Random Variables

Review of Previous Lecture

Continuous random variables:

=⇒ A random variable X is continuous if X can take all the values insome interval.

Continuous probability distributions:

=⇒ A continuous probability distribution of a continuous randomvariable X is described by a density curve. The probability of any eventis the area under the density curve and above the values of X thatmake up the event.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

8/ 43

Means and Variances of Random Variables

Review of Previous Lecture

If X is a continuous random variable, P (X ≥ xi) = P (X > xi):

=⇒ P (X = xi) = 0.

One example of continuous probability distributions:

=⇒ Uniform distributions.

Another example of continuous random variables:

=⇒ Normal distributions.Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

9/ 43

Means and Variances of Random Variables

Review Exercise 1

Suppose the population proportion of Internet users who say they use Twitterto post updates about themselves is 19%. Think about selecting randomsamples from a population in which 19% are Twitter users.

1. What is the sample space for selecting a single person?

=⇒ S = {Y,N}.

2. If you select three people, what is the sample space?

=⇒ S = {Y Y Y,NY Y, Y NY, Y Y N,NNY, Y NN,NY N,NNN}.

3. Define the sample space for the random variable that expresses thenumber of Twitter users in the sample of size 3.

=⇒ X = {0, 1, 2, 3}.

4. What information is contained in the sample space for Question 2 that isnot contained in the sample space for Question 3?

=⇒ The sample space in Question (2) reveals which of the three peopleuse Twitter.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

10/ 43

Means and Variances of Random Variables

Review Exercise 2

The Twitter example continued:

1. Assign probabilities for S = {Y,N}?

=⇒ P (Y ) = 0.19 and P (N) = 0.81.

2. For S = {Y Y Y,NY Y, Y NY, Y Y N,NNY, Y NN,NY N,NNN}?

=⇒ P (Y Y Y ) = P (Y )× P (Y )× P (Y ) = 0.193 = 0.0069 (why?).

=⇒ P (NY Y ) = P (N)× P (Y )× P (Y ) = 0.81× 0.192 = 0.0292.

=⇒ P (NNY ) = P (N)× P (N)× P (Y ) = 0.812 × 0.19 = 0.1247.

=⇒ P (NNN) = P (N)× P (N)× P (N) = 0.813 = 0.5314.

3. Probability distributions for the random variable X.

Outcome YYY NYY YNY YYN NNY YNN NYN NNN

Value of X 3 2 1 0

Probability 0.0292 × 3 0.1247 × 3

0.0069 =0.0876 =0.3741 0.5314

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

11/ 43

Means and Variances of Random Variables

Introduction

When describing “data,” we used tables and graphs (e.g., histograms,scatterplots etc.).

Similarly, when describing “random variables,” we used tables andgraphs.

=⇒ We used tables and graphs to describe probability distributions ofdiscrete or continuous random variables.

On the other hand, we also used numerical measures to describe data(e.g., means, variance, etc.).

We can also use numerical measures to describe random variables.

=⇒ We can estimate the mean or the standard deviation of randomvariables.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

12/ 43

Means and Variances of Random Variables

The Expected Value (or Mean) of a Random Variable

When talking about the mean of a random variable, we use the term“expected value” of a random variable.

The expected value of a random variable is used as a measure of the“center” of the probability distribution of the random variable X.

And it is denoted as E(X) or µX .

Recall that a statistic such as x̄ can be considered as a random variable.

=⇒ So we can define the expected value of x̄; i.e., E(x̄).

Difference between E(x̄) and x̄?

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

13/ 43

Means and Variances of Random Variables

The Expected Value (or Mean) of a Random Variable

E(x̄) vs. x̄

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

14/ 43

Means and Variances of Random Variables

The Expected Value (or Mean) of a Random Variable

The expected value of a discrete random variable X is

E(X) = µX =

k∑i=1

xipi

= x1p1 + x2p2 + · · ·+ xkpk

where xi is a value of X and pi is the corresponding probability:

=⇒ i.e.) pi = P (X = xi).

The mean is called the expected value because it denotes the averagevalue that we would expect to occur if the experiment were repeated alarge number of times.

Another way to think about the expected value:

=⇒ A weighted average in which each outcome (i.e., xi) is weighted byits probability.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

15/ 43

Means and Variances of Random Variables

The Expected Value (or Mean) of a Random Variable

Example: Suppose a random variable X denotes the years of educationbefore entering the MPA program—for our class. And assume that wehave the following probability distribution for X:

Value of X (i.e., Years of Education) 12 13 14

Probability 0.68 0.12 0.20

What is E(X)? Using the formula for the expected value,

E(X) =

3∑i=1

xipi

= x1p1 + x2p2 + x3p3

= 12× 0.68 + 13× 0.12 + 14× 0.20

= 12.52.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

16/ 43

Means and Variances of Random Variables

The Expected Value (or Mean) of a Random Variable

The above example illustrates the calculation of the expected value of adiscrete random variable.

How do we calculate E(X) if X is a continuous random variable.

The formula for the expected value of a continuous random variable is

E(X) =

∫ b

a

xf(x)dx,

where f(x) is a probability function of a continuous variable.

Intuitively, the expected value of a continuous random variable is thepoint at which the area under density curve would balance.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

17/ 43

Means and Variances of Random Variables

The Expected Value (or Mean) of a Random Variable

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

18/ 43

Means and Variances of Random Variables

Statistical Estimation and the Law of Large Numbers

Our goal in using statistical science:

=⇒ Estimate the population parameter using a statistic!

Suppose we want to estimate the mean height µ of the population of allAmerican women between the ages of 18 and 24 years.

To estimate µ:

1. We draw an SRS of young women.

2. Use the sample mean x̄ to estimate µ.

To reiterate, µ is a parameter and x̄ is a statistic.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

19/ 43

Means and Variances of Random Variables

Statistical Estimation and the Law of Large Numbers

Statistics such as x̄ obtained from probability sampling designs are“random variables.” Why?

=⇒ We don’t know their values until we draw an SRS, and their valuesvary in repeated sampling.

Thus, we can think of the sampling distributions of these statistics asthe probability distributions of these random variables.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

20/ 43

Means and Variances of Random Variables

Statistical Estimation and the Law of Large Numbers

We also learned that it is “reasonable” to use x̄ to estimate µ.

=⇒ An SRS should fairly represent the population, so x̄ should besomewhat near µ.

But we don’t expect x̄ to be “exactly” equal to µ.

And we know that if we draw another SRS, then it would give us adifferent x̄.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

21/ 43

Means and Variances of Random Variables

Statistical Estimation and the Law of Large Numbers

If x̄ is rarely right and varies from sample to sample, why are we usingthis to estimate µ?

One answer we learned is that it is because x̄ is an unbiased estimatorfor µ.

Another reason: if we keep on increasing the sample size when we drawan SRS, the statistic x̄ is guaranteed to get as close as we wish to theparameter µ.

=⇒ This fact is called the law of large numbers (LLN).

LLN is very useful law because the law holds for any population,regardless of the shape or spread of the distribution of population data.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

22/ 43

Means and Variances of Random Variables

Statistical Estimation and the Law of Large Numbers

Law of Large Numbers (LLN)

Definition

The law of large numbers (LLN) states that as the number ofobservations drawn increases in a single SRS, the mean x̄ eventuallyapproaches the mean µ of the population.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

23/ 43

Means and Variances of Random Variables

Statistical Estimation and the Law of Large Numbers

Suppose that the mean of all women is 64.5 inches; i.e., µ = 64.5.

Figure below shows the behavior of the mean height x̄ of n womenchosen at random from a population.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

24/ 43

Means and Variances of Random Variables

Statistical Estimation and the Law of Large Numbers

At first, the graph shows that the mean of the sample changes as wetake more observations.

Eventually, however, the mean gets close to the population meanµ = 64.5 and settles down at that value.

=⇒ LLN says that this always happens.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

25/ 43

Means and Variances of Random Variables

Statistical Estimation and the Law of Large Numbers

LLN is intuitively clear.

Suppose our population size is 10,000.

1. If you take an SRS of size 100, then x̄ based on this 100 observations isnot exactly equal to µ that is based on 10,000.

2. What if you take an SRS of 9,500. Then x̄ based on this 9,500observations would be almost equal to µ.

3. What if you take an SRS of 9,999. Then it is almost certain that x̄ basedon this 9,999 observations is equal to µ.

This is what LLN is telling us about.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

26/ 43

Means and Variances of Random Variables

Statistical Estimation and the Law of Large Numbers

So LLN tells us that if we draw a large number of observations in asingle SRS, then x̄ is almost equal to µ.

But we can ask a question: how large is a large number?

=⇒ The answer depends on the variability of the population.

If our outcome of interest in population is so variable, then we needmore observations.

If our outcome of interest in population is not so variable, then LLNholds even if we don’t have that many observations.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

27/ 43

Means and Variances of Random Variables

Statistical Estimation and the Law of Large Numbers

Suppose we would like to estimate the mean salary of all the people inthe US.

=⇒ The salary level is so variable, so we need quite a large number ofobservations to exactly estimate the population mean salary.

Suppose we would like to estimate the mean number of cars that thehouseholds in the US possess.

=⇒ In general, the number of cars possessed by households does notvary to a great extent (maybe around one to three).

=⇒ So we don’t need a large number of observations to estimate themean number of cars.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

28/ 43

Means and Variances of Random Variables

Rules for Means

Sometimes, there are instances in which we want to find out theexpected value of two or more random variables.

There are some rules that come in handy when we calculate theexpected value of such random variables.

We will study four rules.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

29/ 43

Means and Variances of Random Variables

Rules for Means

Rule 1: If X is a random variable and a and b are fixed numbers (i.e.,constants), then

E(a+ bX) = a+ bE(X).

=⇒ We say that a+ bX is a linear transformation of the randomvariable X.

Rule 2: If X and Y are random variables, then

E(X + Y ) = E(X) + E(Y ).

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

30/ 43

Means and Variances of Random Variables

Rules for Means

Rule 3: If X and Y are random variables, then

E(X − Y ) = E(X)− E(Y ).

Rule 4: If we combine Rule 1 and Rule 2 or 3, then we have thefollowing rule:

E(a+ bX + cY ) = a+ bE(X) + cE(Y )

or

E(a− bX + cY ) = a− bE(X) + cE(Y )

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

31/ 43

Means and Variances of Random Variables

Rules for Means

But!!! The rule—in general—doesn’t hold hold for multiplication anddivision.

That is

E(XY ) 6= E(X)E(Y ) and E(X/Y ) 6= E(X)/E(Y )

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

32/ 43

Means and Variances of Random Variables

Rules for Means

Example: Let X and Y be random variables that denote the number ofcourses taken in the fall and spring semester, respectively, by studentsat the Maxwell School. And we have the following probabilitydistributions.

Value of X 1 2 3 4 5 6

Probability 0.05 0.05 0.13 0.26 0.36 0.15

Value of Y 1 2 3 4 5 6

Probability 0.06 0.08 0.15 0.25 0.34 0.12

If we pick a student randomly, what is the expected value of thenumber of courses in the both semesters?

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

33/ 43

Means and Variances of Random Variables

Rules for Means

Value of X 1 2 3 4 5 6

Probability 0.05 0.05 0.13 0.26 0.36 0.15

Value of Y 1 2 3 4 5 6

Probability 0.06 0.08 0.15 0.25 0.34 0.12

Solution:

The question asks to solve E(X + Y ). So using Rule 2 above, we knowthat E(X + Y ) = E(X) + E(Y ). So the expected value is

E(X + Y ) = E(X) + E(Y )

= (1× 0.05 + 2× 0.05 + 3× 0.13 + 4× 0.26 + 5× 0.36 + 6× 0.15)

+ (1× 0.06 + 2× 0.08 + 3× 0.15 + 4× 0.25 + 5× 0.34 + 6× 0.12)

= 4.28 + 4.09

= 8.37.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

34/ 43

Means and Variances of Random Variables

The Variance of a Random Variable

The expected value is a numerical measure of the “center” of aprobability distribution.

We need another measure; i.e., the spread or variability of theprobability distribution.

=⇒ We use the variance and standard deviation of a random variable.

We write the variance of a random variable X as V ar(X) or σ2X .

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

35/ 43

Means and Variances of Random Variables

The Variance of a Random Variable

The variance of a discrete random variable X is the expected value ofthe squared deviations from the mean and is given by the formula

V ar(X) = σ2X = E

[(X − E(X))

2]

= [x1 − E(X)]2p1 + [x2 − E(X)]2p2 + · · ·+ [xk − E(X)]2pk

=

k∑i=1

(xi − E(X))2pi.

Note that the variance can also be calculated by the following formula:

V ar(X) = E(X2)− [E(X)]2.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

36/ 43

Means and Variances of Random Variables

The Variance of a Random Variable

Let’s prove the alternative formula:

V ar(X) = E(X2)− [E(X)]2.

Proof

= E[(X − E(X))

2]

= E[X2 − 2XE(X) + [E(X)]2

]= E

(X2)− 2E(X)E(X) + [E(X)]2 (by the rules of the mean)

= E(X2)− [E(X)]2.

In some cases, especially when the mean is not an integer, it may beeasier to calculate the variance by using the alternative formula ratherthan the original formula.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

37/ 43

Means and Variances of Random Variables

The Standard Deviation of a Random Variable

The standard deviation of a discrete random variable X is given by theformula

SD(X) = σX =√V ar(X)

Question: can you tell the difference between V ar(X) and s2?

Answer:

1. V ar(X) indicates the variability that arises from repeated sampling.

2. s2 indicates the variability in the values among observations in a singlesample.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

38/ 43

Means and Variances of Random Variables

The Standard Deviation of a Random Variable

Example: Find the variance and the standard deviation of the randomvariable X that has the following probability distribution:

Value of X 0 3

Probability 0.4 0.6

Solutions:

1. First, E(X) = 0× 0.4 + 3× 0.6 = 1.8.

2. To calculate the variance, I will use the alternative formula:

=⇒ V ar(X) = E(X2)− [E(X)]2.

3. E(X2) = 02 × 0.4 + 32 × 0.6 = 5.4.

4. V ar(X) = E(X2)− [E(X)]2 = 5.4− 1.82 = 2.16.

5. SD(X) =√V ar(X) =

√2.16 ≈ 1.47.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

39/ 43

Means and Variances of Random Variables

The Rules of the Variance

Some rules for the variance.

Rule 1: If X is a random variable, and a and b are fixed numbers, then

V ar(a+ bX) = b2V ar(X).

=⇒ Notice that the constant a disappears and b comes out as a squaredterm.

Rule 2: If X and Y are random variables, then

V ar(X + Y ) = V ar(X) + V ar(Y ) + 2Cov(X,Y )

and

V ar(X − Y ) = V ar(X) + V ar(Y )− 2Cov(X,Y ).

=⇒ Notice the sign right before the covariance term.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

40/ 43

Means and Variances of Random Variables

The Rules of the Variance

Rule 3: If X and Y are “independent” random variables, then

V ar(X + Y ) = V ar(X) + V ar(Y )

and

V ar(X − Y ) = V ar(X) + V ar(Y ).

=⇒ Notice that the covariance term disappears, as well as the sign(both are positive).

=⇒ Rule 3 above implies that if X and Y are independent, thenCov(X,Y ) = 0 or Corr(X,Y ) = 0.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

41/ 43

Means and Variances of Random Variables

Independence Between Two Random Variables

The notion of independence between two random variables is veryimportant in statistics—especially when you learn Econometrics.

Suppose X is a random variable that indicates whether the students inthe Maxwell School are taking PAI 721—Introduction to Statistics.

And suppose that Y is a random variable that denotes whether thestudents are in the MPA program or not.

Are the two random variables X and Y independent?

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

42/ 43

Means and Variances of Random Variables

Independence Between Two Random Variables

Students in the MPA program are required to take PAI 721.

So if you know the value of Y for Student A, then you are more likelyto know the value of X for this student.

That is information you have regarding the random variable Y ishelpful for determining the information of the random variable X.

Hence, in this case, we say that the two random variables X and Y arenot independent, or we say that the two random variables aredependent.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)

43/ 43

Means and Variances of Random Variables

Independence Between Two Random Variables

On the other hand, suppose X denotes the toe size of students in theMaxwell School.

Knowing whether a student is in the MPA program will not help usfrom determining the toe size of students in the Maxwell School.

So in this case, we can reasonably assume that X and Y areindependent.

Hosung Sohn (Lecture Slide 4-3) Introduction to Statistics: PAI 721 (Fall, 2015)