random variables and probability distributions variables and probability distributions ... • a...

27
David Tenenbaum – GEOG 090 – UNC-CH Spring 2005 Random Variables and Probability Distributions The concept of probability is the key to making statistical inferences by sampling a population What we are doing is trying to ascertain the probability of an event having a given outcome, e.g. We summarize a sample statistically and want to make some inferences about it, such as what proportion of the population has values within a given range we could do this by finding the area under the curve in a frequency distribution This requires us to be able to specify the distribution of a variable before we can make inferences

Upload: buixuyen

Post on 03-Jul-2018

253 views

Category:

Documents


3 download

TRANSCRIPT

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

Random Variables and Probability Distributions

• The concept of probability is the key to making statistical inferences by sampling a population

• What we are doing is trying to ascertain the probability of an event having a given outcome, e.g.• We summarize a sample statistically and want to

make some inferences about it, such as what proportion of the population has values within a given range we could do this by finding the area under the curve in a frequency distribution

• This requires us to be able to specify the distribution of a variable before we can make inferences

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

Random Variables and Probability Distributions

• Previously, we looked at some proportions of area under the normal curve:

Source: Earickson, RJ, and Harlin, JM. 1994. Geographic Measurement and Quantitative Analysis. USA: Macmillan College Publishing Co., p. 100.

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

Random Variables and Probability Distributions

• BUT before we could use the normal curve to draw inferences about some sample, we have to find out if this is the right distribution for our variable …

• While many natural phenomena are normally distributed, there are other phenomena that are best described using other distributions

• This section of the course will begin with some background on probabilities (terminology & rules), and then we will examine a few useful distributions:• Discrete distributions: Binomial and Poisson• Continuous distributions: Normal and its relatives

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

Probability – Some Definitions• Probability – Refers to the likelihood that something

(an event) will have a certain outcome• An Event – Any phenomenon you can observe that can

have more than one outcome (e.g. flipping a coin)• An Outcome – Any unique condition that can be the

result of an event (e.g. the available outcomes when flipping a coin are heads and tails), a.k.a. simple events or sample points

• Sample Space – The set of all possible outcomesassociated with an event (e.g. the sample space for flipping a coin includes heads and tails)

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• For example, suppose we have a data set where in six cities, we count the number of malls located in that city present:

City # of Malls1 12 43 44 45 26 3

• We might wonder if we randomly pick one of these six cities, what is the chance that it will have n malls?

Probability – An Example

Outcome #1

Outcome #2

Outcome #3Outcome #4

SampleSpace

Each count of the # of malls in a city is an event

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• What we have here is a random variable –defined as variable X whose range is values xi are sampled randomly from a population

• To put this another way, a random variable is a function defined on the sample space this means that we are interested in all the possible outcomes

• The question was: If we randomly pick one of the six cities, what is the chance that it will have n malls?

Random Variables and Probability Functions

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• To answer this question, we need to form a probability function (a.k.a. probability distribution) from the sample space that gives all values of a random variable and their probabilities

• A probability distribution expresses the relative number of times we expect a random variable to assume each and every possible value

• We either base a probability function on either a very large empirically-gathered set of outcomes, or else we determine the shape of a probability function mathematically

Random Variables and Probability Functions

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• Here, the values of xi are drawn from the four outcomes, and their probabilities are the number of events with each outcome divided by the total number of events:City # of Malls1 12 43 44 45 26 3

Probability – An Example, Part II

Outcome #1

Outcome #2

Outcome #3Outcome #4

xi P(xi)1 1/6 = 0.1672 1/6 = 0.1673 1/6 = 0.1674 3/6 = 0.5

•The probability of an outcome P(xi) = # of times an outcome occurred

Total number of events

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• We can plot this probability distribution as a probability mass function:

Probability – An Example, Part III

xi P(xi)1 1/6 = 0.1672 1/6 = 0.1673 1/6 = 0.1674 3/6 = 0.5

0

0.25

0.50

P(x i)

1 2 3 4

xi

• This plot uses thin lines to denote that the probabilities are massed at discrete values of this random variable

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• The random variable from our example is a discrete random variable, because it has a finite number of values (i.e. a city can have 1 or 2 malls, but it cannot have 1.5 malls)

• Any variable that is generated by counting a whole number of things is likely to be a discrete variable (e.g. # of coin tosses in a row with heads, questionnaire responses where one of a set of ordinal categories must be chosen, etc.)

• A discrete random variable can be described by a probability mass function

Discrete Random Variables

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• Probability mass functions have the following rules that dictate their possible values:

1. The probability of any outcome must be greater than or equal to zero and must also be less than or equal to one, i.e.

0 ≥ P(xi) ≤ 1 for i = {1, 2, 3, …, k-1, k}

2. The sum of all probabilities in the sample space must total one, i.e.

Probability Mass Functions

Σ P(xi) = 1i=1

i=k

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• We can calculate the mean of a discrete probability distribution by taking all possible values of the variable, multiplying them by their probability, and summing them over the values:

Discrete Random Variables

Σ xi *P(xi)i=1

i=k

µ =

• The symbol µ is used here rather than x because the basic idea of a probability distribution is to use a large number of values to approach a stable estimate of the parameter

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• We can also calculate the variance of a discrete probability distribution by calculating the sum of squares for all possible values of the variable, multiplying them by their probability, and summing them over the values:

Discrete Random Variables

Σ (xi – x)2*P(xi)i=1

i=k

σ2 =

• These formulae are only useful for discrete probability distributions, for continuous probability dists. a different method is required

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• Continuous random variable can assume all real number values within an interval, for example: measurements of precipitation, pH, etc.

• Some random variables that are technically discrete exhibit such a tremendous range of values, that is it desirable to treat them as if they were continuous variables, e.g. population

• Discrete random variables are described by probability mass functions, and continuous random variables are described by probability density functions

Continuous Random Variables

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• Probability density functions are defined using the same rules required of probability mass functions, with some additional requirements:

1. The function must have a non-negative value throughout the interval a to b, i.e.

f(x) ≥ 0 for a ≤ x ≤ b

2. The area under the curve defined by f(x), within the interval a to b, must equal 1:

Probability Density Functions

x

area=1f(x)

a b

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• Theoretically, a continuous variable’s range can extend from negative infinity to infinity, e.g. the normal distribution:

Probability Density Functions

x

area=1f(x)

• The tails of the normal distribution’s curve extend infinitely in each direction, but the value of f(x) approaches zero asymptotically, getting closer and closer, but never reaching zero

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• Suppose we are interested in computing the probability of a continuous random variable falling within a range of values bounded by lower limit c and upper limit d, within the interval a to b

Probability Density Functions

x

f(x)a b

c dHow can we find the probability of a value occurring between c and d?

• We need to calculate the shaded area if we know the density function, we could use calculus:

P(x)c

d=

d

c

f(x) dx

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• Fortunately, we do not need to solve the integral ourselves to practice statistics … instead, if we can match the f(x) up to some known distribution, we can use a table of probabilitiesthat someone else has developed

• Tables A.2 through A.6 in the epilogue of the Rogerson text (pp. 214-221) give probability values for several distributions, including the normal distribution and some related distributions used by various inferential statistics (you can find tables like these at the end of most statistics texts)

Probability Density Functions

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• Suppose we are interested in computing the probabilityof a continuous random variable at a certain value of x (e.g. at d):

Probability Density Functions

x

f(x)a b

d•Can we find the probability of a value occurring at d? P(d) = ?•No, P(d) = 0 … why?•The reasons is:

P(x)c

d0 as c d

• To put this another way, as the interval from c to d becomes vanishingly narrow, the area below the curve within it becomes vanishingly small

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• Now that we have described how to apply individualunconditional probabilities, we can move onto to looking at the rules for combining multiple probabilities

• A useful aid when discussing probabilities is the Venn diagram, that depicts multiple probabilities and their relationships using a graphical depiction of sets:

Probability Rules

•The rectangle that forms the area of the Venn Diagram represents the sample (or probability) space, which we have defined above•Figures that appear within the sample space are sets that represent events in the probability context, & their area is proportional to their probability (full sample space = 1)

A B

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• We can use a Venn diagram with to describe the relationships between two sets or events, and the corresponding probabilities:

Probability Rules

•The union of sets A and B (written symbolically is A ∪ B) is represented by the areas enclosed by set A and B together, and can be expressed by OR (i.e. the union of the two sets includes any location in A or B, i.e. blue OR red) •The intersection of sets A and B (written symbolically as A ∩ B) is the area that is overlapped by both the A and B sets, and can be expressed by AND (I.e. the intersection of the two sets includes locations in A AND B, i.e. purple)

A B

A B

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• If the sets A and B do not overlap in the Venn diagram, the sets are disjoint, and this represents a case of two independent, mutually exclusive events

Probability Rules

•The union of sets A and B here uses the addition rule, where

P(A∪Β) = P(A) + P(B)•You can think of this in terms of areas of the events, where the union in this case is simply the sum of the areas•The intersection of sets A and B here results in the empty set (symbolized by ∅), because at no point do the circles overlap (no purple area as there was in the previous Venn diagram), thus there is no intersection•Unconditional probabilities ~ the outcome of one event does not effect the other

A B

A B

P(A∪Β) = P(A) + P(B)

P(A∩Β) = ∅

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• For example, suppose set A represents a roll of 1 or 2 on a 6-sided die, so P(A)=2/6, and set B represents a roll of 3 or 4, so P(B)=2/6:

Probability Rules

•The union of sets A and B here uses the addition rule, whereP(A∪Β) = P(A) + P(B)P(A∪Β) = 2/6 + 2/6P(A∪Β) = 4/6 = 2/3 = 0.6

•The outcomes represented here are mutually exclusive, thus there is no intersection between sets A and B, thus P(A∩Β) = ∅

A B

A B

P(A∪Β) = P(A) + P(B)

P(A∩Β) = ∅

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• If the sets A and B do overlap in the Venn diagram, the sets are not mutually exclusive, and this represents a case of independent, but not exclusive events

Probability Rules

•The union of sets A and B here isP(A∪Β) = P(A) + P(B) - P(A∩Β)

because we do not wish to count the intersection area twice, thus we need to subtract it from the sum of the areas of A and B when taking the union of a pair of overlapping sets•The intersection of sets A and B here is calculated by taking the product of the two probabilities, a.k.a. the multiplication rule:

P(A∩Β) = P(A) * P(B)

A B

A B

P(A∩Β) = P(A) * P(B)

P(A∪Β) = P(A) + P(B) - P(A∩Β)

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• Consider set A to give the chance of precipitation at P(A)=0.4 and set B to give the chance of below freezing temperatures at P(B)=0.7

Probability Rules

•The intersection of sets A and B here is P(A∩Β) = P(A) * P(B)P(A∩Β) = 0.4 * 0.7 = 0.28This expresses the chance of snow at P(A∩Β) = 0.28

•The union of sets A and B here isP(A∪Β) = P(A) + P(B) - P(A∩Β)P(A∪Β) = 0.4 + 0.7 – 0.28 = 0.82This expresses the chance of below freezing temperatures or precipitation occurring at P(A∪Β) = 0.82

A B

P(A∪Β) = P(A) + P(B) - P(A∩Β)

A B

P(A∩Β) = P(A) * P(B)

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• Consider set A to give the chance of precipitation at P(A)=0.4 and set B to give the chance of below freezing temperatures at P(B)=0.7

Probability Rules

•The complement of set A isP(A’) = 1 - P(A)P(A’) = 1 – 0.4 = 0.6This expresses the chance of it not raining or snowing at P(A’) = 0.6

•The complement of the union of sets A and B isP(A∪Β)’ = 1 – [P(A) + P(B) - P(A∩Β)]P(A∪Β)’ = 1 – [0.4 + 0.7 – 0.28] = 0.18This expresses chance of it neither raining nor being below freezing at P(A∪Β)’ = 0.18

P(A’) = 1 - P(A)

P(A∪Β)’ = 1 – [P(A) + P(B) - P(A∩Β)]

A A’

A BP(

A∪

Β)’

David Tenenbaum – GEOG 090 – UNC-CH Spring 2005

• We can also encounter the situation where set A is fully contained within set B, which is equivalent to saying that set A is a subset of set B:

Probability Rules

•In probability terms, this situation occurs when outcome B is a necessary precondition for outcome A to occur, although not vice-versa (in which case set B would be contained in set A instead)

A

B

• For example, set A might represent rolling a 5 using a 6-sided die, where set B denotes any roll greater than 3 A is contained with B because anytime A occurs, B occurs as well