sadc course in statistics probability distributions (session 04)

27
SADC Course in Statistics Probability Distributions (Session 04)

Upload: sydney-kirk

Post on 28-Mar-2015

237 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: SADC Course in Statistics Probability Distributions (Session 04)

SADC Course in Statistics

Probability Distributions

(Session 04)

Page 2: SADC Course in Statistics Probability Distributions (Session 04)

2To put your footer here go to View > Header and Footer

Learning Objectives

At the end of this session you will be able to:

• solve basic problems concerning real-valued probability distributions.

• distinguish between discrete and continuous random variables (r.v.’s).

• explain what is meant by a probability distribution.

• calculate the population mean and variance of a given distribution.

Page 3: SADC Course in Statistics Probability Distributions (Session 04)

3To put your footer here go to View > Header and Footer

Session Contents

In this session you will

• be introduced to the theory of probability distributions.

• be shown how to build a firm foundation of the theory of probability distributions in preparation for applications in statistical inference (Module H2).

• strengthen the mathematical skills that are required to deal correctly with probability ideas.

Page 4: SADC Course in Statistics Probability Distributions (Session 04)

4To put your footer here go to View > Header and Footer

Random variables

• In the previous two sessions we dealt with probabilities of events.

• In practice events of interest are those generated by random variables.

• A random variable is a variable that associates outcomes in the sample space with numerical values.

Page 5: SADC Course in Statistics Probability Distributions (Session 04)

5To put your footer here go to View > Header and Footer

An example – birth of a baby

Girl

Boy

0 1

Line showing numerical scaleX

Sample space

The figure above depicts a random variable X defined as

X = 0 if outcome is a boy X = 1 if outcome is a girl

Page 6: SADC Course in Statistics Probability Distributions (Session 04)

6To put your footer here go to View > Header and Footer

Often the outcomes are actual measurements.

Thus, we could have:a random variable Y which records measurements of weights (of say maize cobs) into numbers with kilograms as units.

Outcomes of any experiment can be recorded as real numbers by defining an appropriate random variable.

We do this because it is easier to work with numbers.

A second example:

Page 7: SADC Course in Statistics Probability Distributions (Session 04)

7To put your footer here go to View > Header and Footer

Types of random variables• A random variable is said to be discrete if

the set of possible values is countable. Examples of discrete random variables are those that records events on gender, family size, number of traffic offenses, ...

• A random variable is said to be continuous if the set of possible values is not countable. Examples of continuous random variables are those that record events such as weight, height, time, etc ...

Page 8: SADC Course in Statistics Probability Distributions (Session 04)

8To put your footer here go to View > Header and Footer

• Continuous random variables can be mapped into discrete random variables by grouping.

• For example, age X is a continuous random variable since it is a measure of time since birth.

• We can define a discrete random variable Y asY = 1 if 0≤X<5. = 2 if 5≤X<10 = 3 if 10≤X<15 = etc.

• You cannot convert a discrete random variable into a continuous one.

Continuous to discrete?

Page 9: SADC Course in Statistics Probability Distributions (Session 04)

9To put your footer here go to View > Header and Footer

Probability distributions

• A probability distribution is a table, a function or a graph that presents possible outcomes of a trial, say E (e.g. throw of a die), together with their corresponding probabilities.

• Note that the outcome probabilities must sum to 1 since occurrence of E results in exactly one outcome.

Page 10: SADC Course in Statistics Probability Distributions (Session 04)

10To put your footer here go to View > Header and Footer

An example

• The following is an example of a probability distribution for the gender of a new born child:

Outcome Values (x) of a random variable X

P(x)

Male 0 0.5

Female 1 0.5

Total 1

Page 11: SADC Course in Statistics Probability Distributions (Session 04)

11To put your footer here go to View > Header and Footer

allx

1. f (x) 0 for all x.

2. f (x) 1 or f (x)dx 1.

A probability distribution can sometimes be specified using a function f called a probability (mass/density) function.

The function f must satisfy the following conditions:

Probability mass/density function

Page 12: SADC Course in Statistics Probability Distributions (Session 04)

12To put your footer here go to View > Header and Footer

The function P(x) of the slide 10 is a probability mass function since it satisfies the two conditions above.

Point 1 of slide 11 satisfies the first law of probability, as it must since P(x) represents a probability.

Point 2 of slide 11 indicates that the sum is used if the set of values x is countable; otherwise the integral applies.

Points to note:

Page 13: SADC Course in Statistics Probability Distributions (Session 04)

13To put your footer here go to View > Header and Footer

Expected values

The weighted “centre” of a probability distribution is called the expected value written E(X). More formally the expected value of a random variable X is defined as:

allx

E( X ) xf ( x ),

,)()(

dxxxfXE

in the discrete case.

in the continuous case.

E(X) is also called the population mean and is usually denoted by .

Page 14: SADC Course in Statistics Probability Distributions (Session 04)

14To put your footer here go to View > Header and Footer

Example (i)

• If f(x) is given by

then E(X) = 0(0.5) + 1(0.5) = 0.5

x f(x) = Prob(x)

0 0.5

1 0.5

Total 1

Page 15: SADC Course in Statistics Probability Distributions (Session 04)

15To put your footer here go to View > Header and Footer

Example (ii)

.3

2

322

)()(

1

0

1

0

32

1

0

x

dxx

dxxxfXE

Let f(x) = 2x, for 0 x 1

Page 16: SADC Course in Statistics Probability Distributions (Session 04)

16To put your footer here go to View > Header and Footer

Moments• The k-th moment of a random variable X is

defined as:

0

),()(x

kk xfxXE

,)()(

dxxfxXE kk

in the discrete case.

in the continuous case.

The moments of a distribution characterize the shape of a distribution. The notation k is often used to denote the k-th moment.

Page 17: SADC Course in Statistics Probability Distributions (Session 04)

17To put your footer here go to View > Header and Footer

Class exercise

Suppose a coin is tossed twice.

(a) Write down the possible values for the random variable X defined as:

X = number of heads that occur

(b) Prepare a table showing the probability distribution function of X

(c) Use this table to determine the expected value of X

Page 18: SADC Course in Statistics Probability Distributions (Session 04)

18To put your footer here go to View > Header and Footer

Measures of spread• The variance of a random variable X is

defined as

.)()()( 222 XEXEXVar

• Notice that E(X2) is the second moment of X.

• The variance of X is also called the population variance and is denoted by 2.

• The square root of the variance is called the standard deviation of X. It is denoted by .

Page 19: SADC Course in Statistics Probability Distributions (Session 04)

19To put your footer here go to View > Header and Footer

Patterns for differing variances

25.02

12

Note that the bigger the variance, the larger is the spread.

Page 20: SADC Course in Statistics Probability Distributions (Session 04)

20To put your footer here go to View > Header and Footer

Skewness and kurtosis• If the probability distribution is not

symmetrical about the mean it is said to be skew. The distribution has a positive skewness if the tail of high values is longer than the tail of low values, and negative skewness if the reverse is true.

• Kurtosis is a measure of the peakness of a probability distribution. It is usually used as a comparison with the normal distribution (see later sessions) since a kurtosis of more than 3 indicates that the distribution has a higher peak than the normal distribution.

Page 21: SADC Course in Statistics Probability Distributions (Session 04)

21To put your footer here go to View > Header and Footer

Cumulative probability distribution

• In many applications we want to calculate probabilities of the type P(X≤k) or P(X>k) instead of P(X=k).

• The probabilities P(X≤k) for k = 0, 1, 2, .. provide an example of what is called the cumulative distribution of a random variable X.

• Here, the random variable X is discrete.

Page 22: SADC Course in Statistics Probability Distributions (Session 04)

22To put your footer here go to View > Header and Footer

• P(X>k) = 1 – P(X ≤k). This is a direct result of the probability result that P(Ac) = 1 – P(A).

• Similar results can be obtained for continuous random variables.

That is, if a < b then the event {X ≤ a} is a sub-event of the event {X ≤ b}.

Hence P(X ≤ a) < P(X ≤ b).

Some results

Page 23: SADC Course in Statistics Probability Distributions (Session 04)

23To put your footer here go to View > Header and Footer

• The cumulative distribution at x, denoted F(x), is formally defined as:

x

y

yfxF0

),()(

,)()(

x

dyyfxF

in the discrete case for a positive random variable.

in the continuous case

By definition, cumulative distribution is an increasing function having certain properties. These are shown below.

Definition of F(x)

Page 24: SADC Course in Statistics Probability Distributions (Session 04)

24To put your footer here go to View > Header and Footer

• F (– ) = 0.

• F (+ ) = 1. This says that the total area under the probability density function is 1.

• F(a) < F(b) for a<b. Thus F is an increasing function.

• P( a < X ≤ b) = F(b) - F(a).

• P(X = x) = 0, for every point x if X is a continuous random variable.

Results concerning F(x)

Page 25: SADC Course in Statistics Probability Distributions (Session 04)

25To put your footer here go to View > Header and Footer

An example using F(x) - discreteA discrete r.v. X, representing the number of girls in families with 5 children, has the foll: distn:

X = No. of girls P(X=x) F(x)

0 0.03125

1 0.15625

2 0.31250

3 0.31250

4 0.15625

5 0.03125

What is the probability of 4 children or less?

Complete the table with values of F(x)

Page 26: SADC Course in Statistics Probability Distributions (Session 04)

26To put your footer here go to View > Header and Footer

An example using F(x) - continuous

0xf ( x ) e , x

0

1x

y xF( x ) e dy e

A continuous random variable r.v. X, has probability density function given by

What is its cumulative distribution function?

Answer:

Page 27: SADC Course in Statistics Probability Distributions (Session 04)

27To put your footer here go to View > Header and Footer

Practical work follows to ensure learning objectives

are achieved…