3 discrete random variables and probability distributions

Upload: renukadevi-rpt

Post on 11-Oct-2015

759 views

Category:

Documents


4 download

DESCRIPTION

applied statistics

TRANSCRIPT

  • 1

    3

    Contents 3.1 Discrete Random Variables ..................................................................................... 1

    3.2 Probability Distributions and Probability Mass Functions ......................................... 2

    3.3 Cumulative Distribution Functions ........................................................................... 4

    3.4 Mean and Variance of a Discrete Random Variable ................................................ 6

    3.6 Binomial Distribution ............................................................................................. 11

    3.7 Geometric Distributions ......................................................................................... 17

    3.8 Hyper Geometric Distribution ................................................................................ 21

    3.9 Poisson Distribution .............................................................................................. 23

    3.1 Discrete Random Variables Many physical systems can be modeled by the same or similar random experiments and

    random variables. The distribution of the random variables involved in each of these

    common systems can be analyzed, and the results of that analysis can be used in different

    applications and examples. In this chapter, we present the analysis of several random

    experiments and discrete random variables that frequently arise in applications. We often

    omit a discussion of the underlying sample space of the random experiment and directly

    describe the distribution of a particular random variable.

    Example 3.1.1

    For each of the following exercises, determine the range (possible values) of the random

    variable.

    The random variable is the number of nonconforming solder connections on a printed circuit

    board with 1000 connections.

    Solution: The range of X is {{{{0,1,2,...,1000}}}} A batch of 500 machined parts contains 10 that do not conform to customer requirements.

    Parts are selected successively, without replacement, until a nonconforming part is obtained.

    The random variable is the number of parts selected.

    Solution: The range of X is {{{{1,2,...,491}. Because 490 parts are conforming, a nonconforming part must be selected in 491 selections.

  • 2

    3.2 Probability Distributions and Probability Mass Functions Random variables are so important in random experiments that sometimes we essentially

    ignore the original sample space of the experiment and focus on the probability distribution

    of the random variable.

    The probability distribution of a random variable X is a description of the probabilities

    associated with the possible values of X. For a discrete random variable, the distribution is

    often specified by just a list of the possible values along with the probability of each.

    Example 3.2.1

    The possible values for X are {0, 1, 2 } and the probabilities are

    Solution: P(X = 0) = 0.35 P(X = 1) = 0.24 P(X = 2) = 0.41

    The probability distribution of X is specified by the possible values along with the probability

    of each. A graphical description of the probability distribution of X is shown in Fig. 3-1.

    Fig. 3-1 Probability distribution

    Definition

    For a discrete random variable X with possible values x1, x2, . . . , xn, a probability mass

    function is a function such that

    ( ) 0if x

    1

    ( ) 1n

    i

    i

    f x=

    =

    ( ) ( )i if x P X x= =

    0

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    0 1 2 3

  • 3

    Example 3.2.2

    Determine the determine the following probabilities:

    x 2 1 0 1 2

    f (x) 1

    8

    2

    8

    2

    8

    2

    8

    1

    8

    (a) P(X 1.25) (b) P(X 2.2) (c) P ( 1.1 X 1) (d) P(X > 0)

    Solution

    (a) P(X 1.25) =

    (b) P(X 2.2) =

    (c) P ( 1.1 X 1) =

    (d) P(X > 0) =

    Example 3.2.3

    Given

    8 1( ) , 1, 2, 3

    7 2

    x

    f x x = = .

    Find

    (a) P (X 1)

    (b) P (X > 1)

    (c) P (2 1) or

    Solution

    1

    ( a ) P ( 1 ) = P ( = 1 )

    8 1 = 0 . 5 7 1 4

    7 2

    X X

    =

  • 4

    2 3

    ( b ) P ( > 1 ) = P ( = 2 ) + P ( = 3 )

    8 1 8 1 =

    7 2 7 2

    = 0 . 2 8 5 7 + 0 . 1 4 2 9 = 0 . 4 2 8 6

    X X X

    +

    OR: P ( > 1 ) = 1 P ( = 1 ) = 1 0 . 5 7 1 4 0 . 4 2 8 6X X =

    38 1

    ( c ) P ( 2 < < 6 ) = P ( X = 3 ) = 0 . 1 4 2 97 2

    X =

    1 2 3

    ( d ) P ( 1 X > 1 ) = P ( = 1 ) + P ( = 2 ) + P ( = 3 )

    8 1 8 1 8 1 =

    7 2 7 2 7 2

    = 1

    X o r X X X

    + +

    3.3 Cumulative Distribution Functions

    Definition

    The cumulative distribution function of a discrete random variable X, denoted as F(x), is

    ( ) ( ) = ( )i

    i

    x x

    F x P X x f x

    =

    For a discrete random variable X, F(x) satisfies the following properties.

    ( ) ( ) ( ) ( )

    ( ) 0 ( ) 1

    ( ) , ( ) ( )

    i

    i

    x x

    i F x P X x f x

    i i F x

    i i i I f x y th e n F x F y

    = =

  • 5

    Example 3.3.1

    Determine the probability mass function of X from the following cumulative distribution

    function:

    f (2) = P(X = 2)

    = P(X 2) P( X < 2) = 0.2 0 = 0.2

    f (2) = P(X = 2)

    = 1 P(X< 2) = 1 0.7 = 0.3

    Plot F(x) Cumulative distribution function for Example 3.3.1

    Example 3.3.2

    The probability distribution of X, the number of imperfections per 10 meters of a synthetic

    fabric in continuous rolls of uniform width, is given by

    x 0 1 2 3 4

    f (x) 0.41 0.37 0.16 0.05 0.01

    (a) Construct the cumulative distribution function of X.

    Solution

    Cumulative distribution function:

    f (0) = P(X = 0)

    = P(X 0) P(X 2) = 0.7 0.2 = 0.5

  • 6

    (a) ( )

    0 0

    0.41 0 1

    0.78 1 2

    0.94 2 3

    0.99 3 4

    1 4

    x

    x

    xF x

    x

    x

    x

  • 7

    A probability distribution can be viewed as a loading with the mean equal to the balance

    point. Parts (a) and (b) illustrate equal means, but Part (a) illustrates a larger variance.

    Fig 3-4(b): Illustration of probability distributions with equal means variances.

    The probability distributions illustrated in Parts (a) and (b) differ even though they have equal

    means and equal variances.

    Example 3.3.2 (continue)

    The probability distribution of X, the number of imperfections per 10 meters of a synthetic

    fabric in continuous rolls of uniform width, is given by

    x 0 1 2 3 4

    f (x) 0.41 0.37 0.16 0.05 0.01

    (a) Find the average number of imperfections per 10 meters of this fabric.

    (b) Find the variance and standard deviation of the number of imperfections.

    (b) ( ) ( ) ( ) ( ) ( ) ( ) ( )4

    0

    0 0.41 1 0.37 2 0.16 3 0.05 4 0.01i ii

    E X x f x=

    = = = + + + +

    0.88=

  • 8

    (c) ( ) ( )4

    2 2 2

    0

    i i

    i

    V X x f x =

    = =

    ( ) ( ) ( ) ( ) ( )2 2 2 2 2 20 0.41 1 0.37 2 0.16 3 0.05 4 0.01 0.88

    0.8456

    = + + + + =

    0.8456 0.9196 = =

    Example 3.4.1

    There is a chance that a bit transmitted through a digital transmission channel is received in

    error. Let X equal the number of bits in error in the next four bits transmitted. The possible

    values for X are {0, 1, 2, 3, 4}. Based on a model for the errors that is presented in the

    following section, probabilities for these values will be determined. Suppose that the

    probabilities are

    x 0 1 2 3 4

    ( )f x

    0.6561 0.2916 0.0486 0.0036 0.0001

    Now

    ( ) = ( )x

    E X x f x =

    = 0 ( 0 ) 1 (1 ) 2 ( 2 ) 3 ( 3 ) 4 ( 4 )

    = 0 ( 0 . 6 5 6 1 ) 1 ( 0 . 2 9 1 6 ) 2 ( 0 . 0 4 8 6 ) 3 ( 0 . 0 0 3 6 ) 4 ( 0 . 0 0 0 1 )

    0 . 4

    f f f f f+ + + +

    + + + +

    =

    To calculate V (X), a table is convenient

    2

    2

    ( )

    ( ) ( )x

    V X

    x f x

    =

    =

    52

    1

    ( 0 .4 ) ( )i ii

    x f x=

    =

    = 0.36

  • 9

    Exercise 3.4.2

    The number of messages sent per hour over a computer network has the following

    distribution:

    x = number of

    messages 10 11 12 13 14 14

    ( )f x

    0.08 0.15 0.30 0.20 0.20 0.07

    Determine the mean and standard debiation of the number of messages sent per hour

    Solution

    The variance of a random variable X can be considered to be the expected value of a

    specific function of X, namely, h(X) = (X )2 . In general, the expected value of any function h(X) of a discrete random variable is defined in a similar manner.

    Expected Value of a Function of a Discrete Random Variable

    If X is a discrete random variable with probability mass function

    [ ]( ) = ( ) ( )x

    E h X h x f x (3-4)

  • 10

    Using Example 3.4.1

    x 0 1 2 3 4

    ( )f x

    0.6561 0.2916 0.0486 0.0036 0.0001

    X is the number of bits in error in the next four bits transmitted. What is the expected value

    of the square of the number of bits in error? Now, 2( )h X X= .Therefore,

    [ ] 2 2 22 2

    ( ) = 0 ( 0 . 6 5 6 1 ) 1 ( 0 . 2 9 1 6 ) 2 ( 0 . 0 4 8 6 )

    + 3 ( 0 . 0 0 3 6 ) 4 ( 0 . 0 0 0 1 )

    0 . 5 2

    E h X + +

    +

    =

    Note: E( X 2 ) [ E ( X ) ] 2

    E.g.: E[ h ( X ) ] = a E( X) + b

    Example 3.4.3

    The distributor of a machine for cytogenics has developed a new model. The company

    estimates that when it is introduced into the market, it will be very successful with a

    probability 0.6, moderately successful with a probability 0.3, and not successful with

    probability 0.1. The estimated yearly profit associated with the model being very successful

    is $15 million and being moderately successful is $5 million; not successful would result in a

    loss of $500,000. Let X be the yearly profit of the new model.

    (a) Determine the probability mass function of X.

    (b) Determine the mean and variance of the random variable.

    Solution

  • 11

    3.6 Binomial Distribution

    Examples of Binomial distribution

    Consider the following random experiments and random variables:

    1. Flip a coin 10 times. Let X = number of heads obtained.

    2. Aworn machine tool produces 1% defective parts. Let X = number of defective

    parts in the next 25 parts produced.

    3. Each sample of air has a 10% chance of containing a particular rare molecule. Let X

    = the number of air samples that contain the rare molecule in the next 18 samples

    analyzed.

    4. Of all bits transmitted through a digital transmission channel, 10% are received in

    error. Let X = the number of bits in error in the next five bits transmitted.

    5. A multiple choice test contains 10 questions, each with four choices, and you guess

    at each question. Let X = the number of questions answered correctly.

    6. In the next 20 births at a hospital, let X = the number of female births.

    7. Of all patients suffering a particular illness, 35% experience improvement from a

    particular medication. In the next 100 patients administered the medication, let X =

    the number of patients who experience improvement.

    These examples illustrate that a general probability model that includes these experiments

    as particular cases would be very useful.

    Each of these random experiments can be thought of as consisting of a series of repeated,

    random trials: such as 10 flips of the coin and etc.

    The random variable in each case is a count of the number of trials that meet a specified

    criterion. The outcome from each trial either meets the criterion that X counts or it does not;

    consequently, each trial can be summarized as resulting in either a success or a failure.

    For example:

    In experiment 2, because X counts defective parts, the production of a defective part is

    called a success.

    A trial with only two possible outcomes is used so frequently as a building block of a

    random experiment that it is called a Bernoulli trial. It is usually assumed that the trials that

    constitute the random experiment are independent. This implies that the outcome from one

    trial has no effect on the outcome to be obtained from any other trial.

  • 12

    Furthermore, it is often reasonable to assume that the probability of a success in each trial

    is constant. In the multiple choice experiment (4 choice), if the test taker has no knowledge

    of the material and just guesses at each question, we might assume that the probability of a

    correct answer is for each question.

    Factorial notation (review)

    Factorial ( n!) denotes the product of the integers less than or equal to n:

    Definition

    A random experiment consists of n Bernoulli trials such that

    (1) The trials are independent

    (2) Each trial results in only two possible outcomes, labeled as success and failure

    (3) The probability of a success in each trial, denoted as p, remains constant.

    The random variable X that equals the number of trials that result in a success has a

    binomial random variable with parameters 0 < p < 1 and n = 1, 2, 3, ... . The probability

    mass function of X is

    f (x) = p x (1- p) n x x = 0, 1, 2, . . . , n

    For example:

  • 13

    Example 3.6.1

    The random variable X has a binomial distribution with n = 10 and p = 0.3. Sketch the

    probability mass function of X.

    (a) What value of X is most likely?

    (b) What value(s) of X is(are) least likely?

    Solution

    P(X = x) = p x (1- p) n x = ( n Cx ) p

    x (1- p) n x

    Example 3.6.2

    The random variable X has a binomial distribution with n = 10 and p = 0.5. Determine the

    following probabilities.

    P(X = 5) (c) P(X 2) P(X 9) (d) P(3 X < 5)

    Solution

  • 14

    Example 3.6.3

    Determine the cumulative distribution function of a binomial random variable with n = 3 and

    p = 0.01

    Solution

    The probability expression above is a very useful formula that can be applied in a number of

    examples. The name of the distribution is obtained from the binomial expansion. For

    constants a and b, the binomial expansion is

    0

    ( )n

    n k n k

    k

    na b a b

    k

    =

    + =

    Let p denote the probability of success on a single trial. Then, by using the binomial

    expansion with a = p and b = 1 p, we see that the sum of the probabilities for a binomial random variable is 1. Furthermore, because each trial in the experiment is classified into two

    outcomes, {success, failure}, the distribution is called a bi-nomial.

    For a fixed n, the distribution becomes more symmetric as p increases from 0 to 0.5 or

    decreases from 1 to 0.5. For a fixed p, the distribution becomes more symmetric as n

    increases.

    Binomial distributions for selected values of n and p

  • 15

    The mean and variance of a binomial random variable depend only on the parameters p

    and n.

    Definition

    If X is a binomial random variable with parameters p and n,

    ( )E x n p = =and

    2 ( ) (1 )V X n p p = =

    (3-8)

    Example 3.6.4

    Batches that consist of 50 coil springs from a production process are checked for

    conformance to customer requirements. The mean number of nonconforming coil springs in

    a batch is 5. Assume that the number of nonconforming springs in a batch, denoted as X, is

    a binomial random variable.

    (a) What are n and p?

    (b) What is ( 2)P X ?

    (c) What is ( 49)P X ?

    (d) What is ( 48)P X ?

    Solution

    .

    .

    .

  • 16

    Example 3.6.5

    Determine the cumulative distribution function of a binomial random variable with n = 5

    and p = 0.3. Find mean and variance of a binomial random variable.

    Solution

    Example 3.6.6

    A statistical process control chart example. Samples of 20 parts from a metal punching

    process are selected every hour. Typically, 1% of the parts require rework. Let X denotes the

    number of parts in the sample of 20 that require rework. A process problem is suspected if X

    exceeds its mean by more than three standard deviations.

    (a) If the percentage of parts that require rework remains at 1%, what is the probability that

    X exceeds its mean by more than three standard deviations?

    (b) If the rework percentage increases to 4%, what is the probability that X exceeds 1?

    (c) If the rework percentage increases to 4%, what is the probability that X exceeds 1 in at

    least one of the next five hours of samples?

    Solution

  • 17

    3.7 Geometric Distributions

    3-7.1 Geometric Distribution

    Assume a series of Bernoulli trials (independent trials with constant probability p of a

    success on each trial). However, instead of a fixed number of trials, trials are conducted

    until a success is obtained.

    Let the random variable X denote the number of trials until the first success.

    A geometric random variable has been defined as the number of trials until the first

    success.

    Definition

    In a series of Bernoulli trials (independent trials with constant probability p of a success), let

    the random variable X denote the number of trials until the first success.

    Then X is a geometric random variable with parameter 0 < p < 1 and

    f ( x) = (1 p) x 1 p x = 1, 2, 3. . .

    (3-9)

    Below is an examples of the probability mass functions for geometric random variables. The

    probabilities decrease in a geometric progression

    If X is a geometric random variable with parameter p, then

    2

    2

    1 (1 ) ( ) ( )

    pE X and V X

    p p

    = = = =

    Geometric

    distributions for

    selected values of

    the parameter p.

  • 18

    Example 3.7.1

    The probability that a bit transmitted through a digital transmission channel is received in

    error is 0.1. Assume the transmissions are independent events, and let the random variable

    X denote the number of bits transmitted until the first error. Find P(X = 5), mean and

    variance.

    Solution

    For P(X = 5) means the probability that the first four bits are transmitted correctly and the

    fifth bit is in error. Means:

    Correct

    (0.9)

    Correct

    (0.9)

    Correct

    (0.9)

    Correct

    (0.9)

    Error

    (0.1)

    Because the trials are independent and the probability of a correct transmission is 0.9,

    P(X=5) = (0.9)4 (0.1)1 = 0.066

    The mean number of bits until the next error is

    1 1 ( ) = =10

    0.1E X

    p = = (the same result as the mean number of bits until the first

    error)

    2

    2

    (1 0.1) ( ) 90

    0.1 and V X

    = = =

    Example 3.7.2

    The probability of a successful optical alignment in the assembly of an optical data storage

    product is 0.8. Assume the trials are independent.

    (a) What is the probability that the first successful alignment requires exactly four trials?

    (b) What is the probability that the first successful alignment requires at most four trials?

    (c) What is the probability that the first successful alignment requires at least four trials?

    Solution

    Let X denote the number of trials to obtain the first successful alignment. Then X is a

    geometric random variable with p = 0.8

    (a) P (X = 4) =

    (b) P (X 4) = P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4)

    =

  • 19

    (c) P (X 4) =

    Example 3.7.3

    An old gas water heater has a pilot light which much be lit manually, using a match. The

    probability of successfully lighting the pilot light on any given attempt is 82%.

    a) Compute the probability that it takes more than four tries to light the pilot light.

    b) Compute the probability that the pilot light is lit on the 5th try.

    c) Compute ( )4 8 .P X

    Solution

  • 20

    Example 3.7.4

    The probability of a successful optical alignment in the assembly of an optical data storage

    product is 0.8. Assume the trials are independent.

    (a) What is the probability that the first successful alignment requires exactly four trials?

    (b) What is the probability that the first successful alignment requires at most four trials?

    (c) What is the probability that the first successful alignment requires at least four trials?

    Solution

    Let X denote the number of trials to obtain the first successful alignment. Then X is a

    geometric random variable with p = 0.8

    (a) P (X = 4) = (1 0.8)3 (0.8) = (0.2)3 (0.8) = 0.0064

    (b) P (X 4) = P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4)

    = (1 0.8)0 (0.8) + (1 0.8)1 (0.8) + (1 0.8)2 (0.8)

    + (1 0.8)3 (0.8)

    = (0.8) + (0.2)1 (0.8) + (0.2)2 (0.8) + (0.2)3 (0.8)

    = 0.9984

    (c) P (X 4) = 1 P (X 3) = 1 [P (X = 1) + P (X = 2) + P (X = 3)] = 1 [(1 0.8)0(0.8)+(1 0.8)1 (0.8) + (1 0.8)2(0.8)] = 1 [(0.8) + (0.2)1 (0.8) + (0.2)2 (0.8)] = 1 0.992 = 0.008

  • 21

    3.8 Hyper Geometric Distribution

    A set of N objects contains K objects classified as successes and N K objects classified as failures. A sample of size n objects is selected randomly (without replacement) from the

    N objects, where K N and n N .

    Let the random variable X denotes the number of successes in the sample. Then X is a

    hypergeometric random variable and

    = = =

    (3-13)

    max{0, } min{ , }where x n K N to K n= +

    The expression min{K , n} is used in the definition of the range of X because the maximum

    number of successes that can occur in the sample is the smaller of the sample size, n, and

    the number of successes available, K. Also, if n + K > N at n + K N least successes must occur in the sample.

    If X is a hypergeometric random variable with parameters N , K and n, then

    = E(X) = np and 2 = V(X) = np(1 p)

    where p = and

    is finite population correction factor.

    Example 3.8.1

    Spp X h hpg wh N = 100, n = 4, K =20.

    h fwg:

    () (X = 1) () (X = 6)

    () (X = 4) () h v f X.

    Solution

    Using: = = =

  • 22

    Example 3.8.2

    p f f g pp wh

    hp. A 140 , 20 wh p

    f f g.

    () If 20 fv, wh h p h 1 fv

    h p?

    () If 5 fv, wh h p h 1 fv

    pp h p?

    Solution

  • 23

    K objects can be categorised into more than one set. Example:

    K1

    K3

    K2

    K4

    N

    Example 3.8.3

    A foreign student club lists as its members 2 Canadians, 3 Japanese, 5 Italians, and 2 Germans. If a committee of 4 is selected at random, find the probability that (a) all nationalities are represented; (b) all nationalities except the Italians are represented.

    Solution:

    3.9 Poisson Distribution Definition

    Given an interval of real numbers, assume counts occur at random throughout the interval. If

    the interval can be partitioned into subintervals of small enough length such that

    (1) the probability of more than one count in a subinterval is zero,

    (2) the probability of one count in a subinterval is the same for all subintervals and

    proportional to the length of the subinterval, and

    (3) the count in each subinterval is independent of other subintervals, the random

    experiment is called a Poisson process.

  • 24

    The mean and variance of a Poisson random variable are equal. For example, if particle

    counts follow a Poisson distribution with a mean of 25 particles per square centimeter, the

    variance is also 25 and the standard deviation of the counts is 5 per square centimeter.

    Consequently, information on the variability is very easily obtained. Conversely, if the

    variance of count data is much greater than the mean of the same data, the Poisson

    distribution is not a good model for the distribution of the random variable.

  • 25

    Example 3.9.1

    Suppose X has a Poisson distribution with a mean of 4. Determine the following probabilities:

    Solution

    Example 3.9.2

    The number of failures of a testing instrument from contamination particles on the product

    is a Poisson random variable with a mean of 0.02 failure per hour.

    (a) What is the probability that the instrument does not fail in an 8-hour shift?

    (b) What is the probability of at least one failure in a 24-hour day?

    Solution

  • 26

    Example 3.9.3

    Suppose that the number of customers that enter a bank in an hour is a Poisson

    random variable, and suppose that P(X=0) = 0.05. Determine the mean and variance of X.

    Solution

    P(X=0) = 0.05 P(X=0) =e = 0.05

    = ln (0.05) = 2.996

    E(X) = = 2.996

    V(X) = = 2.996

    Suggestion Exercises Chapter 3:

    Topic Exercises

    3.1 3-3, 3-4, 3-8, 3-9

    3.2 3-15, 3-16, 3-17, 3-24, 3-25

    3.3 3-28, 3-29, 3-31, 3-32, 3-33, 3-36

    3.4 3-38, 3-39, 3-40, 3-41, 3-45

    3.6 3-58, 3-59, 3-62, 3-63, 3-68, 3-70,

    3.7 3-73, 3-75, 3-77, 3-81, 3-82

    3.8 3-86, 3-90, 3-92, 3-94

    3.9 3-100, 3-101, 3-104, 3-105

    S.E 3-108, 3-111, 3-113, 3-119, 3-123