2006 geog090 week05 lecture01 discrete distribution

Upload: jamoris

Post on 05-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    1/42

    Office

    310

    320. . .

    . . .

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    2/42

    Probability-Related Concepts

    How to Assign Probabilities to Experimental

    Outcomes

    Probability Rules

    Discrete Random Variables

    Continuous Random Variables

    Probability Distribution & Functions

    Topics Covered

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    3/42

    Concepts

    An event Any phenomenon you can observe thatcan have more than one outcome (e.g., flipping a

    coin)

    An outcome Any unique condition that can be

    the result of an event (e.g., flipping a coin: heads or

    tails), a.k.a simple event or sample points

    Sample space The set of all possible outcomes

    associated with an event

    Probability is a measure of the likelihood of each

    possible outcome

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    4/42

    Probability Distributions

    The usual application of probability distributions

    is to find a theoretical distribution

    Reflects a process that explains what we see in

    some observed sample of a geographic

    phenomenon

    Compare the form of the sampled information and

    theoretical distribution through a test of

    significance

    Geography: discrete random events in space and

    time (e.g. how often will a tornado occur?)

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    5/42

    Discrete Probability Distributions

    Discrete probability distributions

    The Uniform Distribution

    The Binomial Distribution

    The Poisson Distribution

    Each is appropriately applied in certainsituations and to particular phenomena

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    6/42

    The Uniform Distribution

    Source: http://davidmlane.com/hyperstat/A12237.html

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    7/42

    The Uniform Distribution

    Describes the situation where the probability of all

    outcomes is the same

    noutcomesP(xi) = 1/n

    e.g. flipping a coin:

    P(xheads) = 1/2 = P(xtails)0

    0.25

    0.50

    P(xi)

    heads

    xi

    tails

    A uniform probability

    mass function

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    8/42

    0

    /1)1/(1

    )(

    nab

    xf

    1

    )1/()1(

    0

    )()( abaxxXPxF

    a

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    9/42

    The Uniform Distribution

    A little simplistic and perhaps useless

    But actually well applied in two situations

    1. The probability of each outcome is trulyequal (e.g. the coin toss)

    2.No prior knowledge of how a variable is

    distributed (i.e. complete uncertainty), the firstdistribution we should use is uniform (no

    assumptions about the distribution)

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    10/42

    The Uniform Distribution

    However, truly uniformly distributed geographic

    phenomena are somewhat rare

    We often encounter the situation of not knowing

    how something is distributed until we sample it

    When we are resisting making assumptions

    we usually apply the uniform distribution as asort of null hypothesis of distribution

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    11/42

    The Uniform Distribution

    Example Predict the direction of the prevailing

    wind with no prior knowledge of the weathersystems tendencies in the area

    We would have to begin with the idea that

    P(xNorth) = 1/4P(xEast) = 1/4

    P(xSouth) = 1/4

    P(xWest

    ) = 1/4

    Until we had an opportunity to sample and find

    out some tendency in the wind pattern based on

    those observations

    0

    0.25

    P(xi)

    N E S W

    0.125

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    12/42(Lillesand et al. 2004)

    (Binford 2005)

    Remote Sensing

    Supervised classification

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    13/42

    The Binomial Distribution

    Provides information about the probability of therepetition of events when there are only two

    possible outcomes,

    e.g. heads or tails, left or right, success or failure, rain

    or no rain

    Events with multiple outcomes may be simplified as

    events with two outcomes (e.g., forest or non-forest)

    Characterizing the probability of a proportion of

    the events having a certain outcome over a

    specified number of events

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    14/42

    The Binomial Distribution

    A binomial distribution is produced by a set ofBernoulli trials (Jacques Bernoulli)

    The law of large numbers for independent trials

    at the heart of probability theory Given enough observed events, the observed

    probability should approach the theoretical values

    drawn from probability distributions e.g. enough coin tosses should approach the P = 0.5

    value for each outcome (heads or tails)

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    15/42

    How to test the law?

    A set of Bernoulli trials is the way to operationally

    test the law of large numbers using an event with

    two possible outcomes:

    (1)N independent trials of an experiment (i.e. an

    event like a coin toss) are performed

    (2) Every trial must have the same set of possibleoutcomes (e.g., heads and tails)

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    16/42

    Bernoulli Trials

    (3) The probability of each outcome must be the

    same for all trials, i.e. P(xi) must be the same

    each time for both xivalues

    (4) The resulting random variable is determined

    by the number of successes in the trials

    (successes one of the two outcomes)

    p = the probability of success in a trial q = (p1) as the probability of failure in a trial

    p + q = 1

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    17/42

    Bernoulli Trials

    Suppose on a series of successive days, we will

    record whether or not it rains in Chapel Hill We will denote the 2 outcomes using R when it

    rains and N when it does not rain

    n Possible Outcomes # of Rain Days P(# of Rain Days)

    1 R 1 p p

    N 0 (1 - p) q

    2 RR 2 p2 p2

    RN NR 1 2[p*(1 p)] 2pq

    NN 0 (1 p)2 q2

    3 RRR 3 p3 p3

    RRN RNR NRR 2 3[p2 *(1 p)] 3p2q

    NNR NRN RNN 1 3[p*(1 p)2] 3pq2

    NNN 0 (1 p)3 q3

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    18/42

    Bernoulli Trials

    If we have a value for P(R) = p, we can substitute it into the

    above equations to get the probability of each outcomefrom a series of successive samples (e.g. p=0.2)

    n Possible Outcomes # of Rain Days P(# of Rain Days)

    1 R 1 p = 0.2

    N 0 q = 0.8

    2 RR 2 p2 = 0.04

    RN NR 1 2pq = 0.32

    NN 0 q2 = 0.64

    3 RRR 3 p3 = 0.008

    RRN RNR NRR 2 3p2q = 0.096

    NNR NRN RNN 1 3pq2 = 0.384

    NNN 0 q3 = 0.512

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    19/42

    Bernoulli Trials

    1 event, S = (p + q)1 = p + q

    2 events, S = (p + q)2 = p2 + 2pq + q2

    3 events, S = (p + q)3

    = p3 + 3p2q + 3pq2 + q3

    4 events, S = (p + q)4= p4 + 4p3q + 6p2q2 + 4pq3 + q4

    Source: Earickson, RJ, and Harlin, JM. 1994. Geographic Measurement and Quantitative

    Analysis. USA: Macmillan College Publishing Co., p. 132.

    A graphical representation: probability

    # of successes

    The sum of the probabilities can be expressed using the

    binomial expansion of (p + q)n, where n = # of events

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    20/42

    The Binomial Distribution

    A general formula for calculating the probability

    of x successes (n trials & a probability p of

    success:

    where C(n,x) is the number of possible

    combinations of x successes and (nx) failures:

    P(x) = C(n,x) * px * (1 - p)n - x

    C(n,x) =n!

    x! * (n x)!

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    21/42

    The Binomial Distribution Example

    e.g., the probability of 2 successesin 4 trials,given p=0.2 is:

    P(x) =

    4!

    2! * (4 2)! * (0.2)2 *(1 0.2)4 - 2

    P(x) =

    24

    2 * 2 * (0.2)2

    * (0.8)2

    P(x) = 6 * (0.04)*(0.64) = 0.1536

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    22/42

    The Binomial Distribution Example

    Calculating the probabilities of all possible

    number of rain days out of four days (p = 0.2):

    The chance of having one or more days of rain

    out of four: P(1) + P(2) + P(3) + P(4) = 0.5904

    x P(x) C(n,x) px (1 p)n x

    0 P(0) 1 (0.2)0 (0.8)4 = 0.4096

    1 P(1) 4 (0.2)1 (0.8)3 = 0.4096

    2 P(2) 6 (0.2)2 (0.8)2 = 0.1536

    3 P(3) 4 (0.2)3 (0.8)1 = 0.0256

    4 P(4) 1 (0.2)4

    (0.8)0

    = 0.0016

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    23/42

    The Binomial Distribution Example

    Naturally, we can plot the probability massfunction produced by this binomial distribution:

    xiP(x

    i)

    0 0.4096

    1 0.4096

    2 0.1536

    3 0.0256

    4 0.00160

    0.25

    0.50

    P(xi)

    1 2 3 4

    xi

    0

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    24/42

    Source: http://www.itl.nist.gov/div898/handbook/eda/section3/eda366i.htm

    The following is the plot of the binomial probability density

    function for four values of pand n= 100

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    25/42

    Source: http://home.xnet.com/~fidler/triton/math/review/mat170/probty/p-dist/discrete/Binom/binom1.htm

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    26/42

    Source: http://www.mpimet.mpg.de/~vonstorch.jinsong/stat_vls/s3.pdf

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    27/42

    Rare Discrete Random Events

    Some discrete random events in question happen

    rarely (if at all), and the time and place of theseevents are independent and random (e.g.,

    tornados)

    The greatest probability is zero occurrences at acertain time or place, with a small chance of one

    occurrence, an even smaller chance of two

    occurrences, etc.

    heavily peaked andskewed:

    0

    0.25

    0.5

    P(xi)

    1 2 3 4

    xi

    0

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    28/42

    The Poisson Distribution

    In the 1830s, S.D. Poisson described a distribution

    with these characteristics

    Describing the number of events that will occur

    within a certain area or duration (e.g. # of

    meteorite impacts per state, # of tornados per year,

    # of hurricanes in NC)

    Poisson distributions characteristics:

    1. It is used to count the number of occurrences of

    an event within a given unit of time, area, volume,

    etc. (therefore a discrete distribution)

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    29/42

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    30/42

    The Poisson Distribution

    Poisson formulated his distribution as follows:

    P(x) =e-l * lx

    x!

    To calculate a Poisson distribution, you mustknow

    where e = 2.71828 (base of the natural logarithm)

    = the mean or expected value

    x = 1, 2, , n 1, n # of occurrences

    x! = x * (x 1) * (x2) * * 2 * 1

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    31/42

    The Poisson Distribution

    Poisson distribution

    P(x) =e-l * lx

    x!

    The shape of the distribution depends stronglyupon the value of , because as increases, the

    distribution becomes less skewed, eventually

    approaching a normal-shaped distribution as itgets quite large

    We can evaluate P(x) for any value of x, but large

    values of x will have very small values of P(x)

    http://www.capdm.com/demos/software/html/capdm/qm/poissondist/usage.htmlhttp://www.capdm.com/demos/software/html/capdm/qm/poissondist/usage.htmlhttp://www.capdm.com/demos/software/html/capdm/qm/poissondist/usage.htmlhttp://www.capdm.com/demos/software/html/capdm/qm/poissondist/usage.htmlhttp://www.capdm.com/demos/software/html/capdm/qm/poissondist/usage.html
  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    32/42

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    33/42

    The Poisson Distribution

    Poisson distribution

    P(x) =e-l * lx

    x!

    The shape of the distribution depends stronglyupon the value of , because as increases, the

    distribution becomes less skewed, eventually

    approaching a normal-shaped distribution as lgets quite large

    We can evaluate P(x) for any value of x, but large

    values of x will have very small values of P(x)

    http://www.capdm.com/demos/software/html/capdm/qm/poissondist/usage.htmlhttp://www.capdm.com/demos/software/html/capdm/qm/poissondist/usage.htmlhttp://www.capdm.com/demos/software/html/capdm/qm/poissondist/usage.htmlhttp://www.capdm.com/demos/software/html/capdm/qm/poissondist/usage.htmlhttp://www.capdm.com/demos/software/html/capdm/qm/poissondist/usage.html
  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    34/42

    The Poisson Distribution

    Poisson distribution

    n

    P(x) =e-l * lx

    x! The Poisson distribution can be defined as thelimiting case of the binomial distribution:

    lnp constant

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    35/42

    The Poisson Distribution

    The Poisson distribution is sometimes known asthe Law of Small Numbers, because it describes

    the behavior of events that are rare

    We can observe the frequency of some rarephenomenon, find its mean occurrence, and then

    construct a Poisson distribution and compare

    our observed values to those from the distribution(effectively expected values) to see the degree to

    which our observed phenomenon is obeying the

    Law of Small Numbers:

    M d i F ill

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    36/42

    Murder rates in Fayetteville

    Fitting a Poisson distribution to the 24-hourmurder rates in Fayetteville in a 31-day month (to

    ask the question Do murders randomly occur in

    time?)

    # of Murders Days (Frequency)

    0 17

    1 9

    2 3

    3 1

    4 1

    Total 31 days

    M d i F ill

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    37/42

    Murder rates in Fayetteville

    = mean murders per day = 22 / 31 = 0.71

    !

    *

    )( x

    e

    xP

    xl

    l

    # of Murders Days # of Murders* # of Days

    0 17 0

    1 9 9

    2 3 6

    3 1 34 1 4

    Total 31 days 22

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    38/42

    Murder rates in Fayetteville

    = mean murders per day = 22 / 31 = 0.71

    !*)(x

    exPx

    l

    l

    !

    71.0*)(

    71.0

    x

    exP

    x

    31*)(exp xPF

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    39/42

    Murder rates in Fayetteville

    x (# of Murders) Obs. Frequency (Fobs) x*Fobs Fexp

    0 17 0 15.2

    1 9 9 10.9

    2 3 6 3.7

    3 1 3 0.9

    4 1 4 0.2Total 31 22 30.9

    = mean murders per day = 22 / 31 = 0.71

    We can compare Fobs to Fexp using a X2 test to see

    if observations do match Poisson Dist.

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    40/42

    Murder rates in Fayetteville

    Th P i Di t ib ti

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    41/42

    The Poisson Distribution

    Procedure for finding Poisson probabilities and

    expected frequencies: (1) Set up a table with five columns as on the

    previous slide

    (2) Multiply the values of x by their observed

    frequencies (x * Fobs)

    (3) Sum the columns of Fobs (observedfrequency) and x * Fobs

    (4)Compute

    = (x * Fobs) / Fobs (5) Compute P(x) values using the equation or a

    table

    (6)Compute the values of Fexp = P(x) * Fobs

  • 7/31/2019 2006 Geog090 Week05 Lecture01 Discrete Distribution

    42/42