stochastic methods a review. some terms random experiment: an experiment for which the outcome...

44
Stochastic Methods A Review

Upload: kimberly-small

Post on 12-Jan-2016

220 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Stochastic MethodsA Review

Page 2: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Some Terms Random Experiment: An experiment for which the outcome

cannot be predicted with certainty Each experiment ends in an outcome The collection of all outcomes is called the sample space, S An event is a subset of the sample space Given a random experiment with a sample space, S, a

function X that assigns to each element s in S a real number, X(s) = x, is called a random variable.

A boolean random variable is a function from an event to the set {false, true} (or {0.0,1.0}).

Page 3: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Bernoulli/Binomial Experiments

A bernoulli experiment is a random experiment the outcome of which can be classified in one of two mutually exclusive and exhaustive ways ({failure,success}, {false,true}, {0,1}), etc.

A binomial experiment is a bernoulli experiment that: Is performed n times The trials are independent The probability of success on each trial is a contant, p. The probability of failure on each trial is a constant 1 – p A random variable counts the number of successes in n

trials

Page 4: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Example A A fair die is cast six times

– Success: a six is rolled– Failure: all other outcomes

A possible observed sequence is (0,0,1,0): a six has been rolled on the third trial. Call this sequence A.

Since every trial in the sequence is independent, p(A) = 5/6 * 5/6 * 1/6 * 5/6 = (1/6)(5/6)3

Page 5: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Example A’

Now suppose we want to know the probability of 1 six in any four roll sequence:(0001),(0010),(0100),(1000)= 4 * p(A) since there are four ways of

selecting 1 position for the 1 success

Page 6: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

In General

The number of ways of selecting y positions for y successes in n trials is: nCy = n! /((n – y)! * y!)

The probability of each of these ways is the probability of success * the probability of failure

– py * (1-p)n-y

So, if Y is the event of y successes in n trials,

p(Y) = nCy * py * (1-p)n-y

Page 7: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

This is Exactly the Example

p(Y) = nCy * py * (1-p)n-y

A fair die is cast six times– Success: a six is rolled– Failure: all other outcomes

n=4 y = 1

4C1 = 4!/(4-1)! * 1! = 4 py = (1/6)1

(1-p)4-1 = (5/6)3

Page 8: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

What is the probability of obtaining 5 heads in 7 flips of a fair coin

The probability of the event X, p(X), is the sum of the probabilities of each individual events (nCx)px(1-p)n-x

The Event X is 5 successes out of seven tries n = 7, x = 5 p(of a single success) = ½ p(of a single failure) = ½ P(X) = (7C5)(1/2)5(1/2)2 = .164 The tries can be represented like this: {0011111}, {0101111} … There are 21 of the, each with a probability of :(1/2)5(1/2)2

Page 9: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Expectation

If the reward for the occurrence of an event E, with probability p(E), is r, and the cost of the event not occurring, 1-p(E), is c, then the expectation for an event occurring, ex(E), is

ex(E) = r x p(E) + c (1-p(E))

Page 10: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Expectation Example

A fair roulette wheel has integers, 0 to 36. Each player places 5 dollars on any slot. If the wheel stops on the spot, the player

wins $35, else she loses $1 So,

p(winning)= 1/37P(losing) = 36/37ex(E) = 35(1/37) + (-5)(36/37) ~ $-3.92

Page 11: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Bayes Theorem For Two Events

Recall that we defined conditional probability like this:

We can also express s in terms of d:

Multiplying (2) by p(d) we get:

Substituting (3) into (1) gives Bayes’ theorem for two events

)(/)()|( spsdpsdp

)(/)()|( dpsdpdsp

1

2

)()|()( dpdspsdp 3

)(/)()|()|( spdpdspsdp

Page 12: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

If d is a disease and s is a symptom, the theorem tells us that the probability of the disease given the symptom is the probability of the symptom given the disease times the probability of the disease divided by the probability of the symptom

Page 13: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

The Chain Rule

) = Since set intersection is commutative) =

then = Can be generalize for any N sets and proved by induction

Page 14: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

ExampleThree cards are to be dealt one after another at random and without replacement from a fair deck. What is the probability of receiving a spade, a heart, a diamond in that orderA1 = event of being dealt a spadeA2 = event of being dealt a heartA3 = event of being dealt a diamond

Total Probability = 13/52*13/51*13/50

Page 15: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

An Application

Def: A probabilistic finite state machine is a finite state machine where each arc is associated with a probability, indicating how likely that path is to be taken. The sum of the probabilities of all arcs leaving a node must sum to 1.0.

A PFSM is an acceptor when one or more states are indicated as the start states and one or more states is indicated as the accept state.

Page 16: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Phones/Phonemes Def: A phone is a speech sound Def: A phoneme is a collection of related phones

(allophones) that are pronounced differently in different contexts

So [t] is phoneme. The [t] sound in tunafish differs from the [t] sound

in starfish. The first [t] is aspirated, meaning the vocal chords briefly don’t vibrate, producing a sound like a puff a air. A [t] followed by an [s] is unaspirated

FSA showing the probabilities of allophones in the word “tomato”

Page 17: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

More Phonemes

This happens with a [k] and [g]—both are unaspirated, leading to the mishearing of the Jimi Hendrix song:– ‘Scuse me, while I kiss the sky– ‘Scuse me, while I kiss this guy

Page 18: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

PFSA for the pronunciation of tomatoe

Page 19: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Phoneme Recognition Problem

Computational Linguists have collections of spoken and written language called corpora.

The Brown Corpus and the Switchboard Corpus are two examples. Together, they contain 2.5 million written and spoken words that we can use as a base

Page 20: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Now Suppose

Our machine identified the phone I Next the machine has identified the

phone ni (as in “knee”) Turns out that an investigation of the

Switchboard corpus shows 7 words that can be pronounced ni after I– the,neat, need, new, knee, to, you

Page 21: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

How can this be?

Phoneme [t] is often deleted at the end of the word: say “neat little” quickly

[the] can be pronounced like [ni] after in or. Talk like Jersey gangster here or Bob Marley

Page 22: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Strategy Compile the probabilities of each of the

candidate words from the corpora Applies Baye’s theorem for two events:

)(/)()|()|( spdpdspsdp

Page 23: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

WordFrequency Probabilityknee 61 .000024the 114834 .046neat 338 .00013need 1417 .00056new 2625 .001

Page 24: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Apply Simplified Bayes

])ni([/)word()word|]ni([])ni[|(word pppp

Since all of the candidtates will be divided by p[ni], we can drop it off giving:p(word|[ni]) p([ni]|word)p(word))

But where does p([ni]|word) come from?

Rules of pronunciation variation in English are well-known. • Run them through the corpora and generate probabilities for

each. • So, for example, that word initial [th] becomes [n] if the

preceding word ended in [n] is .15• This can be done for other pronunciation rules

Page 25: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

ResultWord p([ni]|word) p(word)p([ni]|word)p(word)New .36 .001 .00036Neat .52 .00013 .000068Need .11 .00056 .000062Knee 1.0 .000024 .000024The 0.0 .046 0.0

The has a probability of 0.0 since the previous phone was [the] not [n]

Notice that new seems to be the most likely candidate. This might be resolved at the syntactic level

Another possibility is to look at the probability of two word combinations in the corpora:

“I new” is less probable than “I need”This is referred to as N-Gram analysis

Page 26: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

General Bayes Theorem

Recall Bayes Theorem for two events:P(A|B) = p(B|A)p(A)/p(B)We would like to generalize this to

multiple events

Page 27: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Example

Suppose:Bowl A contains 2 red and 4 white chipsBowl B contains 1 red and 2 white chipsBowl C contains 5 red and 4 white chipsWe want to select the bowls and compute the p of

drawing a red chipSuppose further P(A) = 1/3P(B) = 1/6P(C) = ½Where A,B,C are the events that A,B,C are chosen

Page 28: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

P(R) is dependent upon two probabilities: p(which bowl) then the p(drawing a red chip)

So, p(R) is the union of the probability of mutually exclusive events:

9/4

)9/5)(2/1()3/1)(6/1()6/2)(3/1(

)|()()|()()|()(

)()()()(

CRpCpBRpBpARpAp

RCpRBpRApRp

Page 29: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Now suppose that the outcome of the experiment is a red chip, but we don’t know which bowl it was drawn from.

So we can compute the conditional probability for each of the bowls.

From the definition of conditional probability and the result above, we know:

8/2

))9/5)(2/1()3/1)(6/1()6/2)(3/1/(()6/2)(3/1(

))|()()|()()|()(/()|()(

)(/)()|(

cRpCpbRpBpARpApARpAp

RpRApRAp

Page 30: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

We can do the same thing for the other bowls:p(B|R) = 1/8P(C|R) = 5/8This accords with intuition. The probability that the

red bowl was chosen increases over the original probability, because since it has more red chips, it is the more likely candidate.

The original probabilities are called prior probabilities

The conditional probabilities (e.g., p(A|R)) are called the posterior probabilities.

Page 31: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

To GeneralizeLet events B1,B2,…,Bm constitute a partition

of the sample space S.That is: ji ,B and ... ji21 BBBBS m

Suppose R is an event with B1 …Bm its prior probabilities, all of which > 0,

then R is the union m mutually exclusive events, namely,

(1) )|()((

R) p(B p(R)

So

)(...)()(

1

m

1i

i

21

m

i

ii

m

BRpBp

RBRBRBR

Page 32: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Now,If p(A) > 0, we have from the definition of

conditional probability that

m1,2,..., k ,)B|)(p(Rp(B)/(B|)P(Rp(B R)|p(B

theoremBayes' have we(2),in p(R) replacing and (1)equation Using

(2) m1,2,..., k ),(/)()|(

m

1i

iikkk

RpRBpRBp kk

P(Bk|R) is the posterior probability

Page 33: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Example Machines A,B,C produce bolts of the same size.Each machine produces as follows: Machine A = 35%, with 2% defective Machine B =25%,

with 1% defective Machine C =40%

with 3% defective

Suppose we select one bolt at the end of the day. The

probability that it is defective is:

000,10/215)100/3)(100/40()100/1)(100/25()100/2)(100/35(

)|()()|()()|()(

)()()()(

DCpCpBDpBpADpAp

DCpDBpDApDp

Page 34: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Now suppose the selected bolt is defective. The probability that it was produced by machine 3 is:

215/120

)000,10/215/()100/3)(100/40(

)(/)|()(

))|()()|()()|()(/()|()()|(

DpCDpCp

CDpCpBDpBpADpApCDpCpDCp

Notice how the posterior probability increased, once we concentrated on C since C produces both more bolts and a more defective bolts.

Page 35: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Evidence and HypothesesWe can think of these various events as

evidence (E) and hypotheses (H).

m

1i

ii

kkk

))p(HH|p(E

))p(HH|p(E E)|p(H

Where

p(Hk|E) is the probability that hypothesis i is true given the evidence, E

p(Hk) is the probability that hypothesis I is true overall

p(E|Hk) is the probability of observing evidence, E, when Hi is true

m is the number of hypotheses

Page 36: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Why Bayes Works

The probability of evidence given hypotheses is often easier to determine than the probability of hypotheses given the evidence.

Suppose the evidence is a headache.The hypothesis is meningitis.It is easier to determine the number of patients who

have headaches given that they have meningitis than it is to determine the number of patients who have meningitis, given that that they have headaches.Because the population of headache sufferers is

Page 37: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

But There Are Issues

When we thought about bowls (hypotheses) and chips (evidence), the probability of a kind of bowl given a red chip required that we compute 3 posterior probabilities for each of three bowls. If we also worked it out for white chips, we would have to compute 3X2 = 6 posterior probabilities.

Now suppose our hypotheses are drawn from the set of m diseases and our evidence from the set of n symptoms, we have to compute mXn posterior probabilities.

Page 38: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

But There’s More

Bayes assumes that the hypothesis partitions the set of evidence into disjoint sets.

This is fine with bolts and machines or red chips and bowls, but much less fine with natural phenomena. Pneumonia and strep probably doesn’t partition the set of fever sufferers (since they could overlap)

Page 39: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

That is

We have to use a form of Bayes theorem that that considers any single hypothesis, hi, in the context of the union of multiple symptoms ei

)...(/()|...()()...|( 212121 ninimi eeepheeephpeeehp

If n is the number of symptoms and m the number of diseases, this works out to be mxn2 + n2+ m pieces of information to collect. In a expert system that is to classify 200 diseases using 2000 symptoms, this is 800,000,000 pieces of information to collect.

Page 40: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Naïve Bayes to the Rescue Naive Bayes classification assumes that

variables are independent. The probability that a fruit is an apple,

given that it is red, round, and firm, can be calculated from the independent probabilities that the observed fruit is red, that it is round, and that it is firm.

The probability that a person has strep, given that he has a fever, and a sore throat, can be calculated from the independent probabilities that a person has a fever and has a sore throat.

Page 41: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

In effect, we want to calculate this:

)...|( 21 neeehp

Since the intersection of sets is a set, Bayes lets us write:

)...(/)|)...(()()...|( 212121 nnn eeepheeephpeeehp

Since we only want to classify and the denominator is constant, we can ignore it giving:

)|)...(()()...|( 2121 heeephpKeeehp nn

Page 42: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Independent Events to the Rescue

Assume that all pieces of evidence are independent given a particular hypothesis.

Recall the chain rule:

)(...|()...|()|()|()(

N)...Bp(A

1

NDCBANpCBADpBACpABpAp

Since p(B|A) = p(B) and p(C)|A B) = p(C), that is, the events are mutually exclusive, then

)()...()()(N))...B(p(A NpCpBpAp

Page 43: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

)|)...(()()...|( 2121 heeephpKeeehp nn

Becomes (with a little hand-waving)

P(hi|E) p(e1|h)Xp(e2|hiX…Xp(en|hi)

Page 44: Stochastic Methods A Review. Some Terms  Random Experiment: An experiment for which the outcome cannot be predicted with certainty  Each experiment

Leading to the naïve Bayes Classifier

P(E|Hj)

n

i

ji HEp1

)|(