intro to probability

26
Intro to Probability Slides from Professor Pan,Yan, SYSU

Upload: vala

Post on 05-Jan-2016

49 views

Category:

Documents


4 download

DESCRIPTION

Intro to Probability. Slides from Professor Pan,Yan, SYSU. 0. 1. 2. 3. X. 4. 5. 6. 7. 8. Probability Theory. Example of a random experiment We poll 60 users who are using one of two search engines and record the following:. Each point corresponds to one of 60 users. Two search - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Intro to Probability

Intro to Probability

Slides from Professor Pan,Yan, SYSU

Page 2: Intro to Probability

Probability TheoryExample of a random experiment

– We poll 60 users who are using one of two search engines and record the following:

X0 1 2 3 4 5 6 7 8

Each point corresponds to one of 60 users

Two searchengines

Number of “good hits” returned by search

engine

Page 3: Intro to Probability

X0 1 2 3 4 5 6 7 8

Probability TheoryRandom variables

– X and Y are called random variables– Each has its own sample space:

• SX = {0,1,2,3,4,5,6,7,8}

• SY = {1,2}

Page 4: Intro to Probability

X0 1 2 3 4 5 6 7 8

Probability TheoryProbability

– P(X=i,Y=j) is the probability (relative frequency) of

observing X = i and Y = j– P(X,Y) refers to the whole table of probabilities

– Properties: 0 ≤ P ≤ 1, P = 1

3 6 8 8 5 3 1 0 060 60 60 60 60 60 60 60 60

0 0 0 1 4 5 8 6 260 60 60 60 60 60 60 60 60

P(X=i,Y=j)

Page 5: Intro to Probability

Probability TheoryMarginal probability

– P(X=i) is the marginal probability that X = i, ie, the

probability that X = i, ignoring Y

X0 1 2 3 4 5 6 7 8

P(X)

P(Y)

Page 6: Intro to Probability

Probability TheoryMarginal probability

– P(X=i) is the marginal probability that X = i, ie, the

probability that X = i, ignoring Y– From the table: P(X=i) =j P(X=i,Y=j)

Note that

i P(X=i) = 1

and

j P(Y=j) = 1

X0 1 2 3 4 5 6 7 8

3 6 8 8 5 3 1 0 060 60 60 60 60 60 60 60 60

0 0 0 1 4 5 8 6 260 60 60 60 60 60 60 60 60

3460

2660

P(Y=j)

3 6 8 9 9 8 9 6 260 60 60 60 60 60 60 60 60

P(X=i)

SUM RULE

Page 7: Intro to Probability

Probability TheoryConditional probability

– P(X=i|Y=j) is the probability that X=i, given that Y=j– From the table: P(X=i|Y=j) =P(X=i,Y=j) / P(Y=j)

X0 1 2 3 4 5 6 7 8

P(X|Y=1)

P(Y=1)

Page 8: Intro to Probability

Probability TheoryConditional probability

– How about the opposite conditional probability, P(Y=j|X=i)?

– P(Y=j|X=i) =P(X=i,Y=j) / P(X=i) Note that jP(Y=j|

X=i)=1

X0 1 2 3 4 5 6 7 8

3 6 8 8 5 3 1 0 060 60 60 60 60 60 60 60 60

0 0 0 1 4 5 8 6 260 60 60 60 60 60 60 60 60

X0 1 2 3 4 5 6 7 8

33

03

66

06

88

08

89

19

59

49

38

58

19

89

06

66

02

22

P(Y=j|X=i)P(X=i,Y=j)

3 6 8 9 9 8 9 6 260 60 60 60 60 60 60 60 60

P(X=i)

Page 9: Intro to Probability

Summary of types of probability

• Joint probability: P(X,Y)

• Marginal probability (ignore other variable): P(X) and P(Y)

• Conditional probability (condition on the other variable having a certain value): P(X|Y) and P(Y|X)

Page 10: Intro to Probability

Probability TheoryConstructing joint probability

– Suppose we know• The probability that the user will pick each search

engine, P(Y=j), and

• For each search engine, the probability of each number of good hits, P(X=i|Y=j)

– Can we construct the joint probability, P(X=i,Y=j)?

– Yes. Rearranging P(X=i|Y=j) =P(X=i,Y=j) / P(Y=j) we get P(X=i,Y=j) =P(X=i|Y=j) P(Y=j)

PRODUCT RULE

Page 11: Intro to Probability

Summary of computational rules

• SUM RULE: P(X) = Y P(X,Y)

P(Y) = X P(X,Y)

– Notation: We simplify P(X=i,Y=j) for clarity

• PRODUCT RULE: P(X,Y) = P(X|Y) P(Y)

P(X,Y) = P(Y|X) P(X)

Page 12: Intro to Probability

Ordinal variables• In our example, X has a natural order 0…8

– X is a number of hits, and– For the ordering of the columns in the table below,

nearby X’s have similar probabilities

• Y does not have a natural order

X0 1 2 3 4 5 6 7 8

Page 13: Intro to Probability

Probabilities for real numbers

• Can’t we treat real numbers as IEEE DOUBLES with 264 possible values?

• Hah, hah. No!

• How about quantizing real variables to reasonable number of values?

• Sometimes works, but…– We need to carefully account for ordinality– Doing so can lead to cumbersome mathematics

Page 14: Intro to Probability

Probability theory for real numbers• Quantize X using bins of width • Then, X {.., -2, -, 0, , 2, ..}

• Define PQ(X=x) = Probability that x X ≤ x+

• Problem: PQ(X=x) depends on the choice of • Solution: Let 0• Problem: In that case, PQ(X=x) 0

• Solution: Define a probability density

P(x) = lim0 PQ(X=x)/

= lim0 (Probability that x X ≤ x+)/

Page 15: Intro to Probability

Probability theory for real numbers

Probability density

– Suppose P(x) is a probability density

– Properties

• P(x) 0• It is NOT necessary that P(x) ≤ 1

• x P(x) dx = 1

– Probabilities of intervals:

P(aX≤b) = b

x=a P(x) dx

Page 16: Intro to Probability

Probability theory for real numbers

Joint, marginal and conditional densities

• Suppose P(x,y) is a joint probability density

– x y P(x,y) dx dy = 1

– P( (X,Y) R) = R P(x,y) dx dy

• Marginal density: P(x) = y P(x,y) dy

• Conditional density: P(x|y) = P(x,y) / P(y)

x

yR

Page 17: Intro to Probability

The Gaussian distribution

is the standard deviation

Page 18: Intro to Probability

Mean and variance

• The mean of X is E[X] = X X P(X)

or E[X] = x x P(x) dx

• The variance of X is VAR(X) = X(X-E[X])2P(X)

or VAR(X) = x (x - E[X])2P(x)dx

• The std dev of X is STD(X) = SQRT(VAR(X))

• The covariance of X and Y is

COV(X,Y) = XY (X-E[X]) (Y-E[Y]) P(X,Y)

or COV(X,Y) = x y (x-E[X]) (y-E[Y]) P(x,y) dx dy

Page 19: Intro to Probability

Mean and variance of the Gaussian

E[X] = VAR(X) = 2

STD(X) =

Page 20: Intro to Probability

How can we use probability as a framework for machine learning?

Page 21: Intro to Probability

Maximum likelihood estimation• Say we have a density P(x|) with parameter

• The likelihood of a set of independent and identically drawn (IDD) data x = (x1,…,xN) is

P(x|) = n=1N P(xn|)

• The log-likelihood is L = ln P(x|) = n=1N lnP(xn|)

• The maximum likelihood (ML) estimate of is

ML = argmax L = argmax n=1N ln P(xn|)

• Example: For Gaussian likelihood P(x|) = N (x|,2),

L =

Page 22: Intro to Probability

Comments on notation from now on• Instead of j P(X=i,Y=j), we write X P(X,Y)

• P() and p() are used interchangeably

• Discrete and continuous variables treated the same, so X, X, x and x are interchangeable

• ML and ML are interchangeable

• argmax f() is the value of that maximizes f()

• In the context of data x1,…,xN, symbols x, X, X and X refer to the entire set of data

• N (x|,2) =

• log() = ln() and exp(x) = ex

• pcontext(x) and p(x|context) are interchangable

Page 23: Intro to Probability

Maximum likelihood estimation• Say we have a density P(x|) with parameter

• The likelihood of a set of independent and identically drawn (IDD) data x = (x1,…,xN) is

P(x|) = n=1N P(xn|)

• The log-likelihood is L = ln P(x|) = n=1N lnP(xn|)

• The maximum likelihood (ML) estimate of is

ML = argmax L = argmax n=1N ln P(xn|)

• Example: For Gaussian likelihood P(x|) = N (x|,2),

L =

Page 24: Intro to Probability

Questions?

Page 25: Intro to Probability

Maximum likelihood estimation• Say we have a density P(x|) with parameter

• The likelihood of a set of independent and identically drawn (IDD) data x = (x1,…,xN) is

P(x|) = n=1N P(xn|)

• The log-likelihood is L = ln P(x|) = n=1N lnP(xn|)

• The maximum likelihood (ML) estimate of is

ML = argmax L = argmax n=1N ln P(xn|)

• Example: For Gaussian likelihood P(x|) = N (x|,2),

L =

Page 26: Intro to Probability

Maximum likelihood estimation

L =

• Example: For Gaussian likelihood P(x|) = N (x|,2),

Objective of regression: Minimize error

E(w) = ½ n ( tn - y(xn,w) )2