probability and statistics - web.stanford.edu · probability and statistics part 2. more...

Probability and StatisticsPart 2. More Probability, Statistics and their Application

Chang-han Rhee

Stanford University

Sep 20, 2011 / CME001

1

Outline

StatisticsEstimation ConceptsEstimation Strategies

More ProbabilityExpectation and Conditional ExpectationInterchange of LimitTransforms

SimulationMonte Carlo MethodRare Event Simulation

Further ReferenceClasses at StanfordBooks

2

Outline





3

Probability and Statistics

Probability

Statistics

Model Data

4

Estimation

Making best guess of an unknown parameter out of sample data.

eg. Average height of west african giraffe

5

Estimator

An estimator (statistic) is a rule of estimation:

θn = g(X1, . . . ,Xn)

6

Quality of an Estimator

I BiasEθ − θ

I Variancevar(θ)

I Mean Square Error (MSE)

E[θ − θ]2 = (bias)2 + (var)

7

Confidence Interval

Consider the sample mean estimator θ = 1n Sn. From the CLT,

Sn − nEX1√n

D→ σN(0, 1)

Rearranging terms, (note: this is not a rigorous argument)

1n

SnD≈ EX1 +

σ√n

N(0, 1)

8

Outline





9

Maximum Likelihood Estimation

Finding most likely explanation.

θn = arg maxθ

f (x1, x2, . . . , xn|θ) = f (x1|θ) · f (x2|θ) · f (xn|θ)

I Gold Standard: Gueranteed to beI Often computationally challenging

10

Method of Moments

Matching the sample moment and the parametric moments.If θ = (θ1, . . . , θk)∫

xjfθn(x)dx =

1n

n∑i=1

Xji for j = 1, . . . , k

or ∑xjpθn

(x) =1n

n∑i=1

Xji for j = 1, . . . , k

I Statistically, less efficient than MLEI Often computationally efficient

11

Outline





12

Properties of Expectation

I Jensen’s Inequality

g(EX) ≤ Eg(X) g(·): convex

I Markov’s inequality

P(|X| > x) ≤ E|X|x

x > 0

I Minkovsky’s inequality(E|X + Y|p

)1/p ≤(E|X|p

)1/p+

(E|Y|p

)1/p

I Hölder’s inequality

E|XY| ≤(E|X|p

)1/p(E|Y|p)1/p for 1/p + 1/q = 1

13

I If X and Y are independent,

Eg(X)h(Y) = Eg(X)Eh(Y)

14

Properties of Conditional Expectation

I Jensen’s Inequality

g(E[X|Y]) ≤ E[g(X)|Y] g(·): convex

I Markov’s inequality

P(|X| > x

∣∣Y) ≤ E[|X|

∣∣Y]x

x > 0

I Minkovsky’s inequality(E[|X + Y|p

∣∣Z])1/p ≤(E[|X|p

∣∣Z])1/p+

(E[|Y|p

∣∣Z])1/p

I Hölder’s inequality

E[|XY|

∣∣Z] ≤ (E[|X|p

∣∣Z])1/p(E[|Y|q

∣∣Z])1/q 1/p + 1/q = 1

15

Tower Property

Tower Property (Law of Iterated Expectation, Law of TotalExpectation)

E[X] = E[E[X|Y]

]i.e.,

E[X] =∑x∈S

E[X|Y = y]P(Y = y)

eg.I Y ∼ Unif (0, 1) & X ∼ Unif (Y, 1). What is EX?I Mouse Escape

16

Bayes Rule

I The law of total probability:

P(A) =∑

i

P(A|Bi)P(Bi)

I Bayes Rule

P(Ai|B) =P(B|Ai)P(Ai)∑j P(B|Aj)P(Aj)

where A1,A2, . . . ,Ak is a disjoint partition of Ω.

17

Outline





19

Monotone Convergence

Theorem (Monotone Convergence)If Xn ≥ 0 and Xn ↑ Xn+1 almost surely, then EXn → EX∞.

20

Dominated Convergenceand bounded convergence as a corollary

Theorem (Dominated Convergence)If Xn → X∞ almost surely and |Xn| ≤ Y for all n and some Y such

that EY < ∞, then XnL1→ X∞.

Corollary (Bounded Convergence)If Xn → X∞ almost surely and |Xn| ≤ K for all n and some K ∈ R,

then XnL1→ X∞.

21

and more

I Scheffe’s LemmaI Fatou’s LemmaI Uniform IntegrabilityI Fubini’s Theorem

22

Outline





23

Moment Generating Function and Characteristic Function

Moment generating function and characteristic function chracterizesthe distribution of the random variable.

I Moment Generating Function

MX(θ) = E[exp(θX)]

I Characteristic Function

ΦX(θ) = E[exp(iθX)]

24

Outline





25

Monte Carlo Method

Computational algorithms that rely on repeated random sampling tocompute their results.

Theoretical BasesI Law of Large Numbers guarantees the convergence

1n(#Xi ∈ A) → P(X1 ∈ A)

I Central Limit Theorem

1n(#Xi ∈ A)− P(X1 ∈ A)

D≈ σ√n

N(0, 1)

26

Outline





27

Challenges of Rare Event

Probability that a coin lands on its edge.

How many flips do we need to see at least one occurrence?

28

Importance Sampling (Change of Measure)

We can express the expectation of a random variable as an expectationof another random variable.

eg.Two continuous random variable X and Y have density fX and fY suchthat fY(s) = 0 implies fX(s) = 0. Then,

Eg(X) =∫

g(s)fX(s)ds =∫

g(s)fX(s)fY(s)

fX(s)ds = Eg(Y)L(Y)

where L(X) = fX(X)fY(X)

.

L(X) is called a likelihood ratio.

29

Outline





30

Probability

I Basic ProbabilitySTATS 116

I Stochastic ProcessesSTATS 215, 217, 218, 219

I Theory of ProbabilitySTATS 310ABC

31

Statistics

I Intro to StatisticsSTATS 200

I Theory of StatisticsSTATS 300ABC

32

Application

I Applied StatisticsSTATS 191, 203, 208, 305, 315AB

I Stochastic SystemsMS&E 121, 321

I Stochastic ControlMS&E 322

I Stochastic SimulationMS&E 223, 323, STATS 362

I Little bit of EverythingCME 308

I Econometrics, Finance, Bio and morehttp://explorecourses.stanford.edu

33

http://explorecourses.stanford.edu

Outline





34

Books

I Sheldon Ross (2009). Introduction to Probability Models.Academic Press; 10th edition

I John A. Rice (2006). Mathematical Statistics and Data Analysis.Duxbury Press; 3rd edition

I Larry Wasserman (2004). All of Statistics : a concise course instatistical inference. Springer, New York

35

probability and statistics - web.stanford.edu · probability and statistics part 2. more...

Documents