chapter 3 · e g x gxpx gx px e gx gx f xdx ... if x is the length of a side of a square (chosen at...

RS - Chapter 3 - Moments

1

Chapter 3 Moments of a Distribution

We develop the expectation operator in terms of the Lebesgue integral.

• Recall that the Lebesgue measure λ(A) for some set A gives the length/area/volume of the set A. If A = (3; 7), then λ(A) =|3 -7|= 4.

• The Lebesgue integral of f on [a,b] is defined in terms of Σi yi λ(Ai),

where 0 = y1 ≤ y2 ≤ ... ≤ yn, Ai = x : yi ≤ f (x) < yi+1, and λ(Ai) is the Lebesgue measure of the set Ai.

• The value of the Lebesgue integral is the limit as the yi's are pushed closer together. That is, we break the y-axis into a grid using yn and break the x-axis into the corresponding grid An where

Ai = x : f (x) є [yi; yi+1).

Expectation


2

Taking expectations: Riemann vs Lebesgue

• Riemann’s approachPartition the base. Measure the height of the function at the center of each interval. Calculate the area of each interval. Add all intervals.• Lebesgue approachDivide the range of the function. Measure the length of each horizontal interval. Calculate the area of each interval. Add all intervals.

Taking expectations

• A Borel function (RV) f is integrable if and only if |f| is integrable.

• For convenience, we define the integral of a measurable function ffrom (Ω, Σ , μ) to ( ¯R, ¯ B), where ¯R = R U −∞, ∞, ¯ B = σ(B U∞, −∞).

Example: If Ω = R and μ is the Lebesgue measure, then the Lebesgue integral of f over an interval [a, b] is written as ∫[a,b] f(x) dx = ∫ab f(x) dx, which agrees with the Riemann integral when the latter is well defined.

However, there are functions for which the Lebesgue integrals are defined but not the Riemann integrals.

• If μ=P, in statistics, ∫ X dP = EX = E[X] is called the expectation or expected value of X.


3

Consider our probability space (Ω, Σ, P) . Take an event (a set A of ω єΩ) and X, a RV, that assigns real numbers to each ω є A.

• If we take an observation from A without knowing which ω є A will be drawn, we may want to know what value of X(ω) we should expect to see.

• Each of the ω є A has been assigned a probability measure P[ω], which induces P[x]. Then, we use this to weight the values X(ω).

• P is a probability measure: The weights sum to 1. The weighted sum provides us with a weighted average of X (ω). If P gives the "correct" likelihood of ω being chosen, the weighted average of X(ω) --E[X]–tells us what values of X(ω) are expected.

Expected Value

• Now with the concept of the Lebesgue integral, we take the possible values xi and construct a grid on the y-axis, which gives a corresponding grid on the x-axis in A, where

Ai = ω є A : X(ω) є [xi; xi+1).

Let the elements in the x-axis grid be Ai. The weighted average is

)(][][111

iX

n

iiiX

n

iii

n

ii xfxxXPxAPx

• As we shrink the grid towards 0, A, becomes infinitesimal. Let dω be the infinitesimal set A. The Lebesgue integral becomes:

dxxfxxXPxdPxAPx iXiXi

n

ii

n)(][][][lim

1

Expected Value


4

DefinitionLet X denote a discrete RV with probability function p(x) (probability density function f(x) if X is continuous) then the expected value of X, E(X) is defined to be:

i ix i

E X xp x x p x

E X xf x d x

and if X is continuous with probability density function f(x)

The Expectation of X: E(X)

The expectation operator defines the mean (or population average) of a random variable or expression.

Sometimes we use E[.] as EX[.] to indicate that the expectation is being taken over f X(x) dx.

0

0.1

0.2

0.3

0.4

0 1 2 3 4 5 6 7

Interpretation of E(X)

1. The expected value of X, E(X), is the center of gravity of the probability distribution of X.

2. The expected value of X, E(X), is the long-run average value of X. (To be discussed later: Law of Large Numbers)

E(X)


5

Example: The Binomal distribution Let X be a discrete random variable having the Binomial distribution--i.e., X = the number of successes in n independent repetitions of a Bernoulli trial. Find the expected value of X, E(X).

1 0,1, 2,3, ,n xxn

p x p p x nx

0 0

1n n

n xx

x x

nE X xp x x p p

x

1

1

1

!1

! !

nn xx

x

nn xx

x

nx p px

nx p px n x

Example: Solution

0 0

1n n

n xx

x x

nE X xp x x p p

x

1

1

1

!1

! !

nn xx

x

nn xx

x

nx p px

nx p px n x

1

!1

1 ! !

nn xx

x

np p

x n x

1 21 2! !1 1

0! 1 ! 1! 2 !

n nn np p p p

n n

1! !

12 !1! 1 !0!

n nn np p p

n n


6

1 20 11 ! 1 !1 1

0 ! 1 ! 1! 2 !

n nn nnp p p p p

n n

2 11 ! 1 !

12 !1! 1 !0 !

n nn np p p

n n

1 20 11 11 1

0 1n nn n

np p p p p

2 11 11

2 1n nn np p p

n n

1 11 1

n nn p p p np np

Example: Solution

Example: Exponential Distribution

Let X have an exponential distribution with parameter . The probability density function of X is:

0

0 0

xe xf x

x

The expected value of X is:

0

xE X xf x dx x e dx

We will determine

udv uv vdu

xx e dx


7

We will determine xx e dx using integration by parts.

In this case and xu x dv e dx

Hence and xdu dx v e

Thus x x xx e dx xe e dx 1

x xxe e

0

00

1 x x xE X x e dx xe e

1 10 0 0

Summary: If X has an exponential distribution with parameter then:

1E X

Example: Exponential Distribution

Example: The Uniform distribution

Suppose X has a uniform distribution from a to b.

Then:

1

0 ,b a a x b

f xx a x b


1b

b aa

E X xf x dx x dx

2 2 2

1

2 2 2

b

b a

a

x b a a b

b a


8

Example: The Normal distribution

Suppose X has a Normal distribution with parameters and .

Then:

2

221

2

x

f x e


2

221

2

x

E X xf x dx x e dx

xz

Make the substitution:

1 and dz dx x z

Hence

Now

2

21

2

z

E X z e dz

2 2

2 21

2 2

z z

e dz ze dz

2 2

2 21

1 and 02

z z

e dz ze dz

T hus E X


9

Example: The Gamma distribution

Suppose X has a Gamma distribution with parameters and .

Then:

1 0

0 0

xx e xf x

x

Note:

This is a very useful formula when working with the Gamma distribution.

1

0

1 if 0,xf x dx x e dx


1

0

xE X xf x dx x x e dx

This is now equal to 1. 0

xx e dx

1

10

1

1xx e dx

1



10

Thus, if X has a Gamma ( ,) distribution, the expected value of X is:

E(X) = /

Special Cases: ( ,) distribution then the expected value of X is:

1. Exponential () distribution: = 1, arbitrary

1E X

2. Chi-square () distribution: = /2, = ½.

21

2

E X


0

0.05

0.1

0.15

0.2

0.25

0.3

0 2 4 6 8 10

E X



11

0

0.05

0.1

0.15

0.2

0.25

0 5 10 15 20 25

The Exponential distribution

1E X

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15 20 25

The Chi-square (2) distribution

E X


12

• Let X denote a discrete RV with probability function p(x) (or pdf f(x) if X is continuous) then the expected value of g(X), E[g(X)], is defined to be:

i ix i

E g X g x p x g x p x

E g X g x f x dx

and if X is continuous with probability density function f(x)

Expectation of a function of a RV

• Examples: g(x) = (x-μ)2 => E[g(x)] = E[(x-μ)2] g(x) = (x-μ)k => E[g(x)] = E[(x-μ)k]

Example: Function of a Uniformly distributed RVSuppose X has a uniform distribution from 0 to b.

Then:

1 0

0 0,b x b

f xx x b

Find the expected value of A = X2 .

If X is the length of a side of a square (chosen at random form 0 to b) then A is the area of the square

2 2 2 1b

b aa

E X x f x dx x dx

3 3 3 2

1

0

0

3 3 3

b

b

x b b

b

= 1/3 the maximum area of the square


13

• A median is described as the numeric value separating the higher half of a sample, a population, or a probability distribution, from the lower half.

Definition: Median

The median of a random variable X is the unique number m that satisfies

the following inequalities:

P(X ≤ m) ≥ ½ and P(X≥ m) ≥ ½.

For a continuous distribution, we have that m solves:

2/1)()(

m

X

m

X dxxfdxxf

Median: An alternative central measure

• Calculation of medians is a popular technique in summary statistics and summarizing statistical data, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier values than is the mean.

An optimality property

A median is also a central point which minimizes the average of the absolute deviations. That is, a value of c that minimizes

E(|X – c|)

is the median of the probability distribution of the random variable X.

Median: An alternative central measure


14

Example I: Median of the Exponential Distribution

Let X have an exponential distribution with parameter . The probability density function of X is:

0

0 0

xe xf x

x

The median m solves the following integral of X:

2/1)(

m

X dxxf

2/1|

mm

x

m

x

m

x eedxedxe

That is, m = ln(2)/λ.

Example II: Median of the Pareto Distribution

Let X follow a Pareto distribution with parameters α (scale) and xs

(shape, usually notated xm). The pdf of X is:

The median m solves the following integral of X: 2/1)(

m

X dxxf

/1

1)1()1(

1

22/1|

1)1(

ssms

s

m

s

m

s

xmmxCxx

Cx

xdxxxdxx

x

Note: The Pareto distribution is used to describe the distribution of wealth.

00

0)( 1

xif

xxifx

xxf s

s


15

The moments of a random variable X are used to describe the behavior of the RV (discrete or continuous).

Definition: Kth Moment

Let X be a RV (discrete or continuous), then the kth moment of X is:

kk E X

-

if is discrete

if is continuous

k

x

k

x p x X

x f x dx X

• The first moment of X, = 1 = E(X) is the center of gravity of the distribution of X.

• The higher moments give different information regarding the shape of the distribution of X.

Moments of a Random Variable

Definition: Central Moments

Let X be a RV (discrete or continuous). Then, the kth central moment of X is defined to be:

0 k

k E X

-

if is discrete

if is continuous

k

x

k

x p x X

x f x dx X

where = 1 = E(X) = the first moment of X .

• The central moments describe how the probability distribution is distributed about the centre of gravity, .



16

The second central moment depends on the spread of the probability distribution of X about It is called the variance of X and is denoted by the symbol var(X).

= 2nd central moment. 202 E X

202 E X

is called the standard deviation of X and is denoted by the symbol .

20 22var X E X

The first central moments is given by:

01 E X

Moments of a Random Variable – 1st and 2nd

• The third central moment

contains information about the skewness of a distribution.

303 E X

• A popular measure of skewness:

0 03 3

1 3 23 02

Moments of a Random Variable - Skewness

0

0 . 0 1

0 . 0 2

0 . 0 3

0 . 0 4

0 . 0 5

0 . 0 6

0 . 0 7

0 . 0 8

0 . 0 9

0 5 1 0 1 5 2 0 2 5 3 0 3 5

03 10, 0

• Distribution according to skewness:1) Symmetric distribution


17

0

0 . 0 1

0 . 0 2

0 . 0 3

0 . 0 4

0 . 0 5

0 . 0 6

0 . 0 7

0 . 0 8

0 5 1 0 1 5 2 0 2 5 3 0 3 5

03 0, 0

2) Positively skewed distribution


3) Negatively skewed distribution

0

0 . 0 1

0 . 0 2

0 . 0 3

0 . 0 4

0 . 0 5

0 . 0 6

0 . 0 7

0 . 0 8

0 5 1 0 1 5 2 0 2 5 3 0 3 5

03 10, 0


• Skewness and Economics- Zero skew means symmetrical gains and losses. - Positive skew suggests many small losses and few rich returns. - Negative skew indicates lots of minor wins offset by rare major losses.

• In financial markets, stock returns at the firm level show positive skewness, but at stock returns at the aggregate (index) level show negative skewness.

• From horse race betting and from U.S. state lotteries there is evidence supporting the contention that gamblers are not necessarily risk-lovers but skewness-lovers: Long shots are overbet (positve skewness loved!).


18

• The fourth central moment

It contains information about the shape of a distribution. The property of shape that is measured by this moment is called kurtosis.

• The measure of (excess) kurtosis:

• Distributions:

1) Mesokurtic distribution

404 E X

0 04 4

2 24 02

3 3

Moments of a Random Variable - Kurtosis

0

0 . 0 1

0 . 0 2

0 . 0 3

0 . 0 4

0 . 0 5

0 . 0 6

0 . 0 7

0 . 0 8

0 . 0 9

0 5 1 0 1 5 2 0 2 5 3 0 3 5

040, moderate in size

0

0 2 0 4 0 6 0 8 0

040, small in size

2) Platykurtic distribution

Moments of a Random Variable - Kurtosis

0

0 2 0 4 0 6 0 8 0

040, large in size

3) Leptokurtic distribution


19

Example: The uniform distribution from 0 to 1

1 0 1

0 0, 1

xf x

x x

11 1

0 0

11

1 1

kk k

k

xx f x dx x dx

k k

Finding the moments

Finding the central moments:

1

0 12

0

1kk

k x f x dx x dx


Finding the central moments (continuation):

1

0 12

0

1kk

k x f x dx x dx

12m ak in g th e su b s titu tio n w x

1122

1 12 2

1 11 1 12 20

1 1

k kkk

k

ww dw

k k

1

1

1if even1 1

2 12 1

0 if odd

kk

k

kk

kk



20

Thus,

0 0 02 3 42 4

1 1 1 1H en ce , 0 ,

2 3 1 2 2 5 8 0

02

1v a r

1 2X

02

1v ar

1 2X

03

1 30

The standard deviation

The measure of skewness

04

1 24

1 803 3 1.2

1 12

The measure of kurtosis


Alternative measures of dispersionWhen the median is used as a central measure for a distribution, there are several choices for a measure of variability:

- The range –-the length of the smallest interval containing the data

- The interquartile range -the difference between the 3rd and 1st quartiles.

- The mean absolute deviation – (1/n)Σi |xi – central measure(X)|

- The median absolute deviation (MAD). – MAD= mi(|xi - m(X)|)

These measures are more robust (to outliers) estimators of scale than the sample variance or standard deviation.

They especially behave better with distributions without a mean or variance, such as the Cauchy distribution.


21

• Rules:

if is discrete

if is continuous

x

g x p x X

E g Xg x f x dx X

1. where is a constantE c c c

if then g X c E g X E c cf x dx

c f x dx c

Proof:

The proof for discrete random variables is similar.

Rules for Expectations

2. where , are constantsE aX b aE X b a b

if then g X aX b E aX b ax b f x dx

Proof


a xf x dx b f x dx

aE X b



22

2023. var X E X

2 22x x f x dx

Proof:


22 22 1E X E X

2 2var X E X x f x dx

2 22x f x dx xf x dx f x dx

2 22 2 12E X E X


24. var varaX b a X

Proof:

2var aX baX b E aX b

2E aX b a b

22E a X

22 2 vara E X a X

aX b E aX b aE X b a b



23

Definition: Moment Generating Function (MGF)

Let X denote a random variable. Then, the moment generating function of X, mX(t), is defined by:

-

if is discrete

if is continuous

x

g x p x X

E g Xg x f x dx X

The expectation of a function g(X) is given by:

if is discrete

if is continuous

tx

xtX

Xtx

e p x X

m t E ee f x dx X

Moment generating functions


24

MGF: Examples

1 0,1, 2, ,n xxn

p x p p x nx

The MGF of X , mX(t) is:

1. The Binomial distribution (parameters p, n)

tX txX

x

m t E e e p x

0

1n

n xtx x

x

ne p p

x

0 0

1n nx n xt x n x

x x

n ne p p a b

x x

1nn ta b e p p

0,1, 2,!

x

p x e xx


2. The Poisson distribution (parameter )

tX txX

x

m t E e e p x 0 !

xntx

x

e ex

0 0

using ! !

t

xt xe u

x x

e ue e e e

x x

1te

e

MGF: Examples


25

0

0 0

xe xf x

x


3. The Exponential distribution (parameter )

0

tX tx tx xXm t E e e f x dx e e dx

0 0

t xt x ee dx

t

undefined

tt

t

MGF: Examples

2

21

2

x

f x e


4. The Standard Normal distribution ( = 0, = 1)

tX txXm t E e e f x dx

2 22

1

2

x tx

e dx

2

21

2

xtxe e dx

MGF: Examples


26

2 2 2 22 2

2 2 21 1

2 2

x tx t x tx t

Xm t e dx e e dx

We will now use the fact that

222

11 for all 0,

2

x b

ae dx a ba

We have completed the square

22 2

2 2 21

2

x tt t

e e dx e

This is 1

MGF: Examples

1 0

0 0

xx e xf x

x


4. The Gamma distribution (parameters , )

tX txXm t E e e f x dx

1

0

tx xe x e dx

1

0

t xx e dx

MGF: Examples


27

We use the fact

1

0

1 for all 0, 0a

a bxbx e dx a b

a

1

0

t xXm t x e dx

1

0

t xtx e dx

tt

Equal to 1

MGF: Examples

21 2Xm t t

The Chi-square distribution with degrees of freedom =/2, =½):

1. mX(0) = 1

0, hence 0 1 1tX XX Xm t E e m E e E

v) G am m a D ist'n Xm tt

2

2iv) S td N orm al D ist'n t

Xm t e

iii) E xponential D ist'n Xm tt

1ii) P oisson D ist'n

te

Xm t e

i) B inom ial D ist'n 1nt

Xm t e p p

Note: The MGFs of the following distributions satisfy the property mX(0) = 1

MGF: Properties


28

2 33212. 1

2! 3! !kk

Xm t t t t tk

We use the expansion of the exponential function:

2 3

12! 3! !

ku u u ue u

k

tXXm t E e

2 32 31

2! 3! !

kkt t t

E tX X X Xk

2 3

2 312! 3! !

kkt t t

tE X E X E X E Xk

2 3

1 2 312! 3! !

k

k

t t tt

k

MGF: Properties

0

3 . 0k

kX X kk

t

dm m t

d t

Now

2 33211

2 ! 3! !kk

Xm t t t t tk

2 1321 2 3

2! 3! !kk

Xm t t t ktk

2 13

1 2 2! 1 !kkt t t

k

1and 0Xm

24

2 3 2 ! 2 !kk

Xm t t t tk

2and 0Xm continuing we find 0kX km

MGF: Properties


29

i) Binomial Dist'n 1nt

Xm t e p p

Property 3 is very useful in determining the moments of a RV X.

Examples:

11

nt tXm t n e p p pe

10 010 1

n

Xm n e p p pe np

2 11 1 1

n nt t t t tXm t np n e p p e p e e p p e

2

2

1 1 1

1 1

nt t t t

nt t t

npe e p p n e p e p p

npe e p p ne p p

MGF: Applying Property 3 - Binomial

222'' ][]1[)0( npqpnqnpnppnpnpmX

1ii) Poisson Dist'n

te

Xm t e

1 1t te e ttXm t e e e

1 1 2 121t t te t e t e tt

Xm t e e e e

1 2 12 2 1t te t e tt t

Xm t e e e e

1 2 12 3t te t e tte e e

1 3 1 2 13 23t t te t e t e t

e e e

MGF: Applying Property 3 - Poisson


30

0 1 0

1 0e

Xm e

0 01 0 1 02 22 0

e e

Xm e e

3 0 2 0 0 3 23 0 3 3t

Xm e e e

To find the moments we set t = 0.

MGF: Applying Property 3 - Poisson

iii) Exponential Dist'n Xm tt

1

X

d tdm t

dt t dt

2 21 1t t

3 32 1 2Xm t t t

4 42 3 1 2 3Xm t t t

5 54 2 3 4 1 4!Xm t t t

1

!kk

Xm t k t

MGF: Applying Property 3 - Exponential


31

2

1

10Xm

3

2 2

20 2Xm

1 !

0 !kk

k X k

km k


Thus,

We can calculate the following popular descriptive statistics:

- σ2 = μ02 = μ2 - μ2 = (2/λ2) - (1/λ)2 = (1/λ)2

- γ1= μ03 /σ3 = (2/λ3) / [(1/λ)2]3/2 = 2

- γ2= μ04/σ4 - 3 = (9/λ4) / [(1/λ)4] - 3 = 6

2 33211

2 ! 3 ! !kk

Xm t t t t tk

Note: the moments for the exponential distribution can be calculated in an alternative way. This is done by expanding mX(t) in powers of t and equating the coefficients of tk to the coefficients in:

2 31 11

11Xm t u u u

tt u

2 3

2 31

t t t

Equating the coefficients of tk we get:

1 ! or

!k

kk k

k

k



32

iv) Standard normal distribution mX(t)=exp(t2/2)

We use the expansion of eu.2 3

0

1! 2! 3! !

k ku

k

u u u ue u

k k

2 2 2

22

2

2 3

2 2 2

212! 3! !

t

kt t t

tXm t e

k

2 4 6 212 2 3

1 1 11

2 2! 2 3! 2 !k

kt t t t

k

We now equate the coefficients tk in:

2 222

112! ! 2 !

k kk kXm t t t t t

k k

MGF: Applying Property 3 - Normal

If k is odd: k = 0.

2 1

2 ! 2 !k

kk k

For even 2k:

2

2 !or

2 !k k

k

k

1 2 3 4 2

2! 4!Thus 0, 1, 0, 3

2 2 2!

MGF: Applying Property 3 - Normal


33

Let lX (t) = ln mX(t) = the log of the MGF.

T hen 0 ln 0 ln 1 0X Xl m

The log of Moment Generating Functions

1 XX X

X X

m tl t m t

m t m t

2

2 X X X

X

X

m t m t m tl t

m t

1

0 0

0X

XX

ml

m

2

2 22 12

0 0 0 0

0

X X X

X

X

m m ml

m

Thus lX (t) = ln mX(t) is very useful for calculating the mean and variance of a random variable

1. 0Xl

2 2. 0Xl

The Log of Moment Generating Functions


34

Log of MGF: Examples - Binomial

1. The Binomial distribution (parameters p, n)

1n nt t

Xm t e p p e p q

ln ln tX Xl t m t n e p q

1 tX tl t n e p

e p q

10Xl n p np

p q

2

t t t t

Xt

e p e p q e p e pl t n

e p q

220X

p p q p pl n npq

p q

2. The Poisson distribution (parameter )

1te

Xm t e

ln 1tX Xl t m t e

tXl t e

tXl t e

0Xl

2 0Xl

Log of MGF: Examples - Poisson


35

3. The Exponential distribution (parameter )

undefined

X

tm t t

t

ln ln ln if X Xl t m t t t

11Xl t t

t

2

2

11 1Xl t t

t

22

1 1Thus 0 and 0X Xl l

Log of MGF: Examples - Exponential

4. The Standard Normal distribution ( = 0, = 1)

2

2t

Xm t e

2

2ln tX Xl t m t

, 1X Xl t t l t

2Thus 0 0 and 0 1X Xl l

Log of MGF: Examples - Normal


36

5. The Gamma distribution (parameters , )

Xm tt

ln ln lnX Xl t m t t

1Xl t

t t

2

21 1Xl t tt

22

Hence 0 and 0X Xl l

Log of MGF: Examples - Gamma

6. The Chi-square distribution (degrees of freedom )

21 2Xm t t

ln ln 1 22X Xl t m t t

12

2 1 2 1 2Xl tt t

2

2

21 1 2 2

1 2Xl t t

t

2Hence 0 and 0 2X Xl l

Log of MGF: Examples – Chi-squared


37

Since eitx = cos(xt) + i sin(xt) and eitx≤ 1, then φX(t) is defined for all t. Thus, the characteristic function always exists, but the MGF need not exist.

Relation to the MGF: φX(t) = miX(t) = mX(it)

Calculation of moments:

Characteristic functions

Definition: Characteristic Function

Let X denote a random variable. Then, the characteristic function of X, φX(t) is defined by:

- )()( itxX eEt

kk

tX

k

it

t

0|)(

chapter 3 · e g x gxpx gx px e gx gx f xdx ... if x is the length of a side of a square (chosen at...

Documents