random variables and mean-variance portfolios · random variables and mean-variance portfolios dan...

Random Variables and Mean-Variance Portfolios

Dan Saunders

Introduction

Suppose X is a random variable. What does that mean? The simplest notion is that we donot know the value that X will take on with certainty. However, this does not imply thatwe are clueless about X. The short answer is that uncertainty, as modeled by probabilitytheory, is an exhaustive characterization of all of the possible outcomes, and that randomvariables make that uncertainty amenable to mathematical analysis.

Probability Theory

Probability theory requires three main ingredients. The first is the sample space, Ω, whichis the set of possible outcomes. Second is the event space, F , which is the set of all subsetsof Ω. Essentially, F represents the set of all possible events to which a probability can beassigned. Finally, we need the probability measure, P , which assigns those probabilities toevery event. There is a lot more that could be said about these three mathematical objects,but it will be easier to demonstrate with an example. Consider the familiar case of a faircoin toss

Ω = H,T

F = ∅, H, T, H,T

P (∅) = 0, P (H) = P (T ) =1

2, P (H,T) = 1

As can be seen from the example, these three objects Ω,F , P are a complete descriptionof the uncertain nature of the coin toss. This logical construction occurs before any mentionof random variables. So what is a random variable? Well, it’s actually quite a misnomer,because a random variable is more than just a variable.

Random Variables

A random variable is actually a function. Specifically, it’s a function which assigns a realnumber to every element in the sample space, in accordance with its respective probabilitymeasure

ΩX−→ <

1

Again, the fair coin toss can help the discussion. Consider the random variable X, whichmaps the event “heads” to the number 1 and the event “tails” to the number 0

X =

1 with probability 1/20 with probability 1/2

In fact, this type of random variable should seem familiar. Recall the definition of aBernoulli random variable, which takes the form

X =

1 with probability p0 with probability 1− p

A Bernoulli random variable is a mathematical function which can be used to represent anyuncertainty with two outcomes. In fact, the use of 0 and 1 is not that special. We could useany two real numbers to achieve the same goal.

Why do we need random variables? Simply put, we cannot mathematically analyze theuncertainty described by Ω,F , P. We must first map the set of possible outcomes tonumbers, before we can define the mean or variance. After all, what’s the average of headsand tails? On the other hand, we can say what the average is of 0 and 1. If they are equallylikely, then the average is 1 · 1/2 + 0 · 1/2 = 1/2.

Expected Value

More generally, we define the expected value of any discrete random variable X as

E(X) =∑ω∈Ω

X(ω)P (ω)

It is important to keep in mind that this is probability theory, and E(X) represents thetheoretically true mean; sometimes called the population mean and denoted by µ. This isentirely separate from statistics, where we do not know the mean or the distribution. Noticethat, for any Bernoulli random variable we have

E(X) = 1 · p+ 0 · (1− p) = p

In the case where the distribution is unknown, we collect n independent and identicallydistributed data points, in order to estimate the expected value using the sample meanestimator

xA =1

n

n∑i=1

xi

This is what we sometimes refer to as the arithmetic mean. For example, when we take thearithmetic mean over historical returns, we are estimating the mean, not using probability

to determine the expected value. Likewise for the geometric mean xG =[∏N

i=1 xi]1/N

.It is much more difficult to calculate probability and expected value of continuous random

variables. For such cases, students should re-familiarize themselves with the tables of criticalvalues found in most undergraduate books on statistics. An example of a continuous randomvariable is the normal distribution, denoted N(µ, σ2). The notation is no mistake here,because any normal distribution is uniquely defined in terms of mean and variance.

2

Variance and Standard Deviation

While the mean is a measure of the central tendency, the variance is a measure of thedispersion. Intuitively, we are measuring the average squared distance from the mean.

V ar(X) = E[(X − E(X))2]

Why squared? Minimizing the variance of an estimator increases precision. However, in orderto use calculus, we must have a smooth function without kinks. Therefore, minimizationwill be easier with squared terms than with absolute values; although, the absolute valuefunction would have a more intuitive interpretation.

The variance is often denoted σ2. An alternative definition exists, which may be moreuseful in solving problems. First, note that the mean could be written as E(X1). Theexponent of 1 emphasizes the why the mean is sometimes called “the first moment”. Ingeneral, the nth moment of a random variable is E(Xn). As it turns out, the variance isclosely related to the second moment by the following relation

V ar(X) = E(X2)−[E(X)

]2It is quite common for people to refer to the second moment when talking about the variance.In particular, if E(X) = 0, then the second moment is exactly equal to the variance.

The standard deviation, defined as the square root of the variance and denoted σ, has asimple, important interpretation. Recall the distance formula from Algebra

d =√

(x1 − x2)2 + (y1 − y2)2

So distance is naturally measured as the square root of the sum of squared differences. Thevariance, by definition, is a sum of squared differences from the mean, and the standarddeviation is the square root of that sum. Therefore, the standard deviation can arguably besaid to measure the average distance from the mean.

σ =

[∑ω∈Ω

(X(ω)− µ

)2 · P (ω)

]1/2

The standard deviation also shares the same units as the underlying random variable,unlike the variance. For example, if X is measured in meters, then so are µ and σ, whereasσ2 is measured in meters-squared.

In the instance where we don’t know the underlying distribution, as with the mean, wemust use a sample analog by collecting n independent observations. Specifically, for thevariance we have the following estimator

s2 =1

n− 1

n∑i=1

(xi − x

)2

We divide by n−1 because we lose one degree of freedom by using the estimate of the mean,x, in the estimation of the variance.

3

Covariance and Correlation

The covariance is a generalization of the variance. This is obvious from the definition

Cov(X, Y ) = E[(X − E(X)

)·(Y − E(Y )

)]Notice that Cov(X,X) = V ar(X). All we can hope to interpret about the covariance isits sign (positive or negative). If, for example, the covariance is positive, than we can saythe following: “On average, if X is above its mean, then Y is also above its mean, and viceversa.” As with the simplification for the variance, we have the following formula

Cov(X, Y ) = E[XY ]− E[X] · E[Y ]

Unfortunately, the covariance is quite sensitive to the units of measurement. For example,suppose X and Y were both measured in meters. Now suppose we measure X in centimeters,i.e., we create a new random variable Z = 100X. Then Cov(Z, Y ) = 100 · Cov(X, Y ). Sothe covariance can be scaled up by any arbitrarily large number, without any change inthe underlying relationship. To solve this problem, we calculate the correlation, sometimescalled the coefficient of correlation and denoted ρ. The definition is as follows

ρX,Y =Cov(X, Y )

σX · σYAs it turns out, −1 ≤ ρ ≤ 1 for any two random variables, and it is invariant to scale. Valuesof∣∣ρ∣∣ close to 1 imply a strong linear relationship, while values of

∣∣ρ∣∣ close to 0 indicate aweak relationship.

Properties

For any two random variables X, Y and any two constants a, b we have

1. The expectations operator is linear

E[aX + bY ] = aE[X] + bE[Y ]

2. The covariance, defined as an expected value, has the following property

Cov(aX, bY ) = abCov(X, Y )

This further implies that the variance is a non-linear operator

V ar(aX) = Cov(aX, aX) = a2V ar(X)

3. Finally, we use all of these properties together to find the variance of a sum of randomvariables as

V ar(aX + bY ) = a2V ar(X) + b2V ar(Y ) + 2abCov(X, Y )

The properties listed above generalize for any number of random variables, and they give usthe tools to calculate the mean and variance of a collection of random variables such as aportfolio of securities.

4

Example 1

Suppose there are two states of the world: s1 is a bull market and s2 is a bear market. Theprobability of a bull market is π1 = 2/3 and the probability of a bear market is π2 = 1/3.There are two securities, R1 and R2, with net returns in each state as follows

s1 s2

R1 0.2 −0.1R2 0.4 −0.2

Find the expected returns and standard deviation of returns of the two assets, and plot themean-variance frontier. In the case of two assets, the m-v frontier is compose of any portfolioof the two assets.

Answer:First we find the expected returns

E(R1) = 0.2 · (2/3)− 0.1 · (1/3) = 0.1

E(R2) = 0.4 · (2/3)− 0.2 · (1/3) = 0.2

Next, we calculate the variance

σ21 = (0.2− 0.1)2 · (2/3) + (−0.1− 0.1)2 · (1/3) = 0.02

σ22 = (0.4− 0.2)2 · (2/3) + (−0.2− 0.2)2 · (1/3) = 0.08

Finally, we must calculate the correlation between these two assets, by way of the covari-ance.

Cov(R1, R2) = E[(R1 − E(R1)

)·(R2 − E(R2)

)]=[(0.2− 0.1)(0.4− 0.2)

]· (2/3) +

[(−0.1− 0.1)(−0.2− 0.2)

]· (1/3) = 0.04

ρ1,2 =0.04√

0.02 ·√

0.08= 1

This makes a lot of sense, since R2 = 2R1. Unfortunately, if returns are perfectly, positivelycorrelated, then there is no benefit to diversification. This can be seen clearly by examiningthe mean-variance frontier. Because ρ = 1, the variance of any portfolio is a perfect square,so the standard deviation will be linear. With only two assets, the mv-frontier will be anyportfolio (α, 1− α) such that α ∈ [0, 1] and

E(Rp) = 0.1α + 0.2(1− α)

σp =√

0.02α +√

0.08(1− α)

5

Example 2

Let’s consider two assets that are normally distributed, with R1 ∼ N(0.06, 0.04) and R2 ∼N(0.15, 0.09). The correlation between these two assets is ρ1,2 = −0.5. Plot the mv-frontier.

Answer: The normal distribution has some nice properties. First, it is uniquely definedby the first two moments, so it is natural to think of σ as risk in this context. Second, thesum of two normal random variables is, itself, normal. So, any portfolio will also be a normalrandom variable, and it will have the following mean and standard deviation

E(Rp) = 0.06α + 0.15(1− α)

σp =[0.04α2 + 0.09(1− α)2 − 0.06α(1− α)

]1/2

6

It will become second nature to assume normality in most circumstances, so it is impor-tant to remember the cost of this assumption. To illustrate, we will look at some problemsfrom RWJ chapter 10.

Example: RWJ Ch.10 #25

Suppose you bought small-cap stock, and suppose its returns are normally distributed. Whatwould be the probability of doubling your money in one year? How about triple?

Answer: First, we need to find the mean and standard deviation of returns for small-capstocks. Looking in RWJ on page 317, we find a table of arithmetic and geometric means ofreturns and standard deviations of returns. These are estimates based on historical data; notthe expected value. However, this is the best we can do in practice. We shall use r = 0.164and sr = 0.33. Since all of the estimates are in terms of net return, we shall answer thequestion that way. In order to double your money, you need a net return of r = 1. Therefore,we want to know P (r ≥ 1).

It will be easier if we can transform any normal random variable into a standard-normalrandom variable so that we can use tables. Recall that for any X ∼ N(µ, σ2) we have

Z =X − µσ

∼ N(0, 1)

Thus, we transform our question into P (Z ≥ 2.533) = 0.00565. To see how this is found,note that a typical table for the normal distribution gives the values for the cumulativedistribution function Φ(z) = P (Z ≤ z). Since Z is centered about zero and is symmetric,

P (Z ≥ 2.533) = P (Z ≤ −2.533) = Φ(−2.533) = 0.00565

Likewise, we can find the probability of tripling our investment as

P (X ≥ 2) = P (Z ≥ 5.564) = Φ(−5.564) = 0.00000001321

So, what does this tell us? Well, with a fairly representative mean and standard deviationof returns, we can only expect to double our money once every 200 years. Tripling our moneyhappens once every 1,000,000 years! This is a byproduct of the normal distribution, whichassigns very small probability to events that are more than one or two standard deviationsaway from the mean. Empirically, many assets seem to exhibit behavior similar to normaldistributions, but with “fat tails”. That is to say, normality is a good assumption except forthe fact that unlikely outliers tend to occur more often in reality than as modeled by theory.

Example: RWJ Ch.10 #26

This problem demonstrates a completely different issue with normality. On a mathematicalnote, every normal distribution is defined over all real numbers. So P (a < X < b) > 0 forany a < b; including negative numbers. However, most securities are “limited liabilities”,which means that you can only lose what you put in. For example, if you buy stock, and it

7

crashes, you only lose your investment. You are not liable for the debts of the issuer. Thisimplies that net returns are bounded below by r ≥ −1. Likewise, gross returns are boundedbelow by R ≥ 0. So, if gross returns cannot be negative, why would we use a distributionthat assigns positive probability to negative returns?

Admittedly, these two problems are in slight opposition to one another. We are tryingto have it both ways when we criticize the distribution for not giving sufficient probabilityto outliers and for giving positive probability to negative numbers. After all, many of theoutliers will be negative numbers. Therefore, these two examples may slightly overstatethe issue. Nonetheless, it is important to keep these lessons in mind as we move towardseveryday use of the normality assumption.

Why NOT use the Central Limit Theorem?

Some people may be tempted to deploy the Central Limit Theorem as a justification forthe assumption of normally distributed returns. This is incorrect. To see this, let’s recallwhat the CLT tells us. Given n independent observations of a random variable with anydistribution (so long as each observation comes from the same distribution), the CLT statesthat

x− µσ/√n

d−→ N(0, 1)

So the CLT tells us that the distribution of our estimator for the mean will converge to anormal distribution as the sample size increases. We use this assumption (not in this class)for hypothesis testing about x. Thus the transformed arithmetic mean of historical returnsr = n−1/2

∑t rt may converge to a normal distribution, but that tells us nothing about the

underlying distribution of each rt.

8

random variables and mean-variance portfolios · random variables and mean-variance portfolios dan...

Documents