i - ucsb's department of economicsecon.ucsb.edu/~llad/econ240af04/lecture_12.doc · web...

Nov. 4, 2004 LEC #12 ECON 240A-1 L. PhillipsBivariate Normal Distribution: Isodensity Curves

I. Introduction

Economists rely heavily on regression to investigate the relationship between a

dependent variable, y, and one or more independent variables, x, w, etc. As we have seen,

graphical analysis often provides insight into these bivariate relationships and can reveal

non-linear dependence, outliers, and other features that may complicate the analysis.

There are other methodologies for examining bivariate relations. We have

examined some of them. For example, correlation analysis, using the correlation

coefficient, , is one method, as discussed in Lecture Eight. Another method is

contingency table analysis. We will discuss the latter shortly. First we turn to the

bivariate normal distribution, which provides a useful visual model for bivariate

relationships just as the univariate normal distribution provides a useful probability

model for a single variable.

It is useful to have a mental model in mind for bivariate relationships and the iso-

density lines, or contour lines of the bivariate normal provide a visual representation. The

bivariate normal distribution of two variables, y and x, is a joint density function, f(x,y),

and if the variables are jointly normal, then the marginal densities, e.g. f(x) and f(y), are

each normal. In addition, the conditional densities, y given x, f(y/x), are normal as well.

The isodensity lines, i.e. the locus where f(x,y) is constant, is a circle around the

origin for the bivariate normal if both x and y have mean zero and variance one, i.e. are

standardized normal variates, and are not correlated. If x and y have nonzero means, x

and y , respectively, then these contour lines are circles around the point (x, y).

If x has a larger variance than y, then the contour lines are ellipses with the long

axis in the x direction. If x and y are correlated, then these ellipses are slanted.

II. Bivariate Normal Density

The density function, f(x,y) for two jointly normal variables, x and y where, for

example, x has mean x, variance x2, and correlation coefficient , is:

f(x, y) = 1/[2x y (1-2)] exp{(-1/[2(1-2)])([(x- x)/x]2 - 2[(x- x)/x ][(y- y)/y] +

[(y- y)/y]2 }. (1)

A. Case 1: correlation is zero, means are zero, and variances are one

f(x, y) = 1/[2 ] exp{(-1/2)[ x2 + y2 ]} (2)

and for an isodensity, where f(x,y) is a constant, k, taking logarithms,

ln [2 f(x, y)] = -1/2 [x2 + y2 ],

or [x2 + y2 ] = -2 ln [2 f(x, y)] = -2ln [2 k]. (3)

Recall [x2 + y2] = r2 is the equation of a circle around the origin, (0, 0) with radius

r, as illustrated in Figure 1.

--------------------------------------------------------------------------------

Figure 1: Isodensity Circles About the Origin

Note that if x and y are independent, then the correlation coefficient, , is zero

and the joint density function, f(x, y), is the product of the marginal density

functions for x and y, i.e.

f(x, y) = f(x) f(y) = 1/ exp [-1/2 x2 ] 1/ exp [-1/2 y2 ] (4)

where x and y have mean zero and variance one.

B. Case 2: correlation is zero, variances are one, means x and y

In this case, the origin is translated to the point of the means, (x, y). The

bivariate density function is:

f(x, y) = 1/(2) exp {(-1/2)[(x - x)2 + (y - y)2 ]}. (5)

For a density equal to k:

[(x - x)2 + (y - y)2 ] = -2 ln [2 f(x,y)] = -2 ln[2k] (6)

This is illustrated in Figure 2.

-------------------------------------------------------------------------------------

Figure 2: Isodensity Lines About the Point of Means, Bivariate Normal

C. Case 3: correlation is zero, variance of x > variance of y

If the variance of x exceeds the variance of y, then the isodensity lines are ellipses

about the point of the means with the semi-major axis in the x direction:

f(x,y) = 1/(2 x y ) exp{ (-1/2) ([(x-x)/x]2 + [(x-y)/y]2 )} (7)

Note that if x and y are independent, then the correlation coefficient is zero and

the joint density is the product of the marginal densities:

f(x, y) = f(x) f(y) = 1/(x ) exp[-1/2[(x- x)/x]2 1/(y ) exp[-1/2[(y- y)/y]2

For a constant isodensity, f(x, y) = k, from Eq. (7) we have,

([(x-x)/x]2 + [(x-y)/y]2 = -2 ln (2 x y f(x, y)) = -2 ln (2 x y k) (8)

Recall the equation of an ellipse about the origin with semi-major axis a

and semi-minor axis b is:

x2/a2 + y2/b2 = 1 (9)

Elliptical isodensity lines around the point of the means are illustrated for Eq. (7)

in Figure 3.

Case 4: correlation is nonzero.

The joint density function is given by Eq. (1) above, and the isodensity lines are

tilted ellipses around the point of the means as illustrated in Figure 4, for positive

autocorrelation.

Figure 3: Isodensity Lines About the Point of the Means, Var x > Var y

-----------------------------------------------------------------------------------

Figure 4: isodensity lines, x and y correlated

----------------------------------------------------------------------------------------------

III. Marginal Density Functions

If x and y are jointly normal, then both x and y each have normal density

functions. For example, the marginal density of x, f(x) is:

f(x) = = 1/(x ) exp[-1/2[(x- x)/x]2 (10)

and similarly for y.

IV. Conditional Density Function

The density of y conditional on a particular value of x, x = x*, is just a vertical slice of

the isodensity curve plot at that value of x, and if x and y are jointly normal, is also

normal. It can be obtained by dividing the joint density function by the marginal density

and simplifying:

f(y/x) = f(x, y)/f(x) = 1/[y (1 - 2)1/2] exp{[-1/[2(1-2)y2][y-y-(x-x)(y/x)]}

where the mean of the conditional distribution is y + (x-x)(y/x), i.e this is the

expected value of y for a given value of x, such as x*:

E[y/x=x*] = y + (x* - x)(y/x) (12)

So, if x is at its mean, x, then the expected value of y is its mean y. If x is above its

mean, and the correlation is positive, then the expected value of y conditional on x is

greater than y. This is called the regression of y on x with intercept y - x(y/x), and

slope (y/x). Of course, if x and y are not correlated, then the slope is zero, and the

intercept is y. The variance of the conditional distribution is:

Var[y/x=x*] = y2 (1 - 2) (13)

The isodensity lines and the regression line, the mean of y conditional on x, is

illustrated in Figure 5, for the case where x and y are positively correlated and the

variance of x is greater than the variance of y.

Expected Value of y Conditional on x

Figure 5: The Expected value of y Conditional on x

V. Example: Rates of Return for a Stock and the Market

In Lab Six we look at the data file XR17-34 for 48 monthly rates of return to the

General Electric (GE) stock and the Standard and Poor’s Composite Index. Both of these

variables are not significantly different from normal in their marginal distributions. An

example is the histogram and statistics for the rate of return for GE, shown in Figure 6.

The coefficient of skewness, S, is a measure of non-symmetry:

S = (1/n) 3 (14)

-0.05 0.00 0.05 0.10

Series: GESample 1993:01 1996:12Observations 48

Mean 0.022218Median 0.019524Maximum 0.117833Minimum -0.058824Std. Dev. 0.043669Skewness 0.064629Kurtosis 2.231861

Jarque-Bera 1.213490Probability 0.545122

Figure 6

Where is s, the sample standard deviation. For the normal distribution, the coefficient of

skewness is zero, since the cube of deviations from the mean sum to zero with the

negative values offset by the positive ones because of symmetry.

The coefficient of kurtosis, K, is a measure of how peaked or how flat the density

is, capturing the weight in the tails.

K = (1/n) 4 (15)

For the normal distribution, the coefficient of kurtosis is three.

The Jarque-Bera statistic, JB, combines these two coefficients:

JB = (n- k/6) [S2 + (1/4)(K – 3)2 (16)

Where k is the number of estimated parameters, such as the sample mean and sample

standard deviation, needed to calculate the statistics. If S is zero and K is 3, then the JB

statistic will be zero. Large values of JB indicate a deviation from normality, and can be

tested using the Chi-Square distribution with two degrees of freedom.

The descriptive statistics for GE and the Index are given in Table 1. The estimated

correlation coefficient is 0.636. These estimates can be used to implement Eq. (12):

E[y/x=x*] = [y - x(y/x)] + x*(y/x)

E[GE/Index] = [0.0222 – 0.636*0.0144*(0.0437/0.0254)] + 0.636*1.720*Index

E[GE/Index] = 0.0064 + 1.094*Index (13)

For comparison, the estimated regression is reported in Table 2. The coefficients are nearly identical. So the

regression can be interpreted as the expected value of y for a given value of x. A plot of the rates of return

for GE and the stock Index are shown in Figure 6.

Table 1Sample: 1993:01 1996:12

GE INDEX

Mean 0.022218 0.014361 Median 0.019524 0.017553 Maximum 0.117833 0.076412 Minimum -0.058824 -0.044581 Std. Dev. 0.043669 0.025430 Skewness 0.064629 -0.453474 Kurtosis 2.231861 3.222043

Jarque-Bera 1.213490 1.743715 Probability 0.545122 0.418174

Observations 48 48

Table 2Dependent variable:

GEMethod:

Least Squares

Coefficient Std. Error t-Statistic Prob.

0.006526 0.005659 1.153229 0.25481.092674 0.195328 5.594046 0.0000

0.404865 Mean dependent var 0.0222180.391927 S.D. dependent var 0.0436690.034053 Akaike info criterion -3.8810390.053341 Schwarz criterion -3.80307295.14493 F-statistic 31.293352.442439 Prob(F-statistic) 0.000001

-0.05 0.00 0.05 0.10

Figure 6: Rates of Return for GE Stock and S&P Composite Index

V. Discriminating Between Two Populations

As an example, we will use the data file XR18-58 on lottery expenditure as a

percent of income, introduced in Lab Six. Twenty-three individuals did not gamble. The

means for their age, number of children, years of education, and income are shown in

Table 3. For comparison, the means of the 77 individuals who did gamble are shown in

Table 4. The question is, can these explanatory variables predict who will and who will

not buy lottery tickets.

The means for number of children and age are fairly similar for the two groups. Those who do not

buy lottery tickets are better educated with higher incomes than those who participate in the lottery. The

correlation between education and income is 0.65 for ticket buyers, and 0.74 for the entire sample.

Table 3Sample: 1 23

AGE CHILDREN EDUCATION INCOME LOTTERY

Mean 40.43478 1.782609 15.56522 47.56522 0.000000 Median 41.00000 2.000000 16.00000 42.00000 0.000000 Maximum 54.00000 4.000000 20.00000 95.00000 0.000000 Minimum 23.00000 0.000000 7.000000 18.00000 0.000000 Std. Dev. 8.805092 1.277658 3.368653 22.51631 0.000000 Skewness -0.446250 0.014659 -0.919721 0.518080 NA Kurtosis 2.308389 1.985475 3.156800 2.097295 NA

Jarque-Bera 1.221762 0.987199 3.266130 1.809815 NA

Probability 0.542872 0.610425 0.195330 0.404579 NA

Observations 23 23 23 23 23----------------------------------------------------------------------------------------

Table 4Sample: 24 100

AGE CHILDREN EDUCATION INCOME LOTTERY

Mean 44.19481 1.779221 11.94805 28.54545 7.000000 Median 43.00000 2.000000 11.00000 27.00000 7.000000 Maximum 82.00000 6.000000 17.00000 64.00000 13.00000 Minimum 21.00000 0.000000 7.000000 11.00000 1.000000 Std. Dev. 12.70727 1.343830 2.887797 9.423578 2.695025 Skewness 0.466514 0.506085 0.293006 1.304264 -0.308533 Kurtosis 3.189937 3.149919 1.918891 5.036654 2.741336

Jarque-Bera 2.908734 3.359008 4.851659 35.13888 1.436299 Probability 0.233548 0.186466 0.088405 0.000000 0.487654

Observations 77 77 77 77 77---------------------------------------------------------------------------

The conceptual framework is provided in Figure 7, which shows isodensity curves

for the two populations for the explanatory variables income and education.

X = income

Y = education

Lottery Players

Lottery Avoiders

Decision Rule Line

Figure 7: Discriminating Between Those Who Play the Lottery and Those Who Don’t

---------------------------------------------------------------------------------------

Using a single variable, we could test for a difference in sample means for

education or for a difference in the sample means for income. But why not use both

variables and instead of a decision rule classifying them as gamblers if x < x*, or y < y*

use a decision rule line that separates the two populations. This is called discriminant

function analysis.

Another approach is to use a probability model. A linear probability model can be

estimated with regression using a dependent variable coded one for those who buy tickets

and zero for those who do not(designated bern for Bernoulli), and regressing it against

education and income. The results are shown in Table 7, with a plot of actual, fitted and

residuals following. Since income is very skewed, it is better to use the natural logarithm

of income, which is more bell shaped.

Using the same coding for the dependent variable, non-linear estimation of the

logit probability model is possible using Eviews, which avoids some problems that occur

with the linear probability model.

Table 7Dependent Variable: BERNMethod: Least Squares

Sample: 1 100Included observations: 100

Variable Coefficient Std. Error t-Statistic Prob.

EDUCATION -0.021597 0.016017 -1.348392 0.1807INCOME -0.010462 0.003430 -3.049569 0.0030

C 1.390402 0.148465 9.365178 0.0000

R-squared 0.277095 Mean dependent var 0.770000Adjusted R-squared 0.262190 S.D. dependent var 0.422953S.E. of regression 0.363299 Akaike info criterion 0.842358Sum squared resid 12.80264 Schwarz criterion 0.920513Log likelihood -39.11792 F-statistic 18.59045Durbin-Watson stat 0.651758 Prob(F-statistic) 0.000000--------------------------------------------------------------------------

The linear probability model can be interpreted from the perspective of decision

theory, and used to come up with a decision rule or discriminant function. The expected

cost of misclassification is the sum of the expected costs of two kinds of

misclassification, (1) labeling a non-player a player, and (2) labeling a player a non-

player. For example, if we have the cost of labeling a non-player a player, C(P/N), and

multiply it by the conditional probability, P(P/N) of incorrectly classifying this non-

player a player, given this individual’s values for income and education, and multiply by

the probability of observing non-players in the population, P(N), we have this first

10 20 30 40 50 60 70 80 90 100

Residual Actual Fitted

Figure 8: Actual , Fitted and residuals from Linear Probabili ty Model

component of misclassification: C(P/N)*P(P/N)*P(N). Adding the other expected cost of

misclassification, we have the total expected costs, E(C), of misclassification:

E(C) = C(P/N)*P(P/N)*P(N) + C(N/P)*P(N/P)*P(P). (14)

If the two costs of misclassification are equal, i.e. C(P/N) = C(N/P), noting that

there are 23 non-players or about one in four in the population, the expected costs are

E(C) = C(P/N)*P(P/N)*(1/4) + C(N/P)*P(N/P)*(3/4), (15)

We could weight the expected costs of misclassification equally by setting the probability

of classifying a non-player (coded one in the linear probability model) as a player to ¾,

i.e setting (P/N) = ¾, i.e.

E(C) = C(P/N)*(3/4)*(1/4) + C(N/P)*(1/4)*(3/4). (16)

This is equivalent to setting the fitted value of Bern to ¾, and classifying an individual as

a player if the individuals fitted probability is greater than ¾, i.e. if ern > ¾, where

Bern = ¾ = 1.390 –0.0216*education – 0.0105*income, (17)

drawing on Table 7. Thus the discriminant function or decision rule line in education

income space is, rearranging Eq. (17):

Education = 29.63 – 0.486*income, (18)

Which is illustrated in Figure 9.

Note that five non-players are misclassified as well as fourteen players, for a total

of nineteen. You could shift the line to the right, misclassifying fewer players but more

non-players. If Bern were set to 0.5, shifting the line to the right, One player would be

misclassified, but thirteen non-players would be misclassified, for a total of fourteen.

--------------------------------------------------------------------------------

Legend: Non-Players PlayersMean: Non-PlayersMean: Players

Discriminant Function or Decision Rule:Bern = ¾ = 1.39 – 0.0216*education – 0.0105*income

Lottery: Players and Non-Players Vs. Education & Income

0 10 20 30 40 50 60 70 80 90 100

Income ($000)

Mean: Non-PlayersMean: Players

i - ucsb's department of economicsecon.ucsb.edu/~llad/econ240af04/lecture_12.doc · web...

Documents

today’s outline - february 21,...

bidding behavior - ucsb's department of...

mathematical foundations: (5) fixpoint theory, part...

the next great business idea… - ucsb's technology ......

concordance among holdouts - ucsb's department of...

speech recognition - nyu computer...

iseranchi.comiseranchi.com/downloads/economics-curves.docx ·...

chapter 2 - thinking like an economist - students...

new venture competition 2018 - ucsb's technology ... info...

weather data and climate model output: -...

doingethics.comdoingethics.com/dee/2013/dee.ch9.23mar13.docx ·...

monopoly chapter 24 - ucsb's department of...

scheduling giovanni de micheli - university of texas at...

pitch deck development - tmp.ucsb.edu - ucsb's … tmp...

expected utility and risk...

modeling diffusion:...

self-con dence and strategic behavior - ucsb's department...

assignment 2 due tonight -...

nvc orientation oct 13 - ucsb's technology management …...

to understand the economics of contemporary college...