predictors of stock market values

8/2/2019 Predictors of Stock Market Values

1/38

PREDICTORS OF STOCK MARKET VALUES

QMB 6305

UNIVERSITY OF WEST FLORIDA

Submitted by

Aaron Hall

April 13, 2010

Instructor

DR. GAYLE BAUGH


2/38

TABLE OF CONTENTS

Introduction ............................................................................................................ 1

The Data ............................................................................................................. 3

Prediction And Moving Averages ........................................................................ 5

Data Manipulation ............................................................................................ 11

Data Exploration .............................................................................................. 12

Linear Prediction .............................................................................................. 13

Checking The Model ......................................................................................... 20

Conclusions .......................................................................................................... 22

The Model ......................................................................................................... 22

Confidence Intervals ........................................................................................ 22

Error Comparison ............................................................................................. 23

Summary .......................................................................................................... 25

References ........................................................................................................... 27

Appendix A: Acknowledgements .......................................................................... 28

GNU/Linux/Ubuntu ............................................................................................ 28

R ....................................................................................................................... 28

OpenOffice.org ................................................................................................. 28

Other Tools ....................................................................................................... 28

Appendix B: R Code ............................................................................................. 29

ii


3/38

Predictors of Stock Market Values 1

INTRODUCTION

The goal of this project is to analyze available econometric data and find

predictors of the valuation of the United States stock market. Availability of

data is an important constraint for this analysis. As the stock market has its

value calculated every second, using predictors with monthly frequency

would have been preferable, however, macroeconomic data is usually given

in quarterly and annual forms. Thus, all data used here has been annualized.

The proxy for stock values, the model's independent variable, is the

Ibbotson Large Company total return values. Since these values are given in

percentage returns from year 1925, the figures used for the model are

transformed as $1,000 invested in 1925 (Harrington, 2008).

Dependent variables include projected GDP, interest rates, inflation, and

the money supply.

Projected GDP is measured by reported predictions by the fed in the

Greenbook, but since it is only reported with a three year delay, the data

must be analyzed up to that date. Further, since the Fed's methodology is a

secret, the results for this figure, if they are significant, cannot be accurately

reproduced. GDP is an estimate even after the period for which it is

measured, and is usually revised several times (St. Louis Fed, 2010).

Projected GPD is important because it represents the productivity of the

United States economy. It would make sense that the more productive the US

economy, the greater chance for profitability for any US company, though

certainly not a guarantee. If an investor believes the economy will be more


4/38


productive in the future, perhaps this will increase the price the investor will

pay for the investment.

Since stock prices are based on the present values of future cash flows,

changes in interest rates are likely to affect the valuation of stocks. Lower

rates would increase the present value of the future cash flows. Interest rates

also affect the cost of capital for the firm as well. Lower interest rates would

decrease the borrowing costs of firms and increase their profitability.

The original proposal sought to examine bond prices as a predictor

variable. Since interest rate changes are directly related to changes in bond

prices, it precludes the use of bonds as a predictor variable.

Inflation may also be a predictor of stock valuations. Higher inflation

means that investors in bonds, if they are spending the bond income, are

losing capital to inflation. As a result, they may seek higher returns in the

stock market. Further, the values of hard assets owned by the companies

may also be increasing in value relative to the weakening dollar. Similarly to

the Large Company stock values, the figures used for inflation are indexed to

$1,000 in 1925.

The money supply is the final predictor to examine. The money supply

represents the number of dollars in circulation. It is related to inflation as the

more dollars in circulation, the less each individual dollar may be worth.

Since it may be highly correlated to inflation, it is unlikely both will remain in

the final model. This idea is not unrelated to Keynes' idea that the

components of money demand include a speculative demand in addition to


5/38


classical notions of precautionary demand and transactional demand.

Since some predictors are only available on a quarterly or annual basis,

important data is only available on a year

A concern for this analysis is whether or not the stock market of today is

affected by the same causes even a decade past.

The Data

The data is expected upon graphical examination to reveal trends and

cyclicality. It is expected that since the stock market values are

representative of growth, perhaps a log transformation of the data is the best

approach. However, the other values also follow a growth form, and

transforming both the predictor and dependent variables in an identical

fashion will not yield any additional predictive power. The Box-Cox operation

may reveal the optimal transformation.

Stock market values will be represented by Ibbotson Large Company

Returns. Since returns are given in terms of percentage gains or losses, the

data is transformed into values based on $1000 invested in 1925, and

represent the value gained or lost by the end of the year (Harrington, 2008).

Projected economic production is measured by Projected Gross National

Product (in billions) for up to year 1992, and Projected Gross Domestic

Product (in billions) for year 1992 on, with one year of overlap in projections

for year 1992. These projections are given by the Greenbook, which is

released along with the Federal Open Market Committee meeting transcripts

after a five year lag (St. Louis Fed, 2010).


6/38


Inflation is given by Ibbotson Inflation Return data. The Fed Funds Rate

will proxy for interest rates. Money supply may be represented variously by

Institutional Money Funds (series IMFNS from the St. Louis Fed) and M2, and

the two series added together. (M3 was to be our time series for money

supply as it is the most encompassing definition of money, but it has been

discontinued by the Fed on the grounds that the costs of gathering the data

are not overcome by the value of the series.)

It should be noted that both the Large Company and Inflation return data

are given in terms of annual percentage growth, and have been transformed

to indicate the growth in the value of $1000 in 1925, therefore these figures

indicate the value by the end of the period. For prediction, the predictor

variables (other than projected GDP and GNP) will be lagged.

Occam's razor states that if two possible explanations are equally likely,

one should accept the least complicated explanation. When making

predictions, one should accept more complicated explanations only if the

more complicated explanation provides significantly greater prediction value.

Thus, this paper will seek to find the simplest model with the best

prediction value.


7/38


Prediction And Moving Averages

There are various ways of attempting to predict a variable based on its

past values. The simplest method is to use the last measured value. This

method may put too much emphasis on a single terms' values. Various forms

of moving averages can provide a more nuanced approach to prediction of

the next period's value.

Simple moving Averages (SMAs) weight all periods evenly, and the only

Table 1: Raw Data

Year LCStock GNP GDP Inflation FEDFUNDS IMFNS M2NS M2IMFNS

1978 89597.47 8445.3 NA 3777.37 NA NA NA NA

1979 106119.24 9385.1 NA 4280.14 13.78 9.5 1479.0 1488.5

1980 140523.09 10219.1 NA 4810.87 18.90 15.2 1604.8 1620

1981 133623.41 11342.9 NA 5240.97 12.37 38.0 1760.3 1798.3

1982 162232.18 12479.4 NA 5443.79 8.95 50.0 1917.2 1967.21983 198750.65 12979.4 NA 5650.66 9.47 42.5 2136.2 2178.7

1984 211212.31 14604.2 NA 5873.86 8.38 65.9 2320.9 2386.8

1985 279138.19 15628.9 NA 6095.30 8.27 68.2 2506.6 2574.8

1986 330695.02 16478.2 NA 6164.18 6.91 88.5 2744.1 2832.6

1987 347990.37 17707.1 NA 6436.02 6.77 95.0 2842.7 2937.7

1988 406487.55 18951.2 NA 6720.49 8.76 94.9 3006.3 3101.2

1989 534490.48 20890.4 NA 7032.99 8.45 112.5 3171.4 3283.9

1990 517547.13 22115.7 NA 7462.71 7.31 141.5 3290.2 3431.7

1991 675657.77 22918.9 NA 7691.07 4.43 191.2 3391.1 3582.3

1992 727480.73 23731.8 23646.8 7914.11 2.92 216.0 3446.7 3662.7

1993 800156.05 NA 25106.9 8131.75 2.96 221.3 3501.2 3722.5

1994 810638.09 NA 26916.3 8348.86 5.45 216.1 3517.7 3733.81995 1114059.93 NA 28464.6 8560.92 5.60 270.3 3663.9 3934.2

1996 1371073.56 NA 29644.5 8845.15 5.29 332.4 3839.7 4172.1

1997 1828463.70 NA 31702.1 8995.51 5.50 409.7 4053.6 4463.3

1998 2351038.63 NA 33760.6 9140.34 4.68 565.6 4398.2 4963.8

1999 2845697.15 NA 35286.5 9385.30 5.30 674.2 4661.9 5336.1

2000 2586454.14 NA 39068.3 9703.47 6.40 833.4 4948.7 5782.1

2001 2279183.39 NA 41928.5 9853.87 1.82 1248.0 5469.0 6717

2002 1775483.86 NA 41733.8 10088.39 1.24 1300.8 5816.2 7117

2003 2285047.73 NA 43518.7 10278.05 0.98 1154.7 6101.4 7256.1

2004 2533432.42 NA 46639.7 10613.12 2.16 1103.0 6443.7 7546.7

2005 2657823.95 NA NA 10976.09 4.16 1172.1 6703.1 7875.2

2006 3077760.13 NA NA 11254.88 5.24 1378.4 7102.3 8480.72007 3246729.16 NA NA 11714.08 4.24 1934.8 7530.2 9465

2008 2045439.37 NA NA 11724.62 0.16 2430.9 8251.3 10682.2


8/38


parameter is the number of periods used for prediction.

Exponential Moving Averages (EMAs) reduce the parameters involved in

weighted moving averages to the number of periods used and a parameter,

alpha, which defines the amount of weighting each period receives. If alpha is

restricted to 2 /n1 , one may reduce the number of parameters to 1

(Colby, 2003).

Weighted Moving Averages (WMA) may also be considered. There are

enumerable variations in choice for weightings. Restricting the weighting to

ntx where n is the number of periods, tis the most recent period, and

xis the period number, can provide a general rule structure with equally

declining rates of weighting. SMA and the Last method are actually restricted

cases of WMA, the SMA with equal weighting, and the Last method with

n=1.

These are simple methods for forecasting time series data. They can

provide a baseline for deciding if other more complicated methods are

worthwhile. Measurement of the degree to which they fail to predict can

provide a way of eliminating the less effective methods of prediction. The

methods are demonstrated and compared on the following pages.


9/38


Table 2: Prediction Method of using the Last Period's Value

Year Last Absolute Error Squared Error Abs. % Error

1978 89597.47

1979 106119.24 89597.47 16521.77 272968969.30 15.57%

1980 140523.09 106119.24 34403.86 1183625368.92 24.48%

1981 133623.41 140523.09 6899.68 47605638.59 5.16%

1982 162232.18 133623.41 28608.77 818461848.89 17.63%1983 198750.65 162232.18 36518.46 1333598241.05 18.37%

1984 211212.31 198750.65 12461.67 155293109.25 5.90%

1985 279138.19 211212.31 67925.88 4613925152.16 24.33%

1986 330695.02 279138.19 51556.82 2658106122.24 15.59%

1987 347990.37 330695.02 17295.35 299129110.46 4.97%

1988 406487.55 347990.37 58497.18 3421920136.68 14.39%

1989 534490.48 406487.55 128002.93 16384749714.32 23.95%

1990 517547.13 534490.48 16943.35 287077043.93 3.27%

1991 675657.77 517547.13 158110.65 24998976830.29 23.40%

1992 727480.73 675657.77 51822.95 2685618284.69 7.12%

1993 800156.05 727480.73 72675.32 5281702797.86 9.08%

1994 810638.09 800156.05 10482.04 109873251.96 1.29%1995 1114059.93 810638.09 303421.84 92064812356.11 27.24%

1996 1371073.56 1114059.93 257013.63 66056004341.90 18.75%

1997 1828463.70 1371073.56 457390.14 209205740036.57 25.01%

1998 2351038.63 1828463.70 522574.93 273084552890.20 22.23%

1999 2845697.15 2351038.63 494658.53 244687058285.70 17.38%

2000 2586454.14 2845697.15 259243.01 67206938571.71 10.02%

2001 2279183.39 2586454.14 307270.75 94415315113.52 13.48%

2002 1775483.86 2279183.39 503699.53 253713215787.74 28.37%

2003 2285047.73 1775483.86 509563.87 259655335708.01 22.30%

2004 2533432.42 2285047.73 248384.69 61694953315.94 9.80%

2005 2657823.95 2533432.42 124391.53 15473253157.22 4.68%

2006 3077760.13 2657823.95 419936.18 176346398595.85 13.64%2007 3246729.16 3077760.13 168969.03 28550533539.91 5.20%

2008 2045439.37 3246729.16 1201289.79 1443097161504.6 58.73%

225742.56 115510479476.74 16.94%

MAD MSE MAPE

LCStock


10/38


Table 3: Simple Moving Average, n=3

Year 3SMA Absolute Error Squared Error Abs. % Error

1978 89597.47

1979 106119.24

1980 140523.09

1981 133623.41 112079.93 21543.48 464121451.77 16.12%

1982 162232.18 126755.25 35476.94 1258612933.63 21.87%1983 198750.65 145459.56 53291.08 2839939693.60 26.81%

1984 211212.31 164868.75 46343.57 2147726102.60 21.94%

1985 279138.19 190731.71 88406.48 7815705416.34 31.67%

1986 330695.02 229700.38 100994.63 10199915820.03 30.54%

1987 347990.37 273681.84 74308.53 5521756957.95 21.35%

1988 406487.55 319274.53 87213.02 7606111133.42 21.46%

1989 534490.48 361724.31 172766.17 29848147904.41 32.32%

1990 517547.13 429656.13 87891.00 7724827496.83 16.98%

1991 675657.77 486175.05 189482.72 35903703032.64 28.04%

1992 727480.73 575898.46 151582.27 22977183646.41 20.84%

1993 800156.05 640228.54 159927.51 25576807786.21 19.99%

1994 810638.09 734431.52 76206.58 5807442490.69 9.40%1995 1114059.93 779424.96 334634.98 111980567596.75 30.04%

1996 1371073.56 908284.69 462788.87 214173535872.04 33.75%

1997 1828463.70 1098590.53 729873.17 532714845282.45 39.92%

1998 2351038.63 1437865.73 913172.89 833884735153.89 38.84%

1999 2845697.15 1850191.96 995505.19 991030584614.88 34.98%

2000 2586454.14 2341733.16 244720.98 59888359287.38 9.46%

2001 2279183.39 2594396.64 315213.25 99359393130.41 13.83%

2002 1775483.86 2570444.90 794961.03 631963045960.47 44.77%

2003 2285047.73 2213707.13 71340.60 5089480910.29 3.12%

2004 2533432.42 2113238.33 420194.09 176563073690.98 16.59%

2005 2657823.95 2197988.00 459835.95 211449097709.30 17.30%

2006 3077760.13 2492101.37 585658.77 342996192310.68 19.03%2007 3246729.16 2756338.83 490390.33 240482676908.24 15.10%

2008 2045439.37 2994104.42 948665.04 899965361827.70 46.38%

337495.89 204341961189.70 25.28%

MAD MSE MAPE

LCStock


11/38


Table 4: Exponential Moving Average, n=3, alpha=0.5

Year 3EMA.5 Absolute Error Squared Error Abs. % Error

1978 89597.47

1979 106119.24 89597.47

1980 140523.09 97858.35

1981 133623.41 119190.72 14432.69 208302472.58 10.80%1982 162232.18 126407.07 35825.12 1283438940.56 22.08%

1983 198750.65 144319.62 54431.02 2962736201.05 27.39%

1984 211212.31 171535.14 39677.18 1574278358.49 18.79%

1985 279138.19 191373.72 87764.47 7702601885.25 31.44%

1986 330695.02 235255.96 95439.06 9108613854.10 28.86%

1987 347990.37 282975.49 65014.88 4226934433.03 18.68%

1988 406487.55 315482.93 91004.62 8281840836.41 22.39%

1989 534490.48 360985.24 173505.24 30104067776.38 32.46%

1990 517547.13 447737.86 69809.27 4873334340.09 13.49%

1991 675657.77 482642.49 193015.28 37254899475.16 28.57%

1992 727480.73 579150.13 148330.59 22001964771.08 20.39%

1993 800156.05 653315.43 146840.62 21562167965.08 18.35%1994 810638.09 726735.74 83902.35 7039605132.02 10.35%

1995 1114059.93 768686.92 345373.02 119282520409.14 31.00%

1996 1371073.56 941373.43 429700.13 184642205957.35 31.34%

1997 1828463.70 1156223.49 672240.21 451906896336.44 36.77%

1998 2351038.63 1492343.60 858695.03 737357153315.08 36.52%

1999 2845697.15 1921691.11 924006.04 853787164899.99 32.47%

2000 2586454.14 2383694.13 202760.01 41111621713.91 7.84%

2001 2279183.39 2485074.14 205890.75 42390999723.26 9.03%

2002 1775483.86 2382128.76 606644.90 368018038091.87 34.17%

2003 2285047.73 2078806.31 206241.42 42535521976.81 9.03%

2004 2533432.42 2181927.02 351505.40 123556043793.01 13.87%

2005 2657823.95 2357679.72 300144.23 90086558779.20 11.29%2006 3077760.13 2507751.83 570008.30 324909460857.22 18.52%

2007 3246729.16 2792755.98 453973.18 206091648861.04 13.98%

2008 2045439.37 3019742.57 974303.20 949266726355.81 47.63%

311128.82 173819531389.31 23.61%

MAD MSE MAPE

LCStock


12/38


Table 5: Weighted Moving Average, 3 periods

Year 3WMA Absolute Error Squared Error Abs. % Error

1978 89597.47

1979 106119.24

1980 140523.09

1981 133623.41 120568 13055.87 170455826.59 9.77%

1982 162232.18 131339 30892.91 954371666.51 19.04%

1983 198750.65 149078 49672.90 2467397310.21 24.99%

1984 211212.31 175723 35489.03 1259471001.03 16.80%

1985 279138.19 198895 80243.12 6438958847.56 28.75%

1986 330695.02 243098 87596.71 7673183321.03 26.49%

1987 347990.37 293596 54394.74 2958787899.04 15.63%

1988 406487.55 330750 75737.66 5736193038.66 18.63%

1989 534490.48 374356 160134.08 25642922636.86 29.96%

1990 517547.13 460739 56807.65 3227108677.42 10.98%

1991 675657.77 504685 170972.79 29231696566.83 25.30%

1992 727480.73 599426 128054.38 16397925184.80 17.60%

1993 800156.05 675217 124938.57 15609647468.82 15.61%

1994 810638.09 755181 55456.87 3075463885.92 6.84%

1995 1114059.93 793285 320775.42 102896866984.15 28.79%

1996 1371073.56 960602 410471.55 168486896330.42 29.94%

1997 1828463.70 1191996 636467.26 405090572707.41 34.81%

1998 2351038.63 1556933 794105.60 630603703969.31 33.78%

1999 2845697.15 2013519 832177.68 692519690655.54 29.24%

2000 2586454.14 2511272 75182.07 5652344215.05 2.91%

2001 2279183.39 2633633 354449.17 125634213850.63 15.55%

2002 1775483.86 2476026 700542.07 490759197131.8 39.46%

2003 2285047.73 2078545 206502.31 42643204645.54 9.04%

2004 2533432.42 2114216 419216.70 175742642136.79 16.55%

2005 2657823.95 2324313 333511.19 111229711943.21 12.55%

2006 3077760.13 2554231 523529.40 274083030393.65 17.01%

2007 3246729.16 2847060 399669.05 159735345716.28 12.31%

2008 2045439.37 3092255 1046815.91 1095823551868.69 51.18%

302846.77 170434983551.10 22.20%

MAD MSE MAPE

LCStock


13/38


Table 6: Moving Average Summary

It is clear that the method of choosing the last value has the best

performance of the moving averages, since it has the least Mean Absolute

Deviation (MAD), the least Mean Squared Errors (MSE), and the least Mean

Absolute Percentage Error (MAPE). In terms of the moving averages, the best

performer is the simplest prediction method, where the last period is used to

predict the next period. Since the data follows a trend with occasional

retracement, it makes sense that this prediction method would provide the

best performance.

This is not to say that selecting the Last Period is a satisfactory approach

to prediction. The variable is generally in an upward trend. If the troughs can

be predicted, an investor will safely gain returns while avoiding or even

profiting from market losses.

Data Manipulation

Since the data are to be used to predict future period Large Company

Stock returns using current data, aside from projected GDP. Therefore

Inflation, Interest Rates, and the Money Supply figures will be lagged.

Also, Projected GNP ends where Projected GDP begins, with one year of

overlap. These series are spliced together with the average taken for the

year of overlap (since there is minimal difference in the figures, less than half

Method MAD MSE MAPE

Last 225742.56 115510479476.74 16.94%

SMA 337495.89 204341961189.70 25.28%

EMA 311128.82 173819531389.31 23.61%

WMA 302846.77 170434983551.10 22.20%


14/38


a percent.) This series will be referred to as SPLICEGROSS in the data, and as

GDP in the text from this point on.

The lagged money supply data will consist of the combined M2 and

Institution Money Funds, since M3 (the most expansive definition of the

money supply) was discontinued. In the data, it is referred to under the

MRM2IMFNS label. In the text, it will be referred to as simply the money

supply. The most recent (thus lagged) Fed Funds and Inflation data are also

used.

Data Exploration

The data are highly correlated. Pairwise correlation is measured where

data is available for both terms. The term of primary interest is the

dependent variable, Large Cap Stock. Correlations with the dependent

variable are: Year, 0.926; GDP, 0.934; lagged Inflation, 0.912; lagged

FEDFUNDS, 0.64; lagged M2 and Institution Money Funds, 0.878.

There are other pairwise correlations of note. FEDFUNDS is negatively

correlated with all other variables. Its strongest relationship is with the GDP (-

0.798, -0.781 lagged) and Inflation (-0.808) variables, both at near 0.8.

The measure of money supply is highly correlated with Inflation (0.962)

as well as GDP (0.981). This relationship may indicate multicollinearity in the

data, and it makes the money supply variable an early potential candidate for

removal. The removal of the money supply (or any other variable, for that

matter) from the model does not mean that it is unimportant, it merely

means that it is not needed to predict the response variable.


15/38


Multicollinearity is the problem of having one predictive variable being a

near linear transformation of another predictive variable. When the variables

are highly correlated, the model may still be reliable so long as the

relationship between the independent variables is stable. If the relationship

Illustration 1: Plots of the Key Variables


16/38


between variables changes, the model may cease to be reliable (Faraway,

2005).

Multicollinearity also means that it is difficult to explain the individual

importance of each variable. Small changes in the predicted variable can

create large changes in the beta coefficients.

The indication of multicollinearity is not only shown in the correlation

matrix. It may also show itself in the model. (Variance inflation factors are a

another approach to examining collinearity, but are beyond the scope of this

paper.)

There are other potential problems as well, including heteroskedasticity,

non-constant variance of the errors.

Linear Prediction

The first model,

LCS=YearProjected GDPInflationFed Funds RateMoney Supply , is fit. GDP and

the money supply are shown to be significant at the 5% level. Adjusted R-

squared is 0.9014, and the p-value for the model is highly significant.

(Excluded observations are those before 1980 and after 2004; all

observations between and including 1980 and 2004 were included in the

regression.)


17/38


Using the original model yields a very high F statistic and R-squared.

However, only two of the independent variables is significant at the 5% level.

Visually observing the errors, they appear to be within a narrow band to the

left and a much wider band to the right.

Variance appears to be non-constant, a condition called

heteroskedasticity. A Q-Q Plot of the errors also indicates non-normality, but

is similar to log-normal residuals (which is more evidence for a log

transformation of the dependent variable). The Shapiro-Wilk normality test

gives a p-value of 0.03778. Since the Shapiro-Wilk null hypothesis is that the

residuals are normal, this test provides formal evidence for the rejection of an

Illustration 2: Residual Variance is Non-Constant


18/38


assumption of normality. (R documentation indicates a rejection threshold of

less than .1 is adequate, citing a remark in Applied Statistics by Patrick

Royston in 1995 (R Development Core Team, 2009).)

A further problem is autocorrelation. Visual inspection of the trending

data is indicative of autocorrelation. The Durbin-Watson test is conclusive

(Zeileis & Hothorn, 2002). It reports a p-value of 0.0002858, rejecting the

hypothesis of non-correlated errors. An approach to deal with autocorrelation

is to add the lagged response variable to the predictor variables. This

approach is akin to using the Last method for prediction.

The proper transformation of the data can be easily estimated with the

Box-Cox method (Venables & Ripley, 2002). The Box-Cox method transforms

the response variable by raising it to the power of lambda (and dividing it by

Illustration 3: Box-Cox Operation Indicates Natural Log-Transformation


19/38


lambda), except when lambda equals zero, which then takes the natural log

of the response variable. The 95% confidence interval for lambda falls

between approximately -0.28 and 0.05, confirming earlier suspicions of the

appropriateness of taking the natural log of the response.

Based on the suggestions of the analysis thus far, the model will be

changed and transformed before dropping insignificant variables in an

attempt to improve the model. The new model is

ln LCS=YearPro.GDPInfl.Fed Funds RateMoney SupplylnLaggedLCS .

The transformation of the dependent variable indicates a successful

improvement on the model. The R-squared has increased, and the p-value is

still significant. The money supply is still significant, but the untransformed

projected GDP is no longer significant.

The Shapiro-Wilk normality test now indicates normally distributed errors.

The Durbin-Watson test still indicates autocorrelation with a p-value of .0847.

The Box-Cox test now indicates a wide range of possible transformations

including -2 and 1 within the 95% confidence level. Perhaps the Box-Cox

result is a problem with the predictors not having the correct transformation.

Both the Money Supply, Inflation, and Gross Domestic Product are functions

of growth over time. The next model will take their logs as well and fit.

This iteration does not improve for any of the variables except the lag

predictor which corrects for autocorrelation. Up to this point, inflation appears

to be an unimportant variable. Remove inflation for the next regression.

Removing inflation improves this model. Removing variables usually


20/38


decreases R-squared, but in this case, there was no change. Adjusted R-

squared actually improved.

Since the lagged dependent variable is in the model as a predictor, it

does a better job of predicting the next year's performance than the year.

Since the year is the least significant predictor now, it is the next variable to

remove from the model.

Removing the Year variable lowers R-squared an insignificant amount,

while still improving adjusted R-squared. In addition, the GDP variable

becomes significant again. The variable for the Federal Funds Rate is the lone

remaining insignificant term. Next, remove the Federal Funds Rate from the

model.

Removing the Fed Funds Rate creates very small decreases to R-squared

(at 0.9842) and adjusted R-squared (at 0.982). The model is now

ln LCS=lnPro.GDPln MoneySupplylnLaggedLCS , which is the best

iteration so far. Each term is significant at the 5% level, and both terms were

significant in the first iterations model (before any transformations). (See

Appendix B, page 33 for the regression ANOVA with the object code "malt5".)

The least significant term is the money supply term. Even though it is

significant at the 5% level, given the high R-squared, the model may be over-

fit. The problem with over-fitting a model is that it is overly sensitive to newly

sampled data. Training the model on a subset of the data and testing its

ability to predict based on data outside the subset is one way of testing for

fit, and this method shall be demonstrated at the end of this paper.


21/38


Removing the money supply data reduces both R-squared and adjusted

R-squared slightly. It also reduces the confidence level of the prediction

provided by predicting gross economic production. It may be that the money

supply adds meaning that is required for projected GDP to mean anything. An

important thing to note is that this regression includes the observations from

year 1979. A look at the data indicates nothing strange that should arise from

that year being included. (Also, there is no indication of multiplicative effects,

see Appendix B.)

Checking The Model

Since the optimal model has been found, checking previous diagnostics

on the model, ln LCS=lnProjected GDPln Money Supplyln LaggedLCS will

test the strength of the model. The Shapiro test gives a p-value of .2885,

which is little evidence to reject the assumption of normality in the errors.

The Durbin-Watson test gives a p-value of .1872, which is evidence for not

rejecting the assumption of independent errors (other forms of data-

exploration confirm this conclusion). And the Box-Cox transformation test

indicates that the data are correctly transformed with a maximal value for

lambda of close to one, although the 95% confidence interval ranges from

approximately -0.75 to 2.66. Thus we can assume the model is linear in the

parameters.


22/38


The one remaining problem with the data is multicollinearity. Projected

GDP is correlated at 98% with the lagged money supply. Although this

correlation indicates removing one of the predictors from the model, removal

from this point is impossible. Upon removal of projected GDP, the money

supply variable's p-value increases to 0.3530. Removal of the money supply

variable causes projected GDP's p-value to go to 0.267.

CONCLUSIONS

The Model

This study indicates that together, the money supply and projected GDP

provide information that indicates the direction of the stock market. To

Illustration 4: Box-Cox Operation Maximum Likelihood: Linear Model


23/38


implement this model in making predictions about the stock market, first

predict the next year's GDP to the same level of accuracy as the Fed (no

small feat). Then, using the current year's end of year money supply and

stock market values, combined with the GDP projection, predict the stock

market's valuations with the following formula.

LCS=e4.98091.9336ln ProjGDP1.0676ln MoneySupply0.5758ln LaggedLCS

The transformation is justified by both the Box-Cox procedure as well as

the improvement in R-Squared of over 0.05.

Confidence Intervals

The minimum and maximum residuals are -0.22263 and 0.23279

respectively. To understand the difference, for any number exponential in e

greater than expected by 0.25, the exponential function is 28.4% greater

than expected. Similarly, for a number in e's exponent less than expected by

0.25, the result is 22.1% less than expected.

The residual standard error is 0.1368 with 21 degrees of freedom. Based

on the two tailed t-distribution with an alpha of 95%, the critical range is plus

or minus 2.08 standard errors. Thus the 95% confidence interval for the

regression is from 24.8% less than expected to 33.0% greater than expected.

This calculation indicates that this regression is no gold mine, and that even

with some expectation of a future value, there can be very large variance.

Error Comparison

Based on the sample the regression was calculated from, assuming

accurate projection of GDP, the next four years would have had this result.


24/38


This prediction uses actual cumulative quarterly GDP instead of projected

GDP (which as noted earlier, is only released by the Fed after a five year lag).

The Mean Absolute Percent Error is the best measure of error, since this out

of sample prediction is after significant growth in the stock market and other

variables, and the absolute errors and squared errors should be much larger.

Looking at the individual prediction percentage errors, for the first three

years, note an average over-prediction that ranges from 7.75% to 13.46%.

The MAPE of 26.73% is skewed high by the 2008 observation.

For the entire sample plus the next four years, the MAPE is 13.12%.

Relative to even the best of the moving average prediction methods, by this

measure, the regression is far superior.

Table 7: Four Year Forecast and Error

Year LCSTOCK GDP M2IMF AUTOCOR Predicted Abs Errors Squared Errors bs%Erro

2005 2657824.0 50553.5 7547 2533432.42 3015493.21 357669.25767 127927297880 13.46%

2006 3077760.1 53595.7 7875 2657823.95 3316359.46 238599.33319 56929641800.5 7.75%

2007 3246729.2 56290 8481 3077760.13 3665965.51 419236.34554 175759113421 12.91%

2008 2045439.4 57765.7 9465 3246729.16 3534858.84 1489419.4698 2218370357114 72.82%

Beta: -4.9809 1.9336 -1.068 0.5758 Sums: 2504924.4062 2578986410215Means: 626231.10156 644746602554 26.73%

MAE MSE MAPE


25/38


It may be considered unfair to compare MAPE for the whole set of years

for a regression fitted to those years designed to minimize Mean Squared

Errors. However, there is little else to compare. Indeed this regression may be

the best approximation to predicting the next year's stock market levels.

Optimization of prediction notwithstanding, the variance may be far too high

to create profitable trading rules based on the data.

Table 8: Final Model's Error

Year LCSTOCK GDP RM2IMFN AUTOCOR Predicted Abs ErrorsSquared Errors Abs%Error

1980 140523.09 10219.1 1488.5 106119.24 124752.29 15770.80 248718251.35 11.22%

1981 133623.41 11342.9 1620 140523.09 163920.03 30296.62 917885351.98 22.67%

1982 162232.18 12479.4 1798.3 133623.41 171322.78 9090.599 82638994.598 5.60%

1983 198750.65 12979.4 1967.2 162232.18 187800.16 10950.49 119913272.36 5.51%

1984 211212.31 14604.2 2178.7 198750.65 237773.43 26561.12 705493083.59 12.58%1985 279138.19 15628.9 2386.8 211212.31 254694.48 24443.71 597495000.97 8.76%

1986 330695.02 16478.2 2574.8 279138.19 305514.84 25180.18 634041355.26 7.61%

1987 347990.37 17707.1 2832.6 330695.02 349602.05 1611.683 2597523.5158 0.46%

1988 406487.55 18951.2 2937.7 347990.37 394866.81 11620.74 135041639.52 2.86%

1989 534490.48 20890.4 3101.2 406487.55 492043.93 42446.55 1801709516 7.94%

1990 517547.13 22115.7 3283.9 534490.48 605042.42 87495.29 7655425684 16.91%

1991 675657.77 22918.9 3431.7 517547.13 607121.90 68535.87 4697165498 10.14%

1992 727480.73 23689.3 3582.3 675657.77 720759.17 6721.556 45179310.683 0.92%

1993 800156.05 25106.9 3662.7 727480.73 821835.76 21679.71 470009941.49 2.71%

1994 810638.09 26916.3 3722.5 800156.05 976169.36 165531.3 27400600909 20.42%

1995 1114059.93 28464.6 3733.8 810638.09 1092297.79 21762.14 473590650.94 1.95%

1996 1371073.56 29644.5 3934.2 1114059.93 1341884.08 29189.48 852025708.45 2.13%

1997 1828463.7 31702.1 4172.1 1371073.56 1617168.71 211295.0 44645574834 11.56%

1998 2351038.63 33760.6 4463.3 1828463.7 2005824.72 345213.9 119172641764 14.68%

1999 2845697.15 35286.5 4963.8 2351038.63 2254232.08 591465.1 349830931632 20.78%

2000 2586454.14 39068.3 5336.1 2845697.15 2836037.94 249583.8 62292075342 9.65%

2001 2279183.39 41928.5 5782.1 2586454.14 2824486.32 545302.9 297355288996 23.93%

2002 1775483.86 41733.8 6717 2279183.39 2217762.22 442278.4 195610143382 24.91%

2003 2285047.73 43518.7 7117 1775483.86 1957991.01 327056.7 106966098805 14.31%

2004 2533432.42 46639.7 7256.1 2285047.73 2535676.41 2243.989 5035487.8639 0.09%

2005 2657823.95 50553.5 7546.7 2533432.42 3015493.21 357669.3 127927297880 13.46%

2006 3077760.13 53595.7 7875.2 2657823.95 3316359.46 238599.3 56929641800 7.75%

2007 3246729.16 56290 8480.7 3077760.13 3665965.51 419236.3 175759113421 12.91%

2008 2045439.37 57765.7 9465 3246729.16 3534858.84 1489419 2.21837E+012 72.82%

Beta: -4.9809 1.9336 -1.0676 0.5758 Sums: 58182523.80170E+012Means: 207794.7 135775133291 13.12%

MAE MSE MAPE


26/38


Summary

Since this model was arrived at over a series of iterative processes that

eliminated one variable at a time, it may be argued that the findings are

spurious, and the result of random chance. That the model is the result of

pure chance is unlikely to be the case, however.

In general, this model states that the stock market goes up when the

economy is expected to grow, and when the money supply is decreasing. The

effect for the economy is about twice as much as the effect for the money

supply.

Expectations of economic growth fuel speculation in stocks. When people

expect the economy to grow more, stock prices increase. When there is less

of an expectation for economic growth, stock prices do not increase as much.

The Fed acts to contract the money supply when the economy is growing

too fast. The stock market is known to be a leading indicator of economic

growth. It would make sense that the Fed would be tightening the money

supply as the stock market is increasing.

Sometimes time series data runs the risk of reaching a change point

where the effects being used for prediction cease to work (Chatfield, 2000). It

is unlikely that the effects found here will cease to predict, however. These

effects are the result of actions of or predictions by a United States

government chartered organization that has powerful control over

fundamental aspects of the economy.

The high correlation between the two factors is an element of concern. It


27/38


would make sense that if the Fed sees the economy growing above average

the next year that it would act today to reduce the money supply. This

reasoning would explain the high level of correlation. This interaction is

troubling, but each needs the other for its significance level in the model. And

without the two variables, the model is left with nothing but an

autocorrelation correction variable based on the previous year's market and

about a third higher residual standard error.

Low standard errors with many variables relative to the number of

observations may indicate a model that is over-fit, but the two variables (plus

the autocorrelation variable) do not seem to be too much relative to the size

of the data available. In retrospect, this model also has the lowest standard

errors, and since all of these models have a very high R-squared, optimizing

for standard errors while keeping the number of predictors small would seem

to be the best remaining approach.


28/38


REFERENCES

Chatfield, C. (2000). Time-Series Forecasting. Boca Raton: Chapman &

Hall/CRC.

Colby, R. W. (2003). The Encyclopedia of Technical Market Indicators. New

York: McGraw-Hill.

Faraway, J. J. (2005). Linear Models with R. Boca Raton: Chapman & Hall/CRC.

Harrington, J. P. (Ed.). (2008). Ibbotson SBBI 2009 Classic Yearbook: Market

Results for Stocks, Bonds, Bills, and Inflation 1926-2008. Chicago:

Morningstar.

R Development Core Team. (2009). R: A Language and Environment for

Statistical Computing. Vienna, Austria: R Foundation for Statistical

Computing. Retrieved from http://www.R-project.org

St. Louis Fed. (2010). St. Louis Fed: Download Data for Series: M2NS, M2

Money Stock. St. Louis Fed. Retrieved April 6, 2010, from

http://research.stlouisfed.org/fred2/series/M2NS/downloaddata?cid=48

Venables, W. N., & Ripley, B. D. (2002). Modern Applied Statistics with S (4th

ed.). New York: Springer. Retrieved from

http://www.stats.ox.ac.uk/pub/MASS4

Zeileis, A., & Hothorn, T. (2002). Diagnostic Checking in Regression

Relationships. R News, 2(3), 7-10.


29/38


APPENDIX A: ACKNOWLEDGEMENTS

GNU/Linux/Ubuntu

The GNU Community has developed or enabled the functioning of all of

the tools used to create this document. All of these tools are open-source

software packages that are free to use and free to modify. The Linux kernel

powered the computing. Ubuntu is a popular distribution of Linux, and the

source for software repositories that provided the operating system,

supporting software, and core tools (except Zotero).

R

This study was done in R, a powerful command-line statistical

programming package (R Development Core Team, 2009). The advantages of

a command-line interface are that one may maintain an exactly reproducible

copy of ones work (e.g. see Appendix B), while having complete access to

many powerful functions. The disadvantage is that the learning curve takes

longer to climb compared to graphical user interfaces.

OpenOffice.org

This paper was written in OpenOffice.org, an open-source version of

Sun's StarOffice. Writer was used for word processing and document

assembly. Calc was used for data manipulation, spreadsheet functions, and

table creation.

Other Tools

SciTE with R syntax highlighting was also used to manipulate the code.

Zotero Firefox and Writer plug-ins were used to manage citations.


30/38


APPENDIX B: R CODE

This is the console input/output. It requires the files to be in the location

provided, and the lmtest and MASS libraries. The command prompt is the

">" symbol, and the "#" symbol indicates a non-executing comment.

R version 2.9.2 (2009-08-24)Copyright (C) 2009 The R Foundation for Statistical ComputingISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.You are welcome to redistribute it under certain conditions.Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.

Type 'contributors()' for more information and'citation()' on how to cite R or R packages in publications.

REvolution R enhancements not installed. For improvedperformance and other extensions: apt-get install revolution-r

> comb cor(comb, use= "pairwise.complete.obs")

Year LCSTOCK GNP GDP SPLICEGROSS InflationYear 1.0000000 0.9261381 0.9971895 0.9939544 0.9887894 0.9975192LCSTOCK 0.9261381 1.0000000 0.9715892 0.8080071 0.9340474 0.9154274GNP 0.9971895 0.9715892 1.0000000 NA 0.9999981 0.9855178

GDP 0.9939544 0.8080071 NA 1.0000000 0.9999990 0.9943664SPLICEGROSS 0.9887894 0.9340474 0.9999981 0.9999990 1.0000000 0.9782972Inflation 0.9975192 0.9154274 0.9855178 0.9943664 0.9782972 1.0000000FEDFUNDS -0.8233347 -0.6218049 -0.8141758 -0.4857107 -0.7987852 -0.8081528IMFNS 0.8859370 0.8180842 0.9549985 0.9533118 0.9287711 0.8753418M2NS 0.9683431 0.8936886 0.9902620 0.9829273 0.9898840 0.9679235M2IMFNS 0.9542132 0.8822151 0.9931752 0.9830307 0.9871683 0.9527529AUTOCOR 0.9261381 0.9508532 0.9668297 0.8545207 0.9352194 0.9307222AUTOCOR2 0.9261381 0.9060283 0.9720780 0.8874491 0.9320751 0.9238932AUTOCOR3 0.9369234 0.8664926 0.9575144 0.9434643 0.9391325 0.9144456MRINFL 0.9975192 0.9119503 0.9808353 0.9940487 0.9774272 0.9986158MRFUNDS -0.8080723 -0.6402895 -0.7588302 -0.3646230 -0.7814267 -0.7788310MRIMFNS 0.8782463 0.8099785 0.9478959 0.9329031 0.9090652 0.8900329MRM2NS 0.9696137 0.8894880 0.9940004 0.9675501 0.9867425 0.9741104

MRM2IMFNS 0.9547696 0.8778663 0.9947253 0.9609944 0.9814615 0.9621024FEDFUNDS IMFNS M2NS M2IMFNS AUTOCOR AUTOCOR2

Year -0.8233347 0.8859370 0.9683431 0.9542132 0.9261381 0.9261381LCSTOCK -0.6218049 0.8180842 0.8936886 0.8822151 0.9508532 0.9060283GNP -0.8141758 0.9549985 0.9902620 0.9931752 0.9668297 0.9720780GDP -0.4857107 0.9533118 0.9829273 0.9830307 0.8545207 0.8874491SPLICEGROSS -0.7987852 0.9287711 0.9898840 0.9871683 0.9352194 0.9320751Inflation -0.8081528 0.8753418 0.9679235 0.9527529 0.9307222 0.9238932FEDFUNDS 1.0000000 -0.6737792 -0.7703007 -0.7510230 -0.6632477 -0.7031356


31/38


IMFNS -0.6737792 1.0000000 0.9612459 0.9785086 0.8846286 0.9476079M2NS -0.7703007 0.9612459 1.0000000 0.9974368 0.9093549 0.9487988M2IMFNS -0.7510230 0.9785086 0.9974368 1.0000000 0.9097530 0.9551136AUTOCOR -0.6632477 0.8846286 0.9093549 0.9097530 1.0000000 0.9508532AUTOCOR2 -0.7031356 0.9476079 0.9487988 0.9551136 0.9508532 1.0000000AUTOCOR3 -0.7960527 0.9413438 0.9476510 0.9520965 0.9060283 0.9508532

MRINFL -0.8420432 0.8805821 0.9640316 0.9495986 0.9154274 0.9307222MRFUNDS 0.8351315 -0.5872710 -0.7318301 -0.6988825 -0.6218049 -0.6558817MRIMFNS -0.6824676 0.9750173 0.9567361 0.9682330 0.8180842 0.9239063MRM2NS -0.7584503 0.9539833 0.9985131 0.9937647 0.8936886 0.9404726MRM2IMFNS -0.7456812 0.9678341 0.9966462 0.9960230 0.8822151 0.9445584

AUTOCOR3 MRINFL MRFUNDS MRIMFNS MRM2NS MRM2IMFNSYear 0.9369234 0.9975192 -0.8080723 0.8782463 0.9696137 0.9547696LCSTOCK 0.8664926 0.9119503 -0.6402895 0.8099785 0.8894880 0.8778663GNP 0.9575144 0.9808353 -0.7588302 0.9478959 0.9940004 0.9947253GDP 0.9434643 0.9940487 -0.3646230 0.9329031 0.9675501 0.9609944SPLICEGROSS 0.9391325 0.9774272 -0.7814267 0.9090652 0.9867425 0.9814615Inflation 0.9144456 0.9986158 -0.7788310 0.8900329 0.9741104 0.9621024FEDFUNDS -0.7960527 -0.8420432 0.8351315 -0.6824676 -0.7584503 -0.7456812IMFNS 0.9413438 0.8805821 -0.5872710 0.9750173 0.9539833 0.9678341

M2NS 0.9476510 0.9640316 -0.7318301 0.9567361 0.9985131 0.9966462M2IMFNS 0.9520965 0.9495986 -0.6988825 0.9682330 0.9937647 0.9960230AUTOCOR 0.9060283 0.9154274 -0.6218049 0.8180842 0.8936886 0.8822151AUTOCOR2 0.9508532 0.9307222 -0.6558817 0.9239063 0.9404726 0.9445584AUTOCOR3 1.0000000 0.9238932 -0.6733649 0.9417894 0.9414619 0.9493800MRINFL 0.9238932 1.0000000 -0.8081528 0.8753418 0.9679235 0.9527529MRFUNDS -0.6733649 -0.8081528 1.0000000 -0.6418000 -0.7506362 -0.7293701MRIMFNS 0.9417894 0.8753418 -0.6418000 1.0000000 0.9538724 0.9741593MRM2NS 0.9414619 0.9679235 -0.7506362 0.9538724 1.0000000 0.9970302MRM2IMFNS 0.9493800 0.9527529 -0.7293701 0.9741593 0.9970302 1.0000000>> m summary(m)

Call:lm(formula = LCSTOCK ~ Year + SPLICEGROSS + MRINFL + MRFUNDS +

MRM2IMFNS, data = comb, na.action = na.exclude)

Residuals:Min 1Q Median 3Q Max

-364914 -203016 -41612 106983 820959

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) -3.897e+08 3.763e+08 -1.036 0.3134Year 1.982e+05 1.913e+05 1.036 0.3134SPLICEGROSS 1.757e+02 8.159e+01 2.153 0.0444 *MRINFL -8.347e+02 5.898e+02 -1.415 0.1732MRFUNDS 1.792e+04 3.871e+04 0.463 0.6486MRM2IMFNS -6.165e+02 2.555e+02 -2.413 0.0261 *---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 290400 on 19 degrees of freedom(8 observations deleted due to missingness)


32/38


Multiple R-squared: 0.9219, Adjusted R-squared: 0.9014F-statistic: 44.87 on 5 and 19 DF, p-value: 7.147e-10

> plot(fitted(m), residuals(m), xlab="Fitted",ylab="Residuals")> qqnorm(resid(m))> shapiro.test(residuals(m))

Shapiro-Wilk normality test

data: residuals(m)W = 0.9142, p-value = 0.03778

> library(lmtest)Loading required package: zoo

Attaching package: 'zoo'

The following object(s) are masked from package:base :

as.Date.numeric

> dwtest(m)

Durbin-Watson test

data: mDW = 1.0697, p-value = 0.0002858alternative hypothesis: true autocorrelation is greater than 0

>> library(MASS)> boxcox(m,plotit=T)

> boxcox(m,plotit=T,lambda=seq(-0.5,0.5,by=0.1))>>> malt summary(malt)

Call:lm(formula = log(LCSTOCK) ~ Year + SPLICEGROSS + MRINFL + MRFUNDS +

MRM2IMFNS + log(AUTOCOR), data = comb, na.action = na.exclude)


-0.224728 -0.071272 -0.002158 0.078216 0.189551


(Intercept) -4.631e+02 1.879e+02 -2.464 0.0240 *Year 2.387e-01 9.603e-02 2.485 0.0230 *SPLICEGROSS -2.748e-06 3.466e-05 -0.079 0.9377MRINFL -3.679e-04 2.572e-04 -1.430 0.1697MRFUNDS -9.905e-04 1.666e-02 -0.059 0.9533MRM2IMFNS -2.957e-04 1.246e-04 -2.373 0.0290 *


33/38


log(AUTOCOR) 3.814e-01 1.829e-01 2.085 0.0516 .---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 0.123 on 18 degrees of freedom(8 observations deleted due to missingness)

Multiple R-squared: 0.9891, Adjusted R-squared: 0.9854F-statistic: 271.3 on 6 and 18 DF, p-value: < 2.2e-16

> shapiro.test(residuals(malt))


data: residuals(malt)W = 0.9747, p-value = 0.7653

> dwtest(malt)

Durbin-Watson test

data: maltDW = 1.8845, p-value = 0.0847alternative hypothesis: true autocorrelation is greater than 0

> boxcox(malt,plotit=T)>> malt2 summary(malt2)

Call:lm(formula = log(LCSTOCK) ~ Year + log(SPLICEGROSS) + log(MRINFL) +

MRFUNDS + log(MRM2IMFNS) + log(AUTOCOR), data = comb, na.action =

na.exclude)


-0.242438 -0.088883 -0.004927 0.094161 0.211644


(Intercept) -72.479133 89.941534 -0.806 0.43085Year 0.037791 0.048280 0.783 0.44395log(SPLICEGROSS) 1.322121 1.589509 0.832 0.41643log(MRINFL) -0.004836 1.307447 -0.004 0.99709MRFUNDS -0.019492 0.018379 -1.061 0.30293log(MRM2IMFNS) -1.318564 0.621873 -2.120 0.04813 *log(AUTOCOR) 0.620016 0.196808 3.150 0.00553 **---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1


Multiple R-squared: 0.9859, Adjusted R-squared: 0.9812F-statistic: 209.8 on 6 and 18 DF, p-value: 1.178e-15


34/38


>> malt3 summary(malt3)

Call:

lm(formula = log(LCSTOCK) ~ Year + log(SPLICEGROSS) + MRFUNDS +log(MRM2IMFNS) + log(AUTOCOR), data = comb, na.action = na.exclude)


-0.242398 -0.088944 -0.005036 0.094195 0.211602


(Intercept) -72.58974 82.56247 -0.879 0.39027Year 0.03784 0.04502 0.841 0.41102log(SPLICEGROSS) 1.31767 1.00937 1.305 0.20733MRFUNDS -0.01945 0.01461 -1.331 0.19881log(MRM2IMFNS) -1.31744 0.52746 -2.498 0.02185 *

log(AUTOCOR) 0.62009 0.19068 3.252 0.00419 **---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1




Call:lm(formula = log(LCSTOCK) ~ log(SPLICEGROSS) + MRFUNDS + log(MRM2IMFNS) +

log(AUTOCOR), data = comb, na.action = na.exclude)


-0.24297 -0.08841 0.01259 0.11157 0.22009


(Intercept) -3.22647 2.76583 -1.167 0.2571log(SPLICEGROSS) 1.83905 0.79046 2.327 0.0306 *MRFUNDS -0.01805 0.01441 -1.253 0.2246log(MRM2IMFNS) -1.24958 0.51741 -2.415 0.0254 *log(AUTOCOR) 0.63590 0.18835 3.376 0.0030 **---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1


Multiple R-squared: 0.9854, Adjusted R-squared: 0.9825F-statistic: 337 on 4 and 20 DF, p-value: < 2.2e-16


35/38


>> #Final model, malt5> malt5 summary(malt5)

Call:lm(formula = log(LCSTOCK) ~ log(SPLICEGROSS) + log(MRM2IMFNS) +



-0.22263 -0.09233 0.01955 0.09150 0.23279


(Intercept) -4.9809 2.4174 -2.060 0.05196 .log(SPLICEGROSS) 1.9336 0.7975 2.425 0.02443 *log(MRM2IMFNS) -1.0676 0.5033 -2.121 0.04597 *

log(AUTOCOR) 0.5758 0.1846 3.119 0.00519 **---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1




Call:lm(formula = log(LCSTOCK) ~ log(SPLICEGROSS) + log(AUTOCOR),

data = comb, na.action = na.exclude)


-3.358e-01 -1.018e-01 4.623e-05 1.066e-01 2.075e-01


(Intercept) -1.2297 1.6680 -0.737 0.468log(SPLICEGROSS) 0.4299 0.3777 1.138 0.267log(AUTOCOR) 0.7774 0.1647 4.721 9.34e-05 ***---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1



>


36/38


> malt7 summary(malt5)




-0.22263 -0.09233 0.01955 0.09150 0.23279


(Intercept) -4.9809 2.4174 -2.060 0.05196 .log(SPLICEGROSS) 1.9336 0.7975 2.425 0.02443 *log(MRM2IMFNS) -1.0676 0.5033 -2.121 0.04597 *log(AUTOCOR) 0.5758 0.1846 3.119 0.00519 **---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1




Call:lm(formula = log(LCSTOCK) ~ log(AUTOCOR), data = comb, na.action = na.exclude)


-0.48053 -0.09394 0.01333 0.12556 0.22081


(Intercept) 0.86816 0.35789 2.426 0.0220 *log(AUTOCOR) 0.94333 0.02646 35.657 comb2 > cor(comb2, use= "pairwise.complete.obs")

Year LCSTOCK SPLICEGROSS MRM2IMFNS AUTOCOR AUTOCOR2Year 1.0000000 0.9261381 0.9887894 0.9547696 0.9261381 0.9261381LCSTOCK 0.9261381 1.0000000 0.9340474 0.8778663 0.9508532 0.9060283


37/38


SPLICEGROSS 0.9887894 0.9340474 1.0000000 0.9814615 0.9352194 0.9320751MRM2IMFNS 0.9547696 0.8778663 0.9814615 1.0000000 0.8822151 0.9445584AUTOCOR 0.9261381 0.9508532 0.9352194 0.8822151 1.0000000 0.9508532AUTOCOR2 0.9261381 0.9060283 0.9320751 0.9445584 0.9508532 1.0000000AUTOCOR3 0.9261381 0.8664926 0.9391325 0.9493800 0.9060283 0.9508532

AUTOCOR3

Year 0.9261381LCSTOCK 0.8664926SPLICEGROSS 0.9391325MRM2IMFNS 0.9493800AUTOCOR 0.9060283AUTOCOR2 0.9508532AUTOCOR3 1.0000000> m2 summary(m2)


log(AUTOCOR), data = comb2, na.action = na.exclude)


-0.22263 -0.09233 0.01955 0.09150 0.23279


(Intercept) -4.9809 2.4174 -2.060 0.05196 .log(SPLICEGROSS) 1.9336 0.7975 2.425 0.02443 *log(MRM2IMFNS) -1.0676 0.5033 -2.121 0.04597 *log(AUTOCOR) 0.5758 0.1846 3.119 0.00519 **---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1



>> plot(fitted(m2), residuals(m2), xlab="Fitted",ylab="Residuals")> qqnorm(resid(m2))> shapiro.test(residuals(m2))


data: residuals(m2)W = 0.9527, p-value = 0.2885

> library(lmtest)> dwtest(m2)

Durbin-Watson test

data: m2DW = 1.8683, p-value = 0.1872


38/38


alternative hypothesis: true autocorrelation is greater than 0

> library(MASS)> boxcox(m2,plotit=T)> boxcox(m2,plotit=T,lambda=seq(-1,3,by=0.1))> m3 summary(m3)


log(SPLICEGROSS) * log(MRM2IMFNS) + log(AUTOCOR), data = comb2,na.action = na.exclude)


-0.20975 -0.08643 0.01519 0.08435 0.22880


(Intercept) -11.86766 11.49803 -1.032 0.31432log(SPLICEGROSS) 2.53960 1.27772 1.988 0.06072 .log(MRM2IMFNS) -0.10262 1.65485 -0.062 0.95117log(AUTOCOR) 0.58928 0.18869 3.123 0.00536 **log(SPLICEGROSS):log(MRM2IMFNS) -0.08825 0.14395 -0.613 0.54673---Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1



> # This creates the variables plot> comb3 plot(comb3)

predictors of stock market values

Documents