case study: a linked linear regression - data envelopment...

40
Case Study: A Linked Linear Regression - Data Envelopment Analysis Model for the Long-Term Prediction of Mutual Fund Performance Cindy Chao, Greg Higgins Michael McGee Department of Computer Science and Engineering SOUTHERN METHODIST UNIVERSITY Dallas, Texas May 14, 1994

Upload: others

Post on 05-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

Case Study: A Linked Linear Regression -

Data Envelopment Analysis Model for the Long-Term Prediction of

Mutual Fund Performance

Cindy Chao, Greg Higgins Michael McGee

Department of Computer Science and Engineering SOUTHERN METHODIST UNIVERSITY

Dallas, Texas

May 14, 1994

Page 2: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR RECRESSION - DEA MODEL

Contents

Abstract.............................................................................................................................3

Introduction......................................................................................................................3

Whatis a Mutual Fund? .................................................................................................4

DEAFactors......................................................................................................................4

Model Development and Validation.............................................................................6

Examinationof Resiilts ...................................................................................................9

InvestmentPerformance............................................................................................... 10

Conclusion...................................................................................................................... 11

References....................................................................................................................... 12

ProfessionalReferences................................................................................................. 13

Appendix........................................................................................................................ 14

DEAFormulation .............................................................................................. 15

DEACode ........................................................................................................... 16

ForcastingCode ................................................................................................. 19

2

Page 3: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

Case Study: A Linked Linear Regression - Data Envelopment Analysis Model for the Long-Term Prediction of Mutual Fund Performance

Michael McGee Greg Higgins Cindy Chao Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275

Data Envelopment Analysis (DEA) can be used to evaluate managerial efficiency of mutual funds. By extending the DEA methodology with linear regression, a new prediction tool for future efficiency is developed. The linked model was tested on a sample of fifty growth mutual funds over a three year time horizon - resulting in accurate predictions of future efficiency with 99% confidence.

The current mutual fund market contains nearly $1.4 trillion

dollars; this rapid growth of mutual

funds as a long—term investment instrument has led to an increased

current assessment of a fund's management. DEA is a linear programming methodology that

computes relative efficiency scores among a group of homogeneous

demand for evaluation and forecasting decision—making units (DMUs). techniques. The performance of a Combined with linear regression, DEA mutual fund is directly linked to the performance of the management staff. Data Envelopment Analysis (DEA) can

be used to evaluate managerial

efficiency; however, this is only, a

becomes a prediction tool for long-term forecasting. The linked model not only evaluates return on investment

but also the quality of the fund's managerial team on a long—term basis.

3

Page 4: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

What is a Mutual Fund?

A mutual fund is an investment company that pools the money of many institutional and individual investors into one professionally

managed investment account. The

fund manager creates a portfolio for the shareholders by buying a variety of "true" financial instruments, such as corporate stocks and bonds, and

government and municipal securities, in accordance with the fund's particular investment objectives. Fund managers are professional investors who know the financial market and

follow it daily with extensive research capabilities to effectively investigate all investment opportunities. The fund is always monitored, with individual securities constantly being bought, sold, and traded in order to optimize

the investment objectives. The beauty of investing in sometimes twenty different securities lies in the protection

of the investment through portfolio diversification.

Mutual funds are divided into six major categories according to their investment objectives, which are outlined in the fund's prospectus, such as goals, level of risk, and the type and frequency of return desired. Since the

goal of the linked model is forecasting of long-term performance, growth mutual funds were selected because their objectives emphasize the accumulation of capital gains, and they have the greatest long-term growth potential.

DEA Factors

The inputs and outputs of the envelopment analysis were made up of the main characteristics of a mutual fund. They were selected because they are important factors to investors and are directly linked to the performance of the fund, manager. The inputs

selected were beta - a measure of risk, expense ratio, and loads & other fees. The outputs used were net asset value (NAV) per share, dividends, and capital gains. Inputs

Beta is a measure of the fund

portfolio's average volatility,or risk, relative to the S&P 500. The beta value of the market is a constant equal to 1.00. A fund with a beta value greater than 1.00 is more volatile than the market. It is expected to outperform the "average fund" in a bull (rising) market and bring in greater returns, while it is expected to fall more than

4

Page 5: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION DEA MODEL

the average fund in times of a bear (declining) market. A fund with a beta value less than 1.00 Possesses the characteristics of the converse situation. For example, a portfolio with a beta value of 0.75 is expected to be 25% less responsive to the general economic trend than the average fund and return 25% more conservative than the average fund.

Expense ratio is a measure which relates expenses to the average net assets for a period. It is calculated by taking the annual expenses paid by the fund (including management and investment advisory fees, custodial

and transfer agency charges, legal fees, and distribution costs) divided by the

average shares outstanding for the period. The greater the expense ratio, the larger the percentage of the shareholder's investment gets paid out directly to the management company. i The expense ratio does not include

loads or commissions paid for sales or f transaction fees. f

Loads and other fees represent all C other charges made to the shareholder which are not accounted for in the h expense ratio. A load is a portion of the c offering price of the fund that pays for v costs such as sales commissions,

brokerage costs, and transaction fees. This value is a Percentage of the

Purchase price taken off either before

the money is invested (front-end load) or after investment when the

shareholder redeems the shares (back- end load). The amount of the load is established by the managerial company, but is paid to the sales and transaction team. Some funds have a load of 0.00%, meaning that they have the same

benefits as load funds, except they are sold directly by the fund company

rather than through a sales broker so that no loads or other fees are incurred. With no-load funds, the investor must

seek out and contact the investment firm directly. There is a distinct advantage to investing in no-load funds. For example, if we had invested 1000 in a particular portfolio with a

7.50% front-end load, $75 would be :aken off the top, and we would be left vith only $925 worth of shares in the

und. Had this been a no-load fund, the till $1000 would have been utilized. )utputs

Net asset value per share represents DW much each share is worth. It is LlCulated by taking the total market

due of a fund's shares (securities plus sh plus accrued earnings minus

5

Page 6: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

liabilities) divided by the number of

shares outstanding. No profits or losses are realized through NAV until the

fund is sold. Interest lies in the amount

of increase or decrease in NAV from time of purchase to time of sale.

Dividends are payments declared by a corporation's board of directors and made to shareholders, usually on a quarterly basis. A fund's dividends come directly from the dividends paid

out by the individual securities within the portfolio. They are paid out equally among the shareholders on a scheduled time frame.

Capital gains on a mutual fund are the profits made from the sale of a

capital asset (such as a security, stock, or bond) within the portfolio. The

number used for capital gains is adjusted for losses and stock splits. The sale of an asset occurs when the

manager decides that a security needs to be sold for the benefit of the

portfolio. Capital gains are distributed to shareholders when they are realized

(when a security is sold and a profit is made) and are paid out as capital distributions.

Dividends and capital gains are t treated as separate outputs because

they are incurred, paid out, and taxed f

in different ways. Shareholders can reinvest dividends and capital

distributions if they elect to do so.

Model Development and Validation The decision—making units (DMUs)

in this analysis were the 50 growth

mutual funds. A prerequisite of DEA is homogeneous DMUs. By selecting the

funds from the same sector (growth funds), this requirement is met.

The DEA model uses the dual formulation of the linear program as described by Charnes, Cooper and

Rhodes (1978). The objective function of this form is to minimize the

objective value of the current DM10. If a DM10 is efficient the optimal value is 1, while an inefficient DM10 is driven

towards zero. The first step in the

development of the linked model is the generation of DEA scores for the

mutual funds being studied. The data collected for the mutual funds is used

as the inputs and outputs for the DEA code which is documented in the appendix. The DEA code is written

using the SAS/OR package allowing :he DEA scores to be fed directly into he regression phase of the model.

DEA scores are generated for the ifty mutual funds in the study using

Page 7: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

the 1990 data collected from Morningstar Mutual Funds. At this point

DEA scores are also generated for 1993 and stored for use in validating the model's forecasts. The 1990 DEA scores are then regressed against all of the six

DEA factors previously described. This regression model is unique

from previous research in that the goal of the regression is not to create an efficiency score, but to forecast an efficiency score for the future of a particular DIv[LT. Previous work by Thanassoulis (1993) compared the DEA

methodology with linear regression as a means of creating efficiency scores. Our model links the two methods into one prediction tool. The full factor model that resulted appeared on the surface to be a good predictor of efficiency (R2 =

0.8239). The full factor model included three non-significant factors at the 5%

level of significance. Beta (p-value = 0.68), load (p-value = 0.65), and dividend (p-value = 0.50) were all eliminated from the model leaving only

expense ratio, net asset value, and capital gains as significant factors. The revised model remained significant

with an R2 = 0.8213.

As stated before, this model appears to be a good predictor; however, linear

IflI &A A*Iy.I. •_.

S.. I I

•.I 5.7 5.5 •.4 •.5 II 1.2

.,a,.t.. V.,.. - *fl P5(5

Figure 1: Residual vs. Predicted DEA Score

regression models must be validated

through diagnostic tests. One assumption in regression analysis underpins the entire methodology - errors are a randomly distributed variable following a normal distribution with mean equal to zero, N(0,a2). Regression analysis in SAS is based on a linear relationship between the dependent and independent variables.

These assumptions can be verified through the examination of the residuals, or errors, of the regression model. A plot of the residuals versus the predicted values, Figure 1, reveals a distinct pattern indicating a non-linear

relationship between the independent and dependent variables. This requires the examination of univariate statistics

7

Page 8: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION — DEA MODEL

for each of the variables in the regression model. A standard normal quantile plot for each of the variables allows the determination of the distribution of each variable.

From the univariate procedures, some of the variables demonstrated an

exponential distribution requiring a log—linear transformation in order to validate the regression model. Net asset value, capital gains, dividends and the actual DEA score follow exponential distributions while beta, load, and expense ratio are all normally distributed. A second data set of the inputs and outputs was created by

taking the natural log of each of the exponentially distributed variables. Once normality was confirmed

through univarjate procedures, a new full factor linear regression model was fitted. The newly formed model resulted in an R2 = 0.8163 and was validated through the residual plot shown in Figure 2. Using the SAS regression report, the regression equation was created using the

coefficients of the significant factors:

beta, expense ratio, net asset value, and dividend.

The next step is the creation of predicted efficiency scores. The

common method for forecast model verification is the prediction of the

past. The linked model is verified by using the input and output data for

1993 in the regression equation created from the 1990 data. The predicted

values are then compared to the actual 1993 DEA scores generated earlier by the DEA code. The forecast errors are evaluated using a significance test on the mean difference. The test is

conducted at the 1% level of significance. H0: mean difference = 0; Ha: mean difference # 0. The resulting test statistic: z = -0.5164 and critical value, z = 2.57 leads to the conclusion that the prediction values are not

significantly different from the actual

DEA scores with 99% confidence.

II,:

•I

,

• .i — •.S P.O AS PfldItt*O V.,.. It LOLA POLO

Figure 2: Residual vs. Predicted DEA Score—after linear traa

8

Page 9: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Fund DEA 1990 Predicted 1993 DEA 1993 Error

FTRNX 1.00000 1.00000 0.62320 -0.37680

PRWAX 1.00000 0.78983 0.29415 -0.49568

G1NLX 1.00000 0.33207 0.23691 -0.09516

ACRNX 1.00000 1.00000 1.00000 0.00000

FDETX 0.41944 0.98900 1.00000 0.01100

CFIMX 0.73415 0.98032 1.00000 0.01968

LOMCX 0.32965 0.68927 1.00000 0.31073Table 1: Efficient MUtUI t-unas

Examination of Results

In analyzing the results of the sample studied, it is important to remember that a mutual fund is mainly used as a long-term investment, providing stable low-risk growth. A

potential shareholder wants to invest in a fund that will be efficient in the future, regardless of its current efficiency. Therefore we focus our examination on the set of funds with efficient scores in either 1990 or 1993.

Table 1 shows the 1990 DEA, Predicted

1993, and actual 1993 DEA scores along with the errors for each prediction (Error = 1993 DEA - Predicted 1993).

Of the first four funds listed (all receiving efficient ratings in 1990), the

future efficiency of three funds was correctly predicted. Our linked model

predicted future behavior of currently efficient funds with 75% accuracy. Although the error for T. Rowe Price New America Fund, PRWAX, is high, we are not concerned with the magnitude of inefficiency, only the

state of its efficiency. Of the last four funds in the table

(all receiving efficient ratings in 1993), three funds were predicted accurately. The fifth and sixth funds, Fidelity Destiny Portfolios: Destiny II, FDETX, and Clipper Fund, CFIMX, both

received a predicted score of approximately 0.98 for 1993. Based on the previous significance test, these values are not significantly different

from 1.00. Thus our model predicts the funds that will be efficient in the future

with 75% accuracy.

Page 10: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Table 2: Investment Simulation Results

Investment Performance Since investors are concerned with

the long-term growth and appreciation of their investments, DEA by itself is not an effective tool to aid in the

development of investment strategies.

The percent of appreciation we

received in the portfolio of 1990 efficient mutual funds was 46.92%, while the appreciation was significantly higher for the portfolio of funds recommended by the linked

However, combined with linear model at 60.41%. regression, DEA gives us the ability to estimate the long-term efficiency of a fund. To illustrate the effectiveness of the linked model relative to DEA alone, we simulated an investment in the efficient funds in 1990 and the funds

predicted to be efficient by our linked model. Since mutual funds allow the purchase of fractional shares, we invested $1000 in each of the mutual

funds. The results are given in Table 2.

10

Page 11: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Conclusion The linked model allows us to

broaden the range of applications for DEA. The linked model enables the

prediction of future efficiency, which is of great benefit to investors because it

provides more insight into the long-term viability of a mutual fund. The model also requires little historical data

for each DM0 unlike other forecasting methods. Another benefit is the relative

compactness of the model; very few predictors are required to explain a high degree of variability and achieve

accurate predictions. Data envelopment analysis by itself is not an accurate predictor of future efficiency of mutual funds. Only one fund that

was efficient in 1990 remained in the efficient set in 1993. Linear regression

and DEA combine to give us an accurate prediction for the future

behavior of DMUs.

An immediate extension to this

research is the definition of an "ideal

type" mutual fund. An optimization algorithm will be used to generate values for the regression factors and

record the combinations which yield and efficient fund prediction. Once this

benchmark is defined a growth mutual fund prospectus can simply be visually inspected to determine managerial efficiency. The next phase of this research would be to compare the accuracy of the linked model with that

of other forecasting methods, such as time series decomposition. Another step in the continued development of this area of research is to improve the accuracy of the predictions through

further modifications to the regression model using a weighted regression

model or the inclusion of adaptive

error control.

11

Page 12: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

References

Barr, R. S., L. M. Seiford, and T. F. Siems (1991), "An Envelopment Analysis Approach to Measuring the Management Quality of Banks," The Annals of Operations Research. J.C. Baltzer AG, Science Publishers.

Dilorio, F. C. (1991), SAS Applications Programming: a Gentle Introduction. Boston: PWS-KENT Publishing Company.

Flatt, L. G. (1994), "FUNDamentals: An Envelopment-Analysis Approach to Measuring the Managerial Quality of Mutual Funds," technical report. Dallas, Texas: Southern Methodist University, Department of Computer Science and Engineering.

Morningstar Mutual Funds User's Guide (1992), Chicago: Morningstar, Inc.

Morningstar Mutual Funds (1993), Chicago: Morningstar, Inc.

Norman, M. and B. Stoker (1991), Data Envelopment Analysis: The Assessment of Performance. New York: John Wiley & Sons.

SAS Institute Inc. (1989), SAS/OR User's Guide, Version 6, First Edition. Cary, NC: SAS Institute Inc.

Sexton, T. R., S. Sleeper, and R. E. Taggart, Jr. (1994), "Improving Pupil Transportation in North Carolina," Interfaces, January-February 1994. Providence, RI: The Institute of Management Sciences.

Thanassoulis, E. (1993), "A Comparison of Regression Analysis and Data Envelopment Analysis as Alternative Methods for Performance Assessments," O.R. Journal of the Operations Research Society. Coventry: Operational Research Society Ltd.

12

Page 13: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Professional References

Barr, Richard S., Ph.D. Associate Professor, School of Engineering and Applied Sciences. Southern Methodist University, Dallas, Texas.

Dula, Jose, Ph.D. Assistant Professor, School of Engineering and Applied Sciences. Southern Methodist University, Dallas, Texas.

Guerra, Rudy, Ph.D. Assistant Professor, Department of Statistical Science. Southern Methodist University, Dallas, Texas.

Welch, David. Senior Systems Analyst II, Bradfield Computer Center. Southern Methodist University, Dallas, Texas.

13

Page 14: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Appendix

DEA Formulation Mat hematical formuation of the DEA model used in the Linked Model.

DEA Code The DEA code is developed using Release 6.0 of the SAS System on an IBM 3090 mainframe. The code presented requires only two input files . One file contains the input and output factor data while the second file contains a listing of the mutualfrnd ticker symbols. The program is construct ed from a series of SAS Macros which fully automate the formulation and solution of the multiple linear programs required in Data Envelopment Analysis.

Forecasting Code Developed as an extension to the DEA code, the forecasting model uses the same computing platform and software. The program reads the same data files as the DEA code as well as the DEA scores generated by the DEA study, The program generates predicted efficiency scores and all necessary statistical procedures for validating the model.

14

Page 15: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

DEA Formulation

A

[X

Inputs

10outputs

+ Slacks =

e

-

A.

Min E)

A-A:!^O A7

e^!o

Mat heinaticalformulation of DEA requires that the inputs and out puts for each DMU are formated in a matrix, A The matrix is multiplied by a vector, A,to construct the linear programming constraints.

Minimize 0

Subject to:

Inputs Beta: X,DMUl+XDMU2+... +xODMU5O-x51(0i'l):5OiO

Load: xIDMUI+xDMU2+... +xo'DMU50-X5(0i1)!90O

E_Pafio: xiDMUi +x2DMU2+... +x5oDMUso-x51(0i l):50 O

Output$ NAV: X1DMUI+XDMU2+... +X5ODMU5O-X51(00) 011

Cap_Gain: xrDMUr +x2 DMU2+ ... +x5ODMU5Ox5(0rO) 0i • 1

Dividend: xrDMUr +)(2 DMU2 + ... + X50 DMUso - X51 (0 0) 1

All Variables 2: 0

Mere a Is current ciecislon-maldng unit (DMU)

Linear program solved by SAS for each DM11.

15

Page 16: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION —DEA MODEL

DEA Code Part 1: Initialization

This section reads the data files into a SAS dataset. Using the SAS Data Management Language datasets are created for the right hand side vectors, as well as the theta vectors. The two datasets are perfectly matched - each right hand side has exactly one corresponding theta vector. There is one right hand side and one theta vector for each DMU in the study. These vectors are merged with the inputs and outputs to formulate the linear program. The final step in the initialization is the renaming of variables to the proper ticker symbols in order to aid in report interpretation.

MICHAEL MCGEE SENIOR DESIGN GROUP '94 DEA ANALYZER

FILENAME NDT MENAME CTAA';

FILENAME DAflAF9O DATAA'; OPTIONS UNESIZE.72;

%LET SYMBOLS = ACRNX LOMCX HRTVX STCSX BEONX GINIX PARNX VCAGX DEVLXAGBBX

FPPTX FDE1X FDESI( FCN1( FBGRX FDVLX SRFCX FRGRX BRW1X SAMYX FDFFX FAGOX MLSVXALTFX ANEFX GPAFX PRWAX CGRWX OVOPX WTMGX

NYVTX SEOGX BVALX BARAXAFGPX FTRNXAMRGX DNLDX HCRPX NBSSX PRGM( INNSI( TWHIX DWM( LMITX JHNX SRSPX CFIMX cWBX BOSSX

DATA TICK; INFILENDT; ARRAY TICKR(50) $5;

INPUT TICKRr);

r MUTUAL FUND TICKER SYMBOLS

DATA ONE; INFILE DAT; ARRAY FUND (50)

INPUT JD_$ FUND() _TYPE_ $ _RHS.;

r GENERATE THETA COLUMNS 1

DATA NEW-COL; SET ONE; ARRAY THETA (SO);

ARRAY FUND(50);

DO I ITO W. IF N_ = I THEN THETA(I) I;

ELSE IF_N== 2 &_N_ <--4 THEN THETA(I .FUNO(I);

ELSETHETA(I) =0;

END DROP I;

P GENERATE RIGHT HAND SIDE VECTORS 1

DATA RANGE; ARRAY RH(50); ARRAY FUND(50);

SET ONE; DO I = ITO SE; IF _N_ = I THEN RH(I) MISSING;

ELSE IF _N_ 4 THEN RH(I)= FUND([),

ELSE RH(I) 0;

END; DROP MISSING I FUNDI.FUND50;

DATA PARAM; MERGE NEW_COL RANGE;

P RENAMES THE FUNDSTO THE PROPER NASDAQTICKER SYMBOL!

PROC DATASETS;

MODIFY ONE; RENAME FUNDI =ACRNX

FUND2=LOMCX

FUND3= HRTVX FUND4 STCSX FUNDS=BEONX FUNDO=GINLX

FUNDT=PARNX FUND8=VCAGA FUND9=DEVLC FUNDIO=AGBBX

FUNDII= FPPIX FUNDI2= FDEIX

16

Page 17: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

FUNDI3 FDESX FUND14 FCNTX FUNDI5= FBGFO( FUND16 FDVLX

FUND17 SRFCX FUNDI8 FRGRX

FUNDI9 BRW1X FUND20SAMVX

FUND21 FDFFX FUND22 FAGOX FUND23=MLSVX FUND24.ALTFX FUND26ANEFX

EUND26 GPAFX FUND27. PRWAX

FUND28=CGRWX

FUND29. QVOPX FUND30.WTMGX FUND3I NWTX FUND SEX3X

FUND33 BVALX FUND34BARAX FUND35=AFOPX FUND36= FTRNX FUND37AMRGX

FUND38= DNLDX

FUND39 HCRPX FUND40=NBSSX FUND41 PRGY( FUND42= INNDX

FUND43 TWHIX FUND44=DWIVX FUND45=LMVTX FUND46=JHNGX FUND47 SRSPX FUND48CFX FUND49 CLMBX

FUND5O= BOSSX;

Part 2: DEA Macro This macro accepts one parameter:

the index of the DMIJ to evaluate. The macro merges the input/output dataset with the appropriate theta and right hand side vectors. This dataset now contains the LP formulation for SAS to analyze. The procedure generates detailed printed reports for each of the

NAME DEA DEC: AUTOMATES THE CREATION OF PROBLEM DATASET.

MERGES THE CURRENT THETAAND RHS INTO THE COEFFICIENT MATRIX AND EXECUTES THE 1?,

PAI*A: NDMU -THE INDEX NUMBER OF THE DMU TO BE ANALYZED

%MACRO DEA(NOMU);

DATA —NULL—; ARRAY TICKR(50) $S; SEE TICK: CALL S'PUT (FUND)(',PUT(TICKR{&NDMU}$S.));

RUN;

DATA CUR—U; ARRAY THETA(50);

ARRAY RH{5O}; IMIDMU; SET PARAM; CTHETA THETA(I);

RHS_ RH{I}; KEEP CTHETA_RHS_;

DATA PROBLEM; MERGE ONE CUR_DMU;

TIELEI DATA ENVELOPMENT ANALYSIS'; TITLE2 CURREN1 E(AU= &FUNDX';

P EXECUTE THE LP' / PROC LP DATAPROBLEM POUT a &FUNDX;

%MEND DEA

17

Page 18: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Part 3: Loop Macro ' In order to iterate the execution of a

SAS procedure without repetitive statements a macro was written to loop around the calls to the DEA macro. The loop macro accepts one parameter: the number of DMUs in study.

Part 4: Extract Macro In addition to the written reports

generated by the DEA macro, datasets were created in memory containing the results of the LPs. This macro combines the reports for each of the DMUs and extracts the DEA scores into a dataset which is then fed to the forecasting model.

NAME: LOOP DEM. ITERATIVELY CALLS THE DEA MACRO PA4: LIMIT - NUMBER OF OMUS TO BE PROCESSED

%MACRO LOOP(LIMIT) %DO INDEX I %TO &LIIrr;

%DEA(&INDEX); %ENE;

%MEND LOOE;

NAME: EXTRACT DEW. EXTRACTS THE OBJECT WE VALUES FROM THE

INDIVIDUAL FUND REPORTS AND CREATES DATASET CONTAINING THE IN FORMATION

%MACRO EXTRACT;

P MERGE LP OUTPUT DATASETS AND REMOVE UNIMPORtANT VARIABLES V

DATA REEl; SEE &SYMBOLS; KEEP _VAR_ _VALUE.

P EXTRACT OBJECTIVE FUNCTION VALUES i

DATA RES2;

SEE RESI; IF _VAft. OBJECr THEN SCORE = -VALUE-;

ELSE DELETE; KEEP SCORE

P GENERATE LISTING OF MUTUAL FUNDS DATA RESE; SEE TICK;

ARRAY TIOKR(50) $5; DO I ITO 50 PCHANGE THIS TO 50 IN PRODUCTION RUN 1

NAME TICKR(I); OUTPUT;

END;

KEEP NAME;

P MERGE CNTASEI' INTO ANAL FORM!

DATA RESULTS;

MERGE RESE RESE;

PROC PRINT DATA=RESULTS;

WEND EXTRACT;

rEXECUTABLE CODE

%LOOP(50); %EXTRACT;

18

Page 19: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Forcasting Code

P Mutual Fund Data Files 1

filename dati 'mfs93 data a'; filename dat2 'mfname data a'; filename dat5 'mfs90 data a';

P DEA Score Data Files 1

filename dat3 'score93 data a'; filename dat4 'score90 data a';

option linesize 72 nonumber nodate;

P Read 1993 Mutual Fund Data */ data one;

mule dati; array val(50); input name $ val{};

proc transpose; P Transpose Matrix: from lO*DMU to DMUIO 1

run;

proc datasets; P Assign proper names to lOs 1

modify data 1; rename colt =Beta

co12 = Load co13 = E_Rabe col4=NAV col5 = Cap-Gain co16 Dividend

_Name_= Fund;

data funds; P Create List of Fund Names / infile dat2; input Fund $;

data scr93; r Read 1993 DEA scores from file 1

infile dat3; input dea93;

tie1 '1993 DEA Analysis';

data comp93; set datal; merge datal funds scr93;

/ Run All possible regression models 1

proc reg; tle2 'All Possible Regression Models';

model dea93=Beta Load E_Ratio NAV Cap-Gain Dividend/ selecben=Rsquare;

/* Full Fit Model */ proc rag;

title2'Full Model'; model dea93=Beta Load E_Ratio NAV Cap-Gain Dividend;

proc rag; tle2 'Minimum Spec Model;

model dea93 = E_Ratio NAV CAP_GAIN;

/* 1990 Data Analysis */

data info9o; infile dat5; array val(50}; input name $ val{};

proc transpose; P Transpose Matrix from IODMU to DMUIO 1

run;

proc datasets; P Assign proper names to lOs modify data2; rename coil = Beta

col2 Load co13 E_Rabo col4=NAV col5 = Cap-Gain coI6 Dividend

_Name_= Fund;

data scr9o; infile dat4; input dea90;

data comp90; merge data2 funds scr90;

tiflel 1990 DEA Analysis';

proc reg data=comp9o; tille2 'All Possible Regressions'; model dea9o= Beta Load E_Ratio NAV Cap-Gain Dividend

/selection rsquare;

proc reg data=cornp9o; tifle2 'Full Model'; model dea90= Beta Load E_Ratio NAV Cap-Gain Dividend;

proc rag data=comp9o; tide2 'Minimum Spec'; model dea90= E_Ratio NAV CAP-GAIN; plot residual.(predicted. E_Ratio NAV Cap-gain);

/* Forecasting Validation Model 1

P Create predicted values using forecast model compare to actual 1993 values.

19

Page 20: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

data predict; model Ldea = Beta Load E_Ratio LNAV LCG LDiv set datal; plot residuat(predicted.); Exp_dea=-0.177220E_Rao + 0.016477NAV +

0.152344Cap_Gain+0.279313; proc reg data = linear; if Exp_dea> 1 then Exp_dea = 1; model Idea = Beta E_Rao LNAV LDIV;

data forecast; merge predict funds scr93 scr9o; / Linear Regression Equation / Err = dea93 - exp_dea; data predict2; keep fund dea9O exp_dea dea93 err; set trans93;

PDEA 0.6874598eta - 0.525658E_Ratio + 0.753563LNAV + O.229900LD1V -2.930810;

Exp_dea = exp(PDEA); tine 'Regression Predictions'; if Exp_dea> I then Exp_dea = 1; tiUe2'Non-Linear Model'; proc pnnt data=forecast; / Compute Forecast Errors

var fund dea90 dec93 exp_dea err; data frcs; merge predict2 funds scr93 scr90;

proc plot; Err = dea93 - exp_dea; plot exp_dea * dea93; keep fund dea90 exp_dea dea93 err;

/ added 4/18/94'1 P Plot 93 Prediction vs 90 Actual data fcas title 'Regression Predictions';

merge scr90 forecast; tille2'Linear Model'; keep exp_dea dea90 dea93 fund; proc print data=frcst2;

var fund dea90 dea93 exp_dea err;

tillel 'Predicted 93 vs. Actual 1990'; tiUe2 'Non-Linear Model'; proc plot data=fcas

plot exp_dea * dea90;

P Linear transform to improve predictions data linear;

set comp90; Ldea = log(dea90); LNAV = log(NAV); it Cap—Gain = 0 then LCG = 0;

else LCG log(Cap_Gain); LEA log(E_Ratio); if Dividend = 0 then LDIV = 0;

else LDIV = LOG(Dividend);

P Transform forecast inputs to linear model /

data trans93; set comp93; Ldea log(dea93); LNAV = log(NAV); if Cap—Gain = 0 then LCG = 0;

else LCG = log(Cap_Gain); LER = log(E_Ratio); if Dividend = 0 then LDIV = 0;

else LDIV = LOG(Dividend);

Title 'Linear Forecast Model'; title2 'Model Verification'; proc rag data = linear;

20

Page 21: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

Case Study: A Linked Linear Regression -

Data Envelopment Analysis Model for the Long-Term Prediction of

Mutual Fund Performance

Cindy Chao, Greg Higgins Michael McGee

Department of Computer Science and Engineering SOUTHERN METHODIST UNIVERSITY

Dallas, Texas

May 14, 1994

C

Page 22: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Contents

Abstract.............................................................................................................................3

Introduction...................................................................................................................... 3

Whatis a Mutual Fund? .................................................................................................4

DEAFactors......................................................................................................................4

Model Development and Validation............................................................................. 6

Examinationof Results ...................................................................................................9

InvestmentPerformance............................................................................................... 10

Conclusion...................................................................................................................... 11

References....................................................................................................................... 12

ProfessionalReferences................................................................................................. 13

Appendix........................................................................................................................ 14

DEAFormulation .............................................................................................. 15

DEACode ........................................................................................................... 16

ForcastingCode ................................................................................................. 19

2

Page 23: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

Case Study: A Linked Linear Regression - Data Envelopment Analysis Model for the Long-Term Prediction of Mutual Fund Performance

Michael McGee Greg Higgins Cindy Chao Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275

Data Envelopment Analysis (DEA) can be used to evaluate managerial efficiency of mutual funds. By extending the DEA methodology with linear regression, a new prediction tool for future efficiency is developed. The linked model was tested on a sample of fifty growth mutual funds over a three year time horizon - resulting in accurate predictions of future efficiency with 99% confidence.

The current mutual fund market current assessment of a fund's contains nearly $1.4 trillion management. DEA is a linear

dollars; this rapid growth of mutual funds as a long-term investment instrument has led to an increased demand for evaluation and forecasting techniques. The performance of a mutual fund is directly linked to the performance of the management staff. Data Envelopment Analysis (DEA) can be used to evaluate managerial

programming methodology that computes relative efficiency scores among a group of homogeneous decision-making units (DMU5). Combined with linear regression, DEA becomes a prediction tool for long-term forecasting. The linked model not only evaluates return on investment but also the quality of the fund's

efficiency; however, this is only a managerial team on a long-term basis.

'3

Page 24: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

What is a Mutual Fund? A mutual fund is an investment

company that pools the money of many institutional and individual investors into one professionally managed investment account. The fund manager creates a portfolio for the shareholders by buying a variety of "true" financial instruments, such as

corporate stocks and bonds, and government and municipal securities, in accordance with the fund's particular investment objectives. Fund managers are professional investors who know the financial market and follow it daily with extensive research capabilities to effectively investigate all

investment opportunities. The fund is always monitored, with individual securities constantly being bought, sold, and traded in order to optimize

the investment objectives. The beauty of investing in sometimes twenty

goal of the linked model is forecasting of long-term performance, growth mutual funds were selected because their objectives emphasize the accumulation of capital gains, and they have the greatest long-term growth potential.

DEA Factors The inputs and outputs of the

envelopment analysis were made up of the main characteristics of a mutual fund. They were selected because they are important factors to investors and are directly linked to the performance

of the fund manager. The inputs selected were beta - a measure of risk,

expense ratio, and loads & other fees. The outputs used were net asset value (NAV) per share, dividends, and capital gains.

Inputs Beta is a measure of the fund

different securities lies in the protection portfolio's average volatility, or risk,

of the investment through portfolio relative to the S&P 500. The beta value

diversification. Mutual funds are divided into six

major categories according to their investment objectives, which are outlined in the fund's prospectus, such as goals, level of risk, and the type and frequency of return desired. Since the

of the market is a constant equal to 1.00. A fund with a beta value greater than 1.00 is more volatile than the market. It is expected to outperform the "average fund" in a bull (rising) market and bring in greater returns, while it is expected to fall more than

4

Page 25: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

the average fund in times of a bear brokerage costs, and transaction fees. (declining) market. A fund with a beta This value is a percentage of the value less than 1.00 possesses the purchase price taken off either before characteristics of the converse the money is invested (front-end load) situation. For example, a portfolio with a beta value of 0.75 is expected to be 25% less responsive to the general

or after investment when the shareholder redeems the shares (back-end load). The amount of the load is

economic trend than the average fund established by the managerial company, and return 25% more conservative than but is paid to the sales and transaction the average fund. team. Some funds have a load of 0.00%,

Ll

Expense ratio is a measure which relates expenses to the average net assets for a period. It is calculated by taking the annual expenses paid by the fund (including management and

investment advisory fees, custodial and transfer agency charges, legal fees, and distribution costs) divided by the

average shares outstanding for the period. The greater the expense ratio, the larger the percentage of the shareholder's investment gets paid out directly to the management company. The expense ratio does not include loads or commissions paid for sales or transaction fees.

Loads and other fees represent all

other charges made to the shareholder which are not accounted for in the expense ratio. A load is a portion of the offering price of the fund that pays for

costs such as sales commissions,

meaning that they have the same benefits as load funds, except they are sold directly by the fund company rather than through a sales broker so that no loads or other fees are incurred.

With no-load funds, the investor must seek out and contact the investment firm directly. There is a distinct advantage to investing in no-load

funds. For example, if we had invested $1000 in a particular portfolio with a 7.50% front-end load, $75 would be taken off the top, and we would be left with only $925 worth of shares in the fund. Had this been a no-load fund, the full $1000 would have been utilized.

Outputs Net asset value per share represents

how much each share is worth. It is calculated by taking the total market value of a fund's shares (securities plus

cash plus accrued earnings minus

Page 26: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

liabilities) divided by the number of

shares outstanding. No profits or losses are realized through NAV until the fund is sold. Interest lies in the amount

of increase or decrease in NAV from

time of purchase to time of sale. Dividends are payments declared

by a corporation's board of directors and made to shareholders, usually on a quarterly basis. A fund's dividends come directly from the dividends paid out by the individual securities within

in different ways. Shareholders can

reinvest dividends and capital distributions if they elect to do so.

Model Development and Validation The decision—making units (DMUs)

in this analysis were the 50 growth mutual funds. A prerequisite of DEA is homogeneous DMUs. By selecting the funds from the same sector (growth funds), this requirement is met.

The DEA model uses the dual

the portfolio. They are paid out equally formulation of the linear program as

among the shareholders on a scheduled time frame.

Capital gains on a mutual fund are

the profits made from the sale of a capital asset (such as a security, stock, or bond) within the portfolio. The number used for capital gains is

adjusted for losses and stock splits. The sale of an asset occurs when the manager decides that a security needs to be sold for the benefit of the

described by Charnes, Cooper and Rhodes (1978). The objective function of this form is to minimize the

objective value of the current DMU. If a DMU is efficient the optimal value is 1, while an inefficient DMU is driven towards zero. The first step in the development of the linked model is the generation of DEA scores for the mutual funds being studied. The data collected for the mutual funds is used

portfolio. Capital gains are distributed as the inputs and outputs for the DEA to shareholders when they are realized code which is documented in the

(when a security is sold and a profit is made) and are paid out as capital distributions.

Dividends and capital gains are treated as separate outputs because they are incurred, paid out, and taxed

appendix. The DEA code is written

using the SAS/OR package allowing the DEA scores to be fed directly into the regression phase of the model.

DEA scores are generated for the fifty mutual funds in the study using

Page 27: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

n

.

S

S

S

the 1990 data collected from Morningstar Mutual Funds. At this point

DEA scores are also generated for 1993 and stored for use in validating the model's forecasts. The 1990 DEA scores are then regressed against all of the six

DEA factors previously described. This regression model is unique

from previous research in that the goal of the regression is not to create an efficiency score, but to forecast an efficiency score for the future of a particular DMtJ. Previous work by Thanassoulis (1993) compared the DEA methodology with linear regression as a means of creating efficiency scores. Our model links the two methods into one prediction tool. The full factor model that resulted appeared on the surface to be a good predictor of efficiency (R2 =

0.8239). The full factor model included three non-significant factors at the 5%

level of significance. Beta (p-value = 0.68), load (p-value = 0.65), and dividend (p-value = 0.50) were all eliminated from the model leaving only expense ratio, net asset value, and capital gains as significant factors. The revised model remained significant

with an R2 = 0.8213.

As stated before, this model appears to be a good predictor; however, linear

fl fLAW flED

Figure 1: Residual vs. Predicted DEA Score

regression models must be validated

through diagnostic tests. One assumption in regression analysis underpins the entire methodology - errors are a randomly distributed

variable following a normal distribution with mean equal to zero, N(0,(Y). Regression analysis in SAS is based on a linear relationship between the dependent and independent variables. These assumptions can be verified through the examination of the residuals, or errors, of the regression model. A plot of the residuals versus the predicted values, Figure 1, reveals a distinct pattern indicating a non-linear

relationship between the independent

and dependent variables. This requires the examination of univariate statistics

S

7

0

Page 28: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

S

.

S

S

S

S

for each of the variables in the regression model. A standard normal quantile plot for each of the variables allows the determination of the

distribution of each variable. From the univariate procedures,

some of the variables demonstrated an

exponential distribution requiring a log—linear transformation in order to validate the regression model. Net asset value, capital gains, dividends and the actual DEA score follow exponential distributions while beta, load, and expense ratio are all normally distributed. A second data set of the

inputs and outputs was created by

taking the natural log of each of the exponentially distributed variables. Once normality was confirmed

through univariate procedures, a new full factor linear regression model was fitted. The newly formed model resulted in an R2 = 0.8163 and was

validated through the residual plot shown in Figure 2. Using the SAS

regression report, the regression equation was created using the coefficients of the significant factors: beta, expense ratio, net asset value, and dividend.

The next step is the creation of

predicted efficiency scores. The

Figure 2: Residual vs. Predicted DEA Score—after linear transform

common method for forecast model verification is the prediction of the

past. The linked model is verified by

using the input and output data for 1993 in the regression equation created from the 1990 data. The predicted values are then compared to the actual 1993 DEA scores generated earlier by the DEA code. The forecast errors are evaluated using a significance test on

the mean difference. The test is conducted at the 1% level of significance. H0: mean difference = 0;

Ha : mean difference # 0. The resulting test statistic: z = -0.5164 and critical value, z = 2.57 leads to the conclusion that the prediction values are not

significantly different from the actual DEA scores with 99% confidence.

8

0

Page 29: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Fund flEA 1990 Predicted 1993 flEA 1993 Error FTRNX 1.00000 1.00000 0.62320 -0.37680

PRWAX 1.00000 0.78983 0.29415 -0.49568

GINLX 1.00000 0.33207 0.23691 -0.09516

ACRNX 1.00000 1.00000 1.00000 0.00000

FDETX 0.41944 0.98900 1.00000 0.01100

CFIMX 0.73415 0.98032 1.00000 0.01968

LOMCX 0.32965 0.68927 1.00000 10.31073Table 1: Efficient Mutual Funds

Examination of Results In analyzing the results of the

sample studied, it is important to remember that a mutual fund is mainly used as a long-term investment,

providing stable low-risk growth. A potential shareholder wants to invest in a fund that will be efficient in the future, regardless of its current efficiency. Therefore we focus our examination on the set of funds with efficient scores in either 1990 or 1993. Table 1 shows the 1990 DEA, Predicted

1993, and actual 1993 DEA scores along with the errors for each prediction (Error = 1993 DEA - Predicted 1993).

Of the first four funds listed (all receiving efficient ratings in 1990), the future efficiency of three funds was correctly predicted. Our linked model

predicted future behavior of currently efficient funds with 75% accuracy. Although the error for T. Rowe Price New America Fund, PRWAX, is high, we are not concerned with the

magnitude of inefficiency, only the

state of its efficiency. Of the last four funds in the table

(all receiving efficient ratings in 1993),

three funds were predicted accurately. The fifth and sixth funds, Fidelity Destiny Portfolios: Destiny II, FDETX, and Clipper Fund, CFIMX, both

received' a predicted score of approximately 0.98 for 1993. Based on the previous significance test, these values are not significantly different from 1.00. Thus our model predicts the funds that will be efficient in the future with 75% accuracy.

Page 30: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Table 2: Investment Simulation Results

Investment Performance Since investors are concerned with

the long-term growth and appreciation of their investments, DEA by itself is not an effective tool to aid in the development of investment strategies. However, combined with linear regression, DEA gives us the ability to estimate the long-term efficiency of a

fund. To illustrate the effectiveness of the linked model relative to DEA alone, we simulated an investment in the efficient funds in 1990 and the funds

predicted to be efficient by our linked model. Since mutual funds allow the purchase of fractional shares, we invested $1000 in each of the mutual funds. The results are given in Table 2.

The percent of appreciation we

received in the portfolio of 1990 efficient mutual funds was 46.92%, while the appreciation was significantly higher for the portfolio of

funds recommended by the linked model at 60.41%.

10

Page 31: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Conclusion An immediate extension to this

Q

The linked model allows us to broaden the range of applications for DEA. The linked model enables the

prediction of future efficiency, which is of great benefit to investors because it provides more insight into the long-term viability of a mutual fund. The model also requires little historical data for each DMIJ unlike other forecasting methods. Another benefit is the relative compactness of the model; very few predictors are required to explain a high degree of variability and achieve accurate predictions. Data envelopment analysis by itself is not an accurate predictor of future efficiency

of mutual funds. Only one fund that was efficient in 1990 remained in the efficient set in 1993. Linear regression and DEA combine to give us an accurate prediction for the future behavior of DMUs.

research is the definition of an "ideal type" mutual fund. An optimization algorithm will be used to generate

values for the regression factors and record the combinations which yield and efficient fund prediction. Once this benchmark is defined a growth mutual fund prospectus can simply be visually inspected to determine managerial efficiency. The next phase of this research would be to compare the accuracy of the linked model with that of other forecasting methods, such as time series decomposition. Another step in the continued development of this area of research is to improve the

accuracy of the predictions through further modifications to the regression model using a weighted regression model or the inclusion of adaptive error control.

11

Page 32: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

References

Barr, R. S., L. M. Seiford, and T. F. Siems (1991), "An Envelopment Analysis Approach to Measuring the Management Quality of Banks," The Annals of Operations Research. J.C. Baltzer AG, Science Publishers.

Dilorio, F. C. (1991), SAS Applications Programming: a Gentle Introduction. Boston: PWS-KENT Publishing Company.

Flatt, L. G. (1994), "FUNDamentals: An Envelopment-Analysis Approach to Measuring the Managerial Quality of Mutual Funds," technical report. Dallas, Texas: Southern Methodist University, Department of Computer Science and Engineering.

Morningstar Mutual Funds User's Guide (1992), Chicago: Morningstar, Inc.

Morningstar Mutual Funds (1993), Chicago: Morningstar, Inc.

Norman, M. and B. Stoker (1991), Data Envelopment Analysis: The Assessment of Performance. New York: John Wiley & Sons.

SAS Institute Inc. (1989), SAS/OR User's Guide, Version 6, First Edition. Cary, NC: SAS Institute Inc.

Sexton, T. R., S. Sleeper, and R. E. Taggart, Jr. (1994), "Improving Pupil Transportation in North Carolina," Interfaces, January-February 1994. Providence, RI: The Institute of Management Sciences.

Thanassoulis, E. (1993), "A Comparison of Regression Analysis and Data Envelopment Analysis as Alternative Methods for Performance Assessments," O.R. Journal of the Operations Research Society. Coventry: Operational Research Society Ltd.

12

Page 33: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

[ii Professional References

Barr, Richard S., Ph.D. Associate Professor, School of Engineering and Applied Sciences. Southern Methodist University, Dallas, Texas.

Dula, Jose, Ph.D. Assistant Professor, School of Engineering and Applied Sciences. Southern Methodist University, Dallas, Texas.

Guerra, Rudy, Ph.D. Assistant Professor, Department of Statistical Science. Southern Methodist University, Dallas, Texas.

Welch, David. Senior Systems Analyst II, Bradfield Computer Center. Southern Methodist University, Dallas, Texas.

13

Page 34: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Appendix

DEA Formulation Mat hematicalformuation of the DEA model used in the Linked Model.

DEA Code The DEA code is developed using Release 6.0 of the SAS System on an IBM 3090 mainframe. The code presented requires only two input files. One file contains the input and output factor data while the second file contains a listing of the mutualfund ticker symbols. The program is construct ed from a series of SAS Macros which fully automate the formulation and solution of the multiple linear programs required in Data Envelopment Analysis.

Forecasting Code Developed as an extension to the DEA code, the forecasting model uses the same computing platform and software. The program reads the same data files as the DEA code as well as the DEA scores generated by the DEA study, The program generates predicted efficiency scores and all necessary statistical procedures for validating the model.

14

Page 35: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

DEA Formulation

A

Ex0

Inputs = =

-------- ] S

+ Slacks = outputs

A.

Mine

A7-A!^O A7

e^!o

Mat heinatical formulation of DEA requires that the inputs and out puts for each DMU are formated in a matrix, A The matrix is multiplied by a vector, 2,to construct the linear programming constraints.

Minimize 0 Subject to:

Inputs Beta: xi'DMUi +xDMU2+... +xsoDMUso - xs(E 1)<-E 0

Load: xIDMUI+x'DMU2+...+x5oDMU50-x5I(&1):50O

E_Patio: XIDMUI + X2 'DMU2 +... + x5a • DMU50 - Xsi (& 1)--^0 '0

OutputsNAV: xDMU +X'DMU2+... +XDMU50-x51(00) 01

Cap_Gain: xDMUi + X2 • DMU2 + + Xo DMU5O - Xti (0 0) 0 * 1 Dividend: xiDMUi + X2 • DMU2 +... + x50 DMU5O X51 (e 0) 0 1

All Variables ^! 0

where 01$ current declslon-inaldng unit (OMU)

Linear program solved by SAS for each DM11.

F]

15

Page 36: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

fl

LINKED LINEAR REGRESSION - DEA MODEL

DEA Code S

Part 1: Initialization MICHAEL MCGEE

SENIOR DESIGN GROUP '94

This section reads the data files into DEA ANALYZER

S

a SAS dataset. Using the SAS Data Management Language datasets are created for the right hand side vectors, as well as the theta vectors. The two

FILENAME NOT MFNAME DATAA'; FILENAME DAT F9O DATA A'; OPTIONS LINESIZE72;

%LEI SYMBOLS ACRNX LOMCX HRTVX STCSX BEONX GINIX PARNX VCAGX DEVLXAGBBX FPPTX FDETX FDESX FCND( FBGRX FDVLX SRFCX FRGRX BIRWIX SAMVX FDFFX FAGOX MLSVX ALTFX ANEFX GPAFX PRWAX CGRWI( OVOPX WTMGX NWTX SEOCX BVALX BARANAFGPX FTRNXAMRGX DNLDX HCRPX NBSSX PRGWI( INNDX TWHIX DW1X LMVTX JHNGX SRSPX CFIMX CLMBX BOSSX

rU

datasets are perfectly matched - each DATA TICK; WILE NOT;

right hand side has exactly one ARMY TICKR(so)ss;

INPUT TICKR?);

corresponding theta vector. There is one . MUTUAL FUND TICKER SYMBOLS •1

right hand side and one theta vector for

S

each DMtJ in the study. These vectors are merged with the inputs and outputs to formulate the linear program. The final step in the initialization is the renaming of variables to the proper ticker symbols in order to aid in report interpretation.

DATA ONE; INFILE DAT; ARRAY FUND (50) INPUT _ID_ $

FUND (} _TYPE_ $ _RHS_;

P GENERATE THETACOLUMNS DATA NEW-COL; SET ONE; ARRAY THETA (SO); ARRAY FUND(5O); DOIITO5 IF _N_ = I THEN THETA(I) = I;

ELSE IF _N_ >=2 & _N_ <=4 THEN THETA{I}= -FUND(I);

ELSE THETA(I) 0; END DROP I;

P GENERATE RIGHT HAND SIDE VECTORS DATA RANGE;

ARRAY RH(50); ARRAY FUND(50);

SET ONE; DO I = ITO 5 IF _N_ I THEN RH(I} = MISSING; ELSE IF _N_>4 THEN RH(I)= FUND{I};

ELSE RH(I) 0; END; DROP MISSING I FUNDI-FUND5O;

DATA PARAM; MERGE NEW-COL RANGE;

P RENAMES THE FUNDS TO THE PROPER NASDAQ TICKER SYMBOL / PROC DATASETS; MODIFY ONE; RENAME FUNDI ACRNX

FUNO2 LOMCX

FUNDS HRT\

FUND4 = STCSX FUNDS = BEONX FUND6 = GINLX FUND7= PARNX FUND8 = VCAGA FUNDS = DEVLC FUNDIO= AGBBX FUNDII= FPPT)( FUNDI2= FDETX

iri

Page 37: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

FUNDI3= FDESX FUNDl4 FCN1X FUNDIS= FBGRX FUNDI6= FDVLX FUNDI7= SRFCX FUND118 FRGRX FUNDI9= BRWIX FUND20= SAM'fX FUND2I FDFFX FUND22 FAGOX RJND23= MLSVX FUND24ALTFX FUND25=ANEFX FUND26= GPAFX FUND27= PRWAX FUND28 CGRY( FUND29 QVOPX FUND30= WTMGX FUND3I= NYVTX FUND32 SEcGX FUND33 BVALX FUND34= BARAX FUND35=AFGPX FUND36 FTRNX FUND37 AMRGX FUND38DNLDX FUND39= HCRPX FUND40=NBSSX FUND4I= PRGWX FUND42 INNDX FUND431WHD( FUND44 DWIvX FUND45= LMVD( FUND46= JHNGX FUND47= SRSPX FUND48= CFX FUND49= CLMB)( FUNDSO. BOSSX;

Part 2: DEA Macro 'NAME: DEA

This macro accepts one parameter:DESC: AUTOMATES THE CREATION OF PROBLEM DATASET.

MERGES THE CURRENT THETAAND RHS INTOTHE COEFFICIENT MATRIXAND EXECUTESTHE LP.

the index of the DM1.1 to evaluate. The PAPM:NDMU-THEINDEXNUMBEROFTHEDMUTOBEANALYZED •1

macro merges the input/output dataset %MACRODEA(NDMU); DATA _NULL_;

with the appropriate theta and right ARMY TICKR(So)ss; SErTK;

hand side vectors. This dataset nowCALL SYMPUT ('FUNDX',PUr(TICKR(&NDMU},SS.)); RUN;

contains the LP formulation for SAS to DATA CURU;

analyze. The procedure generatesARRAY THETA{50}; ARRAYRH(5O); h&NDMU;

detailed printed reports for each of the SET PARM; CTHEIA=A TH ErA{I};

Dtv[(Js._RHS_ = RH{I}; KEEP CTHETA_RHS_;

DATA PROBLEM; MERGE ONE CU R_DMU;

TITLEI 'DATA ENVELOPMENT ANALYSIS'; TITLE2 CURRENT OMU= &FUNDX';

/ EXECUTETHE LP '/ PROC LP DATA=PROBLEM POUT &FUNDX;

%MEND DEA;

17

Page 38: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Part 3: Loop Macro PNAME LOOP

In order to iterate the execution of a DESC: ITERATIVELY CALLS THE DEA MACRO PAPM: LIMIT • NUMBER OFDMU5TOBEPROCESSED

SAS procedure without repetitive%WCROLOOLIM

statements a macro was written to loop %DO INDEX I%TO8LAIIT; %DEA(AJNDEX);

around the calls to the DEA macro. The%EN %MENDLOO

loop macro accepts one parameter: the number of DMUs in study.

Part 4: Extract Macro "NAME EXTRACT

In addition to the written reportsDESC: EXTRACTS THE OBJECTIVE VALUES FROM THE

INDIVIDUAL FUND REPORTSANDCREATESA DATASET CONTAINING THE INFORMATION

generated by the DEA macro, datasets were created in memory containing the

-/.MACRO EXTRACT;

P MERGE LP OUTPUT DATASETSAND REMOVE UNIMPORTANT VARIABLES 'I

results of the LPs. This macro combines DATA RESI;

the reports for each of the DMUs andSET &SYMBOLS; KEEP VARVALUE;

extracts the DEA scores into a dataset P EXTRACT OBJECTFIE FUNCTION VALUES /

which is then fed to the forecasting DATARES2;

- -1 1

model.

SET RESI; IF -VAR­ OBJECT'THEN SCORE =_VALUE_;

ELSE DELETE; KEEP SCORE;

P GENERATE LISTING OF MUTUAL FUNDS/ DATA RES3; SET TICK;

ARRAY TICKR{SO} $5;

DO ITO 501, PCHANGE THIS TO 50 IN PRODUCTION RUN 'I NAME TICKR{l); OUTPUT;

END;

KEEP NAME;

P MERGE DATASET INTO FINAL FORM •/ DATA RESULTS; MERGE RES3 RES2;

PROC PRINT DATMRESULTS; %MEND EXTRACT;

P

EXECUTABLE CODE

%LOOP(50); %EXTRACT;

18

Page 39: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

Forcasting Code

1 Mutual Fund Data Files 'I filename datl 'm1s93 data a'; filename daf2 'mfname data a'; filename dat5 'mfsgo data a';

1 DEA Score Data Files 1

filename dat3 'score93 data a'; filename dat4 'score90 data a';

option linesize = 72 nonumber nodate;

/* Read 1993 Mutual Fund Data •/

data one; infile

del I;

array val(50); input name $val{};

proc transpose; / Transpose Matrix: from IODMU to DMU*IO *1

run;

proc datasets; P Assign proper names to lOs 1

modify data 1; rename coil = Beta

co12 = Load co13 = E_Ratio col4=NAV co15 = Cap—Gain co16 = Dividend

_Name—= Fund;

data funds; P Create List of Fund Names 1

infile dat2; input Fund $;

data scr93; / Read 1993 DEA scores from file 1

infile dat3; input dea93;

titlel '1993 DEAAnalysis';

data comp93; set datal; merge data 1 funds scr93;

1 Run All possible regression models 1

proc reg; title2 'All Possible Regression Models'; model dea93=Beta Load E_Ratio NAV Cap_Gain Dividend/

selection=Rsquare;

proc reg; tle2 'Minimum Spec Model';

model dea93 = E_Ratio NAV CAP_GAIN;

1* 1990 Data Analysis /

data info9o; intile dat5; array val(50); input name $ val{};

proc transpose; J* Transpose Matrix: from IODMU to DMUIO I

run;

proc datasets; /* Assign proper names to lOs 1

modify data2; rename coil = Beta

co12 = Load co13 = E_Ratio col4=NAV col5 Cap—Gain co16 = Dividend

_Name—= Fund;

data scr90; infile dat4; input dea9o;

data comp9o; merge data2 funds scr9o;

title 1 '1990 DEA Analysis';

proc reg dala=comp9o; tifle2 'All Possible Regressions'; model dea90= Beta Load E_Ratio NAV Cap —Gain Dividend

/selection = rsquare;

proc reg data=comp90; tie2 'Full Model'; model dea9o= Beta Load E_Ratio NAV Cap —Gain Dividend;

proc reg data=comp90; title2 'Minimum Spec'; model dea9o= E_Ratio NAV CAP_GAIN; plot residual.(predicted. E_Ratio NAV Cap—gain);

1 Full Fit Model 1

proc reg;

P Forecasting Validation Model /

title2'Full Model';

P Create predicted values using forecast model model dea93=Beta Load E_Ratio NAV Cap—Gain Dividend;

compare to actual 1993 values.

19

Page 40: Case Study: A Linked Linear Regression - Data Envelopment ...s2.smu.edu/emis/design/wp-content/uploads/94-02... · Case Study: A Linked Linear Regression - Data Envelopment Analysis

LINKED LINEAR REGRESSION - DEA MODEL

data predict; model Ldea = Beta Load E_Ratio LNAV LCG LDiv; set data 11; plot residual.(predicted.); Exp_dea=-0.1 77220E_Rao + 0.016477NAV +

0. 152344*Cap_Gain+0.279313; proc reg data = linear; if Exp_dea> 1 then Exp_dea = 1; model Idea = Beta E_Rao LNAV LDIV;

data forecast; merge predict funds scr93 scr9o; Err = dea93 - exp_dea; keep fund dea90 exp_dea dea93 err;

title 'Regression Predictions'; title2'Non-Linear Model';

S proc print data=forecast; var fund dea90 dea93 exp_dea err;

proc plot; plot exp_dea dea93;

1 added 4/18/94/ 1 Plot 93 Prediction vs 90 Actual 1 data fcast

merge scr90 forecast; keep exp_dea dea90 dea93 fund;

title 1 'Predicted 93 vs. Actual 1990'; tifle2 'Non-Linear Model'; proc plot data=fcast

plot exp_dea * dea90;

/* Linear transform to improve predictions 1 data linear;

set comp90; Ldea = log(deago); LNAV = log(NAV); if Cap—Gain = 0 then LCG = 0;

else LCG = log(Cap_Gain); LEA = log(E_Ratio); if Dividend = 0 then LDIV = 0;

else LDIV = LOG(Dividend);

/ Transform forecast inputs to linear model */

data trans93; set comp93; Ldea = log(dea93); LNAV log(NAV); if Cap—Gain = 0 then LCG = 0;

else LCG = Iog(Cap_Gain); LEA = log(E_Aatio); if Dividend = 0 then LDIV = 0;

else LDIV LOG(Dividend);

Title 'Linear Forecast Model'; title2 'Model Verification'; proc reg data = linear;

I' Linear Regression Equation 1 data predict2;

set trans93; PDEA = 0.687459Beta - 0.525658E —Ratio + 0.753563LNAV

+ 0.229900LD1V -2.930810; Exp_dea = exp(PDEA); if Exp_dea> 1 then Exp_dea = 1;

1 Compute Forecast Errors 1 data frcs;

merge predict2 funds scr93 scr9o; Err dea93 - exp_dea; keep fund dea90 exp_dea dea93 err;

title Regression Predictions'; title2'Linear Model'; proc print data=frcst2;

var fund dea90 dea93 exp_dea err;

20