Transcript
Page 1: Predicting the Stock Market Using Clementine

Predicting the Stock Market Using Clementine

Eric U. GraubinsDepartment of Computer ScienceIllinois Institute of Technology

Abstract:

The application of KDD (Knowledge Discovery in Databases) techniques to the analysis of stock market data has been validated as an effective method for the identification of high performing stocks. Using a database of firm fundamental information for the years 1988 to 2001 as input, we construct several stock performance prediction models employing SPSS’s Clementine data mining software product.

Exploiting the rich set of data modeling tools available in Clementine, we achieve success rates of over 80%. This paper describes the data, the methodology, and the results. It compares the algorithms used and rates the effectiveness of each. We affirm that it is possible to apply computer analysis on stock market data and arrive at performance predictions. These methods demonstrate an approach that not only has the potential for obtaining outstanding profits from investing, but also shows the value of data mining in extirpating knowledge from unstructured data.

Introduction:

The efficacy of performing ex ante stock price computations has always been open to conjecture. Predicting stock prices has been Holy Grail of market analysis - a crystal ball for the divination of securities prices and the basis for an automaton which produces a flow of cash. Market divination has been an area of interest for as long as securities have been traded. A cursory count of the number of books, seminars, software packages and newsletters pertaining to this subject attests to this fact.

While the amount of literature in this subject area is indeed copious, the amount of published serious scientific study in this field is scant. This paper presents such a study. Using a database containing approximately ten years of stock market data, we apply analysis techniques to identify high performing stocks. The methodology, as well as the results, is tabulated and the process is described.

This study utilizes company fundamental data as the basis for research. Broadly, the entire area of stock analysis can be classified into two approaches: fundamental analysis and technical analysis [Hol2000]. This distinction is important and is drawn here. Fundamental analysis uses company accounting data to determine a stock’s value whereas technical analysis uses market stock price fluctuations and actual price history to arrive at a stock’s value.

While the emphasis of these two schools is different, much cross-discipline influence is evident. For example a company with no profits,

Page 2: Predicting the Stock Market Using Clementine

such the high-technology Internet companies of recent memory, find weak long-range demand for their shares from the investment community. Likewise, a firm may publish admirable fundamentals, yet find that their stock is not in demand, i.e., not in favor. This results in the stock price becoming depressed, which lowers the worth of the firm. This, in turn, makes it more difficult to raise capital for growth and expenses, effecting the welfare of the company.

The remainder of this paper covers current research in the stock analysis field, after which the conducted subject research is presented. This research is divided into data, methodology, and result sections.

Prior Work:

We surveyed recent work done on applying knowledge and discovery techniques to modeling the U.S. Stock Market.

As sated earlier, traders and stock market experts may be roughly classified into these two camps. Fundamental analysts base trading decisions on information obtained through the meticulous examination of a firm’s financial information. This results in a determination of the fundamental “value” of the stock. If the trading price is lower than the value, the stock is undervalued and there is a strong incentive to buy. Alternately, if the price is higher than the value, the stock is overpriced and it is desirable to sell it.

Technical traders approach stock prices according to the dictum “Res tantum valet quantum vendi potest", which translates to: A thing is worth as much as it can be sold for. The view is that the purest determinant of value for a stock is its trade price. Technical traders exploit free market forces to execute trades that are profitable. They take advantage of anomalies in the Efficient Market Hypothesis (EMH).

EMH states that markets are efficient if prices fully and instantaneously reflect all available information and no profit opportunities are left unexploited. Even in this wired world, information dissemination is not complete and instantaneous.

Background about the markets is provided by Bass [Bas1999]. A good description of market structure and procedures can be obtained from Dalton [Dal1993].

Bass [Bas1999], in his book, relates the story of a group of physicists who apply chaos theory to effect stock market predictions. This book gives a non-technical overview of the stock market and its various machinations and validates that the stock market is predictable, to an extent.

Much of the technical analysis work utilizes chaos theory and complexity theory, while the fundamentalist analysis is more centered on more traditional data mining approaches on static data. The technical analysis is more apt to be applied on a dynamic data stream of trade data (OLAP), while the fundamental analysis takes place on the underlying firms accounting data.

Page 3: Predicting the Stock Market Using Clementine

Stocks which have a strong fundamental value tend to have more price stability and are subjected to different sorts of analysis. This approach does not take into account the wild, violent price fluctuations which are the hallmarks of a market in flux. Rather, the goal is to extirpate stocks which can be held for lengthy periods and which can be identified as assets to a portfolio.

The majority of algorithms used to analyze stock performance from a fundamental perspective are more traditional data mining algorithms. These are run against static data, such as a company's quarterly reported financial data. The results usually consist of a ranking of firms whose securities were deemed good investments.

Mandelbrot extends chaos theory to the financial markets. Multi fractal processes are derived from a scaling law for stock price time graphs. At each point in a series of finite time intervals, the behavior of the graph is used as an additional input to a model, which ultimately describes the path of the graph.

George H. John, Peter Miller, Randy Kerber,[JMK1996], researchers at the computer science department at Stanford, present a study of the application of a software package named "Recon" upon the financial markets. The goal was to identify superior performing stocks using this artificial intelligence based system.

Recon was developed at Lockheed Martin as a software product for use in discerning patterns in voluminous amounts of data. The team applied Recon to the stock market. They created a database consisting of six years of stock information for 1987-1993. Each tuple had about 100 attributes, containing information such as price-to-earnings ration, market trend and market capitalization. Recon was used to identify stocks that were deemed "exceptional". Final examination of the stocks selected by Recon showed that the stock selected had a return of 238% over a 4 year period. By comparison, a team of human experts were only able to achieve a return of 92.5% over the same period.

The previous paper is valuable in that actual empirical performance data is cited. It is unique in that respect because the vast majority of papers in stock market forecasting and analysis are replete with formulas and theoretical constructs without ever providing proof that the stuff performs. This work is detailed in Section 2, and further covered in Section 3.

Five algorithms to identify top performing stocks are given by George T. Albanis, Roy A. Batchelor, [AB2000], who describe 21 ways to beat the stock market. A related work by the same author team, [AB19999], seems to lay the basis for the later research effort. In their paper, Albanis and Batchelor, describe a set of five algorithms that are used to identify outstanding stocks with exceptional returns. The algorithms then "vote" as to projected performance of a stock.

The system was trained from statistical data collected from 700 companies trading on the London Stock Exchange for the period 1993 to 1997. The inputs consisted of company financial information, market economic information, and industry performance. The output was stock classified either high or low.

Page 4: Predicting the Stock Market Using Clementine

If we draw a sharp division between the fundamental stock analysis school and the technical analysis school, published papers in the former outnumber those in the latter. If a computer model of a trading engine is truly to emulate a human trader, that model needs to utilize a variety of approaches [Dal1993], [Osh1996]. This entails a combination of fundamental analysis and technical analysis [HH1998a]. Is short, fundamental analysis is used to identify securities which have the potential for outstanding appreciation, and technical analysis is used to determine when to purchase, when to retain, and when to divest.

Hellstrom and Holstrom [HH1998a] discuss the melding of fundamental and technical analysis in the implementation of an efficacious trading system. Fundamental data, of course, consists of a firm’s financial data and is commonly presented in balance sheets and financial statements.

According to Hellstrom and Holstrom and the majority of literature dealing with stock market analysis, the fluctuations in the graph in a Cartesian coordinate system is a time series, where each stock value at a give time is the result of the previous value, plus some transform function.

While simple to express, the equation has a sinister feature: the value of the transform function has zero mean and each value is independent of others in the series. This is what makes stock market predictability such a daunting task. The equations derived from graph fluctuations are complex and are of a higher, non-determinable, order. Often, graph properties change, rendering what had been considered a valid equation, non-representative of the new market condition.

Chen and Tan [CT1996] point that the measure of complexity for a set of data is the length of the shortest Turing machine program that will generate the data. This measure can be defined but the problem is not practically computable. For this reason, it is advocated that the Turing approach be replaced by classes of probabilistic models, i.e., stored patterns.

Doyne Farmer, in an article by Kevin Kelly [Kel1994], draws an analogy between catching a baseball and predicting to market. He says that we know how to catch a baseball because we have developed and stored a model in our consciousness that describes how baseballs fly. Although we could calculate the trajectory using Newtonian physics, our brains do not stock up on mechanics equations. The similarity to the stock market is drawn. Hence, we do not need to calculate a stock graph, we just need to recognize the pattern. In logic, such a process is known as induction, in contradistinction to the deduction process that leads to a mathematical formula.

Neural Networks are an artificial intelligence construct that mimics the pattern discriminating ability of the human brain. Kogan, [Kog1995], outlines the comparative efficacy of using neural nets in artificial intelligence applications such as medicine, war, genetics, and lastly, finance. Kogan also cites two other applications of AI based stock

Page 5: Predicting the Stock Market Using Clementine

selection systems in industry: the Fidelity StockSelector fund, and LBS Capital.

By contrast, rule induction, the approach used by Recon, is the facile extraction of useful if-then rules from data based on statistical significance. The fundamental approach to rule formulation is thought the use of decision trees. A good treatment is given in [HK2001].

Current Research Methodology:

The starting point for the creation of a statistical model price prediction system was the gathering of approximately ten years of U.S. Company fundamental information. These companies are publicly traded and the data spanned the period 6/30/88 to 12/31/2001. The data was obtained from 10K annual filings with the U.S. Securities and Exchange Commission. The metadata is in Appendix 1.

Since the number of companies traded on U.S. markets is too unwieldy for a complete study, a selection had to be made for to generate a set of firms to operate on. It was decided, to use the Standard and Poor’s 1300 list as a criterion. This list contains the top 1300 companies traded on U.S. markets and accounts for 87% of U.S. market capitalization.

Standard and Poor’s (www.standardandpoors.com), is an independent provider of company rating services. It maintains a number of lists which are held in high regard in the investment community. The S&P 1300 list consists of approximately 1300 companies and is updated periodically.

The company filing data, which was originally in a vertical ASCII text format, was parsed and placed into a horizontal orientation. Each 10K annual filing was reduced to one record and placed in a file. The total number of records was 13726. This file was subsequently loaded into an mSQL relational database on a Unix server.

The count of companies by year is given as follows:

1988 51989 231990 5931991 9851992 10291993 11161994 12051995 12551996 13351997 13771998 14011999 14062000 14072001 493

The smaller number of companies at the extremes of the year range is because of incomplete filing records. Although these incomplete years were not used in the study, they were still included in the database.

Page 6: Predicting the Stock Market Using Clementine

It must be emphasized that the S&P 1300 list is not strictly limited to 1300 firms per annum. The numeric moniker is a mere guideline and the actual number of companies on the list may fluctuate by year. This is partially explainable by periodic reevaluations at which time some firms may be added and others deleted. This results in a firm being included in the set even though it was on the actual list for only a part of the year.

A bi-product of the selection of this list is that the companies in it tend to be large-cap flagship firms whose stocks are considered value stocks. Value stocks, as opposed to growth stocks, disburse profits to stockholders, Growth stocks, on the other hand, are usually issued by smaller companies which reinvest profits to foster growth. These S&P 1300 listed firms are less dynamic and less susceptible to the vicissitudes of the market. These, then are the perfect object of study when analyzing fundamental stock information. It is easier to predict the path of a lumbering dinosaur than a fleet gazelle.

The data analysis was performed by SPSS’s (www.spss.com) Clementine enterprise strength data mining package. This product contains an assortment of algorithms and methods for manipulating data. The suite of functions is accessible via a sophisticated graphical user interface which allows the creation of processing streams by dragging and dropping function nodes from a palette. This allows a user to build rather sophisticated pipelines consisting of analysis stages. Connecting Clementine to the mSQL database via a TCP/IP connection made it possible to run trials in a facile manner. The database connection was managed by ODBC.

Of particular interest, are the Clementine neural network modeling nodes. We use the Neural Net feature for the primary stage of our stock analysis. This node is sometimes referred to as a multi-layer perceptron. It allows the definition of a number of input fields and the designation of a target output field which the process is trained to generate.

Our ultimate goal was to create a system with which to identify stocks which have higher than average future earning potential. To identify high performing securities we use the “NET EARNINGS PER SHARE” field from the stock database. For each year, we compute the mean of the earnings field. This value is used as the basis for the further classification of firms into “High”, “Moderate”, and “Low” groups. Companies whose earnings were greater than 150% of the computed average were placed into the “H”, or high earning category. These companies had earnings in the top 25% for that year.

Likewise, companies whose earning were less than 50% of the computed mean earnings were place into the “L”, or low category. These companies had earnings in the bottom 25% for that year. The remaining unclassified companies were designated M or moderate earnings firms. This middle tier of companies had earnings in the 25% to 75% range for that year.

Page 7: Predicting the Stock Market Using Clementine

Our database consisted of company filing tuples with over fifty attributes. Some of these were obtained directly from filing data and others were derived. A culling process was applied to reduce the number of relevant fields for training the neural network down to a manageable number. Generally, we selected fields which had a strong correlation to stock value and return on investment.

The fields selected for training were:

TOTAL SALESOPERATING PROFITPRE-TAX PROFITPUBLISHED AFTER TAX PROFITEARNED FOR ORDINARYEBITDATOTAL ASSETSNET CURRENT ASSETSMARKET VALUEMARKET TO BOOK VALUENET CASH FLOWNumber of Shares, computed as Total Shares / Sales per Share

Refer to Appendix 1 for definitions.

A set of training data consisting of the mentioned fields, for the years 1996, 1997, and 1998 was built. Included with each training data record was the rating for that firm for the following year. For example, the training data for 1998 used the firm rating value for 1999. The 1998 data used the 1998 ratings, and so on.

Subtracted from the three year training set were records with no rating for the following year. These missing ratings had a variety of explanations such as: the dissolution or merging of the firm, the firm dropping from the S&P 1300, or the firm’s filling being missing. Also removed from the training data was the bottom 25% performing companies. It was found that excluding these under-performing firms resulted in accurate predictions.

The pre-processed data was input to a configured Clementine neural network for the training phase. After a model was constructed, 1999 data was passed through the model and predictions extracted. To reiterate, the ratings for the year 1999 were taken from year 2000. A total of 1333 records were submitted for processing. The results of this process are listed as follows:

Total records input: 1333Total actual H records: 324Total actual M records: 1009

Actual H records predicted as H: 262 giving success rate of 80.9%Actual H records predicted as M: 62

Actual M records predicted as M: 802 giving success rate of 79.5%Actual M records predicted as H: 207

Page 8: Predicting the Stock Market Using Clementine

Total correct predictions: 1064 giving success rate of 79.8%

These results were obtained using one of Clementine’s neural network modeling algorithms. To confirm these results and investigate the suitability of other models, we perform further experiments and glean results.

Another neural network technique in Clementine’s repertoire is the Build C5.0 Node. This algorithm functions by splitting the training data based on the fields that provides the maximum information gain. The previously used algorithm, the “Train Net” node is sometimes refereed to as a multi-layer perceptron. It examines individual training records and generated a prediction for each. The weights of each field are adjusted as additional records are read and analyzed.

The same set of data is used as training input to Build C5.0. The results are as follows:

Total records input: 1333Total actual H records: 324Total actual M records: 1009

Actual H records predicted as H: 233 giving success rate of 71.9%Actual H records predicted as M: 91

Actual M records predicted as M: 853 giving success rate of 84.5%Actual M records predicted as H: 156

Total correct predictions: 1064 giving success rate of 81.5%

The two previous algorithms used for predictions were of the neural network genre. The next one would be the logistic regression model. This is a variation of the linear regression algorithm. The difference being that the output field may be symbolic.

Logistic regression, which is also known as nominal regression, classifies training data field values statistically. These values are used to create equations that relate input values to the output field. When the actual data is input, output field probabilities are calculated and the value with the highest probability is output as the predicted value. As in the previous models, we use the same data for training. Then, applying the generated model to the real data, we obtain the following output values:

Total records input: 1333Total actual H records: 324Total actual M records: 1009

Actual H records predicted as H: 78 giving success rate of 24.1%Actual H records predicted as M: 160

Actual M records predicted as M: 800 giving success rate of 79.3%Actual M records predicted as H: 34

Unclassifiable records: 261

Page 9: Predicting the Stock Market Using Clementine

Total correct predictions: 878 giving success rate of 65.9%

Where the logistic regression model uses a statistical approach to generating predicted values; the “Generated Decision Tree” node uses input training data to construct a decision tree. These are implemented as a series of decision blocks, which route the actual data to a final, (hopefully) correct predicted value.

The result for the decision tree node is given as follows:

Total records input: 1333Total actual H records: 324Total actual M records: 1009

Actual H records predicted as H: 233 giving success rate of 71.9%Actual H records predicted as M: 91

Actual M records predicted as M: 853 giving success rate of 84.5%Actual M records predicted as H: 156

Total correct predictions: 878 giving success rate of 81.5%

Classifying the performance of each of the four models we used, we rank the performance of each:

Model Correct PredictionsBuild C5.0 – 81.5%Generated Decision Tree – 81.5%Train Net - 79.8%Logistic Regression – 65.9%

In conclusion, a scant 1.7 percentage points separate the top three performing models. Given the close finish, Train Net, Build 5.0, and Generated Decision Tree all have utility when predicting the market. The distant fourth place finisher, Logistic Regression, is rated “Not Acceptable”.

Using a commercially software package, we have judiciously configured it so that it may be used to predict high-performing stocks with a high degree of probability. Using a source of stock fundamental data, an investor may avail him or herself of this valuable tool to guide investment decisions. Although here, the stock market is the object of analysis, the methodology guidelines used in this study may be applied to problems other areas where knowledge needs to be extracted from unstructured data.

Appendix 1

Data Schema for Statistical Stock Prediction Research project

Page 10: Predicting the Stock Market Using Clementine

The following data was obtained from company filings with the SEC (Securities and Exchange Commission).

Fields

01) TOTAL SALESSales revenue. The amount of goods and services sold to third parties. Excluded from this figure are taxes and tariffs. Also, this value reflects the sales, which are part of the normal activities of the company, and does not include such things as royalty income, rents, etc. This is a good metric indicative of the firm’s income, as it is a pre-tax figure.

02) DEPRECIATION The deprecation in company assets caused by use, wear & tear and aging of income producing machinery, such as lathes, trucks, desks, chairs, computers, etc. An expense recorded to reduce the value of a long-term tangible asset. Since it is a non-cash expense, it increases free cash flow while decreasing the amount of a company's reported earnings.

03) DEPRECIATION AND OPERATING PROVISIONSValue for depreciation and amortization of fixed assets. Also included in this amount are items such as assets sold, bad debt, write-offs, and other operating provisions.

04) OPERATING PROFITThis is the profit obtained from core operating activities. The value of this field is exclusive of financial income or expenses, extraordinary provisions and gains and losses, which are outside the principal area of activity for the company.

05) NET INTEREST CHARGESThe net aggregate of interest paid less interest received.

06) EXTRAORDINARY AND SPECIAL ITEMSA general category for anything that a company feels like placing here. Usually, firms list things that they want to highlight as unusual and do not correspond to other categories. For example, some firms use this space to list costs of the 9/11 disaster.

07) PRE-TAX PROFITProfit or loss before payment of taxes.

08) PUBLISHED AFTER TAX PROFITProfit or loss after depletion by taxes, as disclosed by the company.

09) MINORITY INTERESTS (PROFIT BASED)Portion of after tax profit amount, which is attributable to outside shareholders in subsidiaries. Not to be confused with item 19.

10) EARNED FOR ORDINARYNet profit minus tax, disbursements to minority interests, and preference dividends.

Page 11: Predicting the Stock Market Using Clementine

Minority interests represent outside but non-controlling ownership of the company. Such payments to outside groups decrease the firm’s earnings. Preference dividends are payments to preferred stock holders. These must be paid before common stock disbursements are made.

11) EXTRAORDINARY ITEMS AFTER TAXThe sum of amounts, which are income as a result of transactions, which are outside the scope of the company’s normal activities.

12) ORDINARY DIVIDENDS - NETThe total amount proposed as a disbursement for ordinary stock shares.

13) EBIT

Earnings Before Interest and TaxesThis is an indicator of a company's financial performance calculated as: EBIT = Revenue - Expenses

The “Expenses” variable does not include tax and interest.

Appraisers, to determine a company’s worth, often use a company’s EBIT value. This metric is indicative of the actual cash flow of a firm, which in many cases is much higher than the tax return earnings. This is because many companies are financially operated in such a way as to minimize taxable earnings, thus reducing income tax exposure.

14) EBITDA

Earnings Before Interest, Taxes, Depreciation and Amortization

This is an indicator of a company's financial performance calculated as:

EBITDA = Revenue - Expenses

The “Expenses” variable does not include tax, interest, depreciation, and amortization. This is a valuable metric because it can be used to evaluate core profit trends and compare profitability between companies. It eliminates many of the accounting differences between firms and also financing considerations. It is not, however a measure well suited to predicting cash flow. Since EBITDA does not take into account changes in working capital. Cash is of paramount importance since it affects the company’s ability to continue operations. The “net cash flow” column is a better indicator of the firm’s cash flow.

15) PUBLISHED CASH EARNINGSCalculated earnings before subtraction of depreciation and amortization. This figure used the net income value as the basis for calculation.

16) EQUITY CAPITAL AND RESERVESStockholders’ equity, exclusive of preferred capital, plus a reserve that the company has set aside for the servicing of this stock.

17) PREFERENCE CAPITAL

Page 12: Predicting the Stock Market Using Clementine

Capital on which fixed dividends are paid to, for example, holders of preferred stock.

18) TOTAL SHARE CAPITAL AND RESERVES Total or share capital, i.e., the equity obtained from issuing shares in return for cash or other consideration.

19) MINORITY INTERESTS (CAPITAL BASED)The proportion of capital and reserves attributable to outside shareholders in subsidiary companies. Not to be confused with item 09

20) LONG TERM DEBTMoney owed, or other financial obligations spanning over a year. Interest is paid on the amount owed.

21) TOTAL CAPITAL EMPLOYEDSum of all non-current liabilities. This included such things as long-term loans, deferred taxes, minority interests, share capital reserve for shareholders.

22) TOTAL FIXED ASSETS-NETTangible property, such as machinery, that the company uses in the production of income. Such property is not expected to be sold or otherwise consumed.

23) TOTAL INTANGIBLESValue includes research and development, goodwill, patents, trademarks, concessions, and similar items.

24) TOTAL STOCK AND WORK IN PROGRESSThis value consists of finished goods in inventory as well as raw material. Note that the “stock” designation in the title does not refer to securities.

25) TRADE DEBTORSRepresents trade receivables due within one year.

26) TOTAL CASH AND EQUIVALENTThis value includes cash, bank balances, and short-term receivables that are included in the “TOTAL CURRENT ASSET” figure.

27) TOTAL CURRENT ASSETSThis value includes stocks, work in progress, debts owed, inventories, prepaid expenses, cash and equivalent instruments. Included are items, which may be converted to cash.

28) TOTAL ASSETSSum of all assets held by the company. Includes tangible and intangible assets and items such as investments, stock, work in progress, cash, IOUs, equipment, hardware, software, notes and certificates. In short, the sum of all items of value.

29) TRADE CREDITORSConsists of long term and short-term trade payables, relating to the normal business activities of the firm.

Page 13: Predicting the Stock Market Using Clementine

30) BORROWINGS REPAYABLE WITHIN 1 YEARShort-term borrowing. This figure represents monies due within on year.

31) TOTAL CURRENT LIABLITIESMonies owed. This includes debt and other obligations that are payable within one year.

32) NET CURRENT ASSETSThe result of total current assets (item 27), minus total current liabilities (item 31).

33) TOTAL DEBTThe total of all long and short term debt.

34) NET DEBTCalculated as short and long term interest bearing debt minus cash and equivalents. This figure is used to give an overall measure of a company's debt situation because cash is applied against the debt.

35) ENTERPRISE VALUE (EV)The total market value of issued stock, plus preference capital (item 17) and plus net debt (item 34), attributable to the issue. Used to value a company who may be a takeover target. Most people use the calculation of market capitalization plus debt & preferred shares minus cash and its equivalents. This field is often referred to as a company's total market capitalization.

36) MARKET VALUE (MV) Market value for the firm computed as the stock price per share of ordinary stock, multiplied by the number of shares currently in issue.

37) TOTAL NUMBER OF EMPLOYEES (UNITS) Average number of employees for the past year.

38) DIVIDENDS PER SHARENet dividend par share.Dividends are cash payments, using profits, which are announced by a company's board of directors to be distributed among stockholders. These are usually cash, but may be stock or property. Most secure and stable companies offer dividends to their stockholders. High growth companies don't offer dividends because all their profits are reinvested to help sustain higher than average growth. Value companies, on the other hand, are more apt to pay dividends.

39) NET EARNING PER SHARECalculated as:

Net Income - Dividends on Preferred Stock NEPS = ----------------------------------------- Average Outstanding Shares

Net income for a specific period (item 10), divided by the number of outstanding shares. Companies usually use a weighted average number of shares outstanding over reporting term.

Page 14: Predicting the Stock Market Using Clementine

This is the single most popular variable in dictating a shares price - it indicates the profitability of a company.

If the company issues more shares, then EPS are much harder to compare to previous years.

40) PUBLISHED CASH EARNINGS PER SHARE (EPS)Published earning per share or ordinary stock. This is the total amount divided by the average number of shares in issue.

41) BOOK VALUE PER SHARE Computed as:

Stockholders Equity - Preferred Stock ------------------------------------ = Book Value per Share Average Outstanding Shares

Somewhat similar to the “Earnings Per Share” figure, but it relates the stockholder's equity to the number of shares, giving the shares a raw value. It is a ratio, which can be helpful in determining whether a stock is overpriced or under priced. It must be emphasized that this metric is merely an accounting value. A more precise measure is the stock’s market value. The latter is what the investment community’s expectations are and the former is based on costs and retained earnings. If the market value is less than the book value, it could mean that the stock is under priced and might be a good buy.

42) MARKET TO BOOK VALUE Market-to-book value (MB)The ratio of the market value per share to the book value per share of a company. The book value is what is shown on a balance sheet. The market value is the value in the free market place. For example, for a piece of equipment, the accounting book value is computed as the purchase price minus depreciation. However, if the item is sold, the price obtained does not necessarily reflect the book value.

43) SALES PER SHAREThe total sales for the company divided by the average number of shares in issue.

44) CASH INFLOW - OPERATING ACTIVITIESA positive or negative value representing cash flow from operating activities.

45) PAYMENTS - FIXED ASSETSCash expenditures used for the purchase of fixed assets.

46) CASH OUTFLOW -INVESTING ACTIVITIESNet cash inflow or outflow arising from investment activities.

47) CASH INFLOW FROM FINANCINGTotal cash inflow or outflow as a result of financing.

Page 15: Predicting the Stock Market Using Clementine

48) NET CASH FLOWChanges in net cash before exchange adjustments and after financing expenses or income.

-----------------------------------------------------------------------

Bibliographic Sources

[AACK2000] Monica Adya, J. Scott Armstrong, Fred Collopy, Miles Kennedy, An Application of Rule-based Forecasting to a Situation Lacking Domain Knowledge, 2000Published in International Journal of Forecasting 16 (2000), 477-484

[AB1999] George T. Albanis, Roy A. Batchelor, Combining Heterogeneous Classifiers for Stock Selection, 2000City University Business School Department of Banking and Finance, Frobisher Cresent, Barbican Centre, London EC2Y 8HB

[AB1999] George T. Albanis, Roy A. Batchelor, Combining Heterogeneous Classifiers for Stock Selection, 1999City University Business School, London

[AB2000] George T. Albanis, Roy A. Batchelor, 21 Nonlinear Ways to Beat the Stock Market, 2000City University Business School Department of Banking and Finance, Frobisher Cresent, Barbican Centre, London EC2Y 8HB

[AB2000a] Albanis, G. A. and R. A. Batchelor, 2000, Five Classification Algorithms to Predict High Performance Stocks, in C. Dunis (Ed.), Advances in Quantitative Asset Management, Kluwer, Academic Publishers, pp295-318.

[AB2000b] George T. Albanis, Roy A. Batchelor, Predicting High Performance Stocks Using Dimensionality Reduction Techniques Based on Neural Networks, 2000

[ABDL1999] Torben G. Andersen, Tim Bollerslev, Francis X. Diebold, Paul Labys, The Distribution of Exchange Rate Volatility, The Wharton School, University of Pennsylvania, 1999

[AC1998] J. Scott Armstrong, Fred Collopy, Integration of Statistical Methods and Judgment for Time Series Forecasting: Principles from Empirical Research, 1998

[Ache2001] Steven B. Achelis, Technical Analysis from A to Z, McGraw-Hill, New York, New York, 2001.Initially published in 1995, this text provides an overview, analysis and mathematical calculations for a number of technical indicators. For practitioners seeking to understand and profit from technical analysis the book helps readers to recognize trends and charts.

Page 16: Predicting the Stock Market Using Clementine

[AHLPT1996] W. Brian Arthur, John H. Holland, Blake LeBaron, Richard Palmer, Paul Taylor, Asset Pricing Under Endogenous Expectation in an Artificial Stock Market, 1996

[AT1999] Linda Alvord, Tama Traberman, When Bears are Blue and Bulls are Red, 1999, Stochastic Services

[Arm2001] J. Scott Armstrong, Principles of Forecasting: A Handbook for Researchers and Practitioners, 2001Kluver Academic Publishers

[BC1990] R.J. Balvers, T.F. Cosimano, B. McDonald, Predicting Stock Returns in an Efficient Market, 1990From Journal of Finance 55

[BC2000] Volker Böhm, Carl Chiarella, Mean Variance Preferences, Expectation Formation, and the Dynamics of Random Asset Prices

[BHY2000] Brynjolfsson, Erik, Lorin M. Hitt and Shinkyu Yang, Intangible Assets: How the Interaction of Computers and Organizational Structure Affects Stock Market Valuations, MIT, 2000

[BI1998] Bernard Bolen, Brett Inder, A General Volatility and the Generalised Historical Volatility Estimator, Department of Econometrics and Business Statistics, Monash University, Clayton, Australia, 1998Bollen and Inder propose a new approach to the estimation of the geometry of a time series mapping the daily volatility in financial markets. This involves a two stage approach, with the first phase being the formulation of daily volatility estimates which are used as a frameworks for time series graphing of a finer granularity.

[Bas1999] Thomas A. Bass, The Predictors, Henry Hold and Company, New York, New York, 1999

[Boh2000] Volker Böhm, The Dynamics of Random Economic Models, Workshop Modelli Dinamici in Economia e Finanza, Urbino/ Italia, 2000

[CAW1997] Copeland L, Abhyankar A, Wong W, Uncovering Non-Linear Structure in Real Time Stock Market Indices: The S&P 500, the FTSE 100 and the DAXJournal of Business and Economic Statistics

[CFM1997] Laurent Calvet, Adlai Fisher, Benoit Mandelbrot, Large Deviations and the Distribution of Price Changes, 1997Yale University and IBM T. J. Watson Research Center

[CLM1999] Shu-Heng Chen, Thomas Lux, Michele Marchesi, Testing for Non-Linear Structure in an Artificial Financial Market, 1999

[COL1996] Tim Chenoweth, Zoran Obradovic, Sauchi Stephen Lee, Embedding Technical Analysis into Neural Network Based Trading Systems, 1996

[CT1992] A.D. Clare, S.H. Thomas, International Evidence for the Predictability of Bond and Stock Returns, 1992From Economics Letters, Volume 40, Issue 1, 01-September-1992

Page 17: Predicting the Stock Market Using Clementine

[CY1997] Shu-Heng Chen, Chia-Hsuan Yeh, Toward a Computable Approach to the Efficient Market Hypothesis: An application of Genetic Programming, 1997From, Journal of Economic Dynamics and Control, Volume 21, Issue 6, 06-June-1997

[Che1996] Ping Chen, A Random Walk or Color Chaos on the Stock Market? - Time-Frequency Analysis of S&P Indexes, 1996

[Con1999] Rama Cont, Statistical properties of financial time series, 1999

[Cor1995] Karen Corcella, Marked Prediction Turns the Tables on Random Walk, Wall Street and Technology, Volume 12, No. 13, May 1995The efficient markets theory is deprecated. The long held perception that financial markets assimilate information efficiently has been disproved by teams of mathematicians using computer based analysis techniques. Using powerful software models, being fed by volumes of trading data, pockets of predictability have been identified. Stock prices, it has been determined, do not just exhibit a "random walk", but fluctuate in response to events. Each participant in the market has a frame of reference. These are time horizons on which they trade. For example, the time horizon for a market trader oriented toward speculation may be a minute or less, while the time horizon for a large bank may be several years.

[CT1996] Shu-Heng Chen and Ching-Wei Tan, Rissanen's Stochastic Complexity in Financial MarketsDepartment of Economics, National Chengchi University, Taiwan, from Society of Computational Economics, Second International Conference on Computing in Economics and Finance, Geneva, Switzerland, 26-28 June 1996

[DA1996] R.T. Daigler, A.J. Aburachis, Biases Associated With International Stock Market Index and Predictability Studies, 1996From Journal of Multinational Financial Management, Volume 6, Issue 2-3, 01-January-1996

[DLMRS1998] Gautam Das, King-Ip Li, Heikki Mannila, Gopal Renganathan, Padhraic Smyth, Rule Discovery from a Time Series, 1998American Associating for Artificial Intelligence

[Dal1993] James M. Dalton, How The Stock Market Works, New York Institute of Finance, 1993A market professional, the author gives a view of the stock market. A classic work, often cited in literature about the market. Given is a description of how the stock market functions. For example, descriptions are given of how trades are made, what the role of a customer, trader, or broker are; and how settlements are made. It gives no advice, but is rather, a pure instruction manual of the market.

[DeS1997] G. De Santis, Stock Returns and Volatility in Emerging Financial Markets , 1997, from Journal of International Money and Finance, Volume 16, Issue 4, 01-August-1997

Page 18: Predicting the Stock Market Using Clementine

[Dor1996] Georg Dorffner, Neural Networks for Time Series Processing, 1996

[EB1995] Carl J.G. Evertsz and Kathrin Berkner, Large Deviation and Self-Similarity Analysis of Graphs: DAX Stock Prices, Chaos, Solutions & Fractals, 6, 121-130 (1995)

[EBB1995] C.J.G. Evertsz, K. Berkner, W. Berghorn, A local multiscale characterization of edges applying the wavelet transform, in Proc. Nato A.S.I. Fractal Image Encoding and Analysis, Trondheim (1995)

[EM1992]C.J.G.Evertsz and B.B. Mandelbrot, Multifractal Measures, Appendix B in Chaos and Fractals by H.-O. Peitgen, H. Juergens and D. Saupe, Springer Verlag, New York, 849-881 (1992)

[Ehl2001] John F. Ehlers, Nonlinear Ehlers Filters, 2000, Technical Analysis of Stocks & Commodities, April 2001, Volume 19, Number 4

[Eve1995] C.J.G.Evertsz, Self-Similarity of High-Frequency USD-DEM Exchange RatesProc. First Int. Conf. on High-Frequency data in Finance and submitted to Fractals (1995)

[Eve1996] C. J. G. Evertsz, Fractal Geometry of Financial Time Series, 1996Proc. Mandelbrot Symposium Curatao, Eds. C.J.G. Evertsz, H.-O. Peitgen, R.F. Voss, and in Fractals and Proc. Mandelbrot Symposium Curacao World Scientific, Singapore, (1996)

[FJ2000] J. Doyne Farmer and Shareen Joshi , The Price Dynamics of Common Trading Strategies, 2000

[FL1999] J. Doyne Farmer, Andrew W. Lo, Frontiers of Finance: Evolution and Efficient Markets, 1999

[FR1999] Sophie Forrester, Renee Ryerson, The Wall Street Journal Guide to Understanding Money and Investing, 1999

[Far1998] J. Doyne Farmer, Market Force, Ecology, and Evolution, Original Version 1994, Update: 1998, Santa Fe Institute, Journal of Economic Behavior and Organization

[Far1999] J. Doyne Farmer, Physicists Attempt to Scale the Ivory Towers of Finance. Santa Fe Institute, 1999

[GAIM2000] Martin Gavrilov, Dragomir Anguelov, Piotr Indyk, Rajeev Motwani, Mining the Stock Market: Which Measure is Best?, Department of Computer Science, Stanford University, KDD, ACM, 2000

[GIP2001] William N. Goetzmann, Roger G. Ibbotson, Liang Peng, A new historical database for the NYSE 1815 to 1925: Performance and predictability, 2001From Journal of Financial Markets, Volume 4, Issue 1, 01-January-2001Described are analysis techniques, which are applied to individual stock prices for NYSE stocks over the period 1815 to 1925. They use this data to forecast long-term returns.

Page 19: Predicting the Stock Market Using Clementine

[Gau1995] Steven Gaukroger, Descartes, An Intellectual Biography, Oxford University Press, 1995

[Gen1998] Ramazan Gençay, The Predictability of Scurity Rturns with Simple Technical Trading Rules , 1998, from Journal of Empirical Finance, Volume 5, Issue 4, 01-October-1998

[Gle1987] James Gleick, Chaos: Making a Science, Viking Penguin, 1987.The detection of order in seemingly random events is discussed here. What is random at first glance, can have an underlying order. Patterns in this chaos can be extirpated by application of fractal analysis techniques.

[Gra1992], Granger, C. W. J., 1992, Forecasting stock market prices: lessons for forecasters, International Journal of Forecasting, 8, 3-14.

[Gra2001] J. Orlin Grabbe, Chaos and Fractals in Financial Markets, 2001

[Gul1999] Les Gulko, The Entropic Market Hypothesis, International Journal of Theoretical and Applied Finance, Vol. 2, No. 3 (1999) 293-329

[HH1995] Michael Harries, Kim Horn, Detecting Concept Drift in Financial Time Series Prediction using Symbolic Machine Learning, 1995

[HH1996] Christian Haefke, Christian Helmenstein, Forecasting Stock Market Averages to Enhance Profitable Trading Strategies, 1996

[HH1998a] Thomas Hellstrom, Kenneth Holmstrom, Predicting the Stock Market (1998)

[HH1998b] Thomas Hellstrom, Kenneth Holmstrom, Predictable Patterns in Stock Returns, 1998

[HK2001] Jiawei Han, Micheline Kamber, Mata Mining, Concepts and Techniques, Morgan Kaulman Publishers, Acedemic Press, 2001

[HKS1999] Rainer Hegger, Holger Kantz,Thomas Schreiber, Practical Implementation of Nonlinear Time Series Methods: The TISEAN package

[HKS2000] Rainer Hegger, Holger Kantz, Thomas Schreiber, TISEAN 2.1 (December 2000), Nonlinear Time Series Analysis Software, Institut für Physikalische und Theoretische Chemie, Universität Frankfurt (Main), Max-Planck-Institut für Physik komplexer Systeme, Dresden

[Hol2000] Norman N. Holland, Creativity and the Stock Market, 2000, in PSYART, A Hyperlink Journal for the Psychological Study of the Arts.A psychological perspective on stock prices. Holland explains how the value of a stock is frequently at odds with the price of a stock. He describes EMH, the Efficient Market Hypothesis, and that stock prices truly do a random walk, or a drunkard's walk. Investors make money by guessing what other investors' opinion will be of the price of a stock.

Page 20: Predicting the Stock Market Using Clementine

[Iva1998] Per H. Ivarsson, Multivariate Nonlinear Time-Series Prediction of Foreign Exchange rate Returns Using Genetic Algorithms and Artificial Neural Networks, Institute of Theoretical Physics, Chalmers University of Technology, Gothenborg, Sweden, 1998Compared are the predicting performance of several algorithms, to wit: univariate linear, multivariate linear, multivariate nonlinear, and univariate nonlinear.

[JB1998] Shareen Joshi, Mark A. Bedau, An Explanation of Generic Behavior in an Evolving Financial Market, 1998

[JMK1996] George H. John, Peter Miller, Randy Kerber, Stock Selection Using Recon, Stanford University

[JPB1999] Shareen Joshi, Jeffrey Parker, Mark A. Bedau, Financial Markets Can Be at Sub-Optimal Equilibria, Santa Fe Institute, 1999

[Joh2001] George John, [email protected], personal communication via email to inquire about details relating to Recon, database schema, data contents, source code availability. March 28, 2001

[KD1999] Mark Kamstra (Simon Fraser University), R. Glen Donaldson (University of British Columbia), The Accuracy of Fundamental Stock Market Price Estimates and a Refinement to the Donaldson-Kamstra Fundamental EstimatePaper provided by Society for Computational Economics in its series Computing in Economics and Finance '99 as number 954,1999

[KS1997] H Kantz and T Schreiber, Nonlinear Time Series Analysis, published by Cambridge University Press in the Cambridge, Nonlinear Science series.

[KT2000] Catherine KYRTSOU, Michel TERRAZA, Is it possible to study jointly chaotic and ARCH behaviour? Application of a noisy Mackey-Glass equation with heteroskedastic errors to the Paris Stock Exchange returns series, Unniversitat Pompeu Fabra, 2000,

[Kab1996] M.A. Kaboudan, Nonlinearity and Complexity of Stock Returns, 1996

[Kel1994] Kevin Kelly, Cracking Wall Street, An article that is an excerpt and adaptation from Kevin Kelly's new book "Out of Control: The Rise of Neo-Biological Civilization," published by Addison-Wesely in June 1994.

[Ker1992] Kerber, R., Chi Merge: Discretization of Numeric Attributes. in AAAI-92 Proceedings of the Tenth National Conference on Artificial Intelligence, AAAI Press/MIT Press, pp 123-128

[Kog1995] Alexander Kogan, Artificial Intelligence and Finance, Rutgers University, 1995

[Kra2000] Andreas Krause, Microstructure Effects on Daily Return Volatility in Financial Markets, 2000

Page 21: Predicting the Stock Market Using Clementine

[Kra2001a] Andreas Krause, An Overview of Asset Pricing Models. University of Bath, School of Management, UK, 2001Krause draws distinctions between the fundamental value of a security and its price. The concept of price is further dichotomized into a natural price and a market price. The former being a reflection of the fundamental value of a commodity, while the latter is the actual price determined in a transactional scenario. The twain may, and quite often do, deviate. However, ultimately, converges on the fundamental value.

[Kra2001b] Andreas Krause, Implicit Collusion in Discounted Stochastic Games with Private State Observations, University of Bath, School of Management, 2001

[Kra2001c] Andreas Krause, The Optimal Control Structure in Dealer Markets, University of Bath, 2001

[Kra2001d] Andreas Krause, Patience and Explicit Collusion in Infinitely Repeated Games, University of Bath, 2001

[LAP1998]Blake LeBaron, W. Brian Arthur, Richard Palmer, Time series properties of an artificial stock market. 1999, from Journal of Economic Dynamics and Control, Volume 23, Issue 9-10, 01-September-1999

[LLS1994] M. Levy, H. Levy and S. Solomon, A microscopic model of the stock market; Cycles, booms and crashes, Economics Letters 45 (1994)

[LLS2000] Haim Levy, Moshe Levy, Sorin Solomon, Microscopic Simulation of Financial Markets, 2000

[MCFR2000] Michele Marchesi, Silvano Cincotti, Sergio Focardi, Marco Raberto, Development and testing of an artificial stock market, 2000

[MS2000] Paul Marriott, Mark Salmon, An Introduction to Differential Geometry in Econometrics, 2000

[MST2000] David McFadzean, Deron Stewart, Leigh Tesfatsion, A Computational Laboratory for Evolutionary Trade Networks, 2000

[MT1999] David McFadzen, Leigh Tesfatsion, A C++ Platform for the Evolution of Trade Networks, 1999

[Man1963] Benoit.B. Mandelbrot, The Variation of Certain Speculative Prices, The Journal Business of the University of Chicago, 36, 394-419 (1963)

[Man1988] Benoit B. Mandelbrot, The Fractal Geometry of Nature, 1988, W.H. Freeman and Company, New York

[Man1997] Benoit B. Mandelbrot, Fractals and Scaling in Finance, 1997

[McF1998] Daniel McFadden, Rationality for Economists?, 1997Department of Economics, University of California, Berkeley, Visiting Economist, Santa Fe Institute, Fall 1998, Paper original August 1996 (revised July 1997, September 1998), Journal of Risk and Uncertainty, Special Issue on Preference Elicitation

Page 22: Predicting the Stock Market Using Clementine

[Mod1999] Theodore Modis, Complexity and Competition at the Stock Market, 1999

[Osh1996] James P. O'Shaughnessy, What Works on Wall Street, McGraw-Hill, New York, 1996

[PCFS1979] N.H. Packard, J.P. Crutchfield, J.D. Farmer, and R.S. Shaw, A Geometry from a Time Series, Phys. Rev. Let. 45 (1980) 712-716. ...

[PJS1992] H.O. Peitgen, H. Juergens and D. Saupe, Chaos and Fractals, Springer Verlag, New York, 849-881 (1992)

[Pre1999] Chris Preist, Commodity Trading Using An Agent-Based Iterated Double Auction, Hewlett Packard Laboratories

[Qi1999] Qi, M., Nonlinear predictability of stock returns using financial and economic variables, 1999, Journal of Business and Economic Statistics, 17, 4, 419-429.

[RW1999] Marcel K. Richter, Kam-Chau Wong, What Can Economists Compute?, Marcel K. Richter (University of Minnesota), Kam-Chau Wong (Chinese University of Hong Kong)

[SDG2001] Swarm Development Group, SWARM, www.swarm.org

[SFI2000] Artificial Stock Market, Santa Fe Institute 2000, Software System Modeling a Stock Market

[SR1993] Shaun-inn Wu, Ruey-Pyng Lu, Combining Artificial Neural Networks and Statistics for Stock-Market Forecasting, 1993, ACM

[SR2000] Sonia Schulenburg, Peter Ross, An Evolutionary Approach to Modelling the Behaviour of Financial Traders, University of Edinburgh, 2000

[Sch1989] Jose A. Scheinkman, Blake LeBaron, Nonlinear Dynamics and Stock Returns, University of Chicago, Journal of Business, 1989, Part of IDEAS, which uses RePEc data, Authors registers at HoPEc

[Sch2000] Thomas Schreiber, Nonlinear time series analysis: Measuring Information TransportMax Planck Institute for the Physics of Complex Systems, Dresden, Germany, Proceedings of the COST P4 workshop "Characterization of Manufacturing Processes: Synergetics and Data Processing Methods", Ljubljana, May 25--26, 2000.

[Sha1989] S. Shaffer, Structural Shifts and the Volatility of Chaotic Markets, 1989From Journal of Economic Behavior & Organization, Volume 15, Issue 2, 01-March-1991

[Sha2000] Cosma Shalizi, Why Markets Aren't Rational but Are Efficient, Santa Fe Institute Bulletin, Volume 15, Number 1, Spring 2000It’s proposed that the price of a stock is determined by a combination of luck, greed, and fear supplemented by testosterone and cocaine. So how can such an improbable methodology deliver a price that's

Page 23: Predicting the Stock Market Using Clementine

reasonable? That's the question a research team, headed by Doyne Farmer attempts to answer. Using time series analysis, they have found that markets tend to return to a state of equilibrium after ever trading session. That's because goods, or capital, wind up with those for whom they have the highest utility, i.e., are willing to pay the most. This equilibrium constantly adjusts itself because of new information, causing prices to fluctuate randomly. This random readjustment is termed "efficiency".

[Sin1994] P. Singer, An Integrated Fractional Fourier Transform, Journal of Computational and Applied Mathematics, 54, 221-237 (1994)

[Urb2000] Richard M. A. Urbach, Footprints of Chaos in the Markets, Pearson Education Limited, Edinburgh Gate, 2000, Financial Times - Prentice HallA lengthy in-depth didactic about chaos in the market Heavy with higher mathematics and statistics, a companion guide to these fields is required for readers not trained in these disciplines.

[Vos1992] R.F. Voss, 1/f Noise and Fractals in Economic Time Series, R.F. Voss, 1992In Fractal geometry and computer graphics J.L. Encarcarnacao, H.-O. Peitgen et al. (Eds.) Springer Verlag

[WG1994] A.S. Weigend, N.A. Gershenfeld (Eds.) Time Series Prediction: Forecasting the Future and Understanding the Past, Addison-Wesley. 1994

[Wil1998] Bill Williams, New Trading Dimensions : How to Profit from Chaos in Stocks, Bonds and Commodities (Wiley Trading Advantage), 1998

------------------------------------------------------------------------


Top Related