# Predicting the Stock Market Using Clementine

Post on 28-Nov-2014

2.369 views

DESCRIPTION

TRANSCRIPT

- 1. Predicting the Stock Market Using Clementine Eric U. Graubins Department of Computer Science Illinois Institute of Technology Abstract: The application of KDD (Knowledge Discovery in Databases) techniques to the analysis of stock market data has been validated as an effective method for the identification of high performing stocks. Using a database of firm fundamental information for the years 1988 to 2001 as input, we construct several stock performance prediction models employing SPSSs Clementine data mining software product. Exploiting the rich set of data modeling tools available in Clementine, we achieve success rates of over 80%. This paper describes the data, the methodology, and the results. It compares the algorithms used and rates the effectiveness of each. We affirm that it is possible to apply computer analysis on stock market data and arrive at performance predictions. These methods demonstrate an approach that not only has the potential for obtaining outstanding profits from investing, but also shows the value of data mining in extirpating knowledge from unstructured data. Introduction: The efficacy of performing ex ante stock price computations has always been open to conjecture. Predicting stock prices has been Holy Grail of market analysis - a crystal ball for the divination of securities prices and the basis for an automaton which produces a flow of cash. Market divination has been an area of interest for as long as securities have been traded. A cursory count of the number of books, seminars, software packages and newsletters pertaining to this subject attests to this fact. While the amount of literature in this subject area is indeed copious, the amount of published serious scientific study in this field is scant. This paper presents such a study. Using a database containing approximately ten years of stock market data, we apply analysis techniques to identify high performing stocks. The methodology, as well as the results, is tabulated and the process is described. This study utilizes company fundamental data as the basis for research. Broadly, the entire area of stock analysis can be classified into two approaches: fundamental analysis and technical analysis [Hol2000]. This distinction is important and is drawn here. Fundamental analysis uses company accounting data to determine a stocks value whereas technical analysis uses market stock price fluctuations and actual price history to arrive at a stocks value. While the emphasis of these two schools is different, much cross- discipline influence is evident. For example a company with no profits,
- 2. such the high-technology Internet companies of recent memory, find weak long-range demand for their shares from the investment community. Likewise, a firm may publish admirable fundamentals, yet find that their stock is not in demand, i.e., not in favor. This results in the stock price becoming depressed, which lowers the worth of the firm. This, in turn, makes it more difficult to raise capital for growth and expenses, effecting the welfare of the company. The remainder of this paper covers current research in the stock analysis field, after which the conducted subject research is presented. This research is divided into data, methodology, and result sections. Prior Work: We surveyed recent work done on applying knowledge and discovery techniques to modeling the U.S. Stock Market. As sated earlier, traders and stock market experts may be roughly classified into these two camps. Fundamental analysts base trading decisions on information obtained through the meticulous examination of a firms financial information. This results in a determination of the fundamental value of the stock. If the trading price is lower than the value, the stock is undervalued and there is a strong incentive to buy. Alternately, if the price is higher than the value, the stock is overpriced and it is desirable to sell it. Technical traders approach stock prices according to the dictum Res tantum valet quantum vendi potest", which translates to: A thing is worth as much as it can be sold for. The view is that the purest determinant of value for a stock is its trade price. Technical traders exploit free market forces to execute trades that are profitable. They take advantage of anomalies in the Efficient Market Hypothesis (EMH). EMH states that markets are efficient if prices fully and instantaneously reflect all available information and no profit opportunities are left unexploited. Even in this wired world, information dissemination is not complete and instantaneous. Background about the markets is provided by Bass [Bas1999]. A good description of market structure and procedures can be obtained from Dalton [Dal1993]. Bass [Bas1999], in his book, relates the story of a group of physicists who apply chaos theory to effect stock market predictions. This book gives a non-technical overview of the stock market and its various machinations and validates that the stock market is predictable, to an extent. Much of the technical analysis work utilizes chaos theory and complexity theory, while the fundamentalist analysis is more centered on more traditional data mining approaches on static data. The technical analysis is more apt to be applied on a dynamic data stream of trade data (OLAP), while the fundamental analysis takes place on the underlying firms accounting data.
- 3. Stocks which have a strong fundamental value tend to have more price stability and are subjected to different sorts of analysis. This approach does not take into account the wild, violent price fluctuations which are the hallmarks of a market in flux. Rather, the goal is to extirpate stocks which can be held for lengthy periods and which can be identified as assets to a portfolio. The majority of algorithms used to analyze stock performance from a fundamental perspective are more traditional data mining algorithms. These are run against static data, such as a company's quarterly reported financial data. The results usually consist of a ranking of firms whose securities were deemed good investments. Mandelbrot extends chaos theory to the financial markets. Multi fractal processes are derived from a scaling law for stock price time graphs. At each point in a series of finite time intervals, the behavior of the graph is used as an additional input to a model, which ultimately describes the path of the graph. George H. John, Peter Miller, Randy Kerber,[JMK1996], researchers at the computer science department at Stanford, present a study of the application of a software package named "Recon" upon the financial markets. The goal was to identify superior performing stocks using this artificial intelligence based system. Recon was developed at Lockheed Martin as a software product for use in discerning patterns in voluminous amounts of data. The team applied Recon to the stock market. They created a database consisting of six years of stock information for 1987-1993. Each tuple had about 100 attributes, containing information such as price-to-earnings ration, market trend and market capitalization. Recon was used to identify stocks that were deemed "exceptional". Final examination of the stocks selected by Recon showed that the stock selected had a return of 238% over a 4 year period. By comparison, a team of human experts were only able to achieve a return of 92.5% over the same period. The previous paper is valuable in that actual empirical performance data is cited. It is unique in that respect because the vast majority of papers in stock market forecasting and analysis are replete with formulas and theoretical constructs without ever providing proof that the stuff performs. This work is detailed in Section 2, and further covered in Section 3. Five algorithms to identify top performing stocks are given by George T. Albanis, Roy A. Batchelor, [AB2000], who describe 21 ways to beat the stock market. A related work by the same author team, [AB19999], seems to lay the basis for the later research effort. In their paper, Albanis and Batchelor, describe a set of five algorithms that are used to identify outstanding stocks with exceptional returns. The algorithms then "vote" as to projected performance of a stock. The system was trained from statistical data collected from 700 companies trading on the London Stock Exchange for the period 1993 to 1997. The inputs consisted of company financial information, market
- 4. economic information, and industry performance. The output was stock classified either high or low. If we draw a sharp division between the fundamental stock analysis school and the technical analysis school, published papers in the former outnumber those in the latter. If a computer model of a trading engine is truly to emulate a human trader, that model needs to utilize a variety of approaches [Dal1993], [Osh1996]. This entails a combination of fundamental analysis and technical analysis [HH1998a]. Is short, fundamental analysis is used to identify securities which have the potential for outstanding appreciation, and technical analysis is used to determine when to purchase, when to retain, and when to divest. Hellstrom and Holstrom [HH1998a] discuss the melding of fundamental and technical analysis in the implementation of an efficacious trading system. Fundamental data, of course, consists of a firms financial data and is commonly presented in balance sheets and financial statements. According to Hellstrom and Holstrom and the majority of literature dealing with stock market analysis, the fluctuations in the graph in a Cartesian coordinate system is a time series, where each stock value at a give time is the result of the previous value, plus some transform function. While simple to express, the equation has a sinister feature: the value of the transform function has zero mean and each value is independent of others in the series. This is what makes stock market predictability such a daunting task. The equations derived from graph fluctuations are complex and are of a higher, non-determinable, order. Often, graph properties change, rendering what had been considered a valid equation, non-representative of the new market condition. Chen and Tan [CT1996] point that the measure of complexity for a set of data is the length of the shortest Turing machine program that will generate the data. This measure can be defined but the problem is not practically computable. For this reason, it is advocated that the Turing approach be replaced by classes of probabilistic models, i.e., stored patterns. Doyne Farmer, in an article by Kevin Kelly [Kel1994], draws an analogy between catching a baseball and predicting to market. He says that we know how to catch a baseball because we have developed and stored a model in our consciousness that describes how baseballs fly. Although we could calculate the trajectory using Newtonian physics, our brains do not stock up on mechanics equations. The similarity to the stock market is drawn. Hence, we do not need to calculate a stock graph, we just need to recognize the pattern. In logic, such a process is known as induction, in contradistinction to the deduction process that leads to a mathematical formula. Neural Networks are an artificial intelligence construct that mimics the pattern discriminating ability of the human brain. Kogan,
- 5. [Kog1995], outlines the comparative efficacy of using neural nets in artificial intelligence applications such as medicine, war, genetics, and lastly, finance. Kogan also cites two other applications of AI based stock selection systems in industry: the Fidelity StockSelector fund, and LBS Capital. By contrast, rule induction, the approach used by Recon, is the facile extraction of useful if-then rules from data based on statistical significance. The fundamental approach to rule formulation is thought the use of decision trees. A good treatment is given in [HK2001]. Current Research Methodology: The starting point for the creation of a statistical model price prediction system was the gathering of approximately ten years of U.S. Company fundamental information. These companies are publicly traded and the data spanned the period 6/30/88 to 12/31/2001. The data was obtained from 10K annual filings with the U.S. Securities and Exchange Commission. The metadata is in Appendix 1. Since the number of companies traded on U.S. markets is too unwieldy for a complete study, a selection had to be made for to generate a set of firms to operate on. It was decided, to use the Standard and Poors 1300 list as a criterion. This list contains the top 1300 companies traded on U.S. markets and accounts for 87% of U.S. market capitalization. Standard and Poors (www.standardandpoors.com), is an independent provider of company rating services. It maintains a number of lists which are held in high regard in the investment community. The S&P 1300 list consists of approximately 1300 companies and is updated periodically. The company filing data, which was originally in a vertical ASCII text format, was parsed and placed into a horizontal orientation. Each 10K annual filing was reduced to one record and placed in a file. The total number of records was 13726. This file was subsequently loaded into an mSQL relational database on a Unix server. The count of companies by year is given as follows: 1988 5 1989 23 1990 593 1991 985 1992 1029 1993 1116 1994 1205 1995 1255 1996 1335 1997 1377 1998 1401 1999 1406 2000 1407 2001 493
- 6. The smaller number of companies at the extremes of the year range is because of incomplete filing records. Although these incomplete years were not used in the study, they were still included in the database. It must be emphasized that the S&P 1300 list is not strictly limited to 1300 firms per annum. The numeric moniker is a mere guideline and the actual number of companies on the list may fluctuate by year. This is partially explainable by periodic reevaluations at which time some firms may be added and others deleted. This results in a firm being included in the set even though it was on the actual list for only a part of the year. A bi-product of the selection of this list is that the companies in it tend to be large-cap flagship firms whose stocks are considered value stocks. Value stocks, as opposed to growth stocks, disburse profits to stockholders, Growth stocks, on the other hand, are usually issued by smaller companies which reinvest profits to foster growth. These S&P 1300 listed firms are less dynamic and less susceptible to the vicissitudes of the market. These, then are the perfect object of study when analyzing fundamental stock information. It is easier to predict the path of a lumbering dinosaur than a fleet gazelle. The data analysis was performed by SPSSs (www.spss.com) Clementine enterprise strength data mining package. This product contains an assortment of algorithms and methods for manipulating data. The suite of functions is accessible via a sophisticated graphical user interface which allows the creation of processing streams by dragging and dropping function nodes from a palette. This allows a user to build rather sophisticated pipelines consisting of analysis stages. Connecting Clementine to the mSQL database via a TCP/IP connection made it possible to run trials in a facile manner. The database connection was managed by ODBC. Of particular interest, are the Clementine neural network modeling nodes. We use the Neural Net feature for the primary stage of our stock analysis. This node is sometimes referred to as a multi-layer perceptron. It allows the definition of a number of input fields and the designation of a target output field which the process is trained to generate. Our ultimate goal was to create a system with which to identify stocks which have higher than average future earning potential. To identify high performing securities we use the NET EARNINGS PER SHARE field from the stock database. For each year, we compute the mean of the earnings field. This value is used as the basis for the further classification of firms into High, Moderate, and Low groups. Companies whose earnings were greater than 150% of the computed average were placed into the H, or high earning category. These companies had earnings in the top 25% for that year. Likewise, companies whose earning were less than 50% of the computed mean earnings were place into the L, or low category. These companies had earnings in the bottom 25% for that year. The remaining
- 7. unclassified companies were designated M or moderate earnings firms. This middle tier of companies had earnings in the 25% to 75% range for that year. Our database consisted of company filing tuples with over fifty attributes. Some of these were obtained directly from filing data and others were derived. A culling process was applied to reduce the number of relevant fields for training the neural network down to a manageable number. Generally, we selected fields which had a strong correlation to stock value and return on investment. The fields selected for training were: TOTAL SALES OPERATING PROFIT PRE-TAX PROFIT PUBLISHED AFTER TAX PROFIT EARNED FOR ORDINARY EBITDA TOTAL ASSETS NET CURRENT ASSETS MARKET VALUE MARKET TO BOOK VALUE NET CASH FLOW Number of Shares, computed as Total Shares / Sales per Share Refer to Appendix 1 for definitions. A set of training data consisting of the mentioned fields, for the years 1996, 1997, and 1998 was built. Included with each training data record was the rating for that firm for the following year. For example, the training data for 1998 used the firm rating value for 1999. The 1998 data used the 1998 ratings, and so on. Subtracted from the three year training set were records with no rating for the following year. These missing ratings had a variety of explanations such as: the dissolution or merging of the firm, the firm dropping from the S&P 1300, or the firms filling being missing. Also removed from the training data was the bottom 25% performing companies. It was found that excluding these under-performing firms resulted in accurate predictions. The pre-processed data was input to a configured Clementine neural network for the training phase. After a model was constructed, 1999 data was passed through the model and predictions extracted. To reiterate, the ratings for the year 1999 were taken from year 2000. A total of 1333 records were submitted for processing. The results of this process are listed as follows: Total records input: 1333 Total actual H records: 324 Total actual M records: 1009 Actual H records predicted as H: 262 giving success rate of 80.9%
- 8. Actual H records predicted as M: 62 Actual M records predicted as M: 802 giving success rate of 79.5% Actual M records predicted as H: 207 Total correct predictions: 1064 giving success rate of 79.8% These results were obtained using one of Clementines neural network modeling algorithms. To confirm these results and investigate the suitability of other models, we perform further experiments and glean results. Another neural network technique in Clementines repertoire is the Build C5.0 Node. This algorithm functions by splitting the training data based on the fields that provides the maximum information gain. The previously used algorithm, the Train Net node is sometimes refereed to as a multi-layer perceptron. It examines individual training records and generated a prediction for each. The weights of each field are adjusted as additional records are read and analyzed. The same set of data is used as training input to Build C5.0. The results are as follows: Total records input: 1333 Total actual H records: 324 Total actual M records: 1009 Actual H records predicted as H: 233 giving success rate of 71.9% Actual H records predicted as M: 91 Actual M records predicted as M: 853 giving success rate of 84.5% Actual M records predicted as H: 156 Total correct predictions: 1064 giving success rate of 81.5% The two previous algorithms used for predictions were of the neural network genre. The next one would be the logistic regression model. This is a variation of the linear regression algorithm. The difference being that the output field may be symbolic. Logistic regression, which is also known as nominal regression, classifies training data field values statistically. These values are used to create equations that relate input values to the output field. When the actual data is input, output field probabilities are calculated and the value with the highest probability is output as the predicted value. As in the previous models, we use the same data for training. Then, applying the generated model to the real data, we obtain the following output values: Total records input: 1333 Total actual H records: 324 Total actual M records: 1009 Actual H records predicted as H: 78 giving success rate of 24.1% Actual H records predicted as M: 160
- 9. Actual M records predicted as M: 800 giving success rate of 79.3% Actual M records predicted as H: 34 Unclassifiable records: 261 Total correct predictions: 878 giving success rate of 65.9% Where the logistic regression model uses a statistical approach to generating predicted values; the Generated Decision Tree node uses input training data to construct a decision tree. These are implemented as a series of decision blocks, which route the actual data to a final, (hopefully) correct predicted value. The result for the decision tree node is given as follows: Total records input: 1333 Total actual H records: 324 Total actual M records: 1009 Actual H records predicted as H: 233 giving success rate of 71.9% Actual H records predicted as M: 91 Actual M records predicted as M: 853 giving success rate of 84.5% Actual M records predicted as H: 156 Total correct predictions: 878 giving success rate of 81.5% Classifying the performance of each of the four models we used, we rank the performance of each: Model Correct Predictions Build C5.0 81.5% Generated Decision Tree 81.5% Train Net - 79.8% Logistic Regression 65.9% In conclusion, a scant 1.7 percentage points separate the top three performing models. Given the close finish, Train Net, Build 5.0, and Generated Decision Tree all have utility when predicting the market. The distant fourth place finisher, Logistic Regression, is rated Not Acceptable. Using a commercially software package, we have judiciously configured it so that it may be used to predict high-performing stocks with a high degree of probability. Using a source of stock fundamental data, an investor may avail him or herself of this valuable tool to guide investment decisions. Although here, the stock market is the object of analysis, the methodology guidelines used in this study may be applied to problems other areas where knowledge needs to be extracted from unstructured data.
- 10. Appendix 1 Data Schema for Statistical Stock Prediction Research project The following data was obtained from company filings with the SEC (Securities and Exchange Commission). Fields 01) TOTAL SALES Sales revenue. The amount of goods and services sold to third parties. Excluded from this figure are taxes and tariffs. Also, this value reflects the sales, which are part of the normal activities of the company, and does not include such things as royalty income, rents, etc. This is a good metric indicative of the firms income, as it is a pre-tax figure. 02) DEPRECIATION The deprecation in company assets caused by use, wear & tear and aging of income producing machinery, such as lathes, trucks, desks, chairs, computers, etc. An expense recorded to reduce the value of a long-term tangible asset. Since it is a non-cash expense, it increases free cash flow while decreasing the amount of a company's reported earnings. 03) DEPRECIATION AND OPERATING PROVISIONS Value for depreciation and amortization of fixed assets. Also included in this amount are items such as assets sold, bad debt, write-offs, and other operating provisions. 04) OPERATING PROFIT This is the profit obtained from core operating activities. The value of this field is exclusive of financial income or expenses, extraordinary provisions and gains and losses, which are outside the principal area of activity for the company. 05) NET INTEREST CHARGES The net aggregate of interest paid less interest received. 06) EXTRAORDINARY AND SPECIAL ITEMS A general category for anything that a company feels like placing here. Usually, firms list things that they want to highlight as unusual and do not correspond to other categories. For example, some firms use this space to list costs of the 9/11 disaster. 07) PRE-TAX PROFIT Profit or loss before payment of taxes. 08) PUBLISHED AFTER TAX PROFIT Profit or loss after depletion by taxes, as disclosed by the company. 09) MINORITY INTERESTS (PROFIT BASED) Portion of after tax profit amount, which is attributable to outside shareholders in subsidiaries. Not to be confused with item 19.
- 11. 10) EARNED FOR ORDINARY Net profit minus tax, disbursements to minority interests, and preference dividends. Minority interests represent outside but non-controlling ownership of the company. Such payments to outside groups decrease the firms earnings. Preference dividends are payments to preferred stock holders. These must be paid before common stock disbursements are made. 11) EXTRAORDINARY ITEMS AFTER TAX The sum of amounts, which are income as a result of transactions, which are outside the scope of the companys normal activities. 12) ORDINARY DIVIDENDS - NET The total amount proposed as a disbursement for ordinary stock shares. 13) EBIT Earnings Before Interest and Taxes This is an indicator of a company's financial performance calculated as: EBIT = Revenue - Expenses The Expenses variable does not include tax and interest. Appraisers, to determine a companys worth, often use a companys EBIT value. This metric is indicative of the actual cash flow of a firm, which in many cases is much higher than the tax return earnings. This is because many companies are financially operated in such a way as to minimize taxable earnings, thus reducing income tax exposure. 14) EBITDA Earnings Before Interest, Taxes, Depreciation and Amortization This is an indicator of a company's financial performance calculated as: EBITDA = Revenue - Expenses The Expenses variable does not include tax, interest, depreciation, and amortization. This is a valuable metric because it can be used to evaluate core profit trends and compare profitability between companies. It eliminates many of the accounting differences between firms and also financing considerations. It is not, however a measure well suited to predicting cash flow. Since EBITDA does not take into account changes in working capital. Cash is of paramount importance since it affects the companys ability to continue operations. The net cash flow column is a better indicator of the firms cash flow. 15) PUBLISHED CASH EARNINGS Calculated earnings before subtraction of depreciation and amortization. This figure used the net income value as the basis for calculation. 16) EQUITY CAPITAL AND RESERVES
- 12. Stockholders equity, exclusive of preferred capital, plus a reserve that the company has set aside for the servicing of this stock. 17) PREFERENCE CAPITAL Capital on which fixed dividends are paid to, for example, holders of preferred stock. 18) TOTAL SHARE CAPITAL AND RESERVES Total or share capital, i.e., the equity obtained from issuing shares in return for cash or other consideration. 19) MINORITY INTERESTS (CAPITAL BASED) The proportion of capital and reserves attributable to outside shareholders in subsidiary companies. Not to be confused with item 09 20) LONG TERM DEBT Money owed, or other financial obligations spanning over a year. Interest is paid on the amount owed. 21) TOTAL CAPITAL EMPLOYED Sum of all non-current liabilities. This included such things as long- term loans, deferred taxes, minority interests, share capital reserve for shareholders. 22) TOTAL FIXED ASSETS-NET Tangible property, such as machinery, that the company uses in the production of income. Such property is not expected to be sold or otherwise consumed. 23) TOTAL INTANGIBLES Value includes research and development, goodwill, patents, trademarks, concessions, and similar items. 24) TOTAL STOCK AND WORK IN PROGRESS This value consists of finished goods in inventory as well as raw material. Note that the stock designation in the title does not refer to securities. 25) TRADE DEBTORS Represents trade receivables due within one year. 26) TOTAL CASH AND EQUIVALENT This value includes cash, bank balances, and short-term receivables that are included in the TOTAL CURRENT ASSET figure. 27) TOTAL CURRENT ASSETS This value includes stocks, work in progress, debts owed, inventories, prepaid expenses, cash and equivalent instruments. Included are items, which may be converted to cash. 28) TOTAL ASSETS Sum of all assets held by the company. Includes tangible and intangible assets and items such as investments, stock, work in progress, cash, IOUs, equipment, hardware, software, notes and certificates. In short, the sum of all items of value.
- 13. 29) TRADE CREDITORS Consists of long term and short-term trade payables, relating to the normal business activities of the firm. 30) BORROWINGS REPAYABLE WITHIN 1 YEAR Short-term borrowing. This figure represents monies due within on year. 31) TOTAL CURRENT LIABLITIES Monies owed. This includes debt and other obligations that are payable within one year. 32) NET CURRENT ASSETS The result of total current assets (item 27), minus total current liabilities (item 31). 33) TOTAL DEBT The total of all long and short term debt. 34) NET DEBT Calculated as short and long term interest bearing debt minus cash and equivalents. This figure is used to give an overall measure of a company's debt situation because cash is applied against the debt. 35) ENTERPRISE VALUE (EV) The total market value of issued stock, plus preference capital (item 17) and plus net debt (item 34), attributable to the issue. Used to value a company who may be a takeover target. Most people use the calculation of market capitalization plus debt & preferred shares minus cash and its equivalents. This field is often referred to as a company's total market capitalization. 36) MARKET VALUE (MV) Market value for the firm computed as the stock price per share of ordinary stock, multiplied by the number of shares currently in issue. 37) TOTAL NUMBER OF EMPLOYEES (UNITS) Average number of employees for the past year. 38) DIVIDENDS PER SHARE Net dividend par share. Dividends are cash payments, using profits, which are announced by a company's board of directors to be distributed among stockholders. These are usually cash, but may be stock or property. Most secure and stable companies offer dividends to their stockholders. High growth companies don't offer dividends because all their profits are reinvested to help sustain higher than average growth. Value companies, on the other hand, are more apt to pay dividends. 39) NET EARNING PER SHARE Calculated as: Net Income - Dividends on Preferred Stock NEPS = ----------------------------------------- Average Outstanding Shares
- 14. Net income for a specific period (item 10), divided by the number of outstanding shares. Companies usually use a weighted average number of shares outstanding over reporting term. This is the single most popular variable in dictating a shares price - it indicates the profitability of a company. If the company issues more shares, then EPS are much harder to compare to previous years. 40) PUBLISHED CASH EARNINGS PER SHARE (EPS) Published earning per share or ordinary stock. This is the total amount divided by the average number of shares in issue. 41) BOOK VALUE PER SHARE Computed as: Stockholders Equity - Preferred Stock ------------------------------------ = Book Value per Share Average Outstanding Shares Somewhat similar to the Earnings Per Share figure, but it relates the stockholder's equity to the number of shares, giving the shares a raw value. It is a ratio, which can be helpful in determining whether a stock is overpriced or under priced. It must be emphasized that this metric is merely an accounting value. A more precise measure is the stocks market value. The latter is what the investment communitys expectations are and the former is based on costs and retained earnings. If the market value is less than the book value, it could mean that the stock is under priced and might be a good buy. 42) MARKET TO BOOK VALUE Market-to-book value (MB) The ratio of the market value per share to the book value per share of a company. The book value is what is shown on a balance sheet. The market value is the value in the free market place. For example, for a piece of equipment, the accounting book value is computed as the purchase price minus depreciation. However, if the item is sold, the price obtained does not necessarily reflect the book value. 43) SALES PER SHARE The total sales for the company divided by the average number of shares in issue. 44) CASH INFLOW - OPERATING ACTIVITIES A positive or negative value representing cash flow from operating activities. 45) PAYMENTS - FIXED ASSETS Cash expenditures used for the purchase of fixed assets. 46) CASH OUTFLOW -INVESTING ACTIVITIES Net cash inflow or outflow arising from investment activities.
- 15. 47) CASH INFLOW FROM FINANCING Total cash inflow or outflow as a result of financing. 48) NET CASH FLOW Changes in net cash before exchange adjustments and after financing expenses or income. ----------------------------------------------------------------------- Bibliographic Sources [AACK2000] Monica Adya, J. Scott Armstrong, Fred Collopy, Miles Kennedy, An Application of Rule-based Forecasting to a Situation Lacking Domain Knowledge, 2000 Published in International Journal of Forecasting 16 (2000), 477-484 [AB1999] George T. Albanis, Roy A. Batchelor, Combining Heterogeneous Classifiers for Stock Selection, 2000 City University Business School Department of Banking and Finance, Frobisher Cresent, Barbican Centre, London EC2Y 8HB [AB1999] George T. Albanis, Roy A. Batchelor, Combining Heterogeneous Classifiers for Stock Selection, 1999 City University Business School, London [AB2000] George T. Albanis, Roy A. Batchelor, 21 Nonlinear Ways to Beat the Stock Market, 2000 City University Business School Department of Banking and Finance, Frobisher Cresent, Barbican Centre, London EC2Y 8HB [AB2000a] Albanis, G. A. and R. A. Batchelor, 2000, Five Classification Algorithms to Predict High Performance Stocks, in C. Dunis (Ed.), Advances in Quantitative Asset Management, Kluwer, Academic Publishers, pp295-318. [AB2000b] George T. Albanis, Roy A. Batchelor, Predicting High Performance Stocks Using Dimensionality Reduction Techniques Based on Neural Networks, 2000 [ABDL1999] Torben G. Andersen, Tim Bollerslev, Francis X. Diebold, Paul Labys, The Distribution of Exchange Rate Volatility, The Wharton School, University of Pennsylvania, 1999 [AC1998] J. Scott Armstrong, Fred Collopy, Integration of Statistical Methods and Judgment for Time Series Forecasting: Principles from Empirical Research, 1998 [Ache2001] Steven B. Achelis, Technical Analysis from A to Z, McGraw- Hill, New York, New York, 2001. Initially published in 1995, this text provides an overview, analysis and mathematical calculations for a number of technical indicators.
- 16. For practitioners seeking to understand and profit from technical analysis the book helps readers to recognize trends and charts. [AHLPT1996] W. Brian Arthur, John H. Holland, Blake LeBaron, Richard Palmer, Paul Taylor, Asset Pricing Under Endogenous Expectation in an Artificial Stock Market, 1996 [AT1999] Linda Alvord, Tama Traberman, When Bears are Blue and Bulls are Red, 1999, Stochastic Services [Arm2001] J. Scott Armstrong, Principles of Forecasting: A Handbook for Researchers and Practitioners, 2001 Kluver Academic Publishers [BC1990] R.J. Balvers, T.F. Cosimano, B. McDonald, Predicting Stock Returns in an Efficient Market, 1990 From Journal of Finance 55 [BC2000] Volker Bhm, Carl Chiarella, Mean Variance Preferences, Expectation Formation, and the Dynamics of Random Asset Prices [BHY2000] Brynjolfsson, Erik, Lorin M. Hitt and Shinkyu Yang, Intangible Assets: How the Interaction of Computers and Organizational Structure Affects Stock Market Valuations, MIT, 2000 [BI1998] Bernard Bolen, Brett Inder, A General Volatility and the Generalised Historical Volatility Estimator, Department of Econometrics and Business Statistics, Monash University, Clayton, Australia, 1998 Bollen and Inder propose a new approach to the estimation of the geometry of a time series mapping the daily volatility in financial markets. This involves a two stage approach, with the first phase being the formulation of daily volatility estimates which are used as a frameworks for time series graphing of a finer granularity. [Bas1999] Thomas A. Bass, The Predictors, Henry Hold and Company, New York, New York, 1999 [Boh2000] Volker Bhm, The Dynamics of Random Economic Models, Workshop Modelli Dinamici in Economia e Finanza, Urbino/ Italia, 2000 [CAW1997] Copeland L, Abhyankar A, Wong W, Uncovering Non-Linear Structure in Real Time Stock Market Indices: The S&P 500, the FTSE 100 and the DAX Journal of Business and Economic Statistics [CFM1997] Laurent Calvet, Adlai Fisher, Benoit Mandelbrot, Large Deviations and the Distribution of Price Changes, 1997 Yale University and IBM T. J. Watson Research Center [CLM1999] Shu-Heng Chen, Thomas Lux, Michele Marchesi, Testing for Non- Linear Structure in an Artificial Financial Market, 1999 [COL1996] Tim Chenoweth, Zoran Obradovic, Sauchi Stephen Lee, Embedding Technical Analysis into Neural Network Based Trading Systems, 1996
- 17. [CT1992] A.D. Clare, S.H. Thomas, International Evidence for the Predictability of Bond and Stock Returns, 1992 From Economics Letters, Volume 40, Issue 1, 01-September-1992 [CY1997] Shu-Heng Chen, Chia-Hsuan Yeh, Toward a Computable Approach to the Efficient Market Hypothesis: An application of Genetic Programming, 1997 From, Journal of Economic Dynamics and Control, Volume 21, Issue 6, 06- June-1997 [Che1996] Ping Chen, A Random Walk or Color Chaos on the Stock Market? - Time-Frequency Analysis of S&P Indexes, 1996 [Con1999] Rama Cont, Statistical properties of financial time series, 1999 [Cor1995] Karen Corcella, Marked Prediction Turns the Tables on Random Walk, Wall Street and Technology, Volume 12, No. 13, May 1995 The efficient markets theory is deprecated. The long held perception that financial markets assimilate information efficiently has been disproved by teams of mathematicians using computer based analysis techniques. Using powerful software models, being fed by volumes of trading data, pockets of predictability have been identified. Stock prices, it has been determined, do not just exhibit a "random walk", but fluctuate in response to events. Each participant in the market has a frame of reference. These are time horizons on which they trade. For example, the time horizon for a market trader oriented toward speculation may be a minute or less, while the time horizon for a large bank may be several years. [CT1996] Shu-Heng Chen and Ching-Wei Tan, Rissanen's Stochastic Complexity in Financial Markets Department of Economics, National Chengchi University, Taiwan, from Society of Computational Economics, Second International Conference on Computing in Economics and Finance, Geneva, Switzerland, 26-28 June 1996 [DA1996] R.T. Daigler, A.J. Aburachis, Biases Associated With International Stock Market Index and Predictability Studies, 1996 From Journal of Multinational Financial Management, Volume 6, Issue 2-3, 01-January-1996 [DLMRS1998] Gautam Das, King-Ip Li, Heikki Mannila, Gopal Renganathan, Padhraic Smyth, Rule Discovery from a Time Series, 1998 American Associating for Artificial Intelligence [Dal1993] James M. Dalton, How The Stock Market Works, New York Institute of Finance, 1993 A market professional, the author gives a view of the stock market. A classic work, often cited in literature about the market. Given is a description of how the stock market functions. For example, descriptions are given of how trades are made, what the role of a customer, trader, or broker are; and how settlements are made. It gives no advice, but is rather, a pure instruction manual of the market.
- 18. [DeS1997] G. De Santis, Stock Returns and Volatility in Emerging Financial Markets , 1997, from Journal of International Money and Finance, Volume 16, Issue 4, 01-August-1997 [Dor1996] Georg Dorffner, Neural Networks for Time Series Processing, 1996 [EB1995] Carl J.G. Evertsz and Kathrin Berkner, Large Deviation and Self-Similarity Analysis of Graphs: DAX Stock Prices, Chaos, Solutions & Fractals, 6, 121-130 (1995) [EBB1995] C.J.G. Evertsz, K. Berkner, W. Berghorn, A local multiscale characterization of edges applying the wavelet transform, in Proc. Nato A.S.I. Fractal Image Encoding and Analysis, Trondheim (1995) [EM1992]C.J.G.Evertsz and B.B. Mandelbrot, Multifractal Measures, Appendix B in Chaos and Fractals by H.-O. Peitgen, H. Juergens and D. Saupe, Springer Verlag, New York, 849-881 (1992) [Ehl2001] John F. Ehlers, Nonlinear Ehlers Filters, 2000, Technical Analysis of Stocks & Commodities, April 2001, Volume 19, Number 4 [Eve1995] C.J.G.Evertsz, Self-Similarity of High-Frequency USD-DEM Exchange Rates Proc. First Int. Conf. on High-Frequency data in Finance and submitted to Fractals (1995) [Eve1996] C. J. G. Evertsz, Fractal Geometry of Financial Time Series, 1996 Proc. Mandelbrot Symposium Curatao, Eds. C.J.G. Evertsz, H.-O. Peitgen, R.F. Voss, and in Fractals and Proc. Mandelbrot Symposium Curacao World Scientific, Singapore, (1996) [FJ2000] J. Doyne Farmer and Shareen Joshi , The Price Dynamics of Common Trading Strategies, 2000 [FL1999] J. Doyne Farmer, Andrew W. Lo, Frontiers of Finance: Evolution and Efficient Markets, 1999 [FR1999] Sophie Forrester, Renee Ryerson, The Wall Street Journal Guide to Understanding Money and Investing, 1999 [Far1998] J. Doyne Farmer, Market Force, Ecology, and Evolution, Original Version 1994, Update: 1998, Santa Fe Institute, Journal of Economic Behavior and Organization [Far1999] J. Doyne Farmer, Physicists Attempt to Scale the Ivory Towers of Finance. Santa Fe Institute, 1999 [GAIM2000] Martin Gavrilov, Dragomir Anguelov, Piotr Indyk, Rajeev Motwani, Mining the Stock Market: Which Measure is Best?, Department of Computer Science, Stanford University, KDD, ACM, 2000 [GIP2001] William N. Goetzmann, Roger G. Ibbotson, Liang Peng, A new historical database for the NYSE 1815 to 1925: Performance and predictability, 2001
- 19. From Journal of Financial Markets, Volume 4, Issue 1, 01-January-2001 Described are analysis techniques, which are applied to individual stock prices for NYSE stocks over the period 1815 to 1925. They use this data to forecast long-term returns. [Gau1995] Steven Gaukroger, Descartes, An Intellectual Biography, Oxford University Press, 1995 [Gen1998] Ramazan Genay, The Predictability of Scurity Rturns with Simple Technical Trading Rules , 1998, from Journal of Empirical Finance, Volume 5, Issue 4, 01-October-1998 [Gle1987] James Gleick, Chaos: Making a Science, Viking Penguin, 1987. The detection of order in seemingly random events is discussed here. What is random at first glance, can have an underlying order. Patterns in this chaos can be extirpated by application of fractal analysis techniques. [Gra1992], Granger, C. W. J., 1992, Forecasting stock market prices: lessons for forecasters, International Journal of Forecasting, 8, 3-14. [Gra2001] J. Orlin Grabbe, Chaos and Fractals in Financial Markets, 2001 [Gul1999] Les Gulko, The Entropic Market Hypothesis, International Journal of Theoretical and Applied Finance, Vol. 2, No. 3 (1999) 293-329 [HH1995] Michael Harries, Kim Horn, Detecting Concept Drift in Financial Time Series Prediction using Symbolic Machine Learning, 1995 [HH1996] Christian Haefke, Christian Helmenstein, Forecasting Stock Market Averages to Enhance Profitable Trading Strategies, 1996 [HH1998a] Thomas Hellstrom, Kenneth Holmstrom, Predicting the Stock Market (1998) [HH1998b] Thomas Hellstrom, Kenneth Holmstrom, Predictable Patterns in Stock Returns, 1998 [HK2001] Jiawei Han, Micheline Kamber, Mata Mining, Concepts and Techniques, Morgan Kaulman Publishers, Acedemic Press, 2001 [HKS1999] Rainer Hegger, Holger Kantz,Thomas Schreiber, Practical Implementation of Nonlinear Time Series Methods: The TISEAN package [HKS2000] Rainer Hegger, Holger Kantz, Thomas Schreiber, TISEAN 2.1 (December 2000), Nonlinear Time Series Analysis Software, Institut fr Physikalische und Theoretische Chemie, Universitt Frankfurt (Main), Max-Planck-Institut fr Physik komplexer Systeme, Dresden [Hol2000] Norman N. Holland, Creativity and the Stock Market, 2000, in PSYART, A Hyperlink Journal for the Psychological Study of the Arts. A psychological perspective on stock prices. Holland explains how the value of a stock is frequently at odds with the price of a stock. He describes EMH, the Efficient Market Hypothesis, and that stock prices
- 20. truly do a random walk, or a drunkard's walk. Investors make money by guessing what other investors' opinion will be of the price of a stock. [Iva1998] Per H. Ivarsson, Multivariate Nonlinear Time-Series Prediction of Foreign Exchange rate Returns Using Genetic Algorithms and Artificial Neural Networks, Institute of Theoretical Physics, Chalmers University of Technology, Gothenborg, Sweden, 1998 Compared are the predicting performance of several algorithms, to wit: univariate linear, multivariate linear, multivariate nonlinear, and univariate nonlinear. [JB1998] Shareen Joshi, Mark A. Bedau, An Explanation of Generic Behavior in an Evolving Financial Market, 1998 [JMK1996] George H. John, Peter Miller, Randy Kerber, Stock Selection Using Recon, Stanford University [JPB1999] Shareen Joshi, Jeffrey Parker, Mark A. Bedau, Financial Markets Can Be at Sub-Optimal Equilibria, Santa Fe Institute, 1999 [Joh2001] George John, gjohn@epiphany.com, personal communication via email to inquire about details relating to Recon, database schema, data contents, source code availability. March 28, 2001 [KD1999] Mark Kamstra (Simon Fraser University), R. Glen Donaldson (University of British Columbia), The Accuracy of Fundamental Stock Market Price Estimates and a Refinement to the Donaldson-Kamstra Fundamental Estimate Paper provided by Society for Computational Economics in its series Computing in Economics and Finance '99 as number 954,1999 [KS1997] H Kantz and T Schreiber, Nonlinear Time Series Analysis, published by Cambridge University Press in the Cambridge, Nonlinear Science series. [KT2000] Catherine KYRTSOU, Michel TERRAZA, Is it possible to study jointly chaotic and ARCH behaviour? Application of a noisy Mackey-Glass equation with heteroskedastic errors to the Paris Stock Exchange returns series, Unniversitat Pompeu Fabra, 2000, [Kab1996] M.A. Kaboudan, Nonlinearity and Complexity of Stock Returns, 1996 [Kel1994] Kevin Kelly, Cracking Wall Street, An article that is an excerpt and adaptation from Kevin Kelly's new book "Out of Control: The Rise of Neo-Biological Civilization," published by Addison-Wesely in June 1994. [Ker1992] Kerber, R., Chi Merge: Discretization of Numeric Attributes. in AAAI-92 Proceedings of the Tenth National Conference on Artificial Intelligence, AAAI Press/MIT Press, pp 123-128 [Kog1995] Alexander Kogan, Artificial Intelligence and Finance, Rutgers University, 1995
- 21. [Kra2000] Andreas Krause, Microstructure Effects on Daily Return Volatility in Financial Markets, 2000 [Kra2001a] Andreas Krause, An Overview of Asset Pricing Models. University of Bath, School of Management, UK, 2001 Krause draws distinctions between the fundamental value of a security and its price. The concept of price is further dichotomized into a natural price and a market price. The former being a reflection of the fundamental value of a commodity, while the latter is the actual price determined in a transactional scenario. The twain may, and quite often do, deviate. However, ultimately, converges on the fundamental value. [Kra2001b] Andreas Krause, Implicit Collusion in Discounted Stochastic Games with Private State Observations, University of Bath, School of Management, 2001 [Kra2001c] Andreas Krause, The Optimal Control Structure in Dealer Markets, University of Bath, 2001 [Kra2001d] Andreas Krause, Patience and Explicit Collusion in Infinitely Repeated Games, University of Bath, 2001 [LAP1998]Blake LeBaron, W. Brian Arthur, Richard Palmer, Time series properties of an artificial stock market. 1999, from Journal of Economic Dynamics and Control, Volume 23, Issue 9-10, 01-September-1999 [LLS1994] M. Levy, H. Levy and S. Solomon, A microscopic model of the stock market; Cycles, booms and crashes, Economics Letters 45 (1994) [LLS2000] Haim Levy, Moshe Levy, Sorin Solomon, Microscopic Simulation of Financial Markets, 2000 [MCFR2000] Michele Marchesi, Silvano Cincotti, Sergio Focardi, Marco Raberto, Development and testing of an artificial stock market, 2000 [MS2000] Paul Marriott, Mark Salmon, An Introduction to Differential Geometry in Econometrics, 2000 [MST2000] David McFadzean, Deron Stewart, Leigh Tesfatsion, A Computational Laboratory for Evolutionary Trade Networks, 2000 [MT1999] David McFadzen, Leigh Tesfatsion, A C++ Platform for the Evolution of Trade Networks, 1999 [Man1963] Benoit.B. Mandelbrot, The Variation of Certain Speculative Prices, The Journal Business of the University of Chicago, 36, 394-419 (1963) [Man1988] Benoit B. Mandelbrot, The Fractal Geometry of Nature, 1988, W.H. Freeman and Company, New York [Man1997] Benoit B. Mandelbrot, Fractals and Scaling in Finance, 1997 [McF1998] Daniel McFadden, Rationality for Economists?, 1997 Department of Economics, University of California, Berkeley, Visiting Economist, Santa Fe Institute, Fall 1998, Paper original August 1996
- 22. (revised July 1997, September 1998), Journal of Risk and Uncertainty, Special Issue on Preference Elicitation [Mod1999] Theodore Modis, Complexity and Competition at the Stock Market, 1999 [Osh1996] James P. O'Shaughnessy, What Works on Wall Street, McGraw- Hill, New York, 1996 [PCFS1979] N.H. Packard, J.P. Crutchfield, J.D. Farmer, and R.S. Shaw, A Geometry from a Time Series, Phys. Rev. Let. 45 (1980) 712-716. ... [PJS1992] H.O. Peitgen, H. Juergens and D. Saupe, Chaos and Fractals, Springer Verlag, New York, 849-881 (1992) [Pre1999] Chris Preist, Commodity Trading Using An Agent-Based Iterated Double Auction, Hewlett Packard Laboratories [Qi1999] Qi, M., Nonlinear predictability of stock returns using financial and economic variables, 1999, Journal of Business and Economic Statistics, 17, 4, 419-429. [RW1999] Marcel K. Richter, Kam-Chau Wong, What Can Economists Compute?, Marcel K. Richter (University of Minnesota), Kam-Chau Wong (Chinese University of Hong Kong) [SDG2001] Swarm Development Group, SWARM, www.swarm.org [SFI2000] Artificial Stock Market, Santa Fe Institute 2000, Software System Modeling a Stock Market [SR1993] Shaun-inn Wu, Ruey-Pyng Lu, Combining Artificial Neural Networks and Statistics for Stock-Market Forecasting, 1993, ACM [SR2000] Sonia Schulenburg, Peter Ross, An Evolutionary Approach to Modelling the Behaviour of Financial Traders, University of Edinburgh, 2000 [Sch1989] Jose A. Scheinkman, Blake LeBaron, Nonlinear Dynamics and Stock Returns, University of Chicago, Journal of Business, 1989, Part of IDEAS, which uses RePEc data, Authors registers at HoPEc [Sch2000] Thomas Schreiber, Nonlinear time series analysis: Measuring Information Transport Max Planck Institute for the Physics of Complex Systems, Dresden, Germany, Proceedings of the COST P4 workshop "Characterization of Manufacturing Processes: Synergetics and Data Processing Methods", Ljubljana, May 25--26, 2000. [Sha1989] S. Shaffer, Structural Shifts and the Volatility of Chaotic Markets, 1989 From Journal of Economic Behavior & Organization, Volume 15, Issue 2, 01-March-1991 [Sha2000] Cosma Shalizi, Why Markets Aren't Rational but Are Efficient, Santa Fe Institute Bulletin, Volume 15, Number 1, Spring 2000
- 23. Its proposed that the price of a stock is determined by a combination of luck, greed, and fear supplemented by testosterone and cocaine. So how can such an improbable methodology deliver a price that's reasonable? That's the question a research team, headed by Doyne Farmer attempts to answer. Using time series analysis, they have found that markets tend to return to a state of equilibrium after ever trading session. That's because goods, or capital, wind up with those for whom they have the highest utility, i.e., are willing to pay the most. This equilibrium constantly adjusts itself because of new information, causing prices to fluctuate randomly. This random readjustment is termed "efficiency". [Sin1994] P. Singer, An Integrated Fractional Fourier Transform, Journal of Computational and Applied Mathematics, 54, 221-237 (1994) [Urb2000] Richard M. A. Urbach, Footprints of Chaos in the Markets, Pearson Education Limited, Edinburgh Gate, 2000, Financial Times - Prentice Hall A lengthy in-depth didactic about chaos in the market Heavy with higher mathematics and statistics, a companion guide to these fields is required for readers not trained in these disciplines. [Vos1992] R.F. Voss, 1/f Noise and Fractals in Economic Time Series, R.F. Voss, 1992 In Fractal geometry and computer graphics J.L. Encarcarnacao, H.-O. Peitgen et al. (Eds.) Springer Verlag [WG1994] A.S. Weigend, N.A. Gershenfeld (Eds.) Time Series Prediction: Forecasting the Future and Understanding the Past, Addison-Wesley. 1994 [Wil1998] Bill Williams, New Trading Dimensions : How to Profit from Chaos in Stocks, Bonds and Commodities (Wiley Trading Advantage), 1998 ------------------------------------------------------------------------

Recommended

View more >