predicting stock trending in a financial market with ...sci.tamucc.edu/~cams/projects/207.pdfhand,...
TRANSCRIPT
Predicting Stock Trending in a Financial Market with
Neural Networks and Genetic Algorithms
GRADUATE PROJECT TECHNICAL REPORT
Submitted to the Faculty of
the Department of Computing and Mathematical Sciences
Texas A&M University-Corpus Christi
Corpus Christi, Texas
in Partial Fulfillment of the Requirements for the Degree of
Master of Science in Computer Science
by
Brian H. McCord
Summer 2003
Committee Members
Dr. Michelle Moore
Committee Chairperson
Dr. Mario Garcia
Committee Member
Dr. Dulal Kar
Committee Member
ii
ABSTRACT
The goal of this project was to predict stock trends based on data from a financial
market. This project originated from two ideas: that the human brain has a well-defined
structure; and that a financial market has a state and some rule of evolution. The system
in this project has two neural networks, which use a genetic algorithm to learn concepts.
Neural networks and genetic algorithms are two different optimization methods, which
may be used, either separately or together, in many applications where other methods
have less success.
The assumption was that at a moment of time, two things are known: the value of
the data series at that time; and the state of the market given by its history. These values
acted as variables of input for two neural networks: one predicted the next value of the
data series; and the other predicted the next state of the market.
The neural network system was tested upon various stocks where predicted trend
line slopes were compared to actual trend line slopes. The overall results were good and
showed that the accuracy of predictions depended on parameter settings of the genetic
algorithm.
iii
TABLE OF CONTENTS
Abstract ………………………………………………………………………….……….. ii
Table of Contents ……………………………………..……………….…………………. iii
List of Figures ……………………………………….………………………….………... vi
List of Tables …………………………………………………………………………….. ix
1. Introduction and Background ……………………………………………………….. 1
2. Predicting Stock Trending with Neural Networks and Genetic Algorithms ………... 7
2.1 Technical Analysis in a Stock Market ………………………………………... 7
2.1.1 Determining an Uptrend ……………………………………………... 7
2.1.2 Determining a Downtrend …………………………………………… 8
2.2 Architecture of Neural Networks ……...……………………………………… 9
2.3 Method of Neural Networks .…………...……………………………………. 12
2.4 Learning in a Neural Network …...…………………………...……….……… 14
2.5 Summary ………………...……………………………………………………. 16
3. System Design ………………………………………………….…………………… 17
3.1 Financial Data Analysis ………….…………………………………………… 17
3.1.1 Obtaining Financial Data ……………………………………………. 17
3.1.2 Quality Data ……………...…………………………...……….…….. 18
3.1.3 Smoothing Data ……………………………………………………… 19
3.1.4 Data Organization ………………...…………………………………. 21
3.2 Training by Backpropagation ………………………………………………… 22
iv
3.2.1 Backpropagation Phases …………………………………………….. 22
3.2.2 Backpropagation Through Time …………………………………….. 23
3.2.3 System Topology of Backpropagation ………………………………. 23
3.3 Partitioning the Data Series …………………………………………………... 25
3.4 Determining Weights .………………………………………………………… 26
3.5 Managing Prediction Error …...………………………………………………. 33
3.6 Initializing Process ………….………………………………………………… 33
3.7 The Ability of Randomness ..…………………………………………………. 34
3.8 Programming Environment …………………………………………………… 34
3.8.1 Programming Applications …………………………………………. 35
3.8.2 Input Files ………………...…………………………………………. 36
4. Evaluation and Results ……………………………………………………………..... 37
4.1 Test Methods ………………………………………………………………….. 37
4.2 Theoretical Experimentation ………………………………………………….. 39
4.2.1 SINE Data …………………………………………………………… 39
4.2.2 Lorenz Data ………………………………………………………….. 41
4.3 Empirical Experimentation …………………………………………………… 43
4.3.1 Wells Fargo ………………………………………………………….. 44
4.3.2 Genentech, Incorporated …………………………………………….. 46
4.3.3 IBM Corporation …………………………………………………….. 48
4.3.4 Exxon Mobile Corporation ………………………………………….. 49
4.3.5 HCA, Incorporated …………………………………………………... 51
v
4.3.6 AOL Time Warner …………………………………………………... 53
4.3.7 Wal-Mart Stores ……………………………………………………... 54
4.3.8 Intel Corporation …………………………………………………….. 56
4.3.9 Microsoft Corporation ……………………………………………….. 57
4.3.10 FedEx Corporation …………………………………………………... 59
5. Conclusion …………………………………………………………………………... 61
6. Future Work …………………………………………………………………………. 63
Bibliography and References …………………………………………………………….. 65
Appendix …………………………………………………………………………………. 66
vi
LIST OF FIGURES
Figure 2.1. Charting an Uptrend …………………………………………………... 8
Figure 2.2. Charting a Downtrend ………………………………………………… 9
Figure 2.3. Neural Network Connections ….……………………………………… 10
Figure 2.4. The Hidden Layer ……………………………………….…………….. 11
Figure 2.5. Simple Time Series Prediction ………………………………………... 12
Figure 2.6. Multiple Time Series ………………………………………………….. 13
Figure 2.7.
Noisy Data …………………………………………………………….. 15
Figure 3.1. Flat Trends ……………………………………………………………. 18
Figure 3.2. Spikes Causing Noise …………………………………………………. 19
Figure 3.3. Calculating a Moving Average ……………………………………….. 20
Figure 3.4. Smoothing data ……………………………………………………….. 21
Figure 3.5. Recurrence ………………………………………………….…………. 23
Figure 3.6. Topology of Backpropagation ………………………………………… 24
Figure 3.7. Partitioned Data ………………………………………………………. 25
Figure 3.8. Neural Network Architecture …………………………………………. 27
Figure 3.9. Encoded Weight Topology ……………………………………………. 28
Figure 3.10. Tournament Selection ………………………………………………… 30
Figure 3.11. Single-Point Crossover ……………………………………………….. 31
Figure 3.12. Evolution Cycle ……………………………………………………….. 32
Figure 3.13. Error During Prediction ……………………………………………….. 33
vii
Figure 4.1. SINE Input Data ………………………………………………………. 40
Figure 4.2. SINE Prediction ……………………………………………………….. 41
Figure 4.3. Lorenz Input Data ……………………………………………………... 42
Figure 4.4. Lorenz Prediction ……………………………………………………... 43
Figure 4.5. WFC Input Data ………………………………………………………. 44
Figure 4.6. WFC Best Predicted Trend …………………………………………… 45
Figure 4.7. DNA Input Data ………………………………………………………. 46
Figure 4.8. DNA Best Predicted Trend …………………………………………… 47
Figure 4.9. IBM Input Data ……………………………………………………….. 48
Figure 4.10. IBM Best Predicted Trend …………………………………………….. 49
Figure 4.11. XOM Input Data ………………………………………………………. 49
Figure 4.12. XOM Best Predicted Trend …………………………………………… 51
Figure 4.13. HCA Input Data ………………………………………………………. 51
Figure 4.14. HCA Best Predicted Trend ……………………………………………. 52
Figure 4.15. AOL Input Data ……………………………………………………….. 53
Figure 4.16. AOL Best Predicted Trend ……………………………………………. 54
Figure 4.17. WMT Input Data ……………………………………………………… 54
Figure 4.18. WMT Best Predicted Trend …………………………………………... 55
Figure 4.19. INTC Input Data ………………………………………………………. 56
Figure 4.20. INTC Best Predicted Trend …………………………………………… 57
Figure 4.21. MSFT Input Data ……………………………………………………... 57
Figure 4.22. MSFT Best Predicted Trend …………………………………………... 58
viii
Figure 4.23. FDX Input Data ……………………………………………………….. 59
Figure 4.24. FDX Best Predicted Trend ……………………………………………. 60
ix
LIST OF TABLES
Table 3.1. Classes …...…………………………………………………….……… 35
Table 3.2. Input Data ……………………………………………………………... 36
Table 4.1. Tested Stocks ………………………………………………………….. 38
Table 4.2. WFC Prediction Results ………………………………………………. 45
Table 4.3. DNA Prediction Results ………………………………………………. 47
Table 4.4. IBM Prediction Results ……………………………………………….. 48
Table 4.5. XOM Prediction Results ……………………………………………… 50
Table 4.6. HCA Prediction Results ………………………………………………. 52
Table 4.7. AOL Prediction Results ………………………………………………. 53
Table 4.8. WMT Prediction Results ……………………………………………… 55
Table 4.9. INTC Prediction Results ……………………………………………… 56
Table 4.10. MSFT Prediction Results ……………………………………………... 58
Table 4.11. FDX Prediction Results ……………………………………………….. 59
Table 4.12. Best Tests ……………………………………………………………... 60
Table 5.1. Iterations ………………………………………………………………. 62
1
1. INTRODUCTION AND BACKGROUND
It is easy to find experts in various aspects of investing from whom to acquire
knowledge, but few regularly offer reliable knowledge to the public. One reason is that
the complexity of making investment decisions is such that the techniques involved are
rarely consistent. In addition, the knowledge of human experts is usually subjective and
limited. To overcome such limitations, knowledge acquisition from a machine-learning
system may be used. Machine learning can be used to automate the creation of
investment rules from inputs associated with a financial market, such as price-earning
ratios, open prices, and close prices [Lisboa 2000]. Machine learning can also be used to
produce stronger trading rules. Some of the most popular approaches to machine
learning include neural networks and genetic algorithms. However, it should be
remembered that human experts remain an important medium from which one can find
fundamental and analytical knowledge. Therefore, their knowledge will always be
considered more important than machine-learned knowledge.
Neural network technology has some advantages over conventional expert system
approaches in some applications. First, since neural networks do not require that
knowledge be formalized, they are appropriate for domains where knowledge is scanty.
In this sense, a neural network may replace any rule-based system [Lisboa 2000].
The study of artificial neural networks originated from efforts to simulate the
functioning of the human brain in the areas of learning and problem solving. When a
person experiences an unfamiliar event the brain makes certain considerations and
generalizations within the scope of previous and stored experiences. With this
2
functionality, attempts can be made to produce educated guesses [Nelson 1991]. An
example of this is demonstrated when an investor observes a stock’s price drop at the
beginning of every year with a strong period of recovery during the remainder of the
year. If the investor makes this observation year after year, and finally purchases the
stock during an early annual price drop, the investor’s expectations from past experiences
will be for the stock price to rise, thereby creating a profitable return later in the year
when the stock is sold.
Neural networks are a form of computer programming inspired by biology. They
are designed to mimic the functions of the human brain. The fundamental idea of neural
networks is based on the nerve cell called a neuron. Each neuron has connective strings
on each end. The connectors on one end are the dendrites, which carry signals into the
neuron, and the axon carries signals out through the connectors on the other end. The
point at which these signals are fired from the axon of one neuron to the dendrites of
another is called a synapse. The signals are binary, where they are either sent or not sent
[Nelson 1991]. Many different signals pass through the synapses. The neurons sum each
incoming signal, and if a signal’s sum exceeds a preset threshold value, the neuron fires it
across the synapse to another neuron [Welstead 1994].
The human nervous system is composed of a complex network of neurons. The
key to human behavior and thoughts is embedded in these networks. This enables the
human brain to simultaneously perform many tasks, otherwise known as parallel
processing [Baddeley 2000]. This basic structure forms the foundation of neural
networks where software can be programmed with linkages of nodes, inputs, and outputs.
3
The term node refers to the neural network equivalent of a neuron. Nodes have
input signals, which correspond to dendrites, and one output signal, which corresponds to
the axon. The input signals are assigned weights and summed at the node before
producing output. Initially, input signals may be assigned weights randomly. As the
network learns, these weights may be adjusted [Welstead 1994]. Many methods exist for
making these adjustments, and the procedure is typically known as the summation
function.
Compared to the statistical models used by conventional systems, neural networks
are strongly different. They are especially suited for simulating intelligence in pattern
detection, association, and classification activities. They have also gained much interest
in economic decision-making. For example, financial organizations are second only to
the U.S. Department of Defense in sponsoring research in neural networks [Nelson
1991]. Although there are many applications of neural networks, investment problems
remain a true conquest. One such luring quest is predicting stock trends in a financial
market.
One of the first systems for predicting a financial market used many statistical
methods. Currently, there are many references on this subject, as well as several
companies that produce these statistical systems to make such predictions. However,
with advances in neural network research, statistical methods are no longer regarded as
the primary technique for predicting financial markets [Lisboa 2000].
Nevertheless, neural networks also have their weaknesses. One weakness is that
they might find some factors to be important for decision-making when those factors are
actually irrelevant or conflict with traditional theories. This can occur because neural
4
networks are entirely data restricted. Since the scope of training is always limited by
economics and time, networks that contradict theory are at risk of functioning well only
on data with structure similar to their training data. Another potential problem is that
most neural networks cannot guarantee an optimal solution to a problem. On the other
hand, if the neural network is properly configured and trained, it usually can provide
correct results [Lisboa 2000].
When dealing with neural networks they must be provided with a method of
learning. Typically, a learning rule is applied for updating the connection strength
between the nodes in a neural network. One such learning method that has a lot to be
explored is the application of a genetic algorithm. Employing a genetic algorithm is
certainly a unique alternative and might provide a more effective concept for neural
networks to learn.
Genetic algorithms are software procedures modeled after genetics and evolution.
They are designed to efficiently search for optimal solutions to large problems. The
search proceeds in survival-of-the-fittest fashion by gradually manipulating a population
of potential problem solutions until the most superior ones dominate the population.
In living organisms, cells contain important groupings of special material
containing hereditary information. These structures are called chromosomes. Genes are
specific factors that are carried by chromosomes. The basic process of coding
information within genes and chromosomes is fairly simple, but the possible results are
almost limitless in number [Rees 1977].
5
The genetic makeup of each organism is called its genotype. The genotype
determines and limits many aspects of development and survival such as how an
organism responds to normal and abnormal environmental conditions [Rees 1997].
During the process of mating, the members of each chromosome exchange
genetic material in a technique known as crossover. Sometimes the copying of genetic
material results in slight imperfections known as mutation, which leads to additional
diversity in a population [Rees 1997].
These concepts provide an important environment for the specific procedures
used in genetic algorithms. To begin using a genetic algorithm as a problem-solving tool,
the problem must be represented in a manner that a genetic algorithm can work with.
This means representing the parameters of the problem solution using a string of digits,
usually binary. This representation has a biological parallel where the bit string can be
viewed as a chromosome-type structure, in which the 0s and 1s represent genes within
the chromosome.
The steps in a genetic algorithm also have some biological similarities. A genetic
algorithm begins by creating a population of potential problem solutions. Next, the
fitness of each individual in the population is calculated and each is assigned a numerical
value. Better-fit individuals are then selected to become parents of the next generation.
Finally, a second generation is formed from these selected parents by means of a
crossover process such as a random exchange of bits. Mutation is another possible
genetic operation that could be performed, where a random string bit is switched to its
opposite. Obviously, this operation would not be used very often but does introduce
some diversity into the genetic algorithm and decreases the chance for premature
6
convergence, where all the individuals in a population attain the same level of fitness too
early causing evolution to halt before an acceptable solution is found. The process of a
genetic algorithm repeats from the step of calculating fitness through the process of
mating until the population fully converges. Full convergence is a characterization when
all strings are identical in all bit positions, otherwise known as a near-optimal solution
[Goldberg 1989].
Both neural networks and genetic algorithms are patterned after nature. Neural
networks mimic brain functions, and genetic algorithms mimic the process of evolution.
The rationale behind both methods is simply to learn from nature’s efficiency. A neural
network must be trained by presenting it with a test set of data. It then makes predictions,
evaluates the results, and repeats over and over. Genetic algorithms involve multiple
generations of a large population. Calculations for each individual of the population are
performed repeatedly as the population undergoes optimization changes. This work
evaluates how well a neural network can learn from a genetic algorithm and make
accurate predictions in real-world systems.
In this project the real-world system was a financial market. The topic has been
chosen because of the author’s personal and career experiences in finance and economics.
By combining these experiences with knowledge gained from researching neural
networks and genetic algorithms, an opportunity was provided to design a method for
predicting stock trends.
7
2. PREDICTING STOCK TRENDING WITH NEURAL
NETWORKS AND GENETIC ALGORITHMS
This project involved the development of neural networks and a genetic algorithm
to predict stock trending in a financial market. Because this was an attempt to predict a
financial stock trend line, it essentially became a model for time series forecasting.
Neural networks that performed the prediction were trained using a backpropagation
technique and a genetic algorithm. As a learning error was calculated in one neural
network, a second neural network determined the state of the market at each instance of
input.
2.1 Technical Analysis in a Stock Market
The stock market can be difficult to understand regardless of one’s level of
experience. Stock brokers, advisory letters, experts, and the media are all sources of
information that share their opinions, but have different points of view. Technical
analysts, on the other hand, have devised methods of creating and interpreting stock
charts, which do not use the fundamentals of the sources mentioned above [Pistolese
1994]. In this project, the neural network system was designed and implemented to
predict stock trends, in which technical analysis of the predicted data created trend
patterns similar to technical analysis of the real data.
2.1.1 Determining an Uptrend
The price of any stock fluctuates in small and quick movements, which makes
short-term top and bottom points over a period of time. When stock prices are charted
and the successive bottoms are higher than preceding bottoms, the price is in an uptrend.
8
This can be seen in Figure 2.1. The graph shows that bottom B is higher than bottom A,
and bottom C is higher than bottom B. After bottom B was established, the uptrend was
made. The uptrend was confirmed when bottom C occurred. Once the bottoms of A and
B have been created on the chart, an uptrend line is drawn. Whenever subsequent
bottoms occur at, or near this line, the uptrend is reconfirmed [Pistolese 1994].
Figure 2.1. Charting an Uptrend
2.1.2 Determining a Downtrend
When charted, stock prices show that successive highs are lower than preceding
highs, the stock is in a downtrend as seen in Figure 2.2. Here it can be seen that point B
is lower than point A, and point C is lower than point B. The downtrend was created as
soon as the downtrend line could be drawn from A to B. Next, point C confirms the
downtrend. When any subsequent top point occurs at, near, or below this trend line, the
downtrend is reconfirmed. In most situations the subsequent tops will not even come
TIME
PRICE
uptrend
A
C
B
9
near the line because downward trends tend to accelerate as they proceed [Pistolese
1994].
Figure 2.2. Charting a Downtrend
2.2 Architecture of Neural Networks
Neural networks are made up of many simple processors, all of which are
programmed to perform the same basic task with each having a small local memory.
Each processor can be referred to as a node. An individual node has one output but more
than one input. Outputs of one node are essentially inputs to other nodes of the network.
In addition, an output may be fed back as an input to the same node. This is known as a
feedback system, which is commonly used in neural networks. On the other hand, a
feedforward network does not contain any feedback connections [Nelson 1991].
In most neural networks the processing that takes place is rather simple. The
process takes a weighted sum of the inputs and calculates an output value that is a
function of that sum. The node’s local memory stores interconnected parameters (or
TIME
PRICE
downtrend
A
C
B
10
weights) and when many nodes are linked together a neural network is created. The
strength of the connection between the nodes determines the networks’ ability to
correctly generalize. The basic model below in Figure 2.3 demonstrates the connectivity
of a simple neural network.
Figure 2.3. Neural Network Connections
In this model, weights are represented as lines connecting the input nodes and the
summation box. First, the input nodes contain values that are presented to the network.
The summation function collects the weight values and the input values. Then the output
node provides a generalization outcome.
The pattern of interconnected nodes is the primary distinguishing factor between
most neural networks. In most patterns, networks are combined in layers (input, output,
and hidden layers) operating in synchronous fashions with one another. In theory, one
hidden layer is sufficient and necessary for expressing a nonlinear relationship between
Summation
Input
Output
Input
11
the input and output nodes. In this project a multilayered feedforward network was used
for the sake of dealing with the dynamics of a financial market [Hecht-Nielson 1990].
A hidden layer can be seen in Figure 2.4 below. The nodes in the middle layer
are known as hidden, because they do not receive direct input from the real world and
they do not produce direct output outside the network [Lisboa 2000].
Figure 2.4. The Hidden Layer
Because hidden layers contain some of the knowledge within a network, they are an
important component. There are no general rules concerning an appropriate number of
hidden layers and there are no general rules about the manner in which they should be
connected. They commonly function as filters for noisy data as information moves
through the network [Nelson 1991]. Hidden nodes were vital in this project when
financial input data experienced sharp fluctuations and anomalies. These had to be
recognized and understood in order for the neural networks to determine overall trending.
Input Layer Output Layer Hidden Layer
12
2.3 Method of Neural Networks
The neural networks operated much like a black box where inputs and desired
outputs were described, an initial guess was made at a network structure, and then
experimentation took place. The neural networks had more than one input and exactly
one output. The goal was to predict one point ahead in a single time series. Figure 2.5
shows the typical mapping of how a time series was used with the neural networks for
prediction.
Figure 2.5. Simple Time Series Prediction
The figure demonstrates the use of six adjoining points in a time series to predict
the next point. A training series was used to generate a large number of individual
samples [Welstead 1994]. A sample consisted of seven points: the current point 0; five
5 4 3 2 1 0 Predicted
historical current training
points point point
Learning
Neural Network
13
historical points of 1 through 5; and the predicted point to be used for directing the
training of the output.
Time series predictions are typically made using points from a single time series
as predictors [Welstead 1994]. An example using a stock market model is shown in
Figure 2.6 below. The figure demonstrates how several different time series are
combined into a future price. The series in the project were determined through trial and
error, but the final technique was similar to this example. Here, the series are the close
prices and respective trading volumes of a stock at five different time periods.
Figure 2.6. Multiple Time Series
Close price
Volume
Close price
Close price
Day 1
Day 2
Day 3
Day 4
Neural
Network
Predicted
Price
Close price
Volume
Volume
Volume
Day 5
Close price
Volume
14
2.4 Learning in a Neural Network
In a multilayered feedforward network the output nodes detect the errors. In this
project, these errors were propagated back to the nodes in the previous layer, and the
process was repeated until the input layer was reached. An effective algorithm that learns
in this manner, and was used here, was the backpropagation algorithm. This allowed an
incremental adjustment of weights to reduce the errors of prediction [Nelson 1991].
Once an acceptable set of weights was reached, the network ran in a feedforward mode to
organize new cases and make predictions.
Sometimes a difficult issue with neural networks is that during the learning
process, the measured error might not decrease in a consistent manner. This means that it
is not always easy to decide when to stop the learning phase. This project determined the
stopping point, based on performance of the test data [Lisboa 2000].
The neural networks had an error associated with each prediction. The accuracy
of the system was determined by this error. Because future data is unknown, known
historical data was used for the entire neural network process. This historical data was
partitioned into three sections so the neural networks could compute the error of
prediction and learn.
The first set was used to train the neural networks. Evaluating the accuracy of
predictions was performed by the second set. These first two sets were used to compute
the error of prediction. Finally, the third set was used to test the results.
For input, the neural networks used historical values pertaining to stocks. An
important issue considered was that many data series are vulnerable to noise. When this
occurs, prediction results might have be affected even further. In order to produce better
15
results, real data was sometimes transformed into smooth data. This was extremely
important when dealing with stocks experiencing rapid price changes, or unstable
markets that have suffered as an indirect result of major world events. Figure 2.7 shows
an example of noisy data, but closer observation shows that the overall trend is moving
upwards. Detecting the overall movement was a critical aspect of the success of the
neural network system.
Figure 2.7. Noisy Data
In order to produce high-quality predictions, the connections (or weights) in the
neural network were identified for the system. However, instead of having a basic and
traditional mathematic function for determining weight adjustments for connectors, a
genetic algorithm was used. The genetic algorithm searched for the best possible weights
in order to strengthen connectors and assist the neural network in maintaining a low error
of prediction.
TIME
PRICE
noisy data
upward trend line
16
2.5 Summary
In summary, building a neural network involved many steps. The system in this
project was a fully connected, multilayered, feedforward network. In addition, a
backpropagation technique was used for training while a genetic algorithm determined
the best parameter values for the neural networks, to produce as little error as possible.
17
3. SYSTEM DESIGN
3.1 Financial Data Analysis
The first step in conducting this research was to study financial trend patterns to
determine the key factors affecting them. This helped determine the type of input that
needed to be provided to the neural network system. Each issue that was discovered was
an important factor in adjusting input data in order for the neural network to perform.
Incorporating the main issues that cause financial trend lines to change, gave the neural
network a better opportunity to produce accurate forecasts.
3.1.1 Obtaining Financial Data
One reason for displaying financial data, such as closing prices, in graphical form
is to make it easier to interpret stock trends and suspected price anomalies. In this
project, all input data was viewed graphically, prior to being downloaded. This pre-
analysis of input data help established proper testing standards and parameters. To
effectively determine stock trends and discover any anomalies, it was advantageous to
obtain historical data for long periods of time [Hecht-Nielson 1990]. Likewise, the
neural network needed a sufficient amount of data in order to be tested effectively. Daily
data was obtained for several stocks with a range of 13 months. This provided the system
with one year of data for training and evaluation, and one month of data for testing the
prediction.
Since reporting systems make use of financial data on individual stocks,
companies, and industries, the data is maintained in many public databases. As a result
of the widespread availability of electronic data sources, investment databases were
18
accessible for downloading large amounts of financial data. The data source that proved
the easiest to manipulate and the most efficient was the financial database of Yahoo.com
located on the Internet at http://finance.yahoo.com.
3.1.2 Quality Data
It is not necessary to have an approach for dealing with incomplete data because
data gaps do not exist in a securities market. When the market is closed to stock trading,
it is closed to all stocks. Even if a stock is dormant it is still available for trade provided
the market is open. Therefore, seemingly inactive data remains vital to the neural
network where flat trends sometimes occur, because all data is active in a stock market as
long as time is elapsing. A flat trend is illustrated in Figure 3.1.
Figure 3.1. Flat Trends
However, financial market data does have the tendency to quickly spike upward
and downward because price changes around a trend are somewhat random. Such spikes
TIME
PRICE
flat trends
19
in the data created a disturbance, otherwise known as noise, in the neural network’s
attempt to train. These spikes can be seen below in Figure 3.2. Predicting these notably
sharp changes around the trend was not possible because the noise can mistakenly be
interpreted as random data and trending cannot be identified [Lisboa 2000]. When the
neural network was tested with noisy data, it was more effective to first smooth any data
spikes.
Figure 3.2. Spikes Causing Noise
3.1.3 Smoothing Data
Moving-averages were used to smooth a data series and make it easier to find
trends amongst noisy data series. Because of the ability to locate trends, moving-
averages are common tools employed by technical analysts of financial markets
[Pistolese 1994]. Price and trading volume data are usually displayed in graphs showing
stock price trend lines, moving-average curves of stock prices, and moving-average
TIME
PRICE
downward spikes
upward spikes
20
curves of trading volume. Calculating the average price of a stock over a predetermined
number of time periods creates moving-averages. When dealing with stock prices, it is
popular for the closing price to be used for computing the moving average. For stock
trends in this project, data on closing prices and average prices were applied. For
example, adding the closing prices for the last five days and dividing the total by five can
calculate a five-day moving average. A moving average moves because as the newest
period is added, the oldest period is discarded [Pistolese 1994]. If the next closing price
in the average is 32, then the new period will be added and the oldest day, price 22, will
be removed. The new five-day moving average will be calculated. Figure 3.3 illustrates
the concept of calculating a moving average.
Figure 3.3. Calculating a Moving Average
The only disadvantage of moving averages in a financial market is that they lag
behind real-time market prices [Pistolese 1994]. Because lagging is a well-known
characteristic of moving averages, they are only emphasized by financial analysts for
smoothing data series and illustrating trends. As previously discussed, financial markets
are volatile to many real-world events, frequently creating sharp changes in trend lines.
Day
Daily
Close Price
5-day
Moving AVG
1 22
2 24
3 26
4 25
5 23 24
6 32 30
21
Because these sharp changes are known as noise by the neural network, the concept of
moving averages was applied to the training of the neural network system whenever
necessary. Neural networks are designed to learn from past data thereby being unaffected
by lag-time, which made moving-averages a good choice for smoothing sharp anomalies.
When smoothing was completed, the resulting data appeared more like the smoother
dotted line in Figure 3.4.
Figure 3.4. Smoothing data
3.1.4 Data Organization
It was critical that the organization of data be consistent for all input and output
files in order to always execute the program code successfully [Welstead 1994]. For the
neural network system, a text file was created for input and contained a data series
arranged in columns and separated by white spaces. A text file was also created for the
genetic algorithm. It contained information vital to the genetic algorithm consisting of
population size and mutation rate. It also consisted of numbers representing neural
TIME
PRICE
smoothing
of spikes
22
network values for training time, evaluating time, testing time, and the number of
neurons. Also included were data file dimensions with each dimension’s level of
importance.
3.2 Training by Backpropagation
The network system in this project contained several input nodes, one hidden
layer, and a single output node. Input variables were classified as closing prices and
trading volume at daily intervals of time. The output values were produced as a data
series of predicted closing prices with time intervals identical to the input. The neural
networks learned by using a backwards propagation of error known as backpropagation.
Basically, this is a feedforward technique, which calculates the difference between
predicted outputs and desired outputs from each iteration.
3.2.1 Backpropagation Phases
Backpropagation training was composed of three phases. The first phase
provided input to the neural networks and moved forward from the hidden layer to the
output layer. The next phase determined the difference (error) between the desired output
and the output produced by the neural network in the output layer [Nelson 1991]. Third,
the weight of each connection was adjusted in proportion to the error previously
calculated. Therefore, after this third step most weights had a different value. An
explanation of determining weights will appear later in section 3.4.
23
3.2.2 Backpropagation Through Time
Errors were backpropagated even further. This is called backpropagation through
time and is a simple extension of what was previously discussed. Local memory was
created that contained states reflecting structural dependencies and states containing
structural predictions. This technique introduced an architecture that has a local recurrent
nature, but also had an overall global feedforward construction [Yves 1995]. The
property of recurrence that was used is shown below in Figure 3.5.
Figure 3.5. Recurrence
3.2.3 System Topology of Backpropagation
The backpropagation training methods that were performed in this project are
illustrated in Figure 3.6. Using a sliding input window, the diagram shows a 30-day
period of a stock’s close prices and trading volumes. Input for both neural networks
Output
Input State
State
Input
Input
State
State
24
contained the values from the sliding window and the state of the system. Sometimes
price values would differ very little or not have any difference. Therefore, the second
neural network used the property of recurrence that introduced a local memory so that
states could function as additional parameters helping to distinguish between input
instances. In the first iteration, not depicted in Figure 3.6, states had an input value equal
to zero until an initial prediction was made. After the input was read in, one network
produced the predicted prices and the other network produced the next predicted state.
An error was calculated for the predicted outcome and connection weights were adjusted
in proportion to the error calculated.
Figure 3.6. Topology of Backpropagation
During the learning process, each data instance was applied to the neural
networks, and output values were computed using the current weights. Then, weights
were adjusted in order to decrease error measures, and another data instance was applied.
Predicted
Input Nodes Neural Networks Outcome Output Node
input1
close trading input2
price volume input3 Neural close price
c1 v1 input4 Network
c2 v2 input5 1c3 v3 input6
c4 v4 input7
c5 v5 input8
c6 v6 input9
c7 v7 input10c8 v8
c9 v9 state1
c10 v10 state1 Neural state2
c11 v11 state2 Network state3
c12 v12 state3 2 state4c13 v13 state4 state5
c14 v14 state5
c15 v15
c16 v16
c17 v17
c18 v18c19 v19
c20 v20
c21 v21
c22 v22
c23 v23c24 v24
c25 v25
c26 v26
c27 v27
c28 v28c29 v29
c30 v30
Input Data
predicted
close pricetrading
volume
input
nodesexpected
results
Calculate error of
predictions.
Sliding
window
of input
Before going to
output, results
back-up to
determine error.
Weights are adjusted in
proportion to error calculated.
Local recurrent network
provides and predicts
structural
dependencies.
25
After all data instances were applied, the entire process was repeated as long as a
significant amount of error reduction was accomplished. A large number of passes
through the data set were required to accomplish a firm solution. After these steps were
executed for all the data in one input window, one epoch had been completed.
3.3 Partitioning the Data Series
The data series was partitioned into three disjoint sets. These sets were the
training set, evaluation set, and test set. Traditionally, this method lends the majority of
the data to training, while the remaining data is equally divided between the other two
sets.
The backpropagation training occurred directly on the training set. The neural
networks’ ability to generalize was checked for accuracy on the evaluation set. Finally,
its ability to forecast was measured on the test set. This is more clearly understood with
Figure 3.7 below.
Figure 3.7. Partitioned Data
PRICE
train test evaluate
real data
TIME
26
3.4 Determining Weights
In order for neural networks to learn they need some type of learning rule [Lisboa
2000]. A learning rule functions as a guideline for performing the weight (connection
strength) updates. In this project, network configurations were created using a genetic
algorithm for learning. Essentially, the genetic algorithm found near optimal weights for
the neural networks’ connections. Its goal was for the error computed on data sets to be
as small as possible. In the genetic algorithm, the population consisted of encoded
weights that represented the individuals. Each individual (weight) had a fitness
associated with it, and individuals with better fitness were considered better solutions.
Between one generation and the next, individuals were selected from which to create
offspring by a crossover operation. The types of individuals consisted of learning rates of
neural network dimensions. The financial indicators, close-price and trading volume,
were used as the dimension types for learning.
The neural network system globally functioned as a feedforward network with a
sliding window of input. Each data window contained input for a two-level network
composed of ten inputs (five close prices and five associated volumes), two units in the
hidden layer (close price and volume), and a single output (close price). There were a
total of twenty weights connecting the input layer to the hidden layer. This architecture is
depicted below in Figure 3.8. The input nodes from the data set are labeled to help
identify their association with the hidden nodes for close price (C) and volume (V).
27
Figure 3.8. Neural Network Architecture
The use of a genetic algorithm implies a genotypic representation of the
individuals. This allows the genetic operators to modify them without using knowledge
about the individuals’ structure. In this project the genetic algorithm used binary encoded
weights to represent individuals in a population. Each weight was composed of eight
bits. Since each data window consisted of ten inputs all connected to two hidden
neurons, there were a total of twenty weights that were determined by the genetic
algorithm per epoch. The encoded weight (W) topology is better understood by viewing
Figure 3.9. As in the previous figure, the labeling of close price (C) and volume (V) help
identify placement within the dataset.
Input Layer
Output Layer
Hidden Layer
C1 V1 C2 V2 C3 V3 C4 V4 C5 V5
28
Figure 3.9. Encoded Weight Topology
Starting from a randomly generated initial population of weights, the network
generator built a network from the genotype. The backpropagation technique for
measuring prediction error was performed on the neural network system, and the error
was delivered to an application for fitness evaluation and scaling. This process calculated
the actual fitness value for each weight according to its performance. A fitness value
represented the quality of an individual, and was used to rank the individual in a
population. The calculation of fitness is specific to the individual problem and is
essentially a driving force for an effective evolutionary search.
In this project the fitness of an individual represented the accuracy of the
prediction computed by the neural network system. Hence, the higher the fitness of an
individual, the lower the prediction error of the neural network system. It is important to
note that high error is bad and high fitness is good. A simple approach is to divide one by
W1
W2
W3
W4
W5
W6
W7
W8
W9
W10
W11
W12
W13
W14
W15
W16
W17
W18
W19
W20
V1
C1
C2
V2
C4
V4
C5
C3
V3
V5
Hidden
Neuron
1
Hidden
Neuron
2
29
the error and higher errors will give lower fitness, and similarly lower errors will give
higher fitness. However, there is a problem with this approach. If the error were ever
derived to be zero (a perfect individual), dividing one by this would cause a divide-by-
zero math error. The better approach was to use the fitness formula shown in Eq. (3.1)
below.
fitness = |N – total error|, (3.1)
where N is the number of training instances.
More precisely stated, given an individual i, let wi be the weight obtained by assigning to
a connection in the net the corresponding weight encoded in the individual. This yields
the following equations given below in Eq. (3.2) and Eq. (3.3).
fitness of i = |N – total error of wi|, (3.2)
where
total error of wi = sum of all instances of |desired output – prediction|. (3.3)
Based on fitness, the genetic operator known as selection determined which
individuals were placed in a mating pool for reproduction. Then the chosen individuals
were allowed to mate by means of a genetic crossover operator, which ensures that
offsprings are new but also related to their parents. This evolution cycle was iterated for
a fixed number of instances (generations).
The tournament method was chosen as the genetic algorithm’s selection method
of choosing parents from the population to mate. Tournament selection is like small
battles between individuals of the population to determine who gets to be placed in the
mating pool of the next generation. Two tournaments were performed when determining
a set of parents. An example of tournament selection can be seen in Figure 3.10.
30
Figure 3.10. Tournament Selection
In a tournament, two individuals are chosen at random from the population and
their fitness values were compared. The individual with the better fitness is retained as
one of the two parents for mating. The second parent is chosen by the same method.
Because the focus of this project was to model a neural network system,
concentration was not given to the variety of selection methods that a genetic algorithm
can employ. However, it should be noted that the selection method can have a major
impact on the genetic algorithm. For financial applications, the choice of selection
method may depend on the type of problem being solved. If decisions must be made
quickly, especially decisions in real-time trading environments, then quicker convergence
may be more desirable. Tournament selection is recognized as a quicker methodology.
The crossover method that was implemented was single-point crossover, which is
one of the most powerful and popular techniques used in genetic algorithms. After two
parents were first selected for mating, a point was randomly chosen where the two strings
were to be cut. Then an exchange occurred of the tails of the two strings, which left the
Mating Pool
Parent 2
Tournament
X1 Fitness = 2 Y1 Fitness = 3
X2
X2
X1
X2 Wins
X2
Y2 Y1 Parent 1
Tournament
Y1
Y1 Wins Y1
X2 Fitness = 4 Y2 Fitness = 1
31
head of the first string with the tail of the second string and vice versa. This method of
crossover is demonstrated in Figure 3.11, where the randomly selected cut is after the
fourth bit.
Figure 3.11. Single-Point Crossover
The basis for evolution is a set of individuals that establish a population. The
better an individual adapts itself to the given environment, the greater is its chance to
survive and produce offsprings. In this project the neural network represented an
environment. Each individual was assigned a fitness value, which reflected its ability to
adapt to the given environment. The pseudocode for this process is:
MAX = preset maximum number of generations
POP = preset population size
IND = individual
BestIND = individual with highest fitness
Generate initial POP of individuals randomly
Evaluate each IND in population to get Fitness(IND)
Iterations = 0;
While (Iterations < MAX)
1. Select individuals for crossover 2. Produce offspring and replace in population 3. Mutate small percent of population 4. Evaluate the fitness for new members 5. BestCurrent = best fit IND from new population 6. IF (Fitness(BestCurrent) > Fitness(BestIND))
THEN BestIND = BestCurrent
7. BestIND goes to neural network for grading 8. Neural Network computes prediction error 9. Adjust Fitness(BestIND) relative to error 10. Iterations ++
Parent 1: 00001111Parent 2: 10011001
Offspring: 00011001
32
A genetic algorithm is an iterative procedure, where each iteration is called a
generation. The application of elitism allows the most fit individuals from the previous
generation to be carried over into the next generation. During each iteration the two
nature inspired principles of selection and reproduction are applied to the population.
The selection mechanism determines which individuals are allowed to produce offsprings
for the next generation. The probability at which an individual is allowed to reproduce
itself and the number of offsprings is based upon its fitness. Supervised learning in
neural networks, where predictions are measured for error, provides a measure of fitness
for individuals in a genetic algorithm. Hence the need for using the technique of
backpropagation discussed earlier [Jain 2000]. The evolution cycle is characterized in
Figure 3.12.
Figure 3.12. Evolution Cycle
Genetic Algorithm
fitness
evaluation
and
adjustment
elitist population
crossovermating pool
network
generated
backpropagation performed
train
network
test
network
fitness
selection
Tournament X Y
Y
Parent 1: 00001111Parent 2: 10011001
Offspring: 00011001
fitness
criteria
33
3.5 Managing Prediction Error
A problem that frequently occurs in neural networks is due to the nature of the
errors in the training set. The error in the training set might become lower and lower, but
errors on the second and third sets might be exponential. Because of this, errors cannot
simply be added together because it is possible that the population will suffer premature
convergence [Welstead 1994]. To deal with this, during the first generations of the
genetic algorithm, the error on the training set is more important. The errors from the
evaluation set will not be introduced into the system until after a few generations. The
goal of the error can be visualized in Figure 3.13, where it has been roughly sketched.
Figure 3.13. Error During Prediction
3.6 Initializing Process
Another problem is that neural networks needed random values to initialize their
weights [Welstead 1994]. In order for all evaluations to be of an equivalent level, a
random number generator was required and needed restarting from the same seed every
time training was initiated. This had the potential to also affect the genetic algorithm, so
PRICE
train test evaluate
predicted data
real data
error
TIME
34
two random number generators were needed: one for the neural networks; and one for
the genetic algorithm.
3.7 The Ability of Randomness
At several points in this project randomly driven events occurred. In the mating
process, random selections were performed. A pair of individuals was randomly chosen,
and it was randomly decided where to cut the strings. Randomness, or chance, is one of
the distinguishing characteristics of both genetic algorithms and neural networks.
Although it might seem counterintuitive that random operations are productive, but based
on their history in computer science they are successful.
3.8 Programming Environment
The programming language used was C++ running on a UNIX platform. Data
was read from a text file and output was written to another text file. In order to assist the
visualization of outcome, the research in this project used Microsoft Excel to graphically
represent results.
This system supported multidimensional data (ie. close prices and trading
volumes) and for every dimension the error of prediction was evaluated for importance.
For example, there were two dimensions (close price and trading volume) in an input file,
but the forecast for only one dimension (close price) was produced. Originally it was
thought that high and low prices could be used instead of volume. However, it became
clear that the trading volume of stock ultimately establishes the value of closing prices
35
based on the economic concepts of supply and demand. Therefore, trading volumes were
used primarily for strengthening the neural network system.
3.8.1 Programming Applications
In order to develop a successful program, several classes needed to be produced
including a separate application to smooth a data series. Some techniques, classes, and
functions were borrowed, converted, and conformed to this project from A Practical
Guide to Neural Nets [Nelson 1991], and Neural Network and Fuzzy Logic Applications
in C/C++ [Welstead 1994]. The classes that were used in this project are listed in Table
3.1 below and accompanied with brief descriptions.
Table 3.1. Classes
Application Description
DataSet loads time series
Genotype contains information about genes
MemRandom the generator of random numbers
Node contains information about a neuron
NeuralNetwork creates neural networks’ connecting nodes
GA
and
Environment
abstract classes for genetic algorithm
ParGA
and
ParEnvironment
searches best parameter for neural networks
36
3.8.2 Input Files
The input of the system required two separate files. These files are listed and
described below in Table 3.2.
Table 3.2. Input Data
File Name File Description
Data.txt contains data series of all historical input values
GA.txt
contains configuration of system with the following format:
population size,
elitism (best solution in population copied to next generation),
mutation rate,
number of dimensions (close price and volume),
value of importance per dimension,
number of values for training set,
number of values for evaluation set,
number of values for test set,
number of states,
number of historical values in input,
number of neurons on networks
37
4. EVALUATION AND RESULTS
The goal of this project was to produce a functional model for predicting a stock
trend using a neural network system with a genetic algorithm. When theoretical tests
were conducted, forecasting with only a small error of prediction, such as 1%, was
desirable because theoretical tests exist primarily for the purpose of assessing
functionality. When the system was executed with a real series from a stock market,
successful forecasting of trends was determined to be those that followed similar
direction of the trends derived from charting techniques by financial technical-analysts.
4.1 Financial Data Analysis
During the evaluation phase of this study, performance tests were initially
conducted through theoretical experimentations using the SINE trigonometric function
and the Lorenz equations. These tests provided an opportunity to test the neural network
system in a controlled environment, without the volatility and chaotic behavior that can
exist in a stock market. Because this project concentrated on stock trends, the outcome of
theoretical tests were graphed in Microsoft Excel and only simple visual analysis was
conducted.
After conducting theoretical tests, empirical experimentations were performed on
real data series from a stock market. To test real data, one year of historical trading
information was retrieved on ten stocks, each from a separate major market. Taking data
from the major markets allowed the neural network system to be tested against multiple
38
data series behaviors. Table 4.1 below, contains a list of the company stocks with their
respective symbols and markets.
Table 4.1. Tested Stocks
Only data regarding closing price and trading volume were used in the empirical
experimentation process because the volume of stock traded is an important indicator for
determining supply and demand. This helped the neural network system predict whether
or not the future price of the stock was going to be higher or lower. The correlation
between closing price and volume maintained the standard principles of popular financial
analysis.
In addition to maintaining a constant type of input for each stock, many system
variables were maintained with constant values for two reasons. First, maintaining the
dimensions of the neural network system in a constant structure allowed result
comparisons of different stocks to be conducted fairly. Therefore, only genetic algorithm
Symbol Market
WFC Wells Fargo Banking
DNA Genentech, Incorporated Biotech & Drug
IBM IBM Corporation Blue Chip
XOM Exxon Mobile Corporation Energy
HCA HCA, Incorporated Healthcare
AOL AOL Time Warner Internet
WMT Wal-Mart Stores Retail
INTC Intel Corporation Semiconductor
MSFT Microsoft Corporation Technology
FDX FedEx Corporation Transportation
Company
39
parameters for population size and generations were adjusted between tests. This led to a
total of four tests per stock. Second, further test variations were desired, but execution of
the neural network system utilized an excessive amount of memory and run-time, which
often exceeded the designated student quota of the operating system.
Data spikes and anomalies were first smoothed. Then data was fed to the neural
network system. To assess empirical results, real data and predicted data were graphed in
Microsoft Excel. Next, the trend analysis technique was manually applied to the testing
data set for the predicted data and the real data. Afterward, the trend line slopes, for both
data types, were calculated, compared for differences (as absolute values), and recorded
in a table. Slope was expressed as the rate of change along a trend line, where the
vertical distance was divided by the horizontal distance between the first two points that
established the trend line. It is important to note that the neural network system in this
project was not an attempt to predict stock prices, but that it was an attempt to predict
trend lines similar in direction when compared to actual trend lines. Therefore, values
that form predicted trend lines were expected to vary somewhat with values that made up
the actual trend lines, but their calculated slopes were expected to be similar.
4.2 Theoretical Experimentation
4.2.1 SINE Data
Using Microsoft Excel, values were calculated for the trigonometric function
SINE, defined from a circle with a radius of one. A theoretical evaluation with a function
such as SINE allowed for a basic test using a data series without any noise, anomalies, or
volatility, unlike the stock market. This kind of test provided the neural network system
40
an opportunity to be evaluated for simple functionality. Because SINE consists of an
easy, recognizable pattern, any properly running neural network system should be able to
learn and predict it.
Figure 4.1 shows the graphed data of SINE that the neural network system had to
learn from. It is divided into the three partitions: train, evaluate, and test. It’s structured
much like the data in a time series, accept here the coordinates must be labeled in radians
and SIN(x), as opposed to time and price.
Figure 4.1. SINE Input Data
Figure 4.2 is a graph of the test section where the final predictions took place.
The thin line in the graph represents the prediction. The graph reveals a fairly accurate
prediction with a very close resemblance to the actual data. The results are of high
quality because the data followed a repetitious pattern and did not experience any sudden
anomalies.
Data Partitions of SINE Values
-2
-1
-1
0
1
1
2
1
13
25
37
49
61
73
85
97
109
121
133
145
157
169
181
193
Radians
SIN(x)
train evaluate test
41
Figure 4.2. SINE Prediction
4.2.2 Lorenz Data
The Lorenz equations are a system of nonlinear differential equations representing
a time continuous system. These equations were originally designed to examine the
properties of nonlinear systems, and are significant because when graphed, they
demonstrate chaotic behavior in an equation-controlled environment. The chaotic aspect
of this system is that regardless of accuracy, a prediction cannot be made of an advanced
state of the system. In other words, the system has built into itself the property of
amplifying small changes, like a ripple effect, until they become so significant they affect
the accuracy of prediction [Strogatz 2000]. The data for the Lorenz test was calculated
and retrieved on the web site, Calculators On-Line Center for Mathematics located at the
following address: http://www-sci.lib.uci.edu/HSG/RefCalculators2.html#COMP-LOR.
Verify Prediction of SINE
-1.5
-1.0
-0.50.0
0.5
1.0
1.51 3 5 7 9
11
13
15
17
19
21
23
25
27
29
Radians
SIN(X)
actual predicted
42
The data that was retrieved and tested is graphed below in Figure 4.3. As before,
this data is not labeled with time and price. Lorenz equations simply use X and Y
indicators for a two-dimensional representation.
Figure 4.3. Lorenz Input Data
Figure 4.4 shows the outcome of the neural network’s attempt to predict the
Lorenz equations. Looking at the graph, it becomes obvious that the neural network
system needs some type of secondary input variable to help support its generalizations.
Otherwise, predictions in a chaotic system tend to resemble random guessing. The tests
conducted in this project, for predicting stock trends, used trading volume as a secondary
indicator.
Data Partitions of Lorenz Values
0.0
0.2
0.4
0.6
0.8
1.0
1
15
29
43
57
71
85
99
113
127
141
155
169
183
197
211
225
239
253
267
X
Y
train evaluate test
43
Figure 4.4. Lorenz Prediction
4.3 Empirical Experimentation
For each stock tested, the input data for closing price was partitioned and graphed.
The genetic algorithm was adjusted to four different combinations for population size (50
and 100) and generations (50 and 100). Results were recorded in a table and shown for
each stock. Each table shows if the proper trend type (up or down) was predicted, and
contains the slope difference between the actual and predicted trends. When the slope
difference showed a negative value, it indicated that the prediction was below the actual
trend’s slope. A positive slope difference indicated that the prediction was above the
actual trend’s slope. From each stock’s test set of four, the prediction results with the
least slope difference were graphed.
Verify Prediction of Lorenz
0.0
0.2
0.4
0.6
0.8
1.0
1 4 7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
X
Y
actual predicted
44
4.3.1 Wells Fargo
The first empirical test examined the stock of Wells Fargo (WFC). In Figure 4.5,
the input data for close price can be seen with three partitioned sections.
Figure 4.5. WFC Input Data
Results were recorded in Table 4.2, and show that a higher number of generations
in the genetic algorithm produce better results, where population size may not matter.
The best performance took place in test four, where the slope difference was lowest
between the predicted and actual trend. However, all of the four tests conducted
predicted the trend to be an uptrend even though the actual trend contained a very small
uptrend quality.
WFC
Data Partitions of Actual Close Prices
30
35
40
45
50
55
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
257
Time (days)
Price ($)
train evalu
ate
test
45
Table 4.2. WFC Prediction Results
Below in Figure 4.6 is the graph of test four, the most successful test. The results
of all the tests for this stock showed that a higher population size and higher generation
size was more optimal for performing the most accurate prediction. Actual data is
represent by the thick crooked line and the actual trend is represented with the thick
straight line. It is hardly visible that they are in an uptrend. The predicted trend is
represented by the thin line where the prediction was successful in determining an
uptrend, varying from the actual trend only slightly in value.
Figure 4.6. WFC Best Predicted Trend: Test #4
1 yes 50 0.061 0.058
2 yes 100 0.028 0.025
3 yes 50 0.084 0.081
4 yes 100 0.025 0.022
Company
Symbol
GA
Population
GA
Generations
Predicted
Trend Type?
(yes/no)
Test
#
Slope
Slope
DifferenceActual
Trend
Predicted
Trend
WFC
50
100
0.003
WFC
Verify Trend Prediction of Close Prices
45.0
46.0
47.0
48.0
49.0
1 3 5 7 9
11
13
15
17
19
21
23
25
27
Time (days)
Price ($)
actual predicted
46
4.3.2 Genentech, Incorporated
In Figure 4.7, Genentech’s input data for close price can be seen with its three
partitioned sections.
Figure 4.7. DNA Input Data
In Table 4.3, once again all the tests successfully predicted this stock to perform
an uptrend. The results of all the tests for this stock showed that a higher population size
and higher generation size resulted in the more accurate prediction. The most successful
tests were test two followed closely by test four. Both of these tests used a higher
quantity of generations than tests one and three.
DNA
Data Partitions of Actual Close Prices
0
10
20
30
40
50
60
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
257
Time (days)
Price ($)
train evalu
ate
test
47
Table 4.3. DNA Prediction Results
Below in Figure 4.8, is the graph of test two. Both the actual trend and the
predicted trend appear very similar, and are established before day 20 when the stock
began to suffer from an anomaly.
Figure 4.8. DNA Best Predicted Trend: Test #2
1 yes 50 0.018 -0.076
2 yes 100 0.108 0.014
3 yes 50 0.181 0.087
4 yes 100 0.109 0.015
Company
Symbol
GA
Population
GA
Generations
DNA
50
Predicted
Trend Type?
(yes/no)
Test
#
0.094
Slope
Slope
DifferenceActual
Trend
Predicted
Trend
100
DNA
Verify Trend Prediction of Close Prices
30.0
40.0
50.0
60.0
70.0
1 3 5 7 9
11
13
15
17
19
21
23
25
27
Time (days)
Price ($)
actual predicted
48
4.3.3 IBM Corporation
Graphed in Figure 4.9, IBM’s input data for close price is represented with its
three partitioned sections.
Figure 4.9. IBM Input Data
In Table 4.4, the most accurate attempt to forecast IBM’s stock was in test four.
Although the slope difference is a negative value, test four’s predicted trend more closely
resembles the actual trend line. A negative value for slope difference simply implies that
the predicted trend contained less rise than the actual trend. Again, a larger generation
size led to the most accurate outcome.
Table 4.4. IBM Prediction Results
1 yes 50 0.002 -0.074
2 yes 100 0.003 -0.073
3 yes 50 0.127 0.051
4 yes 100 0.066 -0.011
Company
Symbol
GA
Population
GA
Generations
Predicted
Trend Type?
(yes/no)
Test
#
Slope
DifferenceActual
Trend
Predicted
Trend
IBM
50
100
0.077
Slope
IBM
Data Partitions of Actual Close Prices
40
50
60
70
80
90
100
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
257
Time (days)
Price ($)
train
evalu
ate
test
49
Below in Figure 4.10, is the graph of test four.
Figure 4.10. IBM Best Predicted Trend: Test #4
4.3.4 Exxon Mobile Corporation
Below in Figure 4.11, Exxon Mobile’s input data for close price is shown with
three partitioned sections.
Figure 4.11. XOM Input Data
IBM
Verify Trend Prediction of Close Prices
75.0
80.0
85.0
90.01 3 5 7 9
11
13
15
17
19
21
23
25
27
Time (days)
Price ($)
actual predicted
XOM
Data Partitions of Actual Close Prices
25
27
29
31
33
35
37
39
41
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
257
Time (days)
Price ($)
train eval
uat
e
test
50
As shown in Table 4.5, the results of Exxon’s stock forecast upheld the idea that a
large generation size is the key to success with this neural network system. However, the
results of test three and test four were very close, of which, test four predicted with more
accuracy. Furthermore, all tests indicate the correct trend to be upward.
Table 4.5. XOM Prediction Results
Below in Figure 4.12, the trend predicted in test four has been graphed. The
forecasted trend line passes through the middle of the actual price values, but as
mentioned before, this research is not attempting to predict stock prices. Here the
predicted trend line is composed of slightly higher values, but the slope of the line
appears very similar to the actual trend’s slope.
1 yes 50 0.060 0.029
2 yes 100 0.051 0.019
3 yes 50 0.017 -0.014
4 yes 100 0.019 -0.013
Company
Symbol
GA
Population
GA
Generations
Predicted
Trend Type?
(yes/no)
Test
#
Slope
Slope
DifferenceActual
Trend
Predicted
Trend
XOM
50
100
0.031
51
Figure 4.12. XOM Best Predicted Trend: Test #4
4.3.5 HCA, Incorporated
In Figure 4.13, HCA’s input data for close price can be seen with three partitioned
sections.
Figure 4.13. HCA Input Data
XOM
Verify Trend Prediction of Close Prices
33.033.534.034.535.035.536.036.537.0
1 3 5 7 9
11
13
15
17
19
21
23
25
27
Time (days)
Price ($)
actual predicted
HCA
Data Partitions of Actual Close Prices
20
25
30
35
40
45
50
55
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
257
Time (days)
Price ($)
train evalu
ate
test
52
In Table 4.6, test one, two, and three failed to predict the proper trend type. On
the other hand test four prevailed by predicting the correct trend type and again making
the most accurate prediction utilizing the highest combination of population and
generation sizes.
Table 4.6. HCA Prediction Results
The graph in Figure 4.14 shows a substantial similarity in the predicted and actual
trends. Again, an interesting result occurred, in which both trends were very similar but
many price values apart.
Figure 4.14. HCA Best Predicted Trend: Test #4
1 no 50 -0.225 -0.409
2 no 100 -0.079 -0.263
3 no 50 -0.038 -0.222
4 yes 100 0.252 0.068
Company
Symbol
GA
Population
GA
Generations
Predicted
Trend Type?
(yes/no)
Test
#
Slope
Slope
DifferenceActual
Trend
Predicted
Trend
HCA
50
100
0.184
HCA
Verify Trend Prediction of Close Prices
20.0
30.0
40.0
50.0
60.0
1 3 5 7 9
11
13
15
17
19
21
23
25
27
Time (days)
Price ($)
actual predicted
53
4.3.6 AOL Time Warner
In Figure 4.15, AOL Time Warner’s input data for close price was partitioned and
is illustrated below.
Figure 4.15. AOL Input Data
In Table 4.7, the correct trend type was only predicted when the number of
generation sizes were at their highest. As in previous test sets, test four resulted with a
lower slope difference and predicted with the highest level of accuracy.
Table 4.7. AOL Prediction Results
1 no 50 -0.042 -0.076
2 yes 100 0.102 0.068
3 no 50 -0.079 -0.113
4 yes 100 0.043 0.009
Company
Symbol
GA
Population
GA
Generations
Predicted
Trend Type?
(yes/no)
Test
#
Slope
Slope
DifferenceActual
Trend
Predicted
Trend
AOL
50
100
0.034
AOL
Data Partitions of Actual Close Prices
5
7
9
11
13
15
17
19
21
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
257
Time (days)
Price ($)
train
evalu
ate
test
54
The result of test four is graphed below in Figure 4.16. This test happened to be
the most accurate thus far for predicting a trend with very similar values as the real trend.
Figure 4.16. AOL Best Predicted Trend: Test #4
4.3.7 Wal-Mart Stores
Figure 4.17 shows Wal-Mart Store’s input data with the partitioned sections.
Figure 4.17. WMT Input Data
AOL
Verify Trend Prediction of Close Prices
11.0
12.0
13.0
14.0
15.0
16.0
1 3 5 7 9
11
13
15
17
19
21
23
25
27
Time (days)
Price ($)
actual predicted
WMT
Data Partitions of Actual Close Prices
30
35
40
45
50
55
60
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
257
Time (days)
Price ($)
train evalu
ate
test
55
As in the previous stock, the neural network system came very close to predicting
the trend perfectly. Once again the more successful test occurred in test four. However,
this stock was the first to demonstrate an actual downtrend. More importantly, each test
was successful at predicting that a downtrend would occur. The outcomes are below in
Table 4.8.
Table 4.8. WMT Prediction Results
The results of test four can be seen below in Figure 4.18.
Figure 4.18. WMT Best Predicted Trend: Test #4
1 yes 50 -0.035 0.269
2 yes 100 -0.045 0.259
3 yes 50 -0.035 0.270
4 yes 100 -0.131 0.174
Company
Symbol
GA
Population
GA
Generations
Predicted
Trend Type?
(yes/no)
Test
#
Slope
Slope
DifferenceActual
Trend
Predicted
Trend
WMT
50
100
-0.305
WMT
Verify Trend Prediction of Close Prices
30.0
35.0
40.0
45.0
50.0
55.0
60.0
1 3 5 7 9
11
13
15
17
19
21
23
25
27
Time (days)
Price ($)
actual predicted
56
4.3.8 Intel Corporation
In Figure 4.19, Intel’s input data for close price is graphed along with its three
partitioned sections.
Figure 4.19. INTC Input Data
The results of testing with INTC stock were not very unusual. Test four produced
the best prediction and its outcome can be seen in Table 4.9. The experiments in this test
set produced similar results as in the earlier tests with HCA. The similarity is due to both
stocks being in an uptrend and every test other than test four predicted a downtrend.
Table 4.9. INTC Prediction Results
1 no 50 -0.031 -0.057
2 no 100 -0.008 -0.033
3 no 50 -0.007 -0.032
4 yes 100 0.055 0.029
Company
Symbol
GA
Population
GA
Generations
Predicted
Trend Type?
(yes/no)
Test
#
Slope
Slope
DifferenceActual
Trend
Predicted
Trend
INTC
50
100
0.026
INTC
Data Partitions of Actual Close Prices
0
5
10
15
20
25
30
35
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
257
Time (days)
Price ($)
train evalu
ate
test
57
A graph showing the outcome of test four can be seen in Figure 4.20. The
predicted trend is hard to see because it is so close to the actual trend line.
Figure 4.20. INTC Best Predicted Trend: Test #4
4.3.9 Microsoft Corporation
Below is Microsoft’s input data shown with its partitions in Figure 4.21.
Figure 4.21. MSFT Input Data
INTC
Verify Trend Prediction of Close Prices
16.0
17.0
18.0
19.0
20.0
21.0
1 3 5 7 9
11
13
15
17
19
21
23
25
27
Time (days)
Price ($)
actual predicted
MSFT
Data Partitions of Actual Close Prices
10
15
20
25
30
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
257
Time (days)
Price ($)
train evalu
ate
test
58
In this test set, the neural network system had another opportunity to predict a
downtrend. As seen in Table 4.10, each test produced a prediction that trended
downwards, but test two and test four proved to be more accurate attempts. Both
contained a higher number of generations resulting in the lowest slope differences in the
set. Test four generated the best outcome.
Table 4.10. MSFT Prediction Results
Figure 4.22 shows the graphical result of test four. Separated by a slight
difference in values, both trend line slopes are seemingly parallel and only differ slightly.
Figure 4.22. MSFT Best Predicted Trend: Test #4
1 yes 50 -0.030 0.113
2 yes 100 -0.041 0.102
3 yes 50 -0.035 0.107
4 yes 100 -0.050 0.093
Company
Symbol
GA
Population
GA
Generations
Predicted
Trend Type?
(yes/no)
Test
#
Slope
Slope
DifferenceActual
Trend
Predicted
Trend
MSFT
50
100
-0.143
MSFT
Verify Trend Prediction of Close Prices
10.0
15.0
20.0
25.0
30.0
1 3 5 7 9
11
13
15
17
19
21
23
25
27
Time (days)
Price ($)
actual predicted
59
4.3.10 FedEx Corporation
In Figure 4.23, FedEx’s input data for close price can be seen in three partitioned
sections.
Figure 4.23. FDX Input Data
In this data set, test one and two were not able to predict an uptrend. On the other
hand, test three and four predicted the proper trend. The success of test four once again
confirmed the need of the neural network system to have a sufficient amount of learning
time. As in other test sets, test four prevailed because it was provided with a larger
number of generations for learning. These results can be seen below in Table 4.11.
Table 4.11. FDX Prediction Results
1 no 50 -0.008 -0.071
2 no 100 -0.076 -0.139
3 yes 50 0.267 0.204
4 yes 100 0.060 -0.003
Company
Symbol
GA
Population
GA
Generations
Predicted
Trend Type?
(yes/no)
Test
#
FDX
50
100
0.063
Slope
Slope
DifferenceActual
Trend
Predicted
Trend
FDX
Data Partitions of Actual Close Prices
20
30
40
50
60
70
1
17
33
49
65
81
97
113
129
145
161
177
193
209
225
241
257
Time (days)
Price ($)
train evalu
ate
test
60
The graph for test four’s prediction is shown in Figure 4.24. Much like the results
from the MSFT test, the trend line and the predicted line appear parallel and faintly differ
in slope value.
Figure 4.24. FDX Best Predicted Trend: Test #4
Overall results showed a system dependency on population and generation sizes.
Results were best when the number of generations were highest, and improved more
when population size increased. Table 4.12 shows how test two and test four performed
best in most empirical experimentations.
Table 4.12. Best Tests
1 50 0 0 0
2 100 1 4 5
3 50 0 3 3
4 100 9 1 10
Test #
GA
Population
GA
Generations
50
100
Predicted Correct
Trend TypeTotal Best &
2nd Best Tests# of Best
Tests
# of 2nd
Best Tests
FDX
Verify Trend Prediction of Close Prices
45.0
50.0
55.0
60.0
65.0
1 3 5 7 9
11
13
15
17
19
21
23
25
27
Time (days)
Price ($)
actual predicted
61
5. CONCLUSION
Neural networks originated from efforts to simulate the functioning of the human
brain in the areas of learning and problem solving. A neural network system must be
provided with a method of learning. A learning rule is applied for establishing and
controlling the connections among many pieces of input data. One such learning method
that has not received a great deal of attention is the application of a genetic algorithm.
The goal of this project was to predict a stock trend based on data from a financial
market. Because this was an attempt to predict a financial stock trend line, it essentially
became a model for time series forecasting. The neural networks that performed the
prediction were trained using a backpropagation technique and a genetic algorithm. As a
learning error was calculated in one neural network, a second neural network determined
the state of the market at each instance of input.
Prediction results were recorded and analyzed graphically to calculate slopes of
trend lines for measuring accuracy. The accuracy level of each forecast was measured by
directly comparing its actual historical trend line and its predicted trend line for the same
time period. These tests were conducted using many different stocks from various major
markets.
This project demonstrated that artificial intelligence could be a substitute for
traditional statistical methods in predicting stock trends. Although neural networks are
not perfect in forecasting stock trends, they perform extremely well and offer vast
potential for growth. This project was important because it explored different genetic
algorithm parameters such as population size and generations. Results later showed that
62
the success of the neural network system is dependent on a sufficient amount of learning
time. In this case learning was aided by a genetic algorithm. When using a genetic
algorithm, the success of the neural network system was dependent on population size
and on the number of generations employed by the genetic algorithm.
Using a genetic algorithm for the process of learning within a neural network did
have a drawback. The computation cost was so high as to almost make the genetic
algorithm impractical. For example, the computation time easily reached between twelve
and eighteen hours for each empirical test. The high number of genetic algorithm
iterations accounted for this high cost of time, where total iterations were in the millions
per test. Given the population size (P), generations (G), and other fixed parameters,
iterations can be approximated with Eq. (5.1) below. The iterations are calculated in
Table 5.1 for the various combinations of population and generation sizes that were
tested. The cost of the numerous iterations, needed to improve the neural network,
quickly became computationally prohibitive.
Table 5.1. Iterations
1 50 10,700,000
2 100 21,400,000
3 50 21,400,000
4 100 42,800,000100
Total
IterationsTest #
GA
Population
GA
Generations
50
G 2 wt 10 input nodes 214 data windows
1 wt 1 input node 1 data window 1 testPTotal Iterations = * * ** (5.1)
63
6. FUTURE WORK
Future use of neural networks to predict stock market movement has the potential
to be a growing area of research for a long time to come. As investors strive to
outperform stock markets, there will always be a strong interest in finding innovative
methods to improve investment returns. This driving force will push researchers to
continuously find more interesting results and produce new theories for training financial
neural networks. Financial neural networks require training so they can learn and form
generalizations about data. Training is a very important aspect of neural networks, and
different methods of training could be explored. Various combinations of parameters for
genetic algorithms could be tested.
This project used a genetic algorithm for weight training in a supervised neural
network system. The difficulty with using a genetic algorithm to optimize a neural
network is the high cost of numerous iterations. Future work could explore the
parallelization of a neural network’s training set in order to speed-up learning. This
would involve the study of multiple processors and distributed memory. Implementation
could involve a cluster of computers and a message passing interface.
The idea of combining neural networks and expert systems also shows potential.
Additional research could focus on a system that provides reasoning and explanations for
the neural network predictions. However, this might be a difficult challenge knowing
that many financial experts typically characterize stock markets as chaotic systems.
Nevertheless, neural networks seem to be an effective and readily available method for
64
dealing with nonlinear data series. Continued work on improving neural networks may
provide insights into the nature of chaotic systems.
Other ideas for future work include system features and usability aspects that can
be implemented into a neural network project. A true user functional system may allow
the user to predict stock prices as opposed to stock trends. Furthermore, it may predict
stocks for a particular week or day, instead of trending a longer time period such as a
month. Being that a day and week are shorter periods of time than a month, predicting
them may provide interesting results and conclusions. Additionally, a graphical user
interface could be implemented to assist the user in employing these features as well as
automatically viewing numeric results in graphical representation.
65
BIBLIOGRAPHY AND REFERENCES
[Baddeley 2000] Baddeley, R., Hancock, P., Foldiak, P., Information Theory and the
Brain. Cambridge University Press, New York City, NY., 2000.
[Goldberg 1989] Goldberg, D.E. Genetic Algorithms in Search, Optimization, and
Machine Learning. Addison-Wesley, Reading, Mass., 1989.
[Hecht-Nielsen 1990] Hecht-Nielsen, R. Neurocomputing. Addison-Wesley, Reading,
Mass., 1990.
[Jain 2000] Jain, L. and Fanelli, A.M., Recent Advances in Artificial Neural Networks
Design and Applications. CRC Press, Boca Raton, Fla., 2000.
[Lisboa 2000] Lisboa, P., Edisbury, B., Vellido, A., Business Applications of Neural
Networks: The State-of-the-Art of Real-World Applications. World Scientific,
River Edge, NJ., 2000.
[Nelson 1991] Nelson, M.M. and Illingworth, W.T., A Practical Guide to Neural Nets.
Addison-Wesley, Reading, Mass., 1991.
[Pistolese 1994] Pistolese, C. Using Technical Analysis: A Step-by-Step Guide to
Understanding and Applying Stock Market Charting Techniques. McGraw-Hill,
New York, NY., 1994.
[Rees 1977] Rees, H. Chromosome Genetics. University Park Press, Baltimore, MD.,
1977.
[Strogatz 2000] Strogatz, S.H. Nonlinear Dynamics and Chaos: With Applications in
Physics, Biology, Chemistry, and Engineering. Perseus Publishing, Cambridge,
Mass., 2000.
[Welstead 1994] Welstead, S.T. Neural Network and Fuzzy Logic Applications in
C/C++. Wiley, New York, NY., 1994.
[Yves 1995] Yves, C. Back Propagation: Theory, Architectures, and Applications.
Lawrence Erlbaum Associates, Inc., Mahwah, NJ., 1995.
66
APPENDIX
The digital media of this project are contained in the disk provided. Found on the
disk, is a copy of the technical report, data files, program, and executable files.