NEURAL NETWORK DEVELOPMENT FOR FINANCIAL FORECASTING
By: Lou Mendelsohn
Extensive research has been conducted about the application of neural networks to financial forecasting in today’s globalized trading environment. What makes this particular use of artificial intelligence so attractive to financial analysts and traders? Here, Lou Mendelsohn of Market Technologies highlights some of those issues and establishes goals for training neural networks.
With the advancements being made in computer and telecommunication technologies today, the world’s major economies and financial markets are becoming more and more globalized. As this trend accelerates, financial markets are becoming more and more interrelated and fundamental factors will become increasingly critical to financial market analysis. In the global marketplace, the prevailing method of technical analysis — in which a single market is modeled through historical simulation and back testing of its own past price behavior — is rapidly losing its competitive advantage as institutions and individual traders both are increasingly applying artificial intelligence (AI) technologies to financial forecasting. Recent research shows that this nonlinear domain can be modeled more accurately with these technologies than with the linear statistical and single-market methods that have been the mainstay of technical analysis throughout the past decade.
It is because of these factors that the field of AI merits a closer look. The result of these new demands is the emergence of a new analytical method that merges technical and fundamental analysis with the more recent emphasis on intermarket analysis. This combined analytical method is known as synergistic market analysis, or synergistic analysis. This new method of analysis, utilizing artificial intelligence tools, synthesizes technical, intermarket and fundamental data within an analytical framework, resulting in better forecasting capabilities and earlier identification of trend changes and allowing traders to profit from market inefficiencies in the global markets of the 1990s.
Such tools as neural networks, expert and knowledge-based systems, machine learning, fuzzy logic, wavelets, chaos theory and genetic algorithms are being applied across industries. In the same vein, neural networks may be applied to financial forecasting because neural nets have been shown to be technologically powerful and flexible, ideally suited to performing synergistic analysis.
ARTIFICIAL NEURAL NETWORKS
Artificial neural networks are models based on the working of the human brain, utilizing a distributed processing approach to computation. Neural nets are capable of solving a wide range of problems by “learning” a mathematical model for the problem: the model can then be used to map input data to output data. Anything that can be represented as a number can be fed into a neural network. Technical indicators and fundamental and price data related to a single target market, as well as intermarket data affecting the target market, can all be fed into a single neural net and used to predict price and trend directions for the target market.
Artificial neural networks are composed of individual interconnected processing elements (PEs). These Pes are analogous to neurons in the brain and are also referred to as neurons. Each PE sends/receives data to/from other PEs. Mathematically speaking, the standard model is relatively simple. For each individual PE, input data (I0-In) is multiplied by the weight (W0-Wn) associated with the connection to the PE. These products in turn are summed and passed through a transfer function, which converts the sum into a value in a specified interval — for example, between zero and one. The output from a given PE is then multiplied by another separate weight and fed into the next processing element.
If the processing element is in the output layer, as would be the case in FIGURE 1, then the output from the processing element is not multiplied by a weight and instead is an output of the network itself. The selected architecture of the neural network specifies the number of processing elements present in a network and the way in which they connect. It is important to recognize that a single processing element is of little or no use. It is the way in which the individual PEs are organized to model complex nonlinear systems that is important in the application of neural networks to financial modeling and forecasting.
PARADIGMATIC VARIATIONS
Neural networks can be applied to many general problem areas, including classification, filtering, pattern association, optimization, conceptualization and prediction. The first step in creating an artificial neural network application involves identifying the category the problem in question belongs — not necessarily as easy as it may seem, because many distinct neural network systems are more appropriate than others for a given application. Because our primary goal here is to forecast prices and trend direction, however, this problem can be classified as a predictive one.
Many different paradigms can be used for predictive purposes. Each has an almost unlimited number of variations, depending on how you select the parameters. It is not as important that you pick the perfect paradigm for your problem — if there is such a thing — as it is to simply choose one that is more or less appropriate. Now examine two well-known paradigms: feed-forward and recurrent back-propagation networks.
FEED-FORWARD BACK-PROPAGATION NETWORKS
A feed-forward multi-layered network, usually referred to as a back-propagation or back prop, network, is probably the most commonly used network paradigm. A representative architecture for a back-prop network is shown in FIGURE 2. These networks are composed of an input layer of neurons, some hidden layers and an output layer. The layers in between the input and the output are called hidden layers because they are essentially hidden from the view of the network developers and users. Although a back-prop net can have any number of hidden layers, one is sufficient to perform any mapping of input to output. This does not mean that a single hidden layer is always the most desirable number, but it is definitely a good place to start.
The operation of the network is relatively straightforward. Input data — for example, changes in price data and moving averages of prices, volume, and open interest — are presented to the network at the input layer. The values associated with each individual input neuron are fed into the first hidden layer. Each hidden neuron receives these values, multiplied by the appropriate weight, sums them, runs them through a transfer function and produces an output. The outputs from the hidden layer then feed forward into either the next hidden layer or to the output layer. This presentation (FIGURE 2) assumes that the layers are fully connected, in which every neuron in the input layer has a connection to every neuron in the hidden layer. The same is true for the connections between the hidden layer and the output layer. This does not have to be the case, but it is yet another decision to be made when designing a network.
The initial values of the weights are randomly selected in the training stages of the neural network, and so, the first set of input values (often referred to as an input vector) will probably not produce the appropriate output vector. For example, assume that you have designed a network to predict the median price of a stock one day in the future, based on the difference in the highs and the lows for the past two days and a moving average of the closes for the past five days. You would present the network with a single fact, composed of a three-valued input vector, where each value corresponds to one of the inputs mentioned, and a single-valued output vector that represents the next day’s median price.
As a result, the first fact presented and fed through the network is likely to produce an output very different from the desired output. What we want the network to learn is that the input vector it was given should contain the factors that would eventually produce the output vector supplied. In its first attempt to do so, however, the network determines a measure of the error between its generated output and the desired output. This is done for each output in the output layer, which in this simple case is just one. The errors are then fed back through the network, layer-by-layer, and are used to adjust the weights of the connections between the neurons to minimize the total error associated with the output vector. Thereafter, varying germane facts are presented to the network repeatedly during training to try to reduce error to acceptable levels. In simple problems, the error level may be reduced to zero, but this is not a realistic expectation in most real world applications.
Recurrent Back-Propagation Networks
This network type consists of a single functional layer of processing elements. FIGURE 3 shows a two-layer representation to make the architecture easier to visualize. Note the neurons in the first layer are fully connected to the neurons in the second layer. The neurons in the second layer feed back into the first layer with a one-to-one mapping. The second layer represents a time delay for the passage of data through the network. This type of architecture allows the net to learn temporal relationships.
In a feed-forward back-prop network, if you wish to present facts that contain the differences in the high for the past five days, you must first create what is referred to as a snapshot of your data by building a fact with an input vector containing five values (one for each difference) and an output for tomorrow. You would have to do this for each fact/day presented to the network. In a sense, you are encoding the temporal information you want the network to use (data from the last five days) into the input data itself. In the case of a recurrent net, by contrast, you would instead sequentially present each fact/day as a single difference. Because the network can feed back on itself, it can learn the temporal information as a result of the order in which the facts are presented. In this case, you do not need to encode the temporal relationship into the input data. We have done considerable research with both recurrent and feed forward nets: but most commercially available neural network development packages do not include a recurrent model, and so we will focus primarily on back-prop models.
For every recurrent network there is a corresponding feed forward network that can be designed with identical behavior, so a recurrent net can be modeled with a feed-forward net.
NETWORK ARCHITECTURE
Network architecture comprises the transfer function, the number and layout of the processing elements and the manner in which they are interconnected. The general requirements for the transfer function in a back-prop network is that it be a nonlinear, continuously differentiable function — that is a function whose derivative exists at every point and whose derivative function allows the net to perform nonlinear statistical modeling. The most common transfer functions used are the sigmoid and the hyperbolic tangent: either can be used effectively in a back-prop network, although we have often realized better results with the hyperbolic tangent function.
In terms of layout and connectivity, we will concentrate on a three-layer, fully-connected architecture, as shown in FIGURE 2. In addition to the transfer function and number of layers, we must select the number of neurons per layer, with input and output layers, this is straightforward. Assume that you are trying to predict the change in the close of a particular stock and you want to do so based on a five-day moving average of the close, a five-day moving average of the high and a five-day moving average of the low. Your network would require three input neurons and one output neuron.
For any nonlinear problem, like that of predicting prices of a stock or commodity, the network would need at least one hidden layer. Choosing the number of neurons in the hidden layer(s) of a back-propagation network is just one of the architectural decisions to be made. No hard and fast rules are available to determine the correct number, so we must rely on either rough rules of thumb or experimentation or both. Frequently, more complex problems require a larger number of hidden neurons, but we have found that too many hidden neurons can result in a network that is over-fitted to the training data and is unlikely to perform well on new unseen facts used for testing.
WHERE DO WE GO FROM HERE?
How do we pick a place to start? You might consider several industry guidelines:
- Start with a number of hidden neurons between half the number of input neurons and two times the number of input neurons.
- Start with an average of the number of inputs plus the number of outputs or simply take the maximum of the two.
- Sum the number of input and output neurons and then multiply this value by a noise constant that varies depending on the relative amount of noise in the data. The noisier the data, the higher the constant. The result is then divided into the total number of fact/days used for training.
Picking an initial starting point for the number of hidden neurons can be subjective. After you do so, you will want to train a number of nets, varying the number and size of hidden layers. Doing so manually may be advisable for traders just getting started designing their own neural nets, as it will help to become more familiar with the tradeoffs associated with varying training parameter values, including the number of hidden units. As you gain more experienced, you will want to automate this process, since finding the optimal number of hidden neurons necessitates a time-consuming systematic search.
When developing neural networks, no single factor, such as the number of hidden units, ultimately controls how well the network will perform. Data selection and quality, data preprocessing techniques, optimization of training parameters and testing procedures all affect network performance. These issues will be examined in closer detail in later articles, as well as many problems that may arise at each stage in the development of a neural network. Next, I will delineate important issues involving the technical, fundamental and intermarket data used in neural networks, and I will also examine methods of raw input selection and how to preprocess these inputs before they enter the network.
Lou Mendelsohn is president of Market Technologies, Wesley Chapel, Fl., a research, development and consulting firm involved in the application of artificial intelligence to synergistic market analysis. He can be reached at 813-973-0496.
REFERENCES
Hecht-Nielsen, R. [1990]. Neurocomputing, Addison-Wesley Publishing Company, Inc.
Mendelsohn, Lou [1991]. “The Basics Of Developing A Neural Trading System, “Technical Analysis of STOCKS & COMMODITIES, Volume 9: JUNE
Murphy, John J. [1991]. Intermarket Technical Analysis, John Wiley & Sons.
Rumelhart, D. E. & J. L. McClelland [1986]. Parallel Distributed Processing, Volumes 1 and 2, The Massachusetts Institute of Technology.
Reprinted from Technical Analysis of
Stocks & Commodities magazine. (C) 1993 Technical Analysis, Inc.,
4757 California Avenue S.W., Seattle, WA 98116-4499, (800) 832-4642.