Artificial Intelligence Applications for Financial Market Analysis

By Lou Mendelsohn

The prevailing method of technical analysis performs linear modeling of one specific market at a time based on historical back-testing of its past price data. This method, first popularized in the 1980’s with the proliferation of technical analysis software for the PC, has lost its appeal as the recent trend toward global integration of the world’s financial markets accelerates.

Now, new analytics, capable of finding hidden patterns and relationships in global market data, are needed to prosper in today’s globally interrelated environment. I posit that inefficiencies existing in global financial markets can be discerned quantitatively and persist long enough for traders with the right analytic tools to reap substantial profits.

To achieve this, I propose a new method of analysis – synergistic market analysis (SMA) – which quantifies complex, nonlinear relationships between related global markets. In effect, SMA combines the three extant schools of analysis: technical, fundamental, and intermarket into one coherent analytic framework by using artificial intelligence to find patterns and relationships in otherwise disparate market data.

However, artificial intelligence is not easy for John Q. Trader to develop and implement. Since decisions about network architecture, data selection, preprocessing techniques, training parameter optimization, and testing procedures all affect real-world performance, I will explore these issues in more detail, and point out pitfalls to avoid at each stage of neural network development in the domain of financial market analysis.

Artificial intelligence uses neural networks that “learn” to solve problems by transmitting information between neurons. Network architecture concerns the number of layers of neurons, how many neurons are in each layer, how they are connected, and what transfer function is used. There are numerous learning paradigms, including two often used in the financial domain: a recurrent back propagation network that learns temporal information by the order in which the facts are presented, and a feed forward back propagation, or “back-prop,” network that trains by back propagation of error, in which temporal information is encoded into the input data itself by taking a preprocessed “snapshot” of the data. The latter paradigm will be used to illustrate network architecture.

The back-prop network is composed of an input layer, an output layer, and one or more hidden layers located between the input and output layers. The layers are typically fully connected, with every neuron in one layer connected to every neuron in an adjacent layer. The values associated with each input neuron are fed forward into each neuron in the first hidden layer. They are then multiplied by an appropriate weight, summed, and passed through a transfer function to produce an output. The outputs from the first hidden layer are then feed forward into either the next hidden layer or directly into the output layer in networks with only one hidden layer. The output layer’s output is that of the network. The number of neurons in the hidden layer is determined through experimentation. For any nonlinear problem such as prediction of stock or commodity prices, the network needs at least one hidden layer. In addition, the transfer function should be a nonlinear, continuously differentiable function that allows the network to perform nonlinear statistical modeling. Two commonly used transfer functions are the sigmoid and the hyperbolic tangent.

“Garbage-in, garbage-out” applies to artificial intelligence. Accurate financial forecasts can only be realized from a well-designed network, based on input data that reflects the breadth and integration of today’s globalized markets. For instance, in the development of a network to forecast Treasury Bond prices, a narrow technical analysis approach would suggest the use of only single-market data on bonds as inputs. A network based solely on fundamental analysis would use data inputs depicting supply and demand factors as well as econometric indicators related to the bond market, such as the Fed Funds rate or the Consumer Price Index. An intermarket analysis approach to neural networks would include data from related markets such as the CRB Index. In today’s global trading environment, none of these individual methods of analysis, when used alone, are sufficient. Synergistic analysis amalgamates these three approaches through the use of neural networks to synthesize global technical, fundamental, and intermarket data for precise market forecasting. Knowledge of the financial markets, coupled with the use of statistical analysis tools to find correlations between global data related to the target market, is mandatory to successful data selection for a neural network application.

Once input data has been selected, it must be preprocessed. By reducing the number of inputs to the network, preprocessing helps it learn more easily. Two widely used preprocessing methods are known as transformation and normalization. Transformation manipulates raw data inputs to create a single input to a net. Normalization transforms a single data input to distribute the data evenly and scale it to match the range of the input neurons.

In most neural network applications transformations involve simple algebraic or statistical manipulation of the input data. In financial forecasting applications, transforms can also include a wide variety of technical indicators, commonly used by traders to interpret market behavior. For a particular target market, such as T-bonds, preprocessed raw inputs might include differences, ratios, and moving averages of the open, high, low close, volume, and open interest data, as well as transforms on related fundamental and intermarket data. Each neuron in the input layer represents one of the preprocessed raw inputs.

One method of data normalization, which does not alter the distribution of the data, is a simple linear scaling of data into the range used by the input neurons. Another method utilizes a statistical measure of central tendency and variance to increase uniformity, by clipping off the ends of the distribution to remove outliers which might otherwise degrade training. A third approach minimizes the standard deviation of the height of the columns in an initial frequency distribution histogram of the data. Various transformation and normalization methods should be explored, since some are more appropriate than others for specific applications. Once the network architecture has been selected and the inputs have been chosen and preprocessed, data facts must be selected.

A fact is represented as a row of related numbers where the first i numbers correspond to the i network inputs and the last j numbers correspond to the j network outputs. A group of related facts is known as a fact set. If two facts have exactly the same input and output values, only one of the facts should be included in the fact set. Once it has been defined, in most financial applications the fact set should be randomized and then separated into mutually exclusive training and testing subsets.

However, since it is possible that all facts with an important characteristic could be randomly assigned to just one of the subsets, the fact set’s underlying distribution related to that characteristic should be determined beforehand. Then the fact set can be split, with similar distributions present in both subsets. This can be done through statistical analysis or with clustering algorithms.

Back propagation networks operate in two modes: a learning mode in which the network uses facts from the training set to modify its internal representation through weight changes, and a recall mode in which the network processes inputs from the test set and utilizes its previously learned representation to generate associated outputs. The relative performance on the test set of trained networks is used to determine which net should be incorporated into the final application.

One training and testing regimen that we have developed splits the fact set into three mutually exclusive subsets, one training set and two testing sets. The second testing set contains examples of facts that are considered most important in judging network performance. This test set is used to evaluate and rank various networks with different architectures, input data, preprocessing, and back propagated training errors.

Once the facts have been selected, they are presented to the network serially during training. At the onset, the weights, which allow the network to adapt its internal representation when modeling a problem, are typically initialized with small randomly assigned weights. If the weights were initially set to the same value, the network might never learn, since error changes are proportional to the weight values. For each pass through the training set, the network computes a measure of the error between its generated output and the desired output for each output in the output layer. The errors are then propagated backward through the network, layer by layer, to alter the connection weights between the neurons in order to minimize the total error associated with each output.

Each time the weights change, the network is taking a step on a multidimensional surface, representing the overall error space. During training the network is traveling along the surface, trying to find the lowest point or minimum error. The weight changes are proportional to a training parameter called the learning rate, which should be selected judiciously to avoid causing a condition known as oscillation. To illustrate, let’s assume that a network’s position is halfway down a valley on a two-dimensional error surface. If the learning rate is too large, oscillation could occur as the network steps back and forth from one side of the valley to the other, with little or no movement toward the bottom where the minimum error solution lies. Conversely, if the learning rate is too small, movement toward the bottom of the valley occurs, but in small steps that lengthen the training time. Another training parameter, momentum, helps prevent oscillation when higher learning rates are used to find solutions similar to those found with lower learning rates, without extending the training time. With a unique error surface for each application, learning rates must be adjusted during training to avoid oscillation while optimizing the trade-off between training time and overall error reduction. Many artificial intelligence tools offer other training parameters that can be adjusted during the training process including temperature, gain, and noise.

Due to the number of training parameters, preprocessing methods, and architectural configurations that can be explored, an automated training and testing regime that integrates testing with training is absolutely necessary to effectively apply this technology to financial market analysis. Tools including genetic algorithms and simulated annealing can be used to expedite searching these parameter spaces. Genetic algorithms are effective for many parameter optimization tasks. Simulated annealing automates learning rate adjustments during training by including a variable temperature term that affects the learning rate. When the temperature is high, learning is rapid. When the temperature drops, learning slows down as the network settles upon a solution. Figure 4 shows an example of simulated annealing, in which the step size is reduced to avoid oscillation while finding a minimum point on the error surface.

Over training, analogous to curve fitting of rule-based trading systems, is one of the major pitfalls that must be avoided when developing neural networks. Over training occurs when a network memorizes the subtleties, and even errors, specific to the training set, lessening its ability to generalize to new data. Such a network might perform well on the training set, but poorly on an out-of-sample test set and later during actual trading. To avoid over training, the network training should be halted periodically at predetermined intervals and run in recall mode on the test set to evaluate the network’s performance on predetermined error criteria. Then, the training resumes from the point at which it was halted. This automated process continues iteratively until the performance on the test set begins to degrade, suggesting that the network has begun to over train. All interim results that met the error criteria are evaluated further.

Another important network design decision concerns the error statistics to be used for training and testing. One measure might be the difference between the market value and the network’s output. This difference would be calculated for each fact in the test set, summed and divided by the number of facts in the test set. This is a standard error measure called average error. Some examples of error measures include average absolute error, sum-of-squares error, and root mean squared error. Other measures can be used to calculate the accuracy of network predictions of market direction, or how closely the network’s predictions correspond to the shape of the price movement over a specific time period. We have found that real-time trading performance can be substantially improved through the implementation of custom error functions, in both training and testing, that are specifically related to financial market analysis.

Synergistic market analysis, utilizing artificial intelligence can be used in two types of financial applications: information systems and trading systems. Information systems offer predictive information on the target market of interest. This information might include price predictions, market direction, or forecasts of impending turning points. In this type of system, the trader incorporates the predictive information into his overall trading strategy, using this information in conjunction with other analytic methods. Information systems can be designed that use a single network or multiple networks. In a multiple network configuration, each network is trained to forecast an independent output, such as the next day’s high, while other networks predict related market information such as the next day’s low or trend direction. These independently derived outputs can be used to confirm one another. Additionally, with a more complex architecture, each of these networks’ outputs can act as an input to other networks. This type of network architecture is referred to as a hierarchical neural network. By encapsulating a certain amount of functionality in each network, it is not necessary to train a single net to produce more than one output.

Artificial Intelligence neural networks can also be trained to function as trading systems that generate buy, sell and sideways signals. However, there are difficulties inherent in its implementation. Using artificial intelligence to generate signals requires substantial domain knowledge and expertise in neural network development to design and train networks properly for a given application. Since traders have different risk propensities, trading styles, and capitalization, these differences would need to be considered in the network design so that the signals that are generated are tailored to a specific trader’s requirements. What may be short-term to one trader may be long-term to another. Additionally, it is difficult to incorporate risk management factors into a neural network design. To generate trading signals, a hybrid approach using neural networks in conjunction with an expert or knowledge based system is advisable.

Synergistic market analysis is a new method of analysis that can accurately forecast prices and predict trend direction changes. It accomplishes this using neural networks to find the market synergy in today’s global interrelated markets by quantifying hidden patterns and relationships in technical, fundamental, and intermarket data that current, single-market methods of analysis can not handle.

Other AI technologies such as genetic algorithms are also useful in implementing synergistic analysis. These algorithms, which mimic the characteristics associated with evolution, have proven to be robust search mechanisms. They are well-suited to optimization problems, such as optimizing neural network training parameters. The same technology incorporated into genetic algorithms has also been used in classifier systems and genetic programming. Classifier systems perform a type of machine learning that generates rules from examples. Genetic programming goes even further by automatically generating a program from a set of primitive constructs. Even fuzzy logic, wavelets and chaos are finding applications in financial forecasting. No doubt, other technologies will continue to emerge. It should be understood that none of these tools, individually, will provide the whole answer. Instead, each tool has its own role to play in putting together the pieces of the puzzle needed to implement a synergistic approach to today’s globalized markets.

Lou Mendelsohn is president of Market Technologies, of Wesley Chapel, Fla., an AI research, software development, and consulting firm, which licenses predictive information and trading software, and develops and licenses proprietary systems for institutional clients.

Hecht-Nielsen, R. [1990]. Neurocomputing, Menlo Park, Calif.: Addison-Wesley Publishing Company, Inc., 1990.
Rumelhart, D. E., & J. L. McClelland. Parallel Distributed Processing, Volumes 1 and 2. Cambridge, Mass.: MIT Press, 1986.
Murphy, J. J. [1991]. Intermarket Technical Analysis. New York: John Wiley & Sons, Inc. 1991.
Lo, Andrew W., and A.C. MacKinlay. “Stock Market Prices Do Not Follow Random Walks: Evidence From a Simple Specification Test,” The Review of Financial Studies, Vol. 1, No. 1, 1988.
Vaga, T. “The Coherent Market Hypothesis,” Financial Analysts Journal, December/January, 1991.
Peters, Edgar E. Chaos and order in the Capital Markets. New York: John Wiley & Sons, 1991.