DATA CONDITIONING FOR NEURAL NETS
Letters to S&C
I have concerns about the “considerable research” done by Lou Mendelsohn in “Neural network development for financial forecasting” (STOCKS & COMMODITIES, September 1993).
About four years ago, when I was experimenting with different transfer functions, I realized that I was wasting a lot of computer time with the hyperbolic tangent function. If you take this function, add 1 to it, then divide by 2, the result is the same as if you used the sigmoid function with twice the input. In other words. the sigmoid and the hyperbolic differ by only a shift and a scale factor. This lies in the realm of data preconditioning.
So I guess my question is: How are we deciding what constitutes good results? If data preconditioning has not been adequately considered, how can we trust judgments such as “we realized better results with hyperbolic”? Maybe I’ll learn in later articles of this series. In the meantime, I’d be interested in comments from Lou.
Lou Mendelsohn replies:
Bob Martin is correct when he identifies the hyperbolic tangent function (TanH) as a shifted and scaled sigmoid function. However, this fact alone does not mean that these two related functions have equivalent effects on training.
When I indicated that we had “realized better results” with the TanH, I was referring to an increase in training efficiency that we have experienced. While most back-prop networks use a sigmoid function in the range of 0 to 1, transfer functions with other ranges, such as the TanH in the range -1 to 1, often converge sooner with shorter training.
For instance, when using a sigmoid, an input value of 0 results in no weight change; however, with a TanH function, an input value of 0, when mapped to -1, results in a weight change. This speedup in convergence tends to be even more pronounced the more pre-mapped input values of 0 there are. Suggested theoretical underpinnings for this speedup have been postulated in the literature.
Further discussion of this issue and closely related topics can be found in Neural Network Learning and Expert Systems (1993) by S.I. Gallant (pages 222-223) and “An improved three-layer, backpropagation algorithm” in Proceedings of the IEEE First International Conference on Neural Networks (1987) by W.S. Stornetta and B.A. Huberman.
Reprinted from Technical Analysis of
Stocks & Commodities Magazine. (C) 1993 Technical Analysis, Inc.,
4757 California Avenue S.W., Seattle, WA 98116-4499, (800) 832-4642.