History Tester Important Factor in Software Selection

By Louis B. Mendelsohn

One of the biggest challenges facing a user of computerized, technical trading software is determining the amount of profits you might receive or the downside risks you might face with the software program before you actually use it in real-time trading.

As experienced traders know, claims in promotional literature and performance results of history tests frequently overstate profitability and downplay downside risks.

By nature, commodity markets are not ideally suited for computerization of a valid history tester. Design decisions must address complex issues such as contract expiration, limit moves, expanded limits, execution timing and slippage.

Often, validity is compromised by software developers to meet other objectives or requirements such as faster processing speed, user friendliness, lower development cost, ease of programming and program size constraints related to microcomputer memory or disk capacity. Unfortunately, in too many situations, faulty conceptual design and omissions of key statistical indicators cause validity to be compromised unnecessarily and introduce distortions without offsetting benefits.

The following questions illustrate important conceptual design features and performance indicators that must be incorporated into the history tester if it is to produce valid, realistically achievable trading performance results.

How much risk capital is required to test the entire portfolio?
That means, of course, how much risk capital would be needed to replicate test results in real-time trading. This depends on the specific commodities tested, number of contract months involved, whether pyramiding is part of the system’s architecture, number of contracts executed on a trading signal, margin requirements and maximum draw down.

The reported trading performance of many systems could not be achieved by many traders because of the substantial capital requirements necessary to trade the entire portfolio. Validity is compromised once traders start to pick and choose among trading signals and subjectively decide which ones to execute.

How Is T-bill Interest handled?
Interest should be excluded from computations of statistical performance indicators, such as cumulative net equity and the Sharpe Ratio.

How should execution slippage and commissions be recognized?
Historical data used by history testers does not include opening and closing ranges but, instead, reports single prices which are used for testing purposes. This distortion, coupled with a slippage factor affecting all price executions, necessitates building an execution slippage allowance into performance results.

Slippage can be a fixed dollar or percentage amount established for all commodities. Or it can vary by commodity according to price/execution variability.

Commissions also must be included. Depending on each trader’s commission cost and the number of transactions, this overhead could represent a significant factor affecting net performance results.

The dollar value of allowances for slippage and commissions should be user selectable. It can be incorporated into the history tester as part of each trade’s profit/loss computation or can be reported separately in the summary report.

When will entry/exit signals be triggered?
Because systems typically are updated after the markets close, signals which were triggered by the closing price cannot be executed until the next day’s open. Some history testers report execution on the close, even though it would be impossible to do so in real time without prior knowledge of the signal.

Both of these trading rules introduce unnecessary distortions in the performance results, which could be overcome through the development of software which can forecast closing price signals. Execution timing could then be a user controllable parameter value which could be tested and optimized.

Will the history tester have a rollover capability to the next option?
A valid history tester must include a rollover capability which rolls forward from an expiring contract month into the next actively traded contract month by putting on a spread between the two contracts. This allows the trading system to be tested over an extended multi-year time period, reflecting varying market conditions, on actual price data representing real contracts.

Many history testers lack this capability. Some can only test single contracts independently over short time periods. Others employ simulated, non-expiring contracts which can be tested independently over longer time periods. Both alternatives to the rollover introduce unnecessary distortions in performance results.

The two most obvious choices for selection as the rollover day would be the first notice day or the last day of the month preceding the expiration month. The last day of the month approach does not work on all commodities when testing the open entry/exit execution parameter value. Both rules are subject to possible liquidity problems and related execution price distortions due to reduced open interest and daily volume.

One rule which overcomes these limitations rolls forward a predetermined number of days before the last day of the month preceding the expiration month. Another rule would employ a variable rollover based on spread optimization.

The system performs differently on rollover day than on a normal trading day. That’s because the rollover overrides the logic block and forces an offset in an open position in the expiring contract and a re-entry in the following contract month.

An additional rollover slippage allowance should be made because the spread difference upon execution is often more than the net difference in opening or closing prices. The accompanying printout illustrates the incorporation of the rollover into the architecture of the history tester.

How will lock-limit situations be handled?
This is one of the most critical areas in the development of history testers. The organization of commodity markets makes it very difficult to mirror trading reality without introducing some degree of distortion.

Inadequate design of daily limit rules is perhaps the most common cause of distortions in trading performance results. The most blatant example is when limit move days are treated as normal market days and entry/exit executions are recorded even though trading could not have occurred.

Very elaborate and precise combinations of trading rules must be developed to minimize distortions when checking for daily limits under these conditions:

Spot month trading without limits.
Commodities trading with expanded daily limits.
Changes in the size of daily limits over the testing period.
Commodities trade all day, then go lock-limit on the close.
Commodities open lock-limit, trade intraday, then close lock-limit.

What commodities, contracts and market conditions will be tested?
A precise portfolio of commodity contracts must be defined for use with a history tester to reflect trading behavior, avoid liquidity problems and meet system logic and rollover requirements.

The data base should be obtained from a reliable source to assure that errors have been eliminated and that the data is clean. It should be uniform for all commodities tested and should cover a multi-year period reflecting all types of market conditions. This minimizes possible bias and distortions due to choice of test period beginning and ending dates, selection of contracts included in the portfolio, erroneous data and seasonality.

The extent of testing to be performed is limited by data acquisition cost and disk storage capacity considerations. Extensive testing covering more than one or two years on 30 commodities necessitates use of a hard disk system.

What performance indicators will be included?
A properly designed history tester developed for microcomputer use is capable of generating a comprehensive summary of performance indicators.

A review of this printout report reveals some of the key indicators which should be included in the history tester:

Cumulative profit to total realized losses.
Commissions and slippage to cumulative net realized profit.
Average winning to average losing trade.
Maximum number of consecutive losing trades.
Maximum dollars of consecutive loss.
Maximum draw down/dollar equity decrease affecting risk of margin calls.
Profit factor (used to relate risk propensity to system profitability).
Sharpe Ratio (used to compare risk factors of various trading systems).

Unfortunately, many history testers do not provide a comprehensive summary report of this caliber because of their own inadequate design and not because of any inherent limitations of the history tester concept or microcomputers.

What will a good history tester do?
A sophisticated history tester, designed according to industry standards, should become an integral part of all technical trading software. This would:

Give traders more reliable and valid performance results.
Allow benchmark performance comparisons of technical trading software.
Promote competition for clients among software vendors.
Let each trader perform historical testing and optimization.
Allow custom tailoring of trading systems.

Hopefully, this presentation will help stimulate industry-wide discussion and formulation of standards for the history tester — a very important, but frequently neglected, component of technical trading software.

Louis Mendelsohn is president of Market Technologies Corporation, a computer software development firm in Tampa, Fla. His first article, “Picking Software Programs: Know Their Limitations,” appeared in the May issue.