Genetic Programming Filter

Genetic Programming Filter

One of the last projects I worked on during my time at RMR was a program to evolve algorithms for trading future contracts.

A Genetic Programming approach seemed like a good fit because the search space was large, we had a fitness metric and, most importantly, multiple solutions that had low correlations for short duration returns and time-in-market were desirable (in other words, there was value to a process that could be run multiple times and produce a variety of results).

This process I came up with was a relatively simple twist on the basic GP algorithm (like you can find here) I used as a template in grad school to evolve an algebraic function to model the path of a cannonball in flight. The algorithm used then was the same as the one linked (though Pa would have been 0). I believe a sum of squares of the error, between a set of projected and known positions for a cannonball in flight at time t, was used as the fitness metric.

Because of certain constraints, I opted to evolve an algorithm for trade initiation separately from the liquidation. The only real trick was deciding on the appropriate form of the solution (chromosome). I opted to use chromosomes that could be translated directly into SQL statements to SELECT against a database of all possible market/date (possible initiation) combinations. The market/date key, which numbered around 540,000 rows, was used to map hundreds, potentially thousands, of Indicators as well as a value that could be used to measure the expectation of an initiation for the market/date of the row key (this did open the possibility of selecting later generations on a different fitness measure than earlier generations, but this was a road I was not planning to explore . . . the multiple metrics were present only as a convenience for changing between runs). The idea was that the Indicators, in various combinations, would be used as thresholds to select rows in the table. The selected rows would correspond to initiations and the count of rows selected and average of the potential measure for those rows would generate a fitness metric. A function using both the count and average is preferable to the sum of the potential measure because the higher count indicates less curve-fitting and the higher average gives higher confidence trades and better overcomes the costs associated with trading.

The end result was:


  • Methods: Constructor, BuildSQL, EvaluateSelf
  • Properties: Fitness,GeneList()


  • Methods: Constructor, GetSQL, Mutate
  • Properties: ComparisonType, GeneType, Name, RangeMin, RangeMax, RangeType, TargetValue

Most of the methods and properties are, hopefully, self documenting. RangeType is a code that indicates how to to treat the range of potential values for the purposes of seeding and mutating.

The Population and GP objects are as expected.


Leave a Reply

Your email address will not be published. Required fields are marked *

Post Navigation