Monthly Archives: January 2014

You are browsing the site archives by month.

Iterative Monte Carlo

Iterative Monte Carlo

This is the more interesting of the two machine learning programs I claim since grad school and the one for which I do not have a working version handy. It’s also one I wrote initially as a private project, and later converted to a work purpose, and as such am happy to describe in more detail. As far as I know, the process is not analogous to anything done elsewhere, but that is probably just a reflection on my lack of awareness of the state of machine learning since my daughter was born in 2003.

This one came about, oddly enough, to settle a disagreement over the proper pricing model for players in rotisserie baseball about ten years ago. I was later able to use it, pretty much unchanged, as a tool for adjusting market/system weighting in a portfolio. I’ll write in terms of the original use case as that seems less likely to add to any confusion.

I am going to make the assumption that whoever is reading this is familiar with rotisserie baseball and how an auction therein would work. A less than perfect explanation can be found here for those that have had more productive ways to spend their time.

The objective of the code was to assign an auction value to every player within a specific environment (here this would be the set of rules for the league). To do that requires identifying the players that should be selected in the auction and the player’s value to a team. This turns-out to be a more difficult concept then one might think and it has spawned some widely used models that fail horribly at the limit cases. Oddly, the limit cases where these models fail most spectacularly are the simplest ones (known player stats and leagues with no roster moves).

Because of the original use case there were some features in the program aimed at accommodating a variety of rule sets and I am going to omit them for now, but would be happy to discuss them later. The algorithm in play here is:

  1. Define the Universe of potential players. This would be anyone that could be reasonably expected to be rostered by one of the teams.
  2. Assign each player a value. For the first iteration this could be done randomly, based on some externally published values or set such that every player included in Step 1 had the same value There are multiple ways to do this, but I can only remember trying a variation where each player is assigned what would be the average value. Since we can normalize the values at any point using an average value is not necessary.
  3. Build (draft) each set (team), adhering to any constraints (for example, Salary Cap, roster configuration). The constraints will allow a small amount of variance into the selection; but, suspecting variance would be welcome, I set the process responsible for building a team to assign a player from a pool of similarly valued players at each draft slot. In other words, the team is built through a draft and with each selection a team was randomly choosing from a pool of players who were valued within x points of the highest-rated remaining player that could be legally rostered. This worked well, if I recall correctly, and I am not aware of any theoretical objections. There may be players left over each iteration, even at the end . . . unless that is the termination condition.
  4. Calculate team stats and score each team based upon the league rules. Some consideration here is given to “bench” scoring as that will be important. In general, and in the portfolio-optimization scenario, that will not be an issue.
  5. Make an adjustment to each player on each team based on the relative score of the team to other teams in the league. Specifically, players on teams scoring above the mean had their individual values increased proportionately (and players on teams scoring below the mean had their individual values decreased).
  6. Remove players from the Universe whose value falls below a threshold value. I used zero.

Repeat steps 3 through 6 until a termination condition (e.g., the difference in the score between the first and last place teams is under a threshold value) is met.

Genetic Programming Filter

Genetic Programming Filter

One of the last projects I worked on during my time at RMR was a program to evolve algorithms for trading future contracts.

A Genetic Programming approach seemed like a good fit because the search space was large, we had a fitness metric and, most importantly, multiple solutions that had low correlations for short duration returns and time-in-market were desirable (in other words, there was value to a process that could be run multiple times and produce a variety of results).

This process I came up with was a relatively simple twist on the basic GP algorithm (like you can find here) I used as a template in grad school to evolve an algebraic function to model the path of a cannonball in flight. The algorithm used then was the same as the one linked (though Pa would have been 0). I believe a sum of squares of the error, between a set of projected and known positions for a cannonball in flight at time t, was used as the fitness metric.

Because of certain constraints, I opted to evolve an algorithm for trade initiation separately from the liquidation. The only real trick was deciding on the appropriate form of the solution (chromosome). I opted to use chromosomes that could be translated directly into SQL statements to SELECT against a database of all possible market/date (possible initiation) combinations. The market/date key, which numbered around 540,000 rows, was used to map hundreds, potentially thousands, of Indicators as well as a value that could be used to measure the expectation of an initiation for the market/date of the row key (this did open the possibility of selecting later generations on a different fitness measure than earlier generations, but this was a road I was not planning to explore . . . the multiple metrics were present only as a convenience for changing between runs). The idea was that the Indicators, in various combinations, would be used as thresholds to select rows in the table. The selected rows would correspond to initiations and the count of rows selected and average of the potential measure for those rows would generate a fitness metric. A function using both the count and average is preferable to the sum of the potential measure because the higher count indicates less curve-fitting and the higher average gives higher confidence trades and better overcomes the costs associated with trading.

The end result was:


  • Methods: Constructor, BuildSQL, EvaluateSelf
  • Properties: Fitness,GeneList()


  • Methods: Constructor, GetSQL, Mutate
  • Properties: ComparisonType, GeneType, Name, RangeMin, RangeMax, RangeType, TargetValue

Most of the methods and properties are, hopefully, self documenting. RangeType is a code that indicates how to to treat the range of potential values for the purposes of seeding and mutating.

The Population and GP objects are as expected.