OZmium Sports Betting and Horse Racing Forums

View Full Version : Backtesting

becareful

9th July 2002, 11:32 AM

I have seen quite a few messages here on the dangers of backtesting to develop/prove systems so thought I would share the system I have used which (in my opinion) allows you to backtest with some confidence.

Firstly to backtest with any confidence you need a good sample size - I personally have a database with several years data (although the older data has some limitations so some of my testing is limited to the more recent data). I think you need at least 6 months data and preferably a full year.

Divide your sample data into 2 parts - doesn't really matter how - for simplicity you can put the first half of the data in "Part A" and the other half in "Part B". If you are concerned about seasonal influences then put alternate months in each group (eg. Jan, Mar, May in "A"; Feb, Apr, Jun in "B")

Now you use the "Part A" data to develop and test your ideas - when you have come up with something that seems to work you can then test it against your "Part B" data. When you are doing this testing you MUST apply the rules/system you have come up with as if you are betting with real money and you should make your bet decision based on the form, etc without looking at the results first.

If your system shows a similar profit against the "Part B" data then you should have some confidence that it is valid and hopefully will work in the future as well. If it makes a loss or if the profit is significantly lower than on the "Part A" data then you should go back to the drawing board and revise your system (again using the "A" data for development and the "B" data for testing.)

One other thing to be aware of is that you should always use the same type of data for developing a system as you are going to use when betting. So if you are only going to bet on Saturday Metro meetings then only use Saturday Metro data for development and testing.

As always I am happy to try to answer any questions or comments.

Benny

19th March 2004, 03:21 PM

I would like to know more.

Benny

sportznut

19th March 2004, 04:14 PM

Yeah, some good ideas there.

20th March 2004, 06:56 PM

Ditto.

markallan

20th March 2004, 08:22 PM

i think that 6 months data is irrelevant. any thing can happen within that 6 month period. i prefer to use a minimum of 5 years to determine if something is going to work. this way you have looked at many thousands of selections and many, many thousands of horses. what i do agree with is your pattern of checking. i use 1997 to 2002 as my basis data for checking my concepts. if it has worked at say a 15% profit, then i check how that idea worked over the full year of 2003. if it still stacks up over that 12 month period...then i think i may be on to something.

markallan

Chrome Prince

21st March 2004, 04:08 PM

On 2004-03-20 20:22, markallan wrote:
i think that 6 months data is irrelevant. any thing can happen within that 6 month period. i prefer to use a minimum of 5 years to determine if something is going to work. this way you have looked at many thousands of selections and many, many thousands of horses. what i do agree with is your pattern of checking. i use 1997 to 2002 as my basis data for checking my concepts. if it has worked at say a 15% profit, then i check how that idea worked over the full year of 2003. if it still stacks up over that 12 month period...then i think i may be on to something.

markallan

markallan,

Obviously the more data you have access to, the better - a lot of people don't have 5 years data to work with, so I think becareful was giving a minimum criteria.

One thing I've been using is looking at how filters perform over various systems.
If a filter only works or improves things for one system but not another, then perhaps the increase in strike rate or POT comes from somewhere else.

One thing I have noticed, is that obvious filters and criteria do not boost POT as every "Harry" with a formguide can use this strategy. For example horses that won their last start are far less value than horses that were beaten by less than a length, similarly horses that ran a place are far less value (in general) rhan horses that ran 4th or 5th but beaten by less than 2 lengths etc.

crash

22nd March 2004, 07:14 AM

Howdy critters,

I thought I'd wade in with my 2c worth.

For starters, any form that is more than 12mths. old in any horse is pretty much worthless info and meaningless stats, so probably the only stats. worth anything much are those last 12 mths., but only for use over the next 12mths. while most of the horses they were based on are still in the game.

How a system performs over the next 5yrs. will have little relationship to how it performed over the last 5yrs. because all the horses will be newbies.

% variables on a system that won 10/20% over the past 5yrs. will have swings large enough to make even a 10yr. average back-fitted performance, all but next to meaningless as far as it's performance over the next 12mths. goes.

It is worth remembering that this game is 70% [approx] a game of chance that is subject to wild swings of our ability to predict outcomes from available variables.

A back-fitted system result that shows say 15% POT over the last 5yrs. may make 15% [or more] over the next 5yrs., loose 15% [or more], but how it performs over the next 12months [ most important ] could be anyones guess.

The fact that anyone can approach a system that has shown a profit over the last year/5yrs. anyway, with some sort of smug glow of security [ radiating downward toward the hand that removes the wallet, that will then remove the notes that are going to do a magic trick of multiplying themselves based on the secure 'wisdom' gleaned from shonky maths applied to shonky facts ], that they are safe from a financial mauling, beggars belief !!!

Cheers.

[ This Message was edited by: crash on 2004-03-22 07:22 ]

kenchar

22nd March 2004, 07:08 PM

Crash,
We've had our differences, but on this I have to totally agree with you.

Every race is different depending on circumstances.

How can I back horses FOR A PLACE that have never been out of a place at the distance, never been out of a place at the track, and is down in in class from their last start, and the same jockey has ridden it at it's last 5 or so starts, and they run a shocker.

The reason is that this is horse racing and there is ALWAYS the unforseen.

[ This Message was edited by: kenchar on 2004-03-22 19:09 ]

[ This Message was edited by: kenchar on 2004-03-23 08:37 ]

Chrome Prince

23rd March 2004, 10:43 AM

On 2004-03-22 19:08, kenchar wrote:
Crash,
We've had our differences, but on this I have to totally agree with you.

Every race is different depending on circumstances.

How can I back horses FOR A PLACE that have never been out of a place at the distance, never been out of a place at the track, and is down in in class from their last start, and the same jockey has ridden it at it's last 5 or so starts, and they run a shocker.

The reason is that this is horse racing and there is ALWAYS the unforseen.

Hi kenchar,

I think that the difference is the expectation of what stats will do.

Stats will not predict the outcome of Race 1 at Flemington with any accuracy, although they might point to value.

As you say, every horse is independant, as is every race, condition jockey and result etc.

However, I can say with some accuracy, that data can predict the OVERALL strike rate or profit given enough bets.

For example, if one in three households have a computer, that's 33% strike rate - this does not mean that if I walk into three households one of them MUST have a computer, it means that if I walk into 100 houses, roughly 33 will have one.

Forgive the crude analogy.

crash

23rd March 2004, 05:26 PM

You have only one problem Chrome,

nobody will give you odds for your money on predicting 'average' outcomes !!!

Ok, so we know that 30% of favorites will win over the next 12 mths. So what ?

No bookie will take your bet on it, but he'll take your money on the favorite of the next race !!!

Cheers.

PS. Hi Kenchar, good to see you around. We probably agree on more things than disagree, so that's all good news.

Chrome Prince

23rd March 2004, 06:44 PM

On 2004-03-23 17:26, crash wrote:

nobody will give you odds for your money on predicting 'average' outcomes !!!

Ok, so we know that 30% of favorites will win over the next 12 mths. So what ?

No bookie will take your bet on it, but he'll take your money on the favorite of the next race !!!

Crash,

I see your point but we're looking at it from different perspectives.
The point is that if you know the average outcomes and can find an edge, you can then get overlays given what the bookies offer.
That doesn't mean you win on the race, but it means you can win on a series of bets.

As to 30% of favourites - so what?
That means that 70% lose and there is a great deal to play with in regards to filters etc.

The whole concept of value is that the bookie has made his market according to that race and not overall value by betting on certain selections.

Example:

If his book is say 110%, that means he has a 10% edge and if sticks to it and correctly fluctuates his markets according to bets laid, he'll win that much on every race.
What he hasn't factored in, and it doesn't matter to him because is margin is always there, is the strike rate of certain horses given certain conditions overall.

This means I should hypothetically lose 10% POT no matter what I bet on - BUT....through purely statistical filtering can get a 10% POT using TAB prices.

crash

24th March 2004, 10:24 AM

Yes, you would get your 10% [eventually, given unlimited time and money to do so].

Consider this Chrome:

Using poker machines as an analogy regarding your below example.

Quote
Example:

If his book is say 110%, that means he has a 10% edge and if sticks to it and correctly fluctuates his markets according to bets laid, he'll win that much on every race.
What he hasn't factored in, and it doesn't matter to him because is margin is always there, is the strike rate of certain horses given certain conditions overall.

This means I should hypothetically lose 10% POT no matter what I bet on - BUT....through purely statistical filtering can get a 10% POT using TAB prices.

End quote.

Stats wise you are of course correct but along that logic line you could put a $100 into a poker machine and expect to get 85% [whatever the figure is in different states] at least, back !!!

Reality and the nature of the beast just does not operate in the simplistic [hypothetical] way you have described.

And onto another point you have mentioned in your posts here, regarding finding 'value' [ nothing more than future prediction based on our personal or computerized reading of variables ]. That skill doesn't operate simplistically either and among other skills, is the defining one that separates the long term consistent winners from the rest of us. No easy task such as 'just bet on the value' and you'll end up in front' simplistic philosophy.

Spotting 'value' is a statement that one can predict value better than the tote [ mass punter opinion which is uncannily very close most of the time ], or an individual Bookie enough times to be consistently in front of both players. Neither is an easy task and our 'value' predicting is often more often wrong than the public and the Bookie.

In other words, our perceived 'overlays' are often in fact, underlays.

We end up back at 'subjective' valuing of variables whether we input into puters, do old fashioned form, or place ourselves into the hands of systems.

Unknown variable just cannot be given precise values and regardless of our best efforts, are often enough way off to keep us behind in the game.

Either from unfair odds, uncoperative averages [not being average when YOU want them to be] or just unexplainable race result outcomes, we are indeed accepting cards from a very 'stacked' deck.

Cheers.

[ This Message was edited by: crash on 2004-03-24 10:45 ]

White Turnip

6th April 2004, 10:33 PM

Very sound strategy for R&D.

I was using a Neural Network programme that split data sets into 3 groups. Train, Test to develop the maths and Evaluate to confirm results.