PDA

View Full Version : An intellectual excersise


KennyVictor
24th May 2006, 11:24 AM
This is a little bit computerish but the principles still work if you think of it in terms of pen and paper (it's just quicker with a computer).

Right, you have a database (or a lot of sheets of paper) with thousands of races in it and each race has (say) twelve factors associated with each horse in it. Maybe one for the horses perceived form before the race, another for let's say it's jockey, another for the weight it carries, and so on. Each of these factors is a number.
To illustrate with an example I'll use a two horse race and only show two of the factors.

Horse 1 I'M A TRYER Form 3, Jockey Factor 5
Horse 2 ALWAYS FIRST Form 7, Jockey Factor 3
(Horse 1 has a better jockey but it appears horse 2 has had better recent form).

We can take each of these 12 factors and multiply them by a number from 0 to 5. We have to multiply them by the same number in all the races and for all the horses (bit like the neurals really).
In the example above we might choose to multiply the form factor by 2 and the Jockey factor by 5 and we would get:

I'M A TRYER (Form 3 * 2) (Jockey 5 * 5) giving a total of 31
ALWAYS FIRST (Form 7 * 2) (Jockey 3 * 5) giving a total of 29

So if we use 2 for form and 5 for jockey we would predict that I'M A TRYER would win the race.
Now if we had used 3 for form and 4 for jockey we would get:

I'M A TRYER (Form 3 * 3) (Jockey 5 * 4) giving a total of 29
ALWAYS FIRST (Form 7 * 3) (Jockey 3 * 4) giving a total of 33

This would predict that ALWAYS FIRST would win the race.

Now the challenge is this -----

We have to pick the numbers that we are going to multiply each factor by (I used 2 and 5 in my first example), that's 12 numbers for our imaginary database. Then we have to go through the thousands of races and multiply each horses factors by these 12 numbers (Just like I did for the two horse race but more horses and more factors and numbers). Quite a job, but a computer can do it in a few seconds. At the end of all this multiplying and adding we tally up how many times we picked the real winner of the race and if we'd had a bet on each one how much we would have won (or lost). Maybe with our first set of 12 numbers we picked 15% of the winners and lost half our stake money.
Well we think, we can do better than this. So we use a new set of 12 numbers to multiply these factors by. Woohoo, this time we come up with 18% of the winners and only lose a quarter of our stake money. Maybe if we tried every possible combination of those 12 numbers (each one from 0 to 5) we would find a combination that got 30% of the winners and made us a profit.

This is the real challenge ----- (I was just trying to stop you dropping off to sleep last time I said this).
How do we find the best combination of 12 numbers - each number from 0 to 5 - which gives us the best return for our money. Wesmip reckons there are 2,176,782,336 combinations. Without a project like CETI we aren't going to be able to try them all even if each takes only a second to run through.

What do we do?

It's only fair to add that I don't know the answer. However if anyone has suggestions I can possibly try them out (as can Wesmip I suspect) and share the path to the Holy Grail with the winner.

If you got this far, congratulations.

KV

wesmip1
24th May 2006, 11:54 AM
KV,

The scary thing is I know how to do it to come up with a good answer in a short time frame. I also know how to get the best answer but it takes a lot longer.

You are delving into Artificial Intelligence here and this is where is all gets fun (for those of us with a sick mind).

There are numerous methods we can try to come up with an optimum solution within a short time frame. I would suggest using a simple Genetic Algorithm that mutates using a compeititve nature for killing off useless combinations. This is the direction I will be heading, but there are always other possibilities.

With over 2 Billion combinations it will take a long time to run all the combinations over tens of thousands of races. I only have 200 races in my databse at the moment but it would still take several days to do all the combinations. Adding more races will only slow it down further.

Just out of interest what database and language are you using for your testing ?

I am using an OracleXE database with Java as the coding language for my applications.

Thanks

Chrome Prince
24th May 2006, 12:44 PM
We measure the impact values over a set of records and assign the factor multiplier to it.

The data determines the factor.

This wil give you the probability or likely winner, it will not give you the best profit.

You can get the best profit by adding a s/r avg dividend impact value.

lomaca
24th May 2006, 03:40 PM
Without a project like CETI we aren't going to be able to try them all even if each takes only a second to run through.

What do we do?

KV
Hi KV!
Analysing the problem and breaking it down into smaller chunks comes to mind.
System analysts do that, and you can do it too, provided, you don't get into coding mode as soon as you have a keyboard in front of you.
Your reference to pen and paper is a very good one.
It will take some time to run even with an optimised code, but I had run one similar to this in only 53 hours over seventy odd thousand races, yes RACES not horses.
Good luck with it.

KennyVictor
24th May 2006, 04:03 PM
Just out of interest what database and language are you using for your testing ?
I am using an OracleXE database with Java as the coding language for my applications.

I use a thing called Powerflex - it's a derivative (and a much improved one at that) of Dataflex (which is probably another one you haven't heard of). Dataflex was sort of like DBase IV.
Powerflex is good in that it also has commands for accessing the net and what have you so I can do everything I need within one program.

I'm interested to know how you collect your data (if it's not a secret). Like a copy and paste affair or programatically for want of a better word.

KV

wesmip1
24th May 2006, 06:01 PM
KV,

I haven't heard of those at all ....

I do it programmatically thorugh java. I haven't written the results part yet just because I am lazy. I am just updating results by hand to get a few in quickly. Then I will sit down and work out a better way of updating the results.

Its all done through java so its pretty simple stuff.
Connect to a web site, Set the cookie info, set the parameters, download the info to a file, Open the file, parse the file and get info, store it in the database.

Essentially getting the results is exactly the same. The hard bit is working out how to parse the page correctly. You need to have so many checks it isn't funny in case something changes slightly.

I currently collect TVF3 and AAP. I haven't reconciled them together yet in the database but that wouldn't be too hard if I sat for a few hours and worked out the abbrev meeting codes which correspond to each other.

As far as analysing the data I have written a few programs that automatically do my selections. I only just started getting AAP Data but I have downloaded every meeting from 1st Jan to the begining of May.

I would prefer not to share the code as it took me at least a couple of hours to write and it is still full of bugs (lots and lots of bugs which I am slowly fixing, There is no bugs in data collection unless there is a break in connection). But if you want some help with anything then let me know.

Good Luck

woof43
24th May 2006, 07:35 PM
Hi Kenny,
Have you ever tried scatterplot analysis on your 12 Factors, Using Class of Race and Win Dividend as axisis and then overlaying to reach a final result.

I'm pretty sure Punter57 used this approach in developing his method.

Chinbok
25th May 2006, 08:58 AM
You can get the best profit by adding a s/r avg dividend impact value.Is your impact value the SR multplied by Av Div?

Is this the same as optimising the factors for pot instead of SR?

KennyVictor
25th May 2006, 09:22 AM
Woof,
I'll try to look up how to do that before I trouble you for more info, but thanks for the suggestion.

Chinbok,
Know it wasn't a question to me but when analysing data I use sr some times and POT others (although it gives me a headache the next day). Trouble with POT is a big winner in a small sample causes a lot of trouble. s/r gives a smoother curve. Often I use a meld of the two - multiplied as you suggest or a ratio or whatever.

KV

KennyVictor
25th May 2006, 09:33 AM
Hi Wesmip,
I do much the same as yourself, was just interested in how you were working. Yes the parseing is a problem, especially when sites change their format every five minutes. It's a thing I find I get better at with practice though.
BTW I'm learning my way around Java at the moment and enjoyable as that is it appears to me that it takes three times as many lines of code to do anything as any other language I've used. Is this just beginner's clumsiness or do you find the same?
KV

wesmip1
25th May 2006, 10:04 AM
KV,

It does take 3 times as many lines in java to write the code. If you get your procedures coded correctly though there is little rework required and the reuse is great.

Chrome Prince
25th May 2006, 12:34 PM
Is your impact value the SR multplied by Av Div?

Is this the same as optimising the factors for pot instead of SR?

YES

No, POT is a whole other kettle of fish, while it's a good indicator, I really never use it in developing systems, the $$$ profit is the main indicator for me, providing the number of selections is more than 500.

wesmip1
25th May 2006, 01:06 PM
Chrome,

I have to agree with you POT is nothing is you don't have volume.

A pot of 500% looks good but if you only have 2 bets a year then it is a useless system. $$$ is a much better indicator.