Ratings aficionado's
Hi all,
Figured i'd spark another thread on here and get some conversation happening. In this thread I want to discuss ratings and more specifically weighting the individual factors that are taken into account. I'm not really interested in discussing what factors to consider or what ones people think are more/less important. What i'd like to discuss is how people do/would approach the weighting of each factor. For example you might start out with a ratings approach where every single factor is weighted to 10 points. So the horse with the best place % gets 10 points, the horse with the best API gets 10 points etc etc and it scales down from there for each horse in the race. What i'm wondering/asking is either how people approach or what they think the best way is to weight these factors from a mathematical point of view to get the best out of your factors in a final rating. The approach I'm taking at the moment is something close to the following description but i'd like to know if others have some input and/or ideas: First I take the strike rate for each factor based on 1st, 2nd, 3rd or 4th. I then multiply each strike rate by 4, 3, 2 and 1. For example the top sky rater has the following positional strike rates: 1st - 22.0% 2nd - 19.2% 3rd - 16.4% 4th - 8.9% (0.220 * 4) + (0.192 * 3) + (0.164 * 2) + (0.089 * 1) = 1.87 I do this for all the factors, then divide them all by the highest score so they are on a 0 to 1 range (i.e. 0% to 100%) Next I look at the POT of each factor. Majority factors are negative in their own right so what I do is take the lowest POT and boost all other POT's by that amount. i.e. if the lowest is -31%, this would be re-calculated to 0% and a -25% POT would be re-calculated to 6% and so on. Once these are all re-calculated I do the same as the strike rate and adjust so they are on a 0% to 100% scale. Finally I like to look at what the profit divided by the highest dividend for that factor comes out to as it shows the consistency of that factor. However as majority of the single factor profits are negative I 'boost' them in the same way as the POT calc using the lowest factors profit. Once this is done and I divide the new profit figure by the highest dividend, I adjust to the 0% to 100% scale as well. once these 3 scales are completed for SR, POT and Profit/MaxDiv I then simply weight these scores for a total figure such as: (3 x POT outcome) + (2 x P/MaxDiv outcome) + SR outcome This gives a final score and you simply weight the factors based on this score. I hope the above isn't too hard to understand (let me know if you have questions). This is just the way I approach it at the moment when looking at how to adjust the factors to get a final rating that is more significant then rating all factors the same. My approach doesn't really have any mathematical significance etc besides me just approaching it on what I believe is more important. Hopefully someone can offer are more structured approach thats more mathematically sound? If not i'm happy to continue with this method but just wondered what ideas other people had. Cheers |
There are three major parts to each race. The venue, the runners and money. Most confuse them. Each should be treated individually first and then combined.
|
Care to elaborate baton? Don't have to go into specific's with regards to your own approach if you wish, just trying to wrap my head around your statement as not entirely sure what you mean with how those 3 major parts apply to factor weightings for ratings?
i.e. do you mean that you might weight factors differently for a race at Caulfield compared to a race at Ascot? or maybe weight differently for a Class 3 race compared to a Group 2 race? Would also be interested in any input mattio may have (if he's still around) given his personally created ratings. Appreciate the reply beton :) |
A 1600 race at Ascot is different to a 1600 race at Randwick. Each track and distance has it's own characteristics. The barrier may be the key factor, the leader at the turn may be the key factor. The jockey in this particular race may be the key factor. What happens on a good surface may not occur on a heavy surface. So you need to get the factors for that distance at that track at that surface for that field size.
Then you need to compare the runners including the little bloke on the back. Just because it is classed a a A Lister party does not mean that is a A Lister party if on the contestants of Australian Idol turn up. After you have assessed the class of race with the class of the field and rated them, then you must see how they will suit the race (the track, the distance, the barrier, the fieldsize at that track etc) Only then can you look at the money and that is 2 parts. Your estimate and what the market says. |
You could use either Multiple (Linear) Regression or (Binomial) Logistic Regression, depending on what sort of output you wanted.
You would use Multiple Regression if you wanted the combination of all your independent variables (Sky Rating, track odds, barrier number, career runs, etc) - multiplied by individual coefficients - to sum to some total (a dependent variable). For example, looking at your historical data, you could decide you want the dependent variable to be 100 minus 4 * the lengths behind in this race. If a horse won by 2 lengths you'd be determining what combination of coefficients multiplying your variables and summed together, equate to 108; (100 - 4*(-2)). If a horse lost by 15 lengths you'd be determining how your variables equate to 40. You could then look at upcoming races and find horses whose past results indicate a score greater than 100 - indicating that conditions are good for a win, or are a certain margin greater than those for the other horses in its race. You would use Binomial Logistic Regression if you wanted the combination of your independent variables - multiplied by individual coefficients, etc - to determine the probability of an either/or event occuring. In this case, the probability of a horse winning (or not winning). By applying either method to a large enough dataset you would determine the statistical significance of individual variables contributing to the final result. You may find some variables just don't matter. The problem with both methods is that they clearly would work best using independent variables (for example, Betting Odds would depend somewhat on barrier draw and jockey, much like Sky Ratings do - they are not independent). While you can still use the methods, they'll be more useful as a rule-of-thumb than a firm guide. Beton's approach of looking at specific field sizes on specific tracks over specific distances under specific track conditions from specific barriers would likely lead to more accurate models, but you may find yourself in turn limited by the size of your resulting dataset. There's a bunch more reasons why any model won't be accurate (perhaps you're missing important variables - like blood counts, or training performance; or outliers are skewing your data; or the relationship between the variables isn't linear at all, maybe it's logarithmic/polynomial/exponential; etc). Still, I certainly feel it's worthwhile applying some mathematical rigour to your processes. You can do it in Excel and there are a heap of how-to's available. |
Depending how big your database is, you could use the approach I used to determine the importance of variables.
I admit it's a rough and ready method but once you have the results it's easy to refine. This is the method simple but takes time, determine your handicapping factors/ideas, I came up with about 30. Order them in descending or ascending order according to whether you set min or max point for best. Run them through your database one by one and print out the final results win% place% for as many selections as you deem important. I used the top four only. The nuts and bolts are dependent on what programme you use I use C in .net. The outcome is quite interesting, some factors almost win outright but for the TAB takeout give or take a few %, some are mere punter's myth. It takes time but worth it, you could do worse. |
walkermac, firstly appreciate the reply. Along with beton's post its the kind of detail and differing view that I'm after.
At this point in time the size of the dataset I'm using wouldn't be sufficient to draw any significant conclusions with regards to how to weight specific factors based on distance, track, conditions etc - however thats not to say i'm discarding this train of thought as its something i'll utilize in the future, it's just not something I can implement now. It sounds like Binomial Logistic Regression is the path I need to investigate. As you say it would work best with truly independent variables but I think it would be useful as a guide or even for comparative sake to my current approach. The issue with horse racing is you can get into debate's about what is truly independent - i.e. is number of wins a good independent stat to utilize or is that then dependent on number of career starts which then means is % Strike rate a better factor, so on and so forth. I'll do some googling and mess around with the numbers tonight and see how I go. Having a more solid mathematical approach and background to the factor weightings rather than my current gut feel / POT & SR approach is certainly what I'm keen on. |
How are you going with this approach, evajb001?
I found what looks to be a useful tool online with all the bells and whistles, along with some explanation: http://www.jeffreysmorrison.net/default.aspx |
Hi walkermac,
Given the way I record the data for each race unfortunately this path wasn't one I could go down which meant I stuck with my original approach. I've taken a look at that website you posted, unbelievable how much effort the creator of that website would have needed to put in to code that up for anyone to publicly utilize. Once again with how I record my data it isn't of much use to me but nonetheless I had a look through and was impressed. One a side note I've continued following your NRL and AFL threads. Haven't had a chance to play around with the massey ratings you linked me to but its on my eventual to do list. The ratings approach I currently use has had a much improved fortnight compared to earlier in the season. |
Logistic regression. Now you're talking.
From the moment that I learned that all gambling is about applied maths, horse racing included, I have never looked back. The best tool of all, year in and year out, was the SP market, now superseded by the betfair market. |
Using some form of regressional analysis may be the way to determine the importance of various factors, but what a task without a substantial database. Surely this has already been done for the common factors and the results should be somewhere for us to find.
I just use the simplistic approach of finding the average strike rate of each of the top 5 ranks of each factor and compare the averages to decide what factors to use. With just 2 factors for the past 25 weeks I have managed a pot that fluctuates between 10 and 12%. Not fantastic but has allowed for an enjoyable time punting. To combine the factors I use a basic probability formula to obtain a rating. The formula is: Rating (prob of at least one factor getting a winner) = 1 - (1 - prob of factor 1)(1-prob of factor 2)(1-prob of factor 3).........etc and multiply the result by 100. Hope this provides something to think about for those without large a database. Gunny72 |
Interesting gunny, can you provide an example?
i.e. say you have 4 factors you want to use, can you provide an example. You can just call them factors 1,2,3,4 and use fake numbers for all I care would just like to see a worked example if thats cool? Appreciate the reply. Cheers |
Factor 1 25%=0.25
Factor 2 15%=0.15 Factor 3 12%=0.12 Factor 4 8%=0.08 etc Rating=1-(1-0.25)(1-0.15)(1-0.12)(1-0.08) ... etc =1-0.75*0.85*0.88*0.92 ... etc =1-0.51612 for these four example factors =0.48388 or 48% (to nearest whole number) This means there is a 48% chance of at least one of the factors producing a winner. I like to keep things simple. I only use two factors at present and my results agree with the theory. Of course getting a good price is another matter. gunny72 |
It's good to see your grey cells are still working Gunny. As a an early member of Ausrace and a contributor to PPM magazine you have earned my respect and probably contributed to me being a full time gambler now. Thanks to BF a comfortable living is relatively easy to obtain. You wrote one of the earliest articles that I ever read on exchange betting and laying horses. It is the way to go.
|
Thanks for the kind words Speedy. I actually do this to keep the grey matter going-I will be 70 yo next year. I've also had a couple of good weeks since I last reported my results and my POT for the last 23 weeks is now 16%! No doubt the run of outs will come. I bet on each BR Sat races except 2yos. I am glad you are doing well at the punt-better than I am.
|
I read an American study that showed that "handicappers" are less likely to suffer from dementia or old timers disease. If only I could remember where I read it.
|
Speedy,you may be confusing me with Statsman.
|
Not if your name is Roman.
|
Does/Has anyone used the sectional time and average speed data supplied as part of the form by one of the main aus horse racing websites. Their web address starts with P and rhymes with hunters?
I hadn't taken much notice of it before but saw it on the weekend and looks like it has the potential to be useful? particularly for speed mapping and/or determining if a horse has the ability to maintain or increase speed even after settling forward in the race. I understand many use sectional data and that its difficult to obtain and calculate yourself unless you pay for the data, however the figures shown on the 'hunters' website seems useful on the surface. So i'm interested if anyone uses it as part of their process and/or does anyone scrape this info into excel via web queries? |
Roman eh.
You write great articles so thank you. Cheers |
Quote:
I don't think you can use an Excel Web Query, given how the page is constructed. I discovered phantomJS recently (http://phantomjs.org), a headless browser which lets you get at the "guts" of a page that a server creates with javascript (and where the content remains otherwise hidden if you were to view the HTML source directly). You can call it from Excel using VBA: Code:
In your test.js file you'd have a script that simply opened the URL you wanted, then output it to a text file. Once it was there you could open that text file in VBA and parse it however you like. The average speed data comes out looking like: Code:
To get distance-specific sectional speeds is a bit more painful. You can either just cherry pick from each runner's last 10 races which are loaded by default; else you'll have to read up a bit more on PhantomJS and learn how to do some Page Automation (selecting the correct item from a dropdown, click a button and then saving the output for parsing). Mid and Late speeds I imagine would depend quite a bit on the particular race circumstance and track layout - to the point where I would question its usefulness (you can select a horse's races at the same distance at the same track, but most races would undoubtedly not have enough info to draw sound conclusions from). Maybe, if you built up enough data you could find blackbook runners by noting unusually fast late speeds given fast early speeds. I can see how the Early Speed indicator would be handy in predicting the pace of a race. And in recording the data over a number of years you could determine track/distance combos where Early Speed was particularly advantageous (likely ones with tight/early bends and/or short straights). I know you raised this on the Communal Ratings thread and called for interest, but I don't really know much about how to apply it. I've used settling positions to predict leaders and pace in the past, and it was accurate often enough to make the exercise feel worthwhile - but probably was not at all helpful, just as often. |
Thanks for the reply walkermac, that looks reasonably easy to put in place if I follow your instructions. I'd probably stick with the averages as not every horse would've ran at the venue or at the distance or a combination of both etc etc in the last 10 runs.
I just get the feeling this data could be pretty useful in helping to determine the better prospects of the race based on where you think they may settle. i.e. if a horse has a sub par late speed compared to other runners and you expect it to settle on the rail midfield or at the back of the field then its likely they won't win. Compare that to a horse with good late speed jumping from a decent barrier in a field with mediocre early speed and you've got a combination that could land you the winner. The examples above are the extremes and are more situational but even in a typical race I think the speed data could help combined with settling position predictions. In terms of the communal ratings proposal, in a previous thread CP showed that certain tracks at certain distances favor leaders much more compared to others. That way if you have a relatively accurate pre-race speed map and an understanding of the pace for the race you can bet on those leader types at particular venues/distances. People stated they were doing this in-running because at that stage your speed map is 100% correct, you can see who is leading. The issue arises by doing it pre race and getting an accurate speed map together. I feel like its possible and worked at it for a while but couldn't find the success I wanted to. Still a project I plan to try again in the future but need to take a fresh look at it I think. |
Just a quick question walkermac,
It's obvious to see now, soon Excel VBA's days are numbered in the area of web query, ( dynamic ) unless MS does what the phantoms and the pythons do, and I doubt it. So do you see it happening, where there may be many "band-aid" type scriptings to full on scripting to compensate the ever changing web page structures, where once Excel VBA was able to do with a simple Record Macro and now it's becoming more and more not able to ? I ask this cos the past few weeks on the other thread about the bet sender project, it became obvious this is the way we ( punters/non programmers but savvy types) may have to accept, an adaption to change or compliment VBA. |
Quote:
First, I don't want to paint myself an expert, as the recent thread you referred to was what spurred my learning and led me to these tools. I agree that Excel isn't likely to implement this as they have Web Querys. As close as it's probably going to get is a third party module like Selenium (http://florentbr.github.io/SeleniumBasic). According to their blurb, you can record macros using user actions in a browser (like you can record macros in Excel presently). Code:
It supports PhantomJS, but I'm not sure how you'd interract with a headless browser. ...maybe you record the macro using a regular browser and then just change the VBA code to use the PhantomJS instead of Firefox. That is, in the auto-generated code, just change: Code:
to Code:
Though you can't see the actual data in the HTML code (when you View Source in a regular browser) they still have a named placeholder for it. *shrug* I'm not sure... There's some documentation on Selenium here: https://code.google.com/p/selenium-vba/. |
Thanks WM
Quote:
Yep, learn as tools. Thanks |
Quote:
Hi evajb001 You need to adopt caution when using these types of figures as they have been produced using the average of a horses time through a particular section. If for example a horse has had 10 runs and they have gone hard early in say half of those then those races (making up 50% of the sample) then they are going to show it has run slow final sectionals. On the flip side if 5 of the runs that are used are based on slow early sectionals then it skews that part of the overall sample buy showing that it has fast closing sectionals. The problem and therefore the reality is that this will most likely lead you down the wrong path if relying on them. Fortunately or unfortunately depending on your view, good quality sectional data requires a significant investment. Personally we invest a substantial 5 figure sum purchasing sectional data from what is essentially a closed shop and 6 figures when it comes to obtaining all the data that we require. Obviously though we feel that the benefits that we get from doing so justify the expense. |
Quote:
..., or spell check ? You mean, sample by showing |
Quote:
"EXACTLY" Taking years to build my own the above is 100 % correct, While not quite in the same category as R2W i found out very quickly. PUT IN NOTHING. TAKE OUT NOTHING.. Whether it be buying data or setting up data, Somewhere along the line you will have to " PAY". Unfortunately this type of data is one of the most expensive. Good Luck with it all. Cheers. |
Quote:
"attempting to determine whether horses take up running positions according to their raw early speed ability or more in keeping with their historical running style." "the table below, displays a predicted settle position using only historical settle position data as the basis for the prediction, versus actual settle position. Those predicted to lead in the race did so on 29.3% of occasions and were in the first 3 in running 62.6% of the time." "the table below shows a predicted settle position using only raw early sectional time speed data (ETRPrR) as the basis for the prediction, versus actual settle position. Those predicted to lead in the race now did so on 25.1% of occasions and were in the first 3 in running 55.1% of the time. This was from the same sample of approx. 1,800 races." "What we can do is look at how the two measures interact with one another. It makes sense that a runner with top ranking on historical run-style and the top ranking on early speed ability should lead races at a higher rate than either factor considered independently. This in fact is the case, with this type of runner leading on 43% of occasions (up from 29% based on one measure only). Likewise, a runner with top ranking on historical run-style and a ranking on early speed ability outside of the top 6, ends up leading races at a rate of 13%, much lower than the single measure, 29% benchmark." |
Interesting post walkermac. When mechanically calculating the predicted settling positions for a race myself I use the historical settling position approach. My calcs are based on the settling and turn positions from the horses last 4 starts and then also take into account barrier and weight them each differently. The weightings to each are based on correlation to settling position of today's race.
I haven't got stats in front of me for accuracy of my settling predictions but from memory my predicted leader actually lead approx 30% of the time, and was in the top 3 65-70% of the time. Greater accuracy could probably be squeezed out of this as well if I re-worked it and added/tested some other features but it serves its main purpose for now. In terms of predicting a leader on a favorable leaders track with not much pace in the race, Tawteen was the perfect example on saturday at the valley for easy money. |
Where do you get your results from that include settle positions?
I've wanted to include this element in my rating for ages but have never really come up with a solution I was happy with. I derive a "preferred" position from scraping a form guide, but the race results I scrape don't include settle info. Without that, to my mind, I never really seriously pursued it further. |
Originally I did my comparison to the results shown on www.racingzone.com.au
Their website went down for a while so I could no longer record their post-race settling positions however I just had a look and seems their website is back up. Pretty easy website to scrape the results from as well. I have no idea regarding the accuracy of settling results etc etc for their website but there you go anyway. |
Quote:
|
Here is a question for some people to ponder.
I've been testing my new ratings and some staking ideas and was interested in the view of others. If you had the choice of the following what would you go with: Option 1: +79.10 units 48.00% Strike Rate 79.10% POT 11.07 P/MR (Profit / Maximum Return) Option 2: +68.98 units 48.00% Strike Rate 68.98% POT 22.59 P/MR Basically what I'm asking is would you forgo some profit for consistency of returns? The above options are using the same selections however one is staked with 1 unit per bet, the other is staked to use the same amount of units overall but adjusted for odds (i.e. more on lower odds, less on higher odds). Thoughts? |
Option 2 IMHO
|
I'd go for consistency so Option 2.
|
All times are GMT +10. The time now is 06:44 AM. |
Powered by: vBulletin Version 3.0.3
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.