View Single Post
  #43  
Old 5th December 2005, 12:11 PM
KennyVictor KennyVictor is offline
Member
 
Join Date: Jan 1970
Location: Mt Tamborine
Posts: 574
Default

Quote:
Originally Posted by Zoe

Next, with relying on data that includes what, 30 to 40% max. known facts [like your example] and up to 60% or even more, unknown facts collected from thousands of races, you are really up the creek because the more the unknown facts collected the greater the chances of the conclusions being incorrect for future prediction.
Zoe

Hi Zoe,
I'm not 100% sure where the 40% known, 60% unknown figures come from but I assume you are referring to 60% of the factors which determine the final results being attributable to "noise", luck or random events (like blocked for a run for e.g.). BTW I'd be interested to know where you come up with that figure.
If this is the case I would assume (and I use no mathmatical basis for this, just common sense, experience and a garage sale book on elementary statistics :-) that a larger database would be a distinct advantage. The reason you get a half presentable curve on something so tenuous (in my opinion) as barrier advantage over 100,000 races is because these random events tend to cancel each other out. In a small database they don't get the chance to cancel out.

KV
Reply With Quote