5th December 2005, 12:11 PM
|
Member
|
|
Join Date: Jan 1970
Location: Mt Tamborine
Posts: 574
|
|
Quote:
Originally Posted by Zoe
Next, with relying on data that includes what, 30 to 40% max. known facts [like your example] and up to 60% or even more, unknown facts collected from thousands of races, you are really up the creek because the more the unknown facts collected the greater the chances of the conclusions being incorrect for future prediction.
Zoe
|
Hi Zoe,
I'm not 100% sure where the 40% known, 60% unknown figures come from but I assume you are referring to 60% of the factors which determine the final results being attributable to "noise", luck or random events (like blocked for a run for e.g.). BTW I'd be interested to know where you come up with that figure.
If this is the case I would assume (and I use no mathmatical basis for this, just common sense, experience and a garage sale book on elementary statistics :-) that a larger database would be a distinct advantage. The reason you get a half presentable curve on something so tenuous (in my opinion) as barrier advantage over 100,000 races is because these random events tend to cancel each other out. In a small database they don't get the chance to cancel out.
KV
|