Probability percentage from a rating - Page 2 - Horse Racing Forum - PaceAdvantage.Com

raybo · 05-23-2016, 05:48 PM

Quote:

Originally Posted by lansdale

Hi Raybo,

I'm a little confused from what you've said in this thread whether this is a list of variable weights or a list of horses whose output is projected according to such weights. I'm guessing the latter.

If so, what the range of this data seem to resemble to me, since you've mentioned that your method is in the black, is the $net of a given field based on a few simple factors, which might explain the clustering. Also, since you've mentioned 'top 3' ranking as a part of your method, possibly you're penalizing horses who fall out of this grouping- would be consistent with this result. Since your description of your method implies that this is what you have sought to maximize, it would seem to make sense.

If it's not possible that this is what you've done, you already know this. But if it is, I would suggest just moving the decimal point two figures to the left and testing this against a database (you mentioned you're a client of J. Platt), and see how it stands up against a reasonably large sample. BTW, since the mean of even this small sample is 79, which would mean a return of .79 vs. all horses, which is quite close to what I believe is the mean return of all horses by the betting public, this may be quite accurate.

Cheers,

lansdale

The example ratings are final ratings, after the weights have been applied. Many of the factors involved come directly from the raw data, jockey win and ITM percentages, trainer win and ITM percentages, horse win and ITM percentages, horse age and weight, horse power rating, horse pace and speed figures, etc.. There are a few other ratings that don't come directly from the raw data. There are 4 categories/sets of weightings, which is pretty standard to most weighted factor methods. But, there are some user preferences for several factors, and paceline selection preferences for the paceline related factors.

This method is not currently part of my Black Box, this is initially going to be a separate method, testable in batch processing mode, against any number of past races, which should help determine which factors are more important and which ones are not, also what the weightings for the final factor sets should be.

I'm looking ahead with this odds line thing, because we all know that, regardless of the accuracy of one's method, profit comes from win probability versus average price, otherwise known as "value". Sure, I could just produce the weightings method and let the user have at it, but the method will have more value to the user if there is a logical value metric included. So, I'm jumping the gun a bit via this thread, mostly because I know this portion is going to take the most time, and I want to get started now, rather than wait until the rest of the method is complete.

To answer your question, the horses, below the top 3, are not penalized at all in this method. All horses are going to receive equal treatment and live or die according to their data, and the factors and weightings each user decides to implement for each of the 4 race type categories of factor/preferences/weight settings.

Dave Schwartz · 05-23-2016, 06:14 PM

Raybo,

If you need to "un-flatten" the ratings, just raise them to a power.

This will exaggerate the differences between the horses.

raybo · 05-23-2016, 08:53 PM

Quote:

Originally Posted by Dave Schwartz

Raybo,

If you need to "un-flatten" the ratings, just raise them to a power.

This will exaggerate the differences between the horses.

Thanks Dave, I'll try that. But, I kind of like the idea of looking at expected win rates based on the rating ranks, like the top 3, top 4, etc., of course I'll have to take into account virtual ties also, which will add a bit more complexity, or a bit of a different mindset.

lansdale · 05-23-2016, 09:27 PM

Quote:

Originally Posted by raybo

The example ratings are final ratings, after the weights have been applied. Many of the factors involved come directly from the raw data, jockey win and ITM percentages, trainer win and ITM percentages, horse win and ITM percentages, horse age and weight, horse power rating, horse pace and speed figures, etc.. There are a few other ratings that don't come directly from the raw data. There are 4 categories/sets of weightings, which is pretty standard to most weighted factor methods. But, there are some user preferences for several factors, and paceline selection preferences for the paceline related factors.

This method is not currently part of my Black Box, this is initially going to be a separate method, testable in batch processing mode, against any number of past races, which should help determine which factors are more important and which ones are not, also what the weightings for the final factor sets should be.

I'm looking ahead with this odds line thing, because we all know that, regardless of the accuracy of one's method, profit comes from win probability versus average price, otherwise known as "value". Sure, I could just produce the weightings method and let the user have at it, but the method will have more value to the user if there is a logical value metric included. So, I'm jumping the gun a bit via this thread, mostly because I know this portion is going to take the most time, and I want to get started now, rather than wait until the rest of the method is complete.

To answer your question, the horses, below the top 3, are not penalized at all in this method. All horses are going to receive equal treatment and live or die according to their data, and the factors and weightings each user decides to implement for each of the 4 race type categories of factor/preferences/weight settings.

Clearly you're not reverse-engineering the 'black box' model you've described here in the way I thought, so my suggestion wouldn't work. And after reading this post, I have to admit I am more confused than before about exactly what you are doing. If this is a product you're developing for clients involving 'user preferences' that would only seem to be muddying the waters. But whatever course you decide to take, hope it works out.

Cheers,

lansdale

davew · 05-23-2016, 10:40 PM

Quote:

Originally Posted by raybo

Obviously, at least to me, if the ratings are fairly accurate, then the top 3 horses should win about 60% of the time, over time. Your line has them at 58%, that's very close. Based on that, how did you differentiate between those 3 horses to get their portion of the 58% total probability? It appears that the relationships are not linear, but I haven't a clue as to how to come up with that non-linear scale/slope.

This set of numbers seems promising, regarding the separation/relationship between the 3 ratings. I found the average and then subtracted the 3 ratings from that average.

#2 _ -2.483333333
#3 _ 0.776666667
#9 _ 1.706666667

Based on those numbers this is what the probabilities percentage would be:

16.85
20.11
21.04

I wish I could say I had a formula you could put into a spreadsheet, but I do not. I did it by hand estimation. There is a chapter in an old classic -> The Odds on your Side by Mark Cramer that tells how to fine tune your own probability line (not to be confused with a morning line which is usually 115-120% to account for track take and estimate of closing odds) to bring it too 100%.

I have a problem with the numbers below the average - what do you do with the 10 in your example, almost a -3 standard deviation.

I have been thinking about this off and on for awhile, as I would like to be able to put the Bris Prime Power ratings into a probability line.

raybo · 05-23-2016, 11:44 PM

Quote:

Originally Posted by davew

I wish I could say I had a formula you could put into a spreadsheet, but I do not. I did it by hand estimation. There is a chapter in an old classic -> The Odds on your Side by Mark Cramer that tells how to fine tune your own probability line (not to be confused with a morning line which is usually 115-120% to account for track take and estimate of closing odds) to bring it too 100%.

I have a problem with the numbers below the average - what do you do with the 10 in your example, almost a -3 standard deviation.

I have been thinking about this off and on for awhile, as I would like to be able to put the Bris Prime Power ratings into a probability line.

Well, we have, or can obtain, the long term probabilities (hit rates) for individual rankings (by field size probably), and also probabilities (hit rates) for groups of congruent rankings, like top ranked, top 2, top 3, top 4, etc., so I believe there must be a way to use those long term group probabilities to get a better handle on what type of separation, between assigned odds, there should be, in different categories of race types and fields.

Don't have any proof of the above, but I think it's worth exploring anyway.

If it doesn't pan out, there is always the ability to test different sets of factors and their scalings, as well as the odds produced by the probabilities. So, what I envision is not just a method that the user can use to select contenders, but more importantly, the ability to test and research the whole ball of wax, without having to pay large amounts of money for a traditional programming language app or traditional database app. It'll take longer in Excel, but it'll be so much simpler for the user, because of the extremely flat learning curve of an automated Excel application.

classhandicapper · 05-24-2016, 09:23 AM

I thought this was a very good article when I read it.

https://betting.betfair.com/horse-ra...22-040810.html

cj · 05-24-2016, 09:47 AM

Quote:

Originally Posted by raybo

Ok you stats gurus, if I have a rating derived from multiple factors, each weighted according to significance, how do I get from that final rating to a projected win probability/percentage?

Let's start with the following final ratings for a race:

#1 -- 72.65
#2 -- 101.06
#3 -- 104.32
#4 -- 87.72
#5 -- 89.90
#6 -- 10.20
#7 -- 75.89
#8 -- 79.96
#9 - 105.25
#10 - 77.95
#11 - 73.84
#12 - 69.82

How do I get from those ratings to a calculated/projected win probability/percentage? I have always thought that you just divide each rating by the sum of all the ratings, but can't seem to find anything related to this type of calculation on the web, and I haven't tried to create a line in a long time, so I'm a bit rusty.

Here is a simple method I posted many years ago that I still use often. It helps to have a cutoff, i.e. a spread that anything below is not considered a contender for the race. That is something only you would know what is best and pretty easy to test if you store your data as I imagine you do. For an example, I'll use 16 for your set. (Of course all this can be done with a program or spreadsheet, but I'm showing it manually here to show the details)

The top rating is 105.25, so to be included as a contender you need to be within 16 or 89.25.

That leaves four horses:

#2 -- 101.06
#3 -- 104.32
#5 -- 89.90
#9 -- 105.25

The minimum points I give a contender are 2. That number could vary for you and others just like the cutoff number. So I take the Min Contender rating, 89.90, deduct 2 = 87.90, and subtract that number from each rating.

#2 -- 13.16
#3 -- 16.42
#5 -- 2.00
#9 -- 17.35

I then add these up for a total of 48.93. Next, divide each rating into the total for a percentage:

#2 -- .27
#3 -- .34
#5 -- .04
#9 -- .35

Next, I account for the non-contenders. Some people like to assign a blanket number like 20%. I do it a little different but that is individual preference. I take the number of contenders, four in this case, and divide it by the field size, 33.333% in this case. I then average it with 100%. I find if I pick only four contenders in a 12 horse field I'm crazy if I think I'll have the winner 80% of the time. So in this case I use 66.6666% as my contender percentage. I then adjust the win percentages above by this estimate

#2 -- .27 / .66666 = .18
#3 -- .34 / .66666 = .23
#5 -- .04 / .66666 = .03
#9 -- .35 / .66666 = .23

I then convert to an odds line:

#2 -- .18 = 4.55
#3 -- .23 = 3.34
#5 -- .03 = 32.33
#9 -- .23 = 3.34

That is my fair odds line. I rounded here since I did it manually, will come out a little different if you program this, but that is the gist of it.

Oh, last thing, any horse with a "fair odds line" > than natural odds is tossed by me from the contender list, but again that is just from my experiences and numbers. Your mileage may vary.

kingfin66 · 05-24-2016, 10:16 AM

I remember when you posted that method years ago and even printed it out. It is really well explained and very nice of you to share it again.

Dave Schwartz · 05-24-2016, 11:09 AM

CJ, that is just excellent.

Thank you!

GameTheory · 05-24-2016, 12:39 PM

Yeah, you can't just normalize any set of ratings directly and expect them to mean anything (beyond their rankings, which remain the same). You need something like CJ's ad-hoc method, tailor-made for horse racing, or for a more generalized method of turning a single rating on an unknown scale to a probability, use logistic regression. (Which should be easy enough in Excel -- aren't you a spreadsheet guru, Raybo?)

We had a long discussion about this once, probably in the thread that CJ originally posted his method. (I posted something similar which involved rescaling the range to something that would normalize better, and then normalizing.)

formula_2002 · 05-24-2016, 12:45 PM

Quote:

Originally Posted by raybo

Ok you stats gurus, if I have a rating derived from multiple factors, each weighted according to significance, how do I get from that final rating to a projected win probability/percentage?

Let's start with the following final ratings for a race:

#1 -- 72.65
#2 -- 101.06
#3 -- 104.32
#4 -- 87.72
#5 -- 89.90
#6 -- 10.20
#7 -- 75.89
#8 -- 79.96
#9 - 105.25
#10 - 77.95
#11 - 73.84
#12 - 69.82

How do I get from those ratings to a calculated/projected win probability/percentage? I have always thought that you just divide each rating by the sum of all the ratings, but can't seem to find anything related to this type of calculation on the web, and I haven't tried to create a line in a long time, so I'm a bit rusty.

Piece of cake.
You need to establish the dollar odds for each incremental rating.
using excel, make one column your rating, another the dollar odds it ran at, and another if it won. Simply use a "1" for a win.
(make it the true dollar odds by adjusting for the total booking percentage.)
you will also need a column for accumulated returns

now sort on your rating column. if you find profits within a specific range, test with new data.
At this time, don't be concerned by race type, surface, distance etc.. the public with take care of that for you.

I do this kind of stuff all the time, that's why I don't bet!!!

If your ratings are in a dbase format with a key field for track, race number and date, you can down load the results file from Bris and have fun with "what if".

Each data file race card was 25 cents some years ago.

ps, you may want to also normalize the results for each race and perhaps add a column for that. I would

classhandicapper · 05-24-2016, 12:57 PM

Quote:

Originally Posted by davew

I wish I could say I had a formula you could put into a spreadsheet, but I do not. I did it by hand estimation.

In the link I posted, the author gives you the information you need to create a spreadsheet that I was able to duplicate and expand on. If you read the article and then the comments below it, you should be able to create a useful model. The key to it is selecting the correct power rating. That more or less defines how much a point on your scale is worth.

What I did was tinker with the power rating until it was producing odds lines that mimicked my thinking as a handicapper. Ultimately, I didn't use the spreadsheet much, but not because it wasn't a useful tool. I just found that I rarely go to the window unless I am fairly sure I actually have value. At that point, I really don't need an odds line. It kind of screams at you because the horse will generally be misranked by the public (in other words, the horse I am making the most likely winner, is the 3rd choice in the betting, the horse I am making 2nd most likely is the 5th choice etc..) So putting the rating into the spreadsheet was costing me time for very little gain.

GameTheory · 05-24-2016, 12:58 PM

Here's what I posted 12 years (!) ago for coming up with semi-reasonable probability numbers for a single race using any rating. (Assuming you don't know anything about the rating or haven't done a study of it anyway.)

See my post#7:

http://www.paceadvantage.com/forum/s...ad.php?t=11081

Then there is the convoluted "tennis tournament" method from the old book "Beating the Races with a Computer" (1980). (Which is mainly a curiosity of how you might do such a study with the computer power of the late 70s) See this thread and my post#17:

http://www.paceadvantage.com/forum/s...ad.php?t=60199

I'm not reposting it all here since the context of those other threads helps. (The bottom of the second thread has links to even more threads!)

formula_2002 · 05-24-2016, 01:01 PM

Quote:

Originally Posted by formula_2002

Piece of cake.
You need to establish the dollar odds for each incremental rating.
using excel, make one column your rating, another the dollar odds it ran at, and another if it won. Simply use a "1" for a win.
(make it the true dollar odds by adjusting for the total booking percentage.)
you will also need a column for accumulated returns

now sort on your rating column. if you find profits within a specific range, test with new data.
At this time, don't be concerned by race type, surface, distance etc.. the public with take care of that for you.

I do this kind of stuff all the time, that's why I don't bet!!!

If your ratings are in a dbase format with a key field for track, race number and date, you can down load the results file from Bris and have fun with "what if".

Each data file race card was 25 cents some years ago.

ps, you may want to also normalize the results for each race and perhaps add a column for that. I would

one more thing, use another column for the odds at say 2 to 4 minutes to post, that's what you really need to determine a bet