assigning probabilities to the outcomes of multi-entry competitions

Post on 22-Jan-2017

220 Views

Category:

Documents

5 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Assigning Probabilities to the Outcomes of Multi-Entry CompetitionsAuthor(s): David A. HarvilleSource: Journal of the American Statistical Association, Vol. 68, No. 342 (Jun., 1973), pp. 312-316Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2284068 .

Accessed: 16/06/2014 19:36

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact support@jstor.org.

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 91.229.248.111 on Mon, 16 Jun 2014 19:36:08 PMAll use subject to JSTOR Terms and Conditions

Assigning Probabilities to the Outcomes of Multi-Entry Competitions

DAVID A. HARVILLE*

The problem discussed is one of assessing the probabilities of the vari- ous possible orders of finish of a horse race or, more generally, of as- signing probabilities to the various possible outcomes of any multi- entry competition. An assumption is introduced that makes it possible to obtain the probability associated with any complete outcome in terms of only the 'win, probabilities. The results were applied to data from 335 thoroughbred horse races, where the win probabilities were taken to be those determined by the public through pari-mutuel betting.

1. INTRODUCTION

A horse player wishes to make a bet on a giveIn horse race at a track having pari-mutuel betting. He has determined each horse's 'probability' of winning. He can bet any one of the entires to win, place (first or second), o0 show (first, second, or third). His payoff on a successful place or show bet depends on which of the other horses also place or show. Our horse player wishes to make a single bet that maximizes his expected return. He finds that not only does he nieed to know each horse's prob- ability of winning, but that, for every pair of horses, he must also know the probability that both will place, and, for every three, he must know the probability that all three will show. Our better is unhappy. He feels that he has done a good job of determining the horses' prob- abilities of winning; however he must now assign prob- abilities to a much larger number of events. Mloreover, he finds that the place and show probabilities are more difficult to assess. Our better looks for an escape from his dilemma. He feels that the probability of two given horses both placing or of three given horses all showing should be related to their probabilities of winning. He asks his friend, the statistician, to produce a formula giving the place and show probabilities in termns of the win probabilities.

The problem posed by the better is typical of a class of problems that share the following characteristics:

1. The members of some group are to be ranked in order from first possibly to last, according to the outcome of some ran- dom phenomena, or the ranking of the members has already been effected, but is unobservable.

2. The 'probability' of each member's ranking first is known or can be assessed.

3. From these probabilities alone, we wish to determine the probability that a more complete ranking of the members

* David A. Harville is research mathiematical statistician, Aerospace Research Laboratories, Wright-Patterson Air Force Base, Ohio 45433. The author wishes to thank the Theory and Methods editor, an associate editor and a referee for their useful suggestions.

will equal a given ranking or the probability that it will fall in a given collection of such rankings.

Dead heats or ties will be assumed to have zero probability. For situations where this assumption is unrealistic, the probabilities of the various possible ties must be assessed separately.

We assign no particular interpretation to the 'prob- ability' of a given ranking or collection of rankings. We assume only that the probabilities of these events satisfy the usual axioms. Their interpretation will differ with the setting.

Ordinarily, knowledge of the probabilities associated with the various rankings will be of most interest in situations like the horse player's where only the ranking itself, and not the closeness of the ranking, is important. The horse player's return on any bet is completely deter- mined by the horses' order of finish. The closeness of the result may affect his nerves but not his pocketbook.

2. RESULTS We will identify the n horses in the race or members in

the group by the labels 1, 2, a.., n. Denote by pkEil, i2, } i*] the probability that horses or members i, i2, ik finish or rank first, second, * kth, respec- tively, where k < n. For convenience, we use p[i] inter- changeably with plEi] to represent the probability that horse or member i finishes or ranks first. We wish to obtain pkEil, i2, ' * * ji] in terms of p[1], p[2], - - , pEn], for all i,i2, . 'ik and for k = 2 3 , n. In a sense, our task is one of expressing the probabilities of ele- mentary events in terms of the probabilities of more complex events.

Obviously, we must make additionial assumptions to obtain the desired formula. Our choice is to assume that, for all i1, i2, ' ' ' ia, and for k -2) 3, . . *, n, the conditional probability that member ik ranks ahead of members ik+1 ik-+21 - iR given that members i -, , ' rank first, second, (k - )th, respectively, equals the conditional probability 'that i4 ranks ahead of i+, ik+2, - *, i4 given that ib i2, ' '* ik-I do not rank first. That is,

Pk[ii,_2_y 4 _ ik] _ __ _

ik (2.1) P_k1[i1_ X2} *,ik-1_ qk-l[i___ _. . .

iX k-1]]

? Journal of the American Statistical Association June 1973, Volume 68, Number 342

ADplications Section

312

This content downloaded from 91.229.248.111 on Mon, 16 Jun 2014 19:36:08 PMAll use subject to JSTOR Terms and Conditions

Multi-Entry Competitions 313

where

qk[il) i2, ..

' *,ik] -1-P[ill - P[i2l -***p[ik],

so that, for the sought-after formula, we obtain

Pk_il .. p[i]p[i2]ik p[ik].

ql[il]q2[il, i2] . . . qk-l[il, i2, . . .* ik-1(

In the particular case k = 2, the assumption (2.1) is equivalent to assuming that the event that member i2 ranks ahead of all other members, save possibly i1, is stochastically independenit of the event that mnember ii ranks first.

The intuitive meaiiing and the reasonableness of the assumption (2.1) will depend on the setting. In particular, our horse player would probably not consider the assump- tion appropriate for every race he encounters. For example, in harness racing, if a horse breaks stride, the driver must take him to the outside portion of the track and keep him there until the horse regains the proper gait. 1\luch ground can be lost in this maneuver. In evaluating a harness race in which there is a horse that is an 'almost certain' winner unless he breaks, the bettor would not want to base his calculations on assumption (2.1). For such a horse, there may be no such thing as an intermediate finish. He wins when he doesn't break, but finishes 'way back' when he does.

In many, though not all, cases, there is a variate (other than rank) associated with each member of the group such that the ranking is strictly determined by ordering their values. For example, associated with each horse is its running time for the race. Denote by Xi the variate corresponding to member i, i = 1, 2, ... , n. Clearly, the assumption (2.1) can be phrased in terms of the joint probability distribution of X1, X2, . . ., Xn. It seems natural to ask whether there exist other conditions on the distribution of the Xi's which imply (2.1) or which follow from it, and which thus would aid our intuition in grasp- ing the implications of that assumption. The answer in general seems to be no. In particular, it can easily be demonstrated by constructing a counterexample that stochastic independence of the Xi's does not in itself imply (2.1). Nor is the converse necessarily true. In fact, in many situations where assumption (2.1) might seem appropriate, it is known that the Xi's are not indepen- dent. For example, we would expect the running times of the horses to be correlated in most any horse race. An even better example is the ordering of n baseball teams according to their winning percentages over a season of play. These percentages are obviously not independent, yet assumption (2.1) might still seem reasonable.

The probability that the ranking belongs to any given collection of rankings can be readily obtained in terms of p[lr], p[2], * , p[n] by using (2.2) to express the probability of each ranking in the collection in terms of the p[i]'s, and by then adding. For example, the horse player can compute the probability that both entry i and

entry j place from

~ {, - p[i]p[j] p[j]p[i] P2[i, J1] + P2[j, il] + - * -

1p[il 1 - PM

A probability of particular interest in many situations is the probability that entry or member r finishes or ranks kth or better, for which we write

pk[r] =i: pk[il, i2, ik], (2.3)

where the summation is over all rankings ii, i2, --, ik

for which i,u = r for some u. If assumption (2.1) holds, then p*[r] > p*[s] if and only if p[r] > p[s]. This statement can be proved easily by comparing the terms of the right side of (2.3) with the terms of the correspond- ing expression for p*[s]. Each term of (2.3), whose indices are such that i. = r and i, = s for some u, v, appears also in the second expression. Thus, it suffices to show that any term pk[il, i2, , k], for which ij # s, j = 1, 2, k, , but iu = r for some u, is made smaller by putting iu = s if and only if p[r] > p[s]. That the latter assertion is true follows immediately from (2.2).

3. APPLICATION In pari-mutuel betting, the payoffs on win bets are

determined by subtracting from the win pool (the total amount bet to win by all bettors on all horses) the combined state and track take (a fixed percentage of the pool-generally about 16 percent, but varying from state to state), and by then distributing the remainder among the successful bettors in proportion to the amounts of their bets. (Actually, the payoffs are slightly smaller because of 'breakage,' a gimmick whereby the return on each dollar is reduced to a point where it can be expressed in terms of dimes.) In this section, we take the 'win probability' on each of the n horses to be in inverse pro- portion to what a successful win bet would pay per dollar, so that every win bet has the same 'expected return.' Note that these 'probabilities' are established by the bettors themselves and, in some sense, represent a consensus opinion as to each horse's chances of winning the race. We shall suppose that, in any sequence of races in which the number of entries and the consensus prob- abilities are the same from race to race, the horses going off at a given consensus probability win with a long-run frequency equal to that probability. The basis for this supposition is that, once the betting on a race has begun, the amounts bet to win on the horses are flashed on the 'tote' board for all to see and this information is updated periodically, so that, if at some point during the course of the betting the current consensus probabilities do not coincide with the bettors' experience as to the long-run win frequencies for 'similar' races, these discrepancies will be noticed and certain of the bettors will place win bets that have the effect of reducing or eliminating them.

By adopting assumption (2.1) and applying the results of the previous section, we can compute the long-run fre- quencies with which any given order of finish is encount- ered over any sequence of races having the same number

This content downloaded from 91.229.248.111 on Mon, 16 Jun 2014 19:36:08 PMAll use subject to JSTOR Terms and Conditions

314 Journal of the American Statistical Association, June 1973

1. APPLICATION OF THEORETICAL RESULTS TO THIRD RACE OF SEPTEMBER 6, 1971,

AT RIVER DOWNS RACE TRACK

Amounts bet to win, Expected payoff place, and show as Theoretical per dollar

percentages of totals probability Place Show

Name Win Place Show Win Place Show bet bet

Moonlander 27.6 20.0 22.3 .275 .504 .688 1.11 1.01

E'Thon 16.5 14.2 11.1 .165 .332 .499 .94 1.o6

Golden Secret 3.5 4.7 6.3 .035 .076 .126 .58 .42

Antidote 17.3 18.8 20.0 .175 .350 .521 .80 .80

Bewiambo 4.0 6.2 7.8 .040 .087 .144 .51 .41

Cedar Wing 11.9 10 .4 10.4 .118 .245 .382 .90 .86

Little Flitter 8.5 11.2 9.9 .085 .180 .288 .62 .68

Hot and Humid 10.7 14.4 12.2 .107 .224 .353 .62 .72

of entries and the same consensus win probabilities. In particular, we can compute the 'probability' that any three given horses in a race finish first, second, and third, respectively. As we shall now see, these probabilities are of something more than academic interest, since they are the ones needed to compute the 'expected payoff' for each place bet (a bet that a particular horse will finish either first or second) and each show bet (a bet that the horse will finish no worse than third).

Like the amounts bet to win, the amounts bet on each horse to place and to show are made available on the 'tote' board as the betting proceeds. The payoff per dollar on a successful place (show) bet consists of the original dollar plus an amount determined by subtracting from the final place (show) pool the combined state and track take and the total amounts bet to place (show) on the first two (three) finishers, and by then dividing a half (third) of the remainder by the total amount bet to place (show) on the horse in question. (Here again, the actual payoffs are reduced by breakage.) By using the prob- abilities computed on the basis of assumption (2.1) and the assumption that consensus win probabilities equal appropriate long-run frequencies, we can compute the expected payoff per dollar for a given place or show bet on any particular race, where the expectation is taken over a sequence of races exhibiting the same number of entries and the same pattern of win, place, and show betting. If, as the termination of betting on a given race approaches, any of the place or show bets are found to have poten'tial expected payoffs greater than one, there is a possibility that a bettor, by making such place and show bets, can 'beat the races'. Of course, if either assumption (2.1) or the assumption that the consensus win probabilities equal long-run win frequencies for races with similar betting patterns is inappropriate, then this system will not work. It will also fail if there tend to be large last-minute adverse changes in the betting pattern, either because of the system player's own bets or because of the bets of others. However, at a track with consider- able betting volume, it is not likely that such changes would be so frequent as to constitute a maj or stumbling

In Table 1, we exemplify our results by applying them to a particular race, the third race of the September 6, 1971, program at River Downs Race Track. The final win, place, and show pools were $45,071, $16,037, and $9,740, respectively. The percentage of each betting pool bet on each horse can be obtained from the table. The table also gives, for each horse, the consensus win prob- ability, the overall probabilities of placing and showing, and the expected payoffs per dollar of place and show bets. The race was won by E'Thon who, on a per-dollar basis, paid $5.00, $3.00, and $2.50 to win, place, and show, respectively; Cedar Wing was second, paying $3.80 and $2.70 per dollar to place and show; and Beviambo finished third, returning $3.20 for each dollar bet to show.

In order to check assumption (2.1) and the assumption that the consensus win probabilities coincide with the long-run win frequencies over any sequence of races having the same number of entries and a similar betting pattern, data was gathered on 335 thoroughbred races from several Ohio and Kentucky race tracks. Data from races with finishes that involved dead heats for one or more of the first three positions were not used. Also, in the pari-mutuel system, two or more horses are sometimes lumped together and treated as a single entity for betting purposes. Probabilities and expectations for the remain- ing horses were computed as though these 'field' entries consisted of single horses and were included in the data, though these figures are only approximations to the 'true' figures. However, the field entires themselves were not included in the tabulations.

As one check on the correspondence between consensus win probabilities and the long-run win frequencies over races with similar patterns of win betting, the horses were divided into eleven classes according to their consensus win probabilities. Table 2 gives, for each class, the associated interval of consensus win probabilities, the average consensus win probability, the actual frequency

2. FREQUENCY OF WINNING-ACTUAL VS. THEORETICAL

Theoretical Number Average Actual Estimated probability of theoretical frequency standard of winning horses probability of winning error

.00 - .05 9)46 .028 .020 .005

.05 - .10 763 .074 .o64 .009

.10 - .15 463 .124 .127 .016

.15 - .20 313 .175 .169 .021

.20 - .25 192 .225 .240 .031

.25 - .30 11)4 .272 .289 .0)42

.30 - .35 71 .32)4 .394 .058

.35 - .40 49 .373 .306 .o66

.0 ho- .)4s 25 . 423 .6)40 .09g6

.4. - .50 12 .46)4 .583 .1)42 . 50 -+ 10 . 55)4 . 700 .1)4 5

This content downloaded from 91.229.248.111 on Mon, 16 Jun 2014 19:36:08 PMAll use subject to JSTOR Terms and Conditions

Multi-Entry Competitions 315

3. FREQUENCY OF FINISHING SECOND- ACTUAL VS. THEORETICAL

Theoretical Actual probability Number Average frequency Estimated of finishing of theoretical of finishing standard

second horses probability second error

.00 - .05 776 .030 .046 .oo8

.05 - .10 750 .074 .095 .011

.10 - .15 548 .124 .128 .014

.15 - .20 426 .175 .155 .018

.20 - .25 283 .223 .170 .022

.25 - .30 164 .269 .226 .033

.30 + 11 .311 .364 .145

of winners, and an estimate of the standard error as- sociated with the aetual frequeney. The actual frequencies seem to agree remarkably well with the theoretical probabilities, though there seems to be a slight tendency on the part of the betters to overrate the chances of long- shots and to underestimate the chances of the favorites and near-favorites. Similar results, based on an extensive amount of data from an earlier time period and from different tracks, were obtained by Fabricand [1].

Several checks were also run on the appropriateness of assumption (2.1). These consisted of first partitioning the horses according to some eriterion involving the theoretical probabilities of second and third place finishes and then comparing the actual frequency with the average theoretical long-run frequency for each class. Tables 3-6 give the results when the criterion is the probability of finishing second, finishing third, placing, or showing, respectively. In general, the observed fre- quencies of second and third place finishes are in reason- able accord with the theoretical long-run frequencies, though there seems to be something of a tendency to overestimate the chances of a second or third place finish for horses with high theoretical probabilities of such finishes and to underestimate the chances of those with low theoretical probabilities, with the tendency being more pronounced for third place finishes than for second place finishes. A logical explanation for the

4. FREQUENCY OF FINISHING THIRD- ACTUAL VS. THEORETICAL

Theoretical Actual probability Number Average frequency Estimated of finishing of theoretical of finishing standard

third horses probability third error

.00 - .05 587 .032 .o49 .009

.05 - .10 713 .074 .105 .011

.10 - .15 691 .124 .126 .013

.15 - .20 838 .175 .147 .012

.20 - .25 115 .212 .130 . 031

.25 + 14 .273 .214 .110o

5. FREQUENCY OF PLACING-ACTUAL VS. THEORETICAL

Theoretical Ntumber Average Actual Estimated probability of theoretical frequency standard of placing horses probability of placing error

.00 - .05 330 .034 .036 .010

.05 - .10 526 .074 .091 .013

.10 - .15 4o4 .125 .121 .016

.15 - .20 358 .174 .179 .020

.20 - .25 268 .224 .257 .027

.25 - .30 240 .274 .271 .029

.30 - .35 193 .326 .306 .033

.35 - .40 175 .375 .354 .036

.40 - .45 117 .425 .359 .044

.45 - .50 109 .472 .440 .048

.50 - .55 73 .525 .425 .058

.55 - .60 51 .578 .667 .o66

.60 - .65 48 .623 .625 .070

.65 - .70 29 .673 .621 .090

.70 - .75 22 .724 .909 .095

.75 + 15 .808 .867 .o88

conformity of the actual place results to those predicted by the theory which is evident in Table 5 is that those horses with high (low) theoretical probabilities of finish- ing second generally also have high (low) theoretical

6. FREQUENCY OF SHOWING-ACTUAL VS. THEORETICAL

Theoretical Number Average Actual Estimated probability of theoretical frequency standard of showing horses probability of showing error

.00 - .05 111 .038 .o45 .020

.05 - .10 316 .075 .092 .016

.10 - .15 328 .124 .180 .021

.15 - .20 266 .174 .222 .025

.20 - .25 253 .227 .257 .027

.25 - .30 243 .274 .284 .029

.30 - .35 201 .326 .303 .032

.35 - .40 196 .374 .439 .035

.40 - .45 169 .425 .426 .038

.45 - .50 150 .477 .460 .041

.50 - .55 158 .525 .468 .o4o

.55 - .60 137 .574 .474 .043

.60 - .65 97 .625 .577 .050

.65 - .70 100 .672 .500 .050

.70 - .75 67 .722 .627 .059

.75 - .80 67 .777 .731 .054

.80 - .85 49 .823 .816 .055

.85 - .90 30 .874 .867 .062 .90 + 20 .930 1.000o .0o56

This content downloaded from 91.229.248.111 on Mon, 16 Jun 2014 19:36:08 PMAll use subject to JSTOR Terms and Conditions

316 Journal of the American Statistical Association, June 1973

7. PAYOFFS ON PLACE AND SHOW BETS- ACTUAL VS. THEORETICAL

Number of Average Average different expected actual Estimated

Expected payoff place and payoff payoff standard per dollar show bets per dollar per dollar error

.00 - .25 80 .216 .088 .062

.25 - .35 214 .303 .286 .o68

.35 - .45 386 .4o4 .609 .091

.45 - .55 628 .504 .570 .071

.55 - .65 904 .601 .730 .072

.65 - .75 980 .700 .660 .o47

.75 - .85 958 .800 .947 .o66

.85 - .95 819 .898 .938 .050

.95 - 1.05 546 .995 .983 .090

1.05 - 1.15 286 1.090 .989 ,o60

1.15 - 1.25 90 1.186 .974 .108

1.25 + 25 1.320 1.300 .258

probabilities of finishing first, so that the effects of the overestimation (underestimation) of their chances of finishing second are cancelled out by the underestimation (overestimation) of their chances of finishing first. While a similar phenomenon is operative in the show results, the cancellation is less complete and there seems to be a slight tendency to overestimate the show chances of those horses with high theoretical probabilities and to under- estimate the chances of those with low theoretical probabilities.

Finally, the possible place and show bets were divided into classes according to the theoretical expected payoffs of the bets as determined from the final betting figures. The average actual payoff per dollar for each class can then be compared with the corresponding average ex- pected payoff per dollar. The necessary figures are given in Table 7. The results seem to indicate that those place and show bets with high theoretical expected payoffs per dollar actually have expectations that are somewhat lower, giving further evidence that our assumptions are not entirely realistic, at least not for some races.

The existence of widely different expected payoffs for the various possible place and show bets implies that either the bettors 'do not feel that assumption (2.1) is entirely appropriate' or they 'believe in assumption (2.1)' but are unable to perceive its implications. Our results indicate that to some small extent the bettors are suc- cessful in recognizing situations where assumption (2.1) may not hold and in acting accordingly, but that big differences in the expected place and show payoffs result primarily from 'incorrect assessments' as to when assump- tion (2.1) is not appropriate or from 'ignorance as to the assumption's implications.'

A further implication of the results presented in Table 7 is that a bettor could not expect to do much better than break even by simply making place and show bets with expected payoffs greater than one.

[Received January 1972. Revised September 1972.]

REFERENCE [1] Fabricand, Burton P., Horse Sense, New York: David McKay

Company, Inc., 1965.

This content downloaded from 91.229.248.111 on Mon, 16 Jun 2014 19:36:08 PMAll use subject to JSTOR Terms and Conditions

top related