ssrn-id2358747

51
Electronic copy available at: http://ssrn.com/abstract=2358747 The Hot Hand Fallacy: Cognitive Mistakes or Equilibrium Adjustments? Evidence from Baseball * Brett Green Jeffrey Zwiebel Current Draft: November 2013 Abstract We test for a “hot hand” (i.e., short-term streakiness in performance) in Major League Baseball using panel data. We find strong evidence for its existence in all ten statisti- cal categories we consider. The magnitudes are significant; being “hot” corresponds to roughly a one quartile increase in the distribution. Our results are in notable contrast to the majority of the hot hand literature, which has found little to no evidence for a hot hand in sports, often employing basketball shooting data. We argue that this difference is attributable to endogenous defensive responses: basketball presents suffi- cient opportunity for defensive responses to equate shooting probabilities across players whereas baseball does not. As such, prior evidence on the absence of a hot hand (de- spite widespread belief in its presence) should not be interpreted as a cognitive mistake – as it typically is in the literature – but rather as an efficient equilibrium adjustment. We provide a heuristic manner for identifying a priori which sports are likely to per- mit an equating endogenous response response and discuss potential implications for identifying the hot hand effect in other settings. * We are grateful to Terrence Hendershott, Przemyslaw Jeziorski, Stefan Nagel, Michael Roberts and Annette Vissing-Jorgensen for helpful comments and discussion. Haas School of Business, University of California, Berkeley, CA 94720, email: [email protected], webpage: http://faculty.haas.berkeley.edu/bgreen/ Graduate School of Business, Stanford University, Stanford, CA 94305, email: [email protected], webpage: http://www.gsb.stanford.edu/users/zwiebel

Upload: semiring

Post on 18-May-2017

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SSRN-id2358747

Electronic copy available at: http://ssrn.com/abstract=2358747

The Hot Hand Fallacy: Cognitive Mistakes or

Equilibrium Adjustments? Evidence from Baseball∗

Brett Green† Jeffrey Zwiebel‡

Current Draft: November 2013

Abstract

We test for a “hot hand” (i.e., short-term streakiness in performance) in Major League

Baseball using panel data. We find strong evidence for its existence in all ten statisti-

cal categories we consider. The magnitudes are significant; being “hot” corresponds to

roughly a one quartile increase in the distribution. Our results are in notable contrast

to the majority of the hot hand literature, which has found little to no evidence for

a hot hand in sports, often employing basketball shooting data. We argue that this

difference is attributable to endogenous defensive responses: basketball presents suffi-

cient opportunity for defensive responses to equate shooting probabilities across players

whereas baseball does not. As such, prior evidence on the absence of a hot hand (de-

spite widespread belief in its presence) should not be interpreted as a cognitive mistake

– as it typically is in the literature – but rather as an efficient equilibrium adjustment.

We provide a heuristic manner for identifying a priori which sports are likely to per-

mit an equating endogenous response response and discuss potential implications for

identifying the hot hand effect in other settings.

∗We are grateful to Terrence Hendershott, Przemyslaw Jeziorski, Stefan Nagel, Michael Roberts andAnnette Vissing-Jorgensen for helpful comments and discussion.†Haas School of Business, University of California, Berkeley, CA 94720, email: [email protected],

webpage: http://faculty.haas.berkeley.edu/bgreen/‡Graduate School of Business, Stanford University, Stanford, CA 94305, email: [email protected],

webpage: http://www.gsb.stanford.edu/users/zwiebel

Page 2: SSRN-id2358747

Electronic copy available at: http://ssrn.com/abstract=2358747

1 Introduction

The absence of a “hot hand” in sports is a frequently cited motivating example in behavioral

economics of common cognitive mistakes. Following Gilovich et al. (1985)’s (GVT) seminal

study of streakiness in basketball shooters, a large literature has examined the hot hand.

Most of this literature follows GVT in purporting to show that sports players do not have a

propensity for hot or cold streaks, contrary to the almost universally held perceptions among

such players, their coaches, and fans alike. That is, temporal outcomes are not found to

exhibit streaks beyond what would follow randomly under independence, and consequently,

widespread perceptions of hot hands are attributed to a basic cognitive mistake of inferring

patterns in random data.1

Much of this hot hand literature, however, is plagued by two critical drawbacks. First, it

often fails to take endogenous strategic responses into account. A hot shooter in basketball

should be defended more intensely, and should take more difficult shots, both of which will

lower his shooting percentage. And second, many of the tests conducted in this literature,

especially tests attempting to address this endogeneity, lack the power to identify plausible

and significant models of hot hands.

To get an intuitive sense of the importance of endogenous responses in sports, consider

field goal shooting in basketball, the focus of GVT and a large amount of the subsequent

literature on the hot hand. One might be inclined to think that “better” players should make

a higher percentage of the field goals they attempt. However, this is not the case. The top

20 leading scorers include most of the widely acknowledged top offensive and highest paid

players in the game; 14 out of 20 were selected for the all star game. Yet, the median (mean)

field goal percentage among these “stars” was 0.447 (.465), which is slightly below, but

statistically indistinguishable from, that of a typical starting player .460 (.466).2 Figure 1(a)

depicts the relationship (or lack thereof) between number of points scored and the likelihood

of success in professional basketball.

There is an obvious and widely accepted explanation for why some NBA scoring leaders

and universally acknowledged stars do not shoot any better than the league average. Even

the most casual basketball observer can readily observe that Kobe Bryant (career shooting

percentage of .454) draws far more defensive attention, and attempts far more difficult shots,

1GVT (p. 311) states: “Evidently, people tend to perceive chance shooting as streak shooting and theyexpect sequences exemplifying chance shooting to contain many more alternations than would actually beproduced by a random (chance) process. Thus, people ‘see’ a positive serial correlation in independentsequences, and they fail to detect a negative serial correlation in alternating sequences.”

2Typical starting player is defined as one who started in at least 50 percent of all regular season games(i.e., 41 out of 82 games).

1

Page 3: SSRN-id2358747

500

1000

1500

2000

2500

Poi

nts

scor

ed

.3 .4 .5 .6 .7

Field goal percentage

(a) Basketball: Successes

050

010

0015

00

Num

ber

of a

ttem

pts

.3 .4 .5 .6 .7

Field goal percentage

(b) Basketball: Attempts

Figure 1 – Relation between points scored, attempts and success percentage in the National BasketballAssociation (NBA) using data from the 2012-13 season. Figures include data from all players who started inat least 50 percent of all regular season games. In neither category is the correlation coefficient statisticallydifferent from zero.

than his teammates. Indeed, a fundamental aspect of defensive strategy in basketball involves

allocating more defensive attention to better shooters and less to weaker shooters. Insofar

as defenses have great flexibility in allocating their defensive attention to cover the five

opposing players on the court, and offenses are free in turn to allocate their shots across

their players on the court, it would be natural to expect, to a first approximation, that such

adjustments would yield shooting percentages that are equated across players on a given

team. If Kobe Bryant’s probability of making a shot was higher than that of teammate Matt

Barnes, defenses should cover Bryant more and Barnes less, and Kobe should shoot more and

Barnes less, until these probabilities were equated.3 Hence, in equilibrium individual shooting

percentages are not likely to be indicative of players’ shooting abilities; good shooters are

likely to shoot more, and raise the overall team shooting percentage, but should not be

expected to have a higher shooting percentage than their teammates. This also implies there

should not be a relationship between the shooting percentage and the number of attempts,

which is consistent with Figure 1(b). Aharoni and Sarig (2012) also provide evidence that

good shooters and hot shooters both shoot more and raise overall team shooting percentages.

Few would dispute the importance of defensive and shooting adjustments when evaluating

field goal percentages across players. No one that we are aware of argues that Kobe Bryant

is but an average shooter, and no one disputes that he is defended more and attempts harder

3There are some important qualifications and limitations to this simple argument, based on specificfeatures of the game – i.e., 3-point shots are worth more, shots by inside players are harder to set-up thanoutside shots, etc. We discuss a number of these qualifications in more detail in Section 2, where we will arguethat i) most of these are likely to be of second order, and consequently, a model of endogenous responsesequating shooting percentages is likely to be a fairly good approximation to reality, and ii) such qualificationslead to deviations from equal percentages in the direction matched by the data.

2

Page 4: SSRN-id2358747

shots than others. The same reasoning, however, should apply if a player is “hot”. A hot

player, who is temporarily shooting better than he normally does, would temporarily exceed

his teammates shooting percentage if he only took his usual shots and received his usual

defensive attention. In response, he should attempt more difficult shots, and he should receive

more defensive attention, until these adjustments lower his marginal shooting percentage to

equate them with his teammates.4 Consequently, for the same reason good shooters are

unlikely to exhibit a higher shooter percentage than average players, “hot” shooters are

unlikely to exhibit a higher future shooting percentage than they themselves typically realize.

As such, it is rather noteworthy that tests finding no evidence that recent shooting success in

basketball predicts future successes have been readily interpreted by the behavioral literature

as compelling evidence for a cognitive mistake by players, coaches, and fans alike (i.e., a false

belief in the presence of hot hands). It seems that a more natural prior interpretation would

be one of efficient equilibrium adjustments by defenses in the face short-term changes in

performance.5

Also problematic is the lack of power in many of the tests that have been employed in

research on the hot hand. This is especially notable in attempts to address the issue of

endogeneity. For example, GVT also consider free throws in Study 4 of their paper. Since

free throws are not defended, and are taken from the same spot of the floor, they do not

suffer from the endogeneity of field goals (that is, of ordinary shots). GVT look at whether

the likelihood of success of a second free-throw is related to the success of a first free-throw.

However, they only consider nine players, with no more than 430 pairs of free throws for the

player who shot the most, and less than 200 pair of free throws for five of the nine players.

Along similar lines, many of the tests by others for endogeneity have looked for evidence of

a hot hand player by player, often relying on a few hundred, or at most, a few thousand

observations per player. We however demonstrate below that under simple specifications for

a hot hand, that seem well in line with common perceptions of streakiness, one would need

a dataset at least an order of magnitude larger than this in order to be able to distinguish

randomness from a hot hand.

In this paper we address both of these drawbacks by looking for evidence of streakiness in

4Huizinga and Weil (2008) have found that shooters who have recently made several shots in a rowshoot more frequently, and proceed to interpret this as evidence that defensive adjustments are limited.This interpretation, however, does not seem to follow from the result, as one might expect both offensiveadjustments (more shots) and defensive adjustments (more defensive attention) in response to a bettershooter. The Huizinga-Weil interpretation here is akin to interpreting Kobe Bryant’s high shooting frequencyas evidence that he does not receive extra defensive attention, which seems highly implausible.

5We speculate further in Section 5 as to whether behavioralists exhibit their own confirmation bias ininterpreting these results as a cognitive mistake rather than an equilibrium adjustment.

3

Page 5: SSRN-id2358747

050

100

150

200

Hits

.15 .2 .25 .3 .35

Batting average

(a) Baseball: Successes

200

300

400

500

600

700

Pla

te a

ppea

ranc

es

.15 .2 .25 .3 .35

Batting average

(b) Baseball: Attempts

Figure 2 – Relation between number successes, number of attempts and success percentage in MajorLeague Baseball using data from the 2013 season. Figures include data from all players who started in atleast 50 percent of all regular season games. In both relationships, the coefficient is positive and statisticallysignificant.

baseball player performance employing a large panel of data. We will argue in detail in Section

2 that the scope for endogenous responses in baseball is likely to be far more limited than in

basketball, and insufficient to equate margins across players as in basketball. This is partially

evidenced by Figure 2, which indicates that good players are both more likely to be successful

and have more attempts. Notice that this lies in stark contrast with analogous plots for

baseball in Figure1 In brief, the argument is that because batters face a pitcher individually

and sequentially, there is relatively little scope for the defensive team to shift resources from

defending one batter to another.6 Consequently, if a hot hand exists among baseball players,

recent past performance should predict future outcomes. Additionally, our tests have high

statistical power, for two reasons. First, there is an abundance of baseball data. There

are roughly 200,000 batter-pitcher matchups per year (the unit of our observation), and

we employ twelve years of data, giving us over two million observations. Second, we look

for general evidence of streakiness over the panel of our data instead of considering each

player individually. Given this, our tests have a high power to discern reasonable models of

streakiness from random outcomes.

We consider ten different statistical categories in baseball. In particular, we consider five

statistics for hitters: batting average, home runs per at bat, on-base percentage, strike outs

per at bat, and walks per plate appearance; and the same five statistics for pitchers: batting

6The most significant adjustment that can be made which transfers defensive resources across hitters isthat a pitcher can “pitch around” a good hitter or a hot hitter, by not giving the hitter a good pitch to hit.However such activity i) will lead to more walks which we can observe, and ii) will not affect batting averageor our other hitting statistics, which are calculated in proportion to at bats which nets out walks from plateappearances. See Section 2 for a detailed discussion on endogenous adjustments in both basketball andbaseball.

4

Page 6: SSRN-id2358747

average allowed, home runs allowed per at bat etc. The first three statistics are “good”

outcomes for hitters and “bad” outcomes for pitchers. The fourth statistic is bad for hitters

and good for pitchers. With all these statistics we hypothesize that recent performance

for a given statistic predicts future performance in the same statistic, for both hitters and

pitchers. The fifth statistic, walks per plate appearance, begets a different interpretation.

Like the first three statistics, a walk is a good outcome for a hitter. However, unlike the

other statistics, walks are likely to increase if a pitcher is pitching around (i.e., trying to

avoid pitching to) a given hitter. Consequently, this is where we expect to find evidence of

an endogenous reaction - good hitters and hot hitters should receive fewer good pitches to

hit, resulting in more walks. Given this, not only do we expect walks to beget more walks

for hitters, but also hot performance in other categories to beget more walks as well.

For each of these statistics we examine whether recent past history is predictive of current

outcomes, for both hitters and pitchers. For all our tests, we control for the ability of the

hitter and pitcher. We also control for other factors that affect outcome, such as ballpark and

platoon differentials (hitters hit worse and pitchers pitch better when facing a same-handed

opponent).

Strikingly, we find recent performance is highly significant in predicting performance in

all ten statistical categories that we examine. In all cases, being “hot” in a statistic makes

one more likely to perform well in the same statistic. A recent history on the order of about

25 at bats, which equals about 5 games or close to one week for the average hitter, has the

most predictive power over the next at bat. Furthermore these effects are of a significant

magnitude: for instance, a 30 percent increase in the number of times that a batter has

gotten on base in the last 25 at bats predicts a 5% increase in the likelihood of getting on

base in the next at bat, after controlling for all other explanatory variables. Moreover, we

also find clear evidence that when a hitter is hot in other statistical categories, he is likely

to draw more walks as well.

To assess the significance of these results and how they compare with common percep-

tions, one needs a model for streakiness. While we remain agnostic as to the specific model

underlying streaks, for some simple intuition we consider a basic three state Markov model,

where players can switch between a normal state, a hot state, and a cold state.7 Note that

in such a model, the probability of a successful outcome, conditional on a good recent his-

tory, will be less than the probability of a success in the good state, because of both, 1) the

possibility that recent successes were due to luck rather than being in the hot state, and 2)

7Two alternative models that strike us as plausible are a setting with a continuous state variable for“hotness”, and a setting where switches between states are determined by outcomes rather than exogenouschanges in the state. We do not pursue such models here, but may consider this in future work.

5

Page 7: SSRN-id2358747

a random switch out of the hot state. This mitigation notwithstanding, we show that the

streakiness exhibited in our data corresponds to a model where the difference between hot

and cold states is highly significant. For instance, our results are consistent with a model

in which a hot batter is 33% (or 9 percentage points) more likely to get a hit than a cold

batter. As we discuss in the conclusion, these results are important enough to rationalize

common coaching decision and fan perceptions with respect to hot players.

The rest of the paper proceeds as follows. In Section 2 we discuss the difference between

endogenous responses available in basketball and baseball in more detail, in order to highlight

the difference in a setting where endogenous responses are likely to equate margins across

players and a setting where such responses are not likely to do so. We follow this discussion

with a brief summary of the literature. Section 3 discusses the data and our methodology,

and Section 4 presents our results. In Section 5, we present a simple model of streakiness

to give an intuitive sense of the magnitudes consistent with our empirical results. Section

6 provides a general discussion and compares the hot-hand in sports literature with the

finance literature, drawing parallels with the efficient market hypothesis, market responses,

and behavioral finance. Section 7 concludes. Empirical results are located in Section A.

2 Endogenous Responses

2.1 Basketball vs. Baseball

It is instructive to discuss the different endogenous responses available in basketball and

baseball, in order to distinguish between a setting where margins are likely to be equated

across players and those where they are not. We first discuss basketball, to detail our

argument that endogenous responses are likely to be sufficient to equate margins there, and

then turn our attention to baseball, where we will argue that endogenous responses are far

more limited.

In basketball all active players are on the court at the same time, and defenses are very

fluid. On each possession, the offensive team must decide who will shoot the ball, and the

defensive team must decide how to allocate their defensive resources across the offensive

players. There are many defensive adjustments that shift defensive attention from one offen-

sive player to another. For example, better defenders are assigned to better offensive players,

good players are double-teamed, and defenders temporarily move off their assigned man to

help out in the coverage of another player. Such adjustments happen nearly continuously

and on practically every possession: switches, double-teams, posts, and other manners of

temporary defensive help are given and withdrawn across players fluidly, in manners small

6

Page 8: SSRN-id2358747

and large, often changing within seconds. A defender may briefly step away from his assigned

man (or assigned zone if playing zone defense) to help double-team the player with the ball

for two seconds if the ball-handler is a dangerous shooter, and for only one second if the

ball-handler is an average shooter. Moreover, such detailed adjustments and switches are

central to a team’s defensive strategy and follow rather detailed and sophisticated schemes.8

Such strategies are frequently adjusted and altered over the course of a game, depending on

what has transpired, the current game situation, and the current players in the game. All

defensive players are continuously engaged in shifting attention across offensive players based

on their ability, recent performance, and current shot opportunities. Indeed, a player’s defen-

sive ability is evaluated to a large extent by his ability to make such adjustments efficiently

and rapidly.

Offensives in turn adjust to these defensive decisions in seeking the “best available shot”

(we discuss below what we mean by this term). On offense, a team passes the ball among

players trying to set up the “best” shot, subject to the constraint of a shot clock, which

requires a shot to be taken in a given amount of time.9 Optimally, a shot should be taken

when it is a better shot than the expected best shot that will be found in the remaining

time before the shot clock expires. This suggests a threshold shot quality, with the threshold

decreasing with the time remaining on the shot chock.

A good shooter or a hot shooter that is more likely than others to make particular shots

will draw more defensive attention, which will in turn make the marginal shot more difficult.

If there is likely to be better marginal shots available for player A than player B, player A

should be shooting more and player B should be shooting less. Defenses should adjust as

well, drawing attention from player B to player A. In such a manner, the quality of a player’s

marginal shots should be equated across players on a team. Good players, or hot players,

should not have better marginal shots than bad players or cold players; both defensive and

offensive adjustments should serve to equate such margins.

To this point we have spoken of the value of the shot as if it is the same as the likely

success of a shot. This is not quite accurate for a number of reasons. Most significantly,

long shots behind the 3-point line count for 3 points, whereas other field goals count for only

2 points.Hence the expected scoring from a shot will be given by the probability the shot

will be made weighted by 2 for an ordinary shot and 3 for a 3-point shot. Not surprisingly,

8For example, offensive player X, a good inside player without outside shooting skills, might be double-teamed 40% the time that he has the ball on the right side of the basket within 6 feet of the basket, but onlywhen he is being covered by Y and not by Z, and only when his teammate W is not in the game, whereas hemight receive very loose coverage from one man at most - who is spending some of his attention shadowingother nearby offensive players - if he is 15 feet away from the basket.

9In the NBA, a team has 24 seconds to shoot, whereas in men’s college basketball, a team has 35 seconds.

7

Page 9: SSRN-id2358747

3-point shots are made less frequently than 2-point shots, and roughly in proportion to

these weights: in the NBA in 2012-13, the average 3-point shot succeeded with probability

.364, and the average 2-point shot succeeded with probability .485.10 While the frequency

of 3-points shots will consequently affect shooting percentages, one can readily verify that

for most players this effect is rather small given the relative infrequency of 3-point shots.

For our purposes it is sufficient to note that one can calculate expected scoring per shot

for each player (weighting 3 point shots by 3 and 2-point shots by two, and accounting for

each player’s shooting percentage for each of the two types of shots), and one still finds star

players with expected scoring percentages indistinguishable from league averages.11

We now turn to baseball, where we presently argue that endogenous defensive responses

are far more limited than basketball, and are unlikely to be sufficient to equate margins

across batters (offensive players).12 There are several reasons for this difference.

Most importantly, in baseball pitchers face batters sequentially rather than simultane-

ously. Unlike in basketball, there is not a fixed defensive resource (the number and defensive

ability of players on the field) that can be allocated and adjusted across all the offensive

players. Rather, each hitter faces the pitcher and his defensive team (his fielders) one at a

time, and the pitcher and all fielders can focus almost all of their defensive attention on the

hitter at hand. Consequently, if batter A is a better hitter than batter B, the opposing team

has little scope to transfer defensive resources from B to A, equating hitting margins. More-

over, even in the unlikely event that they could transfer resources in a significant manner,

it is not clear that this strategy would be optimal: such transfers would only be optimal if

the extra resources expended on better hitters would have a greater impact in lowering their

success rate than expending those same resources on weaker hitters.

Additionally, in contrast to basketball, where the offensive team is free to allocate the

shot to any player in the game, in baseball the offensive team does not choose which hitter

10The number for other years is similar and this has only changed gradually from year to year. Themoderately higher expected points realized from the average 3-point shot (1.09) compared to the average2-point shot (.97) can be readily explained by the fact that failed long shots are significantly more likely tolead to easy scoring opportunities for opponents (on fast breaks).

11Even after adjusting for 3-point shots, there are a number of other reasons that expected points per shotonly approximates shot value. For example, there is a greater chance that the ball will be stolen by opponents(and hence no shot taken) when passing to a teammate that is inside (close to the basket), and consequentlyinside shots should have a compensating higher likelihood of success, as they do. Additional reasons includethe following: Some shots are more likely to draw fouls, yielding additional scoring opportunities; some shotsare taken under greater time pressure (i.e., the shot clock is running out); and some shots are inframarginalwhen the defense breaks down (i.e., fast breaks). These specific details are not important for our argumenthere, but to note that in accord with the hypothesis of sufficient endogeneity to equate marginal value of shotsacross players, that deviations between expected points and shot value are matched, at least qualitatively,by an offsetting difference in scoring percentage.

12We speak of batters here; the argument for pitchers is essentially the same.

8

Page 10: SSRN-id2358747

gets to bat at each moment based on likely success, but rather, each hitter in the line-up

takes their turn at bat, and after hitting each player only gets to bat again after the other

eight players in the line-up have all batted. Consequently, the offensive team has little scope

to equate margins as well.

The general inability to transfer defensive resources across hitters should not to be con-

fused with the ability to make a number of optimal defensive adjustments for each hitter

that do not involve a transfers across players. Pitchers choose their type of pitches (i.e.,

fastball, change-up, slider) and the location of their pitches (i.e., inside, outside, high, low)

based on each hitter’s history at hitting such pitches. And fielders regularly shift in the field,

based on the hitter’s proclivity for hitting to certain spots, the game situation, and the type

of pitches likely to be thrown. These common adjustments, however, do not involve moving

resources from one hitter to another, but rather are made optimally for each hitter. Nor

do they consume a limited defensive resource, thereby restricting the ability to make such

adjustments for subsequent hitters. Consequently, there is no reason to expect such adjust-

ments to equate margins across players: if hitter A is better than hitter B after taking into

account the optimal adjustments for each hitter, there is no significant manner to transfer

some defensive resources from B to A.

There are a few minor adjustments a defensive team can make that may transfer some

defensive resources from one hitter to another, but these are rather limited in scope. The

most significant adjustment a defensive team can make across hitters is to “pitch around” a

good hitter; that is, either to avoid the hitter by intentionally walking him (putting him on

first base without giving him the opportunity to hit), or by not giving him a good pitch to hit

(and consequently, greatly increasing the chance that he will get a walk). Insofar as a walk

is significantly more costly to the defense when first base is already occupied, walking one

hitter makes walking the next hitter more costly, thereby providing a link across hitters.13

However, the ability to “pitch around” a hitter is limited by the fact that in an average

game situation, a walk is a well above average outcome for even the best of hitters (and

correspondingly, a below average outcome for the pitcher). Consequently, such a strategy

is primarily reserved for particular, and fairly uncommon, setting – in particular, when a

strong hitter is at bat, with men on base, with first base open, towards the end of a close

game, and with a weaker hitter to follow. (This is precisely the setting when a base hit will

do the most damage relative to a walk.) Significantly for our purposes, such a strategy, even

if it was far more common than it in fact is, would not equate margins across different hitters

for the statistics we consider. This is because all of our statistics save for the walk and on

base statistics are calculated exclusive of walks. For example, we consider the probability of

13Indeed, batters are almost never intentionally walked when there is a baserunner already on first base.

9

Page 11: SSRN-id2358747

hitting a home run in a given at bat conditional on the recent history of home runs per at

bat, where at bats are defined by plate appearances by the batter exclusive of walks.14 So

when a batter receives a walk, this will not count as an observation with our other three

statistics, and as such, walks will not directly affect our measure of hotness. Moreover, we

can use the prevalence of walks to observe endogenous behavior. It is well known that good

hitters walk more frequently than bad hitters; we would also expect to see hot hitters, as

measured by any of our four other statistics, walked more frequently than cold hitters.

A team can also change pitchers, bringing in a better pitcher, a more rested pitcher,

or a better matched-up pitcher, to face a better hitter. However, such opportunities are

also very limited, and we always control for the quality of the pitcher that the hitter faces.

The scope for such replacements are limited due to the rule that replaced players cannot

re-enter a game. Given limited numbers of pitchers, and the need to rest pitchers across

games appropriately (due to the strain of pitching on the arm), teams are very careful in

their choice to replace pitchers. Typically, a team uses only 2-4 pitchers in a game, with the

starter expected to pitch most of the game, and relief pitchers used in the later innings. Most

pitcher replacements come at predictable times (based on the number of pitches the pitcher

has thrown, the inning in the game, or the use of a pinch hitter) rather than based on the

particular hitter to be faced. One important exception to this is that at critical moments

late in a close game, teams frequently do change pitchers in consideration of the particular

hitter to be faced (and similarly, teams change hitters in consideration of the pitcher to be

faced). The primary consideration for such changes is typically the platoon advantage – it

is generally advantageous (disadvantageous) for the pitcher to be facing a hitter of the same

(opposite) handedness as the batter. We control for the platoon situation (that is, whether

the pitcher and batter are of the same or opposite hands) in all of our tests.

Some commentators sometimes speak of one further possible manner that defensive re-

sources can be transferred across hitters. In particular a pitcher could conserve energy for

one particular hitter; “reaching back for something extra” when facing a better hitter. To

the extent that this is possible, this would serve as a defensive adjustment that transfers

resources across hitters. However, there is little to no evidence that this phenomenon is

important, or even exists.15

Beyond these few adjustments that we have argued are limited in scope, we know of

14Walks (as well as a few rarer outcomes; i.e., hit by pitcher and balks) are excluded from at bats inthe definition of these standard baseball statistics; this does not require any adjustment on our part fromstandard practice.

15Often this is couched in terms of a pitcher throwing harder for key batters. However, radar gun readings,which record pitch velocity for all pitches thrown, do not reveal any difference in the velocity of pitches thatdifferent batters face from a given pitcher. Moreover, as noted above, it is not clear this strategy, even ifpossible, would be optimal.

10

Page 12: SSRN-id2358747

no other defensive strategies that involve a significant transfer of defensive resources across

hitters. Once again, what is critical here is that hitters are faced sequentially, and the defense

can optimize strategically for each hitter, unlike in basketball, where all shooters must be

defended simultaneously. And given this limited scope for transfers of defensive resources

across hitters, there is no reason to expect margins to be equated across hitters. In contrast

with basketball shooters, better hitters and pitchers should realize higher success rates than

weaker hitters and pitchers.

2.2 Other Sports: Endogeneity and Hot Hands in General

We have gone on at some length here detailing the different scope for endogenous responses

in basketball and baseball in order to convey a critical difference with respect to the hot hand

between the two sports: In basketball, endogenous responses are available that are likely to

lead to the equating of shooting success rates across teammates, whereas such endogenous

responses do not exist in baseball. Moreover, if sufficient endogenous responses exist in

basketball to equate margins across players of different long-term abilities, these responses

should also adjust to equate margins across transitory differential ability (i.e., hot hands) as

well. Put simply, under the hypothesis that there are transitory elements of ability across

all sports, one should not expect to find evidence for a hot hand in basketball shooting

percentages, but one should expect to see it in baseball hitting and pitching success rates.

We hypothesize that such transitory components in ability are typical in sports, and that

one should find evidence for a hot hand in a particular sport if and only if there is limited

scope for endogenous responses which transfer defensive resources across players. Thus, we

would predict that golf, bowling, track and field, and shooting (all of which admit no defense),

and baseball and tennis (which have limited or no scope for transferring defensive resources

across players), should all exhibit evidence of a hot hand. Conversely, soccer, football, hockey,

and basketball, all of which involve fluid defenses that choose how to allocate resources across

players, as well as offenses that choose how to allocate scoring opportunities across players,

should find limited or no evidence for hot hands in shooting/scoring probabilities.

Since endogenous responses, when available, should equate margins both across perma-

nent and transitory differences in abilities, we further hypothesize that one straightforward

manner to identify a priori whether a sport admits a sufficient endogenous response to negate

the effect of a hot hand or not would be to look at the unconditional variation in success

rates across players for the sport or statistic in question. If there is little variation, and if

variation is not correlated with other measures of ability (such as, say, market pay), this

is indicative of an endogenous response which equates margins. In contrast, if there is a

11

Page 13: SSRN-id2358747

lot of variation, and this variation is related to other measures of ability, this is indicative

of the absence of such an endogenous response. That is, we argue that one can identify

the importance of endogenous defensive responses from observing differences in long-term

performance. We then hypothesize that this will in turn predict whether there are responses

to short-term variations in performance, i.e., whether one will observe a hot hand or not.

That is, the fact that Kobe Bryant’s shooting percentage (as well as that of other bas-

ketball stars) is not different from the league average, is indicative of the existence of strong

endogenous responses in basketball which equate margins. Consequently, one should see

such responses to perceptions of streakiness in basketball as well. Conversely, the fact that

Albert Pujols’ batting average and home run rate (as well as that of other baseball stars)

are consistently and predictably far above average, year in and year out, is indicative that

margins are not equated in baseball. Consequently, we would predict that one should not

expect to see an endogenous response negate the effect of streakiness either, giving rise to

a hot hand in baseball data. Presently we explore baseball data, and leave this general

hypothesis regarding endogenous responses and hot hands to future analysis.16

2.3 Related Literature

Since GVT, a vast literature on streakiness in sports – primarily conducted by statisticians

and sports enthusiasts – has developed. Much of this research attempts to identify the hot-

hand effect across a variety of sports.17 Bar-Eli et al. (2006) provides a comprehensive review

of this literature. To summarize, they conclude that “the empirical evidence for the existence

of the hot hand is considerably limited” and that while a number of the studies they survey

claim to find evidence in favor of the hot hand, “when one closely examines the results,

demonstrations of hot hands per se are rare and often weak, due to various reasons...”. Our

results lie in stark contrast to these conclusions.

Concerns related to endogenous responses in basketball have been noted from the very

beginning. Yet, they have often operated in the periphery, whereas we argue they are

fundamental to any identification strategy. As mentioned earlier, GVT attempt to address

these concerns by obtaining similar results using free-throw performance and a “controlled”

16The difference between field goal shooting and free-throw shooting in basketball is very suggestive here.As we have already noted, there is no discernable difference between the field goal percentage of the playersconsidered the best offensive players in basketball and average players. In sharp contrast, there is a verysignificant difference in free-throw shooting (which is not defended and which does not permit an endoge-nous response) between players, with players who are considered better outside shooters predictably andpersistently shooting far better in free-throws than those who are not considered good outside shooters.

17See for example Adams (1992, 1995); Vergin (2000); Koehler and Conley (2003); Larkey et al. (1989);Forthofer (1991); Wardrop (1995, 1999); Gilden and Wilson (1995); Klaassen and Magnus (2001); Frameet al. (2003).

12

Page 14: SSRN-id2358747

shooting experiment. However, their attempts to do so lack power and are therefore unable to

reject plausible models of streakiness. Other studies have attempted to address endogeneity

by looking at sports in which endogenous responses are less of a concern. These studies are

also generally underpowered and deliver (at best) weak statistical evidence for a hot-hand

effect and have little to say about the economic magnitudes. Not surprisingly, the most

supportive evidence in favor of the hot hand has been found in horseshoe pitching (Smith,

2003) and bowling (Dorsey-Palmateer and Smith, 2004), where the scope for endogenous

responses is virtually non-existent. We differ from these papers in that: i) we have far

more data and hence far more power to our tests, and ii) we consider the hot hand and

endogenous responses to it from an economic (i.e., equilibrium) perspective, providing an

equlibrium theory of where one should expect to find and where one should not expect to

find evidence for a hot hand.

Our argument – that in equilibrium, one should not expect to see a hot-hand effect in

the basketball shooting data even if it exists – is most closely related to Aharoni and Sarig

(2012). They provide several pieces of indirect evidence that are consistent with both en-

dogenous responses and the hot hand. Namely, that players with strong recent shooting

performance (two or three consecutive successes) stay in the game longer and are more likely

to take the next shot. Further, the presence of a player with strong recent shooting perfor-

mance has a positive influence on the shooting percentage of their teammates. Moreover,

while teammate of a hot shooter exhibit higher shooting percentages (presumably due to the

increased defensive attention on the hot shoooter), the hot shooter’s shooting percentage,

conditional on taking the next shot, remains unchanged. If instead the hot hand were a

cognitive illusion then the shooting percentage of the hot shooter should decrease following

the additional defensive attention.18 We start with a similar motivation – i.e., that defensive

adjustments in basketball will equate shooting margins, thereby concealing shooting streak-

iness, but instead look for direct evidence of streakiness in another sport where we argue

that such adjustments do not exist. As such, we look for, and expect to find, direct evidence

of streakiness in baseball, where Aharoni and Sarig (2012) must look for indirect evidence

in basketball.

Within baseball, statistical analysis has largely failed to identify support for streakiness

(Siwoff, 1988; Gould, 1989; Vergin, 2000; Albert and Bennett, 2003). Most of this analysis

is conducted at the player or player-year level and involves conducting a runs test (or Wald-

18Additional playing time and a higher propensity to shoot the next shot could also be explained byan incorrect perception (and hence suboptimal decision-making) on the part of players and coaches. Forexample, if a coach incorrectly believes that player A on the opposing team is hot and (suboptimally) devotesmore defensive resources to that player, then less defensive resources will be devoted to the remaining playerscausing their shooting percentage to rise even if player A’s streak was purely random.

13

Page 15: SSRN-id2358747

Wolfowitz test). Albright (1993) conducts regressions for the category of hits at the player-

year level. He concludes that actual performance appears to be generated in a manner

consistent with an i.i.d. process. Stern and Morris (1993) point out that Albright’s analysis

is biased against finding a streakiness effect and lacks power. The simulations we conduct in

Section 5 are in agreement with Stern and Morris (1993). The key methodological differences

between our work and the previous literature, is that we estimate regressions across using the

full panel of data and look at both batters and pitchers in a variety of statistical categories.

By doing so, we obtain additional power and overcome the downward bias. Our results

strongly reject an i.i.d. data-generating process and suggest the magnitude of the hot-hand

effect is significant and strategically important.

3 Data and Methodology

3.1 Data

We use data from Major League Baseball for the 2000-2011 seasons, which is publicly avail-

able from Retrosheet.org. Each observation in the Retrosheet data set is an event, defined to

be any occurrence that changes the state of the game, not including the pitch count. Most

of the events correspond to a plate appearance (PA), which occurs when a batter completes

his turn with a hit, walk, out or reaching base on an error.19 Each observation also includes

roughly 90 variables that describe the state of the game prior to and following the event

(e.g., batter, pitcher, score, outs, inning, base-runners, stadium, date, etc.). The data from

2000-2011 contains contains approximately 2 million observations.

3.2 Methodology

Though we do not put forth a specific model of how outcomes are determined in baseball

(or other sports), it will be useful to delve a bit into the notion of “streakiness” in order to

justify the methods used in our analysis.

3.2.1 What is Streakiness?

It is worth discussing exactly what we mean by streakiness as the term is has been used

loosely among sports fans as well as in the literature. We define a streaky player as one

whose (unconditional) likelihood of success varies over time at a relatively high frequency

19Retrosheet distinguishes between a total of 25 different types events. There are several types of events,such as a stolen base or pick off, that do not constitute a plate appearance.

14

Page 16: SSRN-id2358747

(a “streak” lasts perhaps a week to a month).20 This notion seems to capture what is

conventional usage of streakiness or the hot hand effect. For example, in 1997 Joey Cora

batted 0.475 during a 24 game span (roughly 4 weeks) but only 0.262 during the rest of

the season. Here, the frequency (or duration) is crucial to the notion of being streaky.

Lower frequency variation is generally not considered to be indicative of streaky behavior.

For example, many players annual batting average improves during their early years and

declines in years prior to retirement. Yet few (us included), would argue that this is evidence

of streaky behavior. Rather, it is simply long term time-variation in ability. Thereby, in

order to test for the presence of streakiness, it is necessary to distinguish it from this longer

term variation in ability.

In the literature, most tests are performed at the player level. The advantage of perform-

ing tests at the player level is that they do not require an additional distinction between the

player’s ability and whether the player is hot or cold. Instead, under the null hypothesis,

the ability of the player on which the test is performed is constant throughout the sample

period (often a year), and the test is whether recent past performance predicts the outcome

of future attempts. As we will discuss in Section 5.2, due to the small sample size, this

approach generally lacks power to reject simple, yet fairly plausible, models of streakiness.

Using panel data affords a much greater number of observations, which gives us the power

necessary to identify the presence of streakiness. However, it also requires us to distinguish

between a good player and a hot player. That is, we must control for each player’s ability

when trying to detect any deviations from it.

We conceive of ability as consisting of two components. A long-term component that

captures attributes of the player that are unlikely to change over the course of a season (e.g.,

raw talent, skill, speed, strength, hand-eye coordination) and a short-term component that

captures higher frequency variations attributable to factors such as confidence, attitude and

minor adjustments in technique. The question is then whether and by how much this short-

term component influences the outcome of the current attempt. The basic test will look at

how much predictive power recent performance has on the current attempt after controlling

for long-term ability and other observable variables.

3.3 Regression Model

We estimate a (reduced-form) regression model (both logistic and OLS) for ten different

outcomes of particular interest (five for batters and five for pitchers). Namely, whether

the plate appearance resulted in a hit, homerun, strikeout, walk as well as whether the

20Whether recent success has a causal impact is not crucial to, nor excluded from, this definition.

15

Page 17: SSRN-id2358747

batter reached base. When considering each of the first three outcomes (hit, homerun and

strikeout), we drop all plate appearances that did not result in an official at bat. At bats

are plate appearances excluding walks as well as several other rare occurrences in which the

plate appearance terminates without the batter attempting to earn a hit (e.g., when the

batter is hit by a pitch or hits a sacrifice bunt). Letting Yit denote an indicator for whether

the outcome Y occurred in player i’s attempt at time t, we estimate the following OLS and

logistic regression equations:

Yit = α + γStateit + βXit + εit (1)

log

(Pit

1− Pit

)= α + γStateit + βXit (2)

where Pit = Pr(Yit = 1), Stateit, is a measure of how well player i has performed recently

and Xit denotes a vector of control variables, which includes the long term ability of player

i, as well as the ability of the pitcher or batter he is facing. For each outcome, we estimate

the model in which i refers to a batter in a given year and the model in which i refers to

pitcher in a given year.

The coefficient of interest, γ, captures the hot hand effect. Under the null hypothesis that

this effect does not exist, we would expect that γ = 0. A positive coefficient is indicative of

streaky behavior whereas a negative coefficient is indicative of reversals.

When the dependent variable corresponds to whether the batter is walked, γ, can also

be interpreted as capturing an endogenous response of the defense. That is, suppose the

data indicates that a batter, who has performed particularly well recently, is more likely to

be walked (γ > 0). One explanation is that the batter’s ability for discerning balls from

strikes is higher than usual and thus his plate appearance is more likely to result in a walk

(i.e., he is streaky). An alternative story, is that the pitcher responds to the batter’s recent

success by throwing pitches that are more difficult to hit and thus more likely to be outside

the strike zone and result in a walk. Such a response would be most easily justified in

situations where the batter was more likely to get a home run (or extra base hit) rather than

simply a single, in which case he reaches the same base as if he was walked. Thus, when

estimating the regression coefficients for whether the batter gets walked, we will attempt to

distinguish between these explanations by including measures of both long term ability and

recent performance with respect to both hits and home runs (in addition to these measures

with respect to walks themselves).

16

Page 18: SSRN-id2358747

3.4 Measuring State and Ability

In the baseline model, we use the player’s recent success rate (e.g., batting average) to

estimate his current state. More specifically, in our baseline regression model, we estimate

the state of a player using his recent success rate in his last L attempts.

Stateit = RLit ≡

1

L

L∑k=1

Yi,t−k. (3)

In selecting the history length (L), one encounters the following tradeoff. Longer histories

contain less noise but are also less relevant for measuring streakiness (and more relevant for

measuring long-term ability). Shorter histories contain more noise, but are also more relevant

for capturing high frequency variation. For the majority of our analysis we use L = 25 (and

drop the L superscript). This history length corresponds roughly to performance in the 5

most recent games for a batter and takes roughly a week for a “regular” batter. To account

for this, we also require that these 25 most recent attempts occurred within the last 10

days so as to ensure the measure captures “recent” performance. We arrived at L = 25 for

two reasons. First, we feel this history length correctly balances the tradeoff above using

the conventional notion of how long a typical streak lasts (i.e., several weeks to a couple

months). Second, we experimented with a variety of different history lengths using a single

season of data, for which we obtained the stronger results using L = 25, than alternative

history lengths in which we obtained qualitatively similar (though quantitatively weaker)

results. We also report estimates using the full data set for two alternative lengths L=10

and L=40. We also consider several additional methods for measuring a player’s state, (see

Section 3.6.1), for which we obtain similar results.

We consider a variety of methods to control for the player’s ability. In the baseline

model, we do so by calculating the player’s rate of success during the season not including a

“window” of W attempts before and after the current attempt.

Abilityit =1

Ni − 2W×

∑s/∈[t−W,t+W ]

Yis (4)

The window is needed to avoid biasing the estimates for a wide-range of data-generating

processes. To understand the need for the window, consider an attempt in which Rit > Ri,

where Ri is player i’s overall (unwindowed) success rate over the course of the season. By

construction, the success rate in all other attempts (excluding the last L) must be strictly

lower than Ri and therefore it is mechanically less likely that player i is successful in his next

attempt. Thus, the measurement of permanent ability must not overlap with the measure

17

Page 19: SSRN-id2358747

of streakiness. We chose W = 50 in an effort to reduce any lasting effects of the player’s

(potential) streakiness (recall that we use L = 25 for the baseline model), while at the same

time balancing the need to maintain a statistically meaningful estimate of the players ability

based on Ni − W observations.21 In this vain, we drop observations with less than 100

attempts outside the window, though our results are strengthened if these observations are

included. We also consider several other ways to control for ability (see Section 3.6.1), for

which we obtain similar results.

3.5 Control Variables and Endogenous Defensive Responses

In addition to controlling for the state and ability of the player, additional control variables

were included in the regressions. In particular, the ability of the opposing batter/pitcher,

the stadium and whether the pitcher and batter were of the same hand. Ability of the

opposing batter or pitcher (player j) was measured by computing the success rate of all

batters (pitchers) who faced pitcher (batter) j in that year excluding the current at bat. For

example, we use the batting average of batters who faced pitcher j over the course of the

year excluding the current at bat. We control for the stadium by calculating the success rate

of all batters in that stadium over the course of the season. We used a dummy variable to

control for whether the batter and pitcher were of the same hand. The dummy variable is

equal to 1 if the pitcher’s throwing arm is the same as the side from which the batter hit.

As we argued in Section 2, there is a limited set of ways that a defensive baseball team can

shift resources toward better (or hotter) players and thereby equate margins across different

hitters.22 Furthermore, to the extent that they can, most of these responses can be controlled

for in the data. Most significantly, a team may choose to ”pitch around” or intentionally

walk a good hitter, or it may bring in a better pitcher to face a good hitter. In both cases,

these responses are accounted for in our analysis. First, we drop all walks when predicting

hits, homeruns and strikeouts. Second, in the regression analysis, we control for both pitcher

ability and the platoon effect.

Our data set provides a wide range of additional situational variables that one might

expect to predict the success of a particular outcome (e.g., inning of at bat, number of outs,

number of runners on base or in scoring position). However, similar to Albright (1993), these

variables were not found to be statistically significant nor do they alter the predictions of

the model and are omitted from our reported results.

21Using this measure will result in unbiased estimates if the true data generating process is i.i.d. Further,it is consistent asymptotically if as Ni →∞, we let W →∞ at a rate slower than Ni so that Ni−2W →∞.

22This is evidenced by the fact that a batters ability is generally measured by his batting average (orsuccess rate), which is in stark contrast to basketball.

18

Page 20: SSRN-id2358747

3.6 Robustness

In addition to the baseline regression, we consider a number of alternative specifications.

Most of these involve using other methods for measure both both state and ability.

3.6.1 Alternative Measures of State

In the baseline model, the measure of state, will tend to overweight players with higher

ability. To address this, we consider two alternative strategies. The first is to use dummy

variables to indicate whether a player, based on recent performance, appears to be hot, cold

or neutral. To do so, we use an estimate in the form of Stateit = [Hotit Coldit].

• Additive Cutoffs: A player is considered hot (cold) if his recent success rate is A (B)

percentage points above (below) his long term success rate or ability. That is,

Hotit = 1 {Rit ≥ Abilityit + A}

Coldit = 1 {Rit ≤ Abilityit −B}

where A, B are chosen so that the average player is hot, in 5% of all at bats and cold

in 5% of all attempts. For this, and each of the specifications using dummy variables

below, we obtained similar results for alternative using 2.5% and 10% thresholds. We

reported these in the categories of batting average and on base percentage for batters

in Tables A.17 and A.18.23

• Proportional Cutoffs: A player is considered hot (cold) if his recent success rate is

q > 1 (r < 1) times above (below) his ability. That is,

Hotit = 1 {Rit ≥ q · Abilityit}

Coldit = 1 {Rit ≤ r · Abilityit}

where q, r are chosen so that the average player is hot in 5% of all at bats and cold in

5% of all at bats.24

As a convention, we maintain the same criterion across both batters and pitchers, despite

that outcomes which are considered “good” for batters, such as hits and homeruns, are

considered “bad” for pitchers and vice versa (strikeouts). The advantage of this convention

23Results for other categories are available upon request.24In the category of home runs and walks, Rit takes a value of zero more than 5% of the time for the

typical player. Therefore, there does not exist an r > 0 such that the average player is cold 5% of the time.Therefore, in these two categories the category “cold” is omitted.

19

Page 21: SSRN-id2358747

is to maintain the same sign prediction for being hot (or cold) across all outcomes. That is,

regardless of whether an outcome is good or bad for the player, if the outcome has occurred

more frequently recent past, than if players are streaky, we would expect the outcome is

more likely to occur in the current attempt and hence a positive (negative) coefficient on the

hot (cold) dummy variable.

The second strategy we employ is to compute how the players current recent history, Rit,

compares to his overall distribution of histories. In this case, we construct an estimate of the

players state, which corresponds to the fraction of the time that his recent history is above

the distribution of his histories in the opposite half of the season. Letting S(t) ∈ {1, 2}denote whether the attempt t occurred in the first or second half of the season, we let

Stateit =1

|{s /∈ S(t)}|∑s/∈S(t)

1{RLit > RL

is} = freq{RLit > RL

is, ∀s /∈ S(t)} (5)

Making use of this statistic for the player’s state, we first run a a regression of the form

given by equation (1). We then construct dummy estimates of the form given above using

the following method.

• Distribution Based Cutoffs: A player is considered to be hot in attempt t if Rit is in

the right tail (top 5%) of the distribution from the opposite half of his season and cold

if Rit is in the left tail (bottom 5%).25

3.6.2 Alternate Measures of Ability

An alternative way to control for player ability is by using player fixed effects. We estimate

the following regression model:

Yit = αi + γStateit + βXit + εit (6)

Estimating (6) using standard techniques generates a downward bias on γ due to the lagged

dependent variables in the State variable.26 The intuition for the bias is essentially the same

as the logic given above for why a window is needed.27 Therefore, we have also considered

25It is necessary to use the opposite half of the season for this criteria in order to avoid the overlap bias.To see this, let ti denote the time at which i’s recent success rate is at its maximum point and assume forsimplicity that ti is unique. Since Rit+1 < Rit, it must be that Yit = 0. In other words, if a player has beenvery successful recently relative to the rest of the entire season, then in his next attempt the unconditionalprobability of success must be necessarily lower.

26See, for example, Wooldridge (2010, p 373).27In fact, including a player (player-year) fixed effect is identical to using the lifetime (annual) batting

average of the player as an ability control.

20

Page 22: SSRN-id2358747

three additional measures to control for the players ability; the player’s success rate in the

previous year, the entire sample omitting the season in which the observation takes place and

the players’ success rate over the entire sample including the season in which the observation

takes place. For each of these measures, we obtain similar regression estimates.28

3.6.3 Test for Serial Correlation

Though we remain agnostic as to a model of streakiness, most plausible models that come to

mind generate serial correlation in the error terms (after controlling for player ability). Thus,

we test for serial correlation using a Wooldridge test. We run the test for all five categories

using batter-year as the longitudinal variable (i) and including controls for pitcher ability,

the stadium and the platoon effect. As before, in the categories of hits, homeruns and

strikeouts, we omit plate appearances that did not result in an official at bat. In all five of

the categories for batters, we strongly reject the null-hypothesis of no serial correlation. The

same result obtains when restricting to various subsamples of the data (e.g., batter-years

with at least 200 at bats).

4 Results

Regression estimates are located in the Appendix. Standard errors are clustered at the player

level. Tables A.5-A.14 contain the estimates for each of the outcomes of interest, both for

batters and pitchers. The results are quite striking. In each, the estimate for γ is both

consistent with a hot-hand effect and statistically significant at a 1% confidence level when

using the players recent success rate, RLit, as the measure of state. This is true for both the

OLS and logistic specifications (Columns (1) and (2)). For the case of the linear model, the

coefficient can be interpreted as the increase (in percentage points) in the likelihood of the

outcome given a one percentage point increase in RLit. For example, if a players recent on

base percentage increases from say 0.200 to 0.500, then the probability of getting on base

in the current attempt increases by 1.8% (or 18 points). This corresponds to just over a 5%

increase in the on base percentage for a typical batter.

The third column of Tables A.5-A.14 contains the estimates when using the fraction of

attempts for which the current recent history exceeds the distribution of recent histories in

the opposite season half (the proxy for state in (5)). Here again, we obtain statistically

significant results consistent with streakiness.

28While the latter suffers from the overlap bias, the estimator is still consistent and the number of obser-vation is significantly larger than at the season level.

21

Page 23: SSRN-id2358747

Columns (4)-(6) contain the results when using dummy variables as a proxy for whether

the batter or pitcher is hot or cold. For hits, we find that hot batters are roughly 1.6%

(16 batting points) more likely to get a hit than cold batters. Comparing this magnitude

to the summary statistics in Table A.1-A.2, corresponds to approximately a one quartile

increase relative to the population of other batters (i.e., from the 50th percentile to the 75th

percentile). The results are similar for pitchers.

For homeruns, the median batter hits a homerun in 3% of his at bats. Thus, it is not

possible to sort out the left five percent tail using the proportional or distribution based

methods because in more than 10% of all at bats, a batter has zero homeruns in his last

25 at bats. Using the additive method, we estimate that being hot versus cold leads to 1.2

percentage point increase in the likelihood, roughly a 40% increase for the median batter.

This also roughly corresponds to a one quartile increase. A similar result obtains in for both

strikeouts and reaching base.

In the case of walks, see Tables A.13-A.14, we again find roughly a one quartile difference

between a player identified as hot and one identified as cold. The absolute change in difference

in percentage points for pitchers is about half as large as it is for batters, though standard

deviation across pitchers is also lower.

Analyzing walks permits a test for whether a defense endogenously responds to recent

performance by batters. As mentioned earlier, the most significant adjustment a defensive

team can make across hitters is to pitch around a good hitter; that is, either to avoid the

hitter by throwing pitches outside the strike zone or intentionally walking him. It is well

known that good hitters walk more frequently than bad hitters; we would also expect to

see hot hitters walk more frequently. Table A.15 provides evidence of such responses. In

particular, it shows recent home runs by a batter are a strong predictor of whether the batter

gets walked in the current at bat. Such a response is economically justifiable because giving

up a home run is substantially more penal than giving up a walk. There is also evidence that

recent hits are predictive of walks, and further that this effect is concentrated in batters with

recent extra base hits rather than singles. This is also consistent with a defense adjusting

optimally; absent runners on base, a single is equivalent to a walk and thus a defense has

little reason to pitch around a batter who is particularly good or streaky in hitting singles.

The remaining tables contain a number of robustness checks. Specifically, Table A.16

contains estimates for the regression equation 6, which includes batter fixed effects as an

alternative control for ability. In all the five categories, the coefficient on the stateis positive.

Further, it is statistically significant in each category with the exception of hits.

Tables A.17-A.18 use 2.5% and 10% thresholds (as opposed to 5%) in order to classify

a batter as hot or cold (see Section 3.6.1). For parsimony, results are reported for batters

22

Page 24: SSRN-id2358747

in two categories (hits and on base). The statistical significance of the estimates is similar

to the 5% case. When using the 2.5% (10%) threshold, the magnitude of the coefficients

are moderately larger (smaller) than the 5% thresholds. One interpretation is that the 10%

threshold leads to a higher frequency of Type I errors (incorrectly identifying the player as

hot or cold), thereby attenuating the estimated size of the effect.

Finally, Tables A.19 and A.20 contain estimates when using a longer (L = 40) and shorter

(L = 10) history lengths in the estimate of the batter’s state as given by equations (3) and

(5). Here again, we report the results for batters in the categories of hits and on base. The

sign of the estimates is the same for all specifications except when L = 10 and using the

distribution-based estimate of state (i.e., equation (5)). The longer history length (L = 40)

results in similar estimates, while the shorter history length (L = 25) results in weaker ones.

5 Simulations

In this section, we consider a simple Markov switching model of streakiness. Our goal here

is not to propose or test a specific model of streakiness; we remain agnostic here to the

specific form that streaky behavior may take. Rather, we present this simple model to

give some intuitive sense of magnitudes of streaky behavior that are consistent with our

empirical results. That is, we will ask what parameters in the Markov model would generate

streakiness on the scale that we have observed, and we will in turn argue that these appear

to be much in line with common intuitive notions of the hot-hand in sports. In order to

do so, we will use the model to generate data, conduct regressions on simulated data, and

compare our empirical estimates to those generated by the simulated data. As expected,

given our imperfect empirical inference of the underlying state, we find the difference between

performance in high and low states in the model is greater than the difference in performance

based on inferences of the state from recent history. The magnitude of the difference is

surprisingly large. In addition, the simulations demonstrate the need for a significant amount

of data (on the order of 105 observations) in order to have the statistical power needed to

identify streaky behavior.

5.1 A Simple Model of Streakiness

A player’s ability follows a Markov-switching process. There are three underlying states:

ω ∈ {Hot(H),Normal(N),Cold(C)}. In the normal state, the player’s (unconditional) prob-

ability of success is µ. When the player is hot (cold) the probability of success is µ + ∆

(µ−∆). The transition matrix, M , is symmetric, with transition probabilities chosen such

23

Page 25: SSRN-id2358747

that the steady state distribution across states is (κ, 1 − 2κ, κ) with the half life of the H

and C state of T1/2.

Note that ∆ captures the magnitude of the streakiness effect while κ measures its preva-

lence.29 These are the two key parameters of interest. We begin by simulating data for

different values of these two parameters and estimating the least squares regression model

given in (1) using RLit as the measure for the player’s current state and the windowed ability

given by (3).

For each pair of values (∆, κ), we simulate 100 data sets according to this process, each

containing 1,000 players with 2,000 attempts per player (a total of two million observation).

Motivated by the category of hits for batters (see Table A.2), the ability of each player, µ,

is drawn from a distribution with mean 0.270 and standard deviation 2.5%. The results are

displayed in Table 5.1, in which we report the mean estimate for γ and the mean t-statistic

across the 100 simulations. In Table 5.2, we perform the same exercise where batters are

assumed to be homogenous (each with µ = 0.270). For simplicity, we assume that all other

parameters are uniform across all players and fix T1/2 = 25, which corresponds to the length

of the history used to evaluate the players state.

Magnitude of Streakiness

Frequency ∆ = 0.01 ∆ = 0.02 ∆ = 0.04 ∆ = 0.05 ∆ = 0.10

κ = 1%0.0105∗∗∗ 0.0120∗∗∗ 0.0139∗∗∗ 0.0158∗∗∗ 0.0289∗∗∗

(2.98) (3.40) (3.94) (4.48) (8.24)

κ = 5%0.0116∗∗∗ 0.0149∗∗∗ 0.0252∗∗∗ 0.0328∗∗∗ 0.0924∗∗∗

(3.28) (4.22) (7.19) (9.39) (27.36)

κ = 10%0.0122∗∗∗ 0.0184∗∗∗ 0.0389∗∗∗ 0.0543∗∗∗ 0.1606∗∗∗

(3.44) (5.23) (11.18) (15.73) (49.63)*** p<0.01, ** p<0.05, * p<0.1

Table 5.1 – Analysis of simulated data for players of heterogenous ability: µ ∼ N (0.270, 0.025) and T1/2 =25. The first number in each box is the mean estimated γ coefficient across N = 100 simulations from thespecification given in equation (1) with RL

it as the measure for the players current state and the windowedability given by equation (3), the mean t-statistic is reported in parentheses.

Both tables yield similar results. First, as expected, higher ∆ generate larger estimates

for the coefficient and its statistical significance. Increasing κ has a similar effect; the more

frequently a player is hot (cold), the more likely it is that strong (weak) recent performance

is indicative of being hot (cold).

The most notable observation comes from a comparison to the empirical estimates from

the baseball data. The analogous estimates to Tables 5.1 and 5.2 are given in column (1)

29When either ∆ = 0 or κ = 0, the data generating process is i.i.d.

24

Page 26: SSRN-id2358747

Magnitude of Streakiness

Frequency ∆ = 0.01 ∆ = 0.02 ∆ = 0.04 ∆ = 0.05 ∆ = 0.10

κ = 1%0.0007 0.0007 0.0037 0.0037 0.0184∗∗∗

(0.21) (0.20) (1.04) (1.03) (5.22)

κ = 5%0.0017 0.0031 0.0138∗∗∗ 0.0217∗∗∗ 0.0821∗∗∗

(0.47) (0.88) (3.91) (6.18) (24.16)

κ = 10%0.0018 0.0073∗∗ 0.0280∗∗∗ 0.0435∗∗∗ 0.1509∗∗∗

(0.52) (2.07) (7.99) (12.51) (46.36)*** p<0.01, ** p<0.05, * p<0.1

Table 5.2 – Analysis of simulated data for players of homogeneous abilities: µ = 0.270 and T1/2 = 25.The first number in each box is the mean estimated γ coefficient across N = 100 simulations from thespecification given in equation (1) with RL

it as the measure for the players current state and the windowedability given by equation (3), the mean t-statistic is reported in parentheses.

of Tables A.5 and A.6, where we estimated a coefficient of 0.0319 for batters and 0.0468

for pitchers. Notice estimate are consistent with a ∆ between 0.04 and 0.05 (40-50 batting

points) and κ between 5 and 10%.30 This corresponds to roughly a 20% increase in the

likelihood of getting a hit for a hot player compared to an average player and a 40% increase

relative to a cold player. Note that this magnitude for ∆ is 5-10 times larger than the

difference in performance between players that we infer to be hot or cold in our dummy

specifications of Tables A.5 and A.6 (columns (4)-(6)). That is, if the Markov model was the

true underlying model, our dummy specifications are sufficiently noisy in identifying true

hot and cold players, that they only yield a fraction of the underlying difference in ability

based on state.31

Thus, while we found roughly a one quartile difference between a player empirically

identified as hot and cold player in the previous section, in order to generate this result

from our Markov model we would need the true difference between hot and cold players

to be on the order of a three quartile difference in performance (roughly 80 to 100 batting

points). GVT surveyed fans on their perceptions of a hot hand, and argued that they were

implausibly large, even if a small effect existed. However, these responses, which were on

the order of 5-10 percentage points, are very much in line with the differences implied by

our simulations; if respondents to the GVT survey believed they were being asked about

the difference in performance between a correctly identified hot player and a neutral player,

without adjusting defenses (which to us is the most natural reading of the question that

30Similar conclusions obtain when estimating alternative specifications. Results available upon request.31Of course, if we knew this Markov model to be the correct model, we could estimate who is hot and cold

with a more efficient procedure. That is not our intent here, as we do not want to test this one particularmodel of streakiness. Rather our point is that true underlying streakiness will be significantly higher thandifferences in performance from empirically inferred streakiness based on recent performance, due to bothnoise and model misspecification.

25

Page 27: SSRN-id2358747

GVT asked), then they gave responses much in line with the parameters of the Markov

model needed to generate our empirical estimates.

5.2 Power

We have argued above that in sports where defenses can adjust freely, one should not expect

to find short-run predictability in player performance, whether or not a hot hand effect

exists. In contrast, this should not be the case in sports in which defensive adjustments are

sufficiently costly. It is notable then that tests in a number of other sports have not found

evidence of a hot hand. However, many of these tests suffer from a second drawback distinct

from endogeneity, that being the lack of power needed to reject their null hypothesis. In

Table 5.3, we use the Markov model to show the number of observations needed to detect

streakiness using our methodology is on the order of 105, while the previous studies that we

are aware of have used data in which the number of observations is at least one order of

magnitude lower.

Magnitude of Streakiness

Observations ∆ = 0.01 ∆ = 0.02 ∆ = 0.04 ∆ = 0.05 ∆ = 0.10

n = 103−0.0707 −0.0475 0.0132 −0.0084 0.0224

(-0.34) (-0.22) (0.18) (0.02) (0.23)

n = 104−0.0065 0.0024 0.0163 0.0278 0.0890∗

(-0.10) (0.07) (0.35) (0.58) (1.90)

n = 105−0.0000 0.0029 0.0155 0.0234 0.0819∗∗∗

(0.00) (0.19) (1.00) (1.50) (5.44)

*** p<0.01, ** p<0.05, * p<0.1

Table 5.3 – This table illustrates the number of observations required to identify streakiness as it dependson the magnitude of the effect (for fixed κ = 0.05). The first number in each box is the mean estimated γcoefficient across N = 100 simulations from the specification given in equation (1) with RL

it as the measurefor the players current state and the windowed ability given by equation (3), the mean t-statistic is reportedin parentheses.

6 Discussion

In this section we briefly discuss three topics: First, we address endogeneity in other sports,

and hypothesize that whether one should expect to find short term predictability in other

sports will depend on the availability of endogenous defensive responses. Second, we discuss

the magnitude of the effects we find, and argue that at a minimum, they appear consistent

26

Page 28: SSRN-id2358747

with conventional notions of hot-hands. And third, we compare the hot-hand in sports

literature with the finance literature, drawing parallels with the efficient market hypothesis,

market responses, and behavioral finance.

6.1 Other Sports

While we have discussed differences between basketball and baseball in detail, an interesting

question is how to identify which sports have sufficient scope for an endogenous response

so as to equate margins and negate the effect of a hot-hand in a given statistic. In some

sports the answer is easy: sports without defenses (i.e., golf, bowling, track and field, etc.)

trivially lack the opportunity for defensive responses. Other one-on-one sports, such as

boxing, fencing, and chess, would also seem to lack such a response, given the inability to

transfer defenses resources across players. (A boxer should always be maximizing his chance

to score points, win rounds, or knock out his opponent, regardless of how hot his opponent

is.) Then there are sports such as soccer and hockey that share the defensive feature of

basketball, where defensive resources adapt freely across offensive players, and one might

expect margins to be equated in these sports. But beyond looking sport by sport at specific

rules of play, here we propose a simple test with which we hypothesize will identify whether a

sport will exhibit short-term predictability in performance. In particular, if defenses are able

to allocate resources across players in a manner that equates margins for a particular sport,

relatively “good” players and “bad” players should have similar unconditional success ratios.

Other measures of a player’s quality (salaries, awards, etc.) should not correlate strongly

with the statistic, nor should individual player’s performance in this statistic exhibit much

correlation across years.32 On the other hand, if defenses are not able to respond to equate

margins in a given sport, one should expect better players to perform better, and to see

these performance measurements persist across years. Hence baseball all-stars should have

higher batting averages and higher home run percentages and lower earned run averages

than average players, but basketball all-stars should not have higher shooting percentages

than average players.

Applying this notion to other sports, we would predict that if and only if one sees margins

equated unconditionally across players should one expect to see margins equated temporal

within players. That is, if and only if a sports’ rules and optimal strategy yields a sufficient

defensive adjustment so that the success rates of players are equated, should one then also

expect to observe adjustments which equate success rates temporally across any given player

32One might expect to see some small correlation, if the endogenous response is significant but incompletein some settings, or if the statistic does not perfectly measure the performance over which the defense isoptimizing (i.e., equating expected shot value in basketball is not the same as equating shooting percentages).

27

Page 29: SSRN-id2358747

regardless of whether he is hot or cold. Thus, we should find predictability in sports statis-

tics if those same statistics correlate with other measures of players ability, and if player

performance in those statistics is persistent across long periods.

6.2 Magnitude of the Hot-Hand

One common response to our results is that perhaps players do exhibit a hot hand, but

even so, it is likely that common perceptions overestimate the magnitude of this effect

(for example, See Rabin). Since we do not have a measure of the magnitude of hot-hand

perceptions with which to compare our results, we cannot dismiss such a possibility. However,

it is notable that the strength of the hot-hand that we find seems very much in line with our

intuitive notions of a hot-hand, and that a number of coaching decisions based on perceptions

of a hot-hand seem to be consistent with this as well. Take, for example, batting average.

As noted in Section 5, we find that the difference in the likelihood of a hit between what

we empirically identify as a hot and cold hitter (i.e., what could be identified using just

recent outcomes) is on the order of about 15-20 batting average points, which corresponds

approximately to about a quartile increase in ranked performance for an average hitter (i.e.,

the difference between the median hitter and the 75th percentile hitter). This would suggest

that a coach should choose to play an average quality reserve over a somewhat above average

starter if the reserve is particularly hot and/or the starter is particularly cold. At the same

time it also implies that it would not generally make sense to bench a very top hitter for

an average hitter, regardless of how cold/hot the two players are. This is very much in line

with conventional coaching wisdom, where marginal starting hitters are shuffled in and out

of the lineup on a daily basis based on their recent performance and a sense of whether they

are hot or not, whereas top starting hitters are rarely replaced, even when they are very

cold. Similarly, top starting pitchers almost always take their regular turn in the pitching

rotation (typically, once every five games) regardless of recent performance as long as they

are healthy enough to do so, whereas marginal starting pitchers will often gain/lose their

starting spots based on good/bad recent performances.33

Furthermore, it is worth noting that a player or a coach might be able to better identify

their own hot streaks or the hot streaks of players they are coaching more efficiently than

we can using only recent performance outcomes. In particular, hot/cold streaks might also

manifest in other physical or psychological displays such as confidence, harder pitches, minor

injuries, etc. observed by the player/coach but not observed by the econometrician. This

33And it is notable that coaches routinely reference recent performance when justifying such line-up moves,saying things like a certain player is “in a groove”, is “swinging a hot bat”, or has been ”throwing very welllately”.

28

Page 30: SSRN-id2358747

would in turn magnify the difference in performance between times identified as hot and cold.

(Recall that in the Markov model we presented above in Section 6, the performance difference

between actual states was several times larger than the performances differences conditional

on observed hot and cold streaks, insofar as observed streaks are a noisy measure of actual

states.) Hence, to the extent this is true, one would expect to see more active strategic

responses to perceptions of player hotness.

6.3 Efficient Markets, Market Responses, and Behavioral Finance

It is useful to draw a comparison between the hot-hand literature and the finance literature.

A basic organizing principle in much finance research is semi-strong market efficiency: One

should not expect to realize abnormal returns (excess returns after a proper adjustment for

risk) associated with firm characteristics that are publicly known. The market should react

efficiently to any known information, and consequently, past prices, current firm characteris-

tics, firm management, strategic advantages, etc. should all be reflected in the market price

if this information is public. This is not to say that such deviations never occur; indeed,

the finance literature is replete with examples of deviations from market efficiency and these

anomalies are a source of a great deal of research. However, as a whole, this principle does a

good job explaining market movements (or the lack thereof) relating to a wide range of firm

characteristics and actions. Consequently, when research finds that a certain firm character-

istic or action does not predict future abnormal returns, the standard interpretation is that

the market has priced this factor correctly, rather than that the factor has no value. And in

order to determine the value of the factor, researchers instead try to find settings where the

factor changes unexpectedly and exogenously, or where the factor existed but its presence

was not known publicly.

The parallel to the hot-hand literature is that just as markets react efficiently to firm

characteristics and equate expected (risk-adjusted) returns, basketball defenses should react

efficiently and equate shot values across players. Basketball teams have much scope to

allocate defensive resources across offensive players, just as market participants have scope

to allocate their resources across firms. Better players should not have a higher shooting

percentage than weaker players, and hot players should not have a higher shooting percentage

(going forward) than cold players. The first of these two predictions is uncontroversial: it

is well-accepted that most NBA stars lack abnormally high career shooting percentages

precisely because they attract more defensive coverage. But in the exact same vein, it

seems that upon finding a lack of short-term predictability in shooting success, the natural

interpretation should once again be efficient defensive adjustments, and not that shooting is

29

Page 31: SSRN-id2358747

random and perceptions of hot-hands are a cognitive mistake.

We focus our tests on baseball, where the ability to transfer defensive resources across

different hitters is very limited, and consequently one should expect short term predictability

in outcomes if a hot hand exists. That is, the rules and play in baseball do not permit a

sufficient endogenous defensive response to equate margins across different players. Our

focus on baseball instead of basketball is analogous with looking for evidence on the value

of a firm characteristic in financial markets when the markets are constrained from freely

reacting to these characteristics - say, due to limited information. In this present study, we

look for and find strong evidence for a hot hand in baseball across all ten statistical categories

that we examine. Furthermore, the magnitude is significant, and corresponds qualitatively

with conventional perceptions.

More generally, the misattribution of equilibrium outcomes to partial equilibrium moti-

vations that we believe underlies the hot hand literature appears common to us. A number of

behavioral economics findings can readily be reinterpreted as optimal equilibrium responses.

For example, results interpreted as managerial overconfidence are often indistinguishable

from optimal managerial decisions under agency, and results interpreted as attention biases

can be reinterpreted as optimal information processing under uncertainty.

Such concerns are closely related to familiar questions of endogeneity that are central to

empirical work in many areas of economics. The care that must be taken when interpreting

correlations between a choice variable and outcomes is akin to the same care needed in

interpreting results in settings where strategic interactions equate margins. Berk and Green

(2004), to cite one of many examples, note that similar returns across mutual fund managers

might not represent similar investment abilities (the standard interpretation), but rather,

could also reflect different choices of equilibrium scale in a setting of differential investing

abilities and decreasing returns to scale. This observation is quite similar to ours: similar

shooting percentages across time (conditional on recent successes) might represent similar

abilities across time (i.e., the lack of a hot hand), or it might instead represent equilibrium

defensive responses.

7 Conclusion

We test for a “hot hand” (i.e., short-term streakiness in performance) in Major League

Baseball using panel data. We consider ten statistical categories, and find strong evidence of a

hot hand in all of them. Moveover, the magnitudes are significant; strong recent performance

relative to being in a neutral state corresponds to roughly a one quartile increase in the

distribution of present performance. Thus, a 50th percentile hitter will hit like a 75th

30

Page 32: SSRN-id2358747

percentile hitter following strong recent performance.

Our results are in notable contrast to the majority of the hot hand literature, which

has found little to no evidence for a hot hand, often employing basketball shooting data.

We argue that a primary explanation for this difference is endogenous defensive responses:

basketball presents sufficient opportunity for defensive responses to equate shooting prob-

abilities across players whereas baseball does not. As such, much prior evidence on the

absence of a hot hand despite widespread belief in its presence should not be interpreted as

a cognitive mistake—as it typically is in the literature, but rather, as an efficient equilibrium

adjustment.

We also provide a simple heuristic for identifying a priori which sports are likely to

permit such a response and discuss potential implications for identifying the hot hand effect

or ability in other settings. Defenses should treat permanent and transient skill differences of

offensive players in a similar manner: defenses are likely to adjust to equate margins across

hot and cold players if and only if they also adjust to equate margins across good and bad

players. Hence we should expect to see streakiness in a statistic if and only if we also see

permanent quality differences across players in the same statistic.

Another drawback in much of the literature is a lack of power in the tests considered.

Baseball panel data – because it is so extensive – allows us to address such issues. In

simulations we address both the questions of what type of streakiness parameters would be

necessary in order to generate our observed effects, and how much data is necessary in order

to reliably identify or reject the presence of a hot hand. We consider a simple Markov model

and find that our results are consistent with a rather strong hot hand effect, and additionally,

the need for a significant number of observations to effectively distinguish such a model from

random outcomes.

A final remark on the hot-hand as a “behavioral” phenomenon is in order. We opened

this paper noting that the absence of a hot-hand in sports is a common motivating example

of behavioralists. Our results challenge the common behavioral interpretation of the hot

hand as a cognitive mistake, but moreover, we question whether the hot hand bias should

have ever been interpreted as a behavioral phenomenon, even if the standard interpretation

were true. The behavioral mistake that the prior literature purports to show is a tendency

to infer patterns in random data. Conversely, however, behavioral finance research has

focused on challenging the efficient market hypothesis prediction of no predictability in asset

returns.34 Indeed, this literature argues that psychological considerations and inferential

mistakes, together with limits to arbitrage, should lead to predictability in market returns

34Predictability could also be due to time-varying required rates of return, but that is tangential to ourpoint here.

31

Page 33: SSRN-id2358747

(see Shleifer, 2000; Hirshleifer, 2001). Here it is rather interesting that proponents of efficient

markets have argued along the same lines of hot-hand behaviorists, noting the tendency to

infer patterns in random data. And similarly, sports fans, players, and coaches, argue (as do

we) that psychological factors are an important mechanism underlying hot-hands in sports.

Our point in drawing this analogy between the asset pricing literature and the hot hand

literature is not to say that the outcome in these areas need be related in any manner;

it could well be the case the stock market exhibits predictability, yet at the same time

sports players performance does not. What does seem odd to us, however, is that both the

existence of predictability (in the market) and the purported lack of predictability (in sports

performance) are both taken as evidence in favor of a behavioral approach. One can only

wonder whether our results will also be hailed as further behavioral evidence.35

35Psychology affects sports performance!

32

Page 34: SSRN-id2358747

References

Adams, R. M. (1992): “The’Hot Hand’revisited: Successful basketball shooting as a func-

tion of intershot interval,” Perceptual and Motor Skills, 74, 934–934.

——— (1995): “Momentum in the performance of professional tournament pocket billiards

players.” International Journal of Sport Psychology.

Aharoni, G. and O. H. Sarig (2012): “Hot hands and equilibrium,” Applied Economics,

44, 2309–2320.

Albert, J. and J. Bennett (2003): Curve ball: Baseball, statistics, and the role of chance

in the game, Springer.

Albright, S. C. (1993): “A statistical analysis of hitting streaks in baseball,” Journal of

the American Statistical Association, 88, 1175–1183.

Bar-Eli, M., S. Avugos, and M. Raab (2006): “Twenty Years of Hot Hand Research:

Review and Critique,” Psychology of Sports and Excercise, 7, 525–553.

Berk, J. and R. Green (2004): “Mutual fund flows and performance in rational markets,”

Journal of Political Economy, 112, 1269–1295.

Dorsey-Palmateer, R. and G. Smith (2004): “Bowlers’ hot hands,” The American

Statistician, 58, 38–45.

Forthofer, R. (1991): “Streak shooterThe sequel,” Chance, 4, 46–48.

Frame, D., E. Hughson, and J. Leach (2003): “Runs, regimes, and rationality: The

hot hand strikes back,” Tech. rep., Working paper.

Gilden, D. L. and S. G. Wilson (1995): “Streaks in skilled performance,” Psychonomic

Bulletin & Review, 2, 260–265.

Gilovich, T., R. Vallone, and A. Tversky (1985): “The Hot Hand in Basketball: On

The Misperception of Random Sequences,” Cognitive Psychology, 17, 295–314.

Gould, S. J. (1989): “The streak of streaks,” Chance, 2, 10–16.

Hirshleifer, D. (2001): “Investor psychology and asset pricing,” Journal of Finance, 56.

Huizinga, J. and S. Weil (2008): “Hot Hand or Hot Head: The Truth About Heat

Checks in the NBA,” University of Chicago Working Paper.

33

Page 35: SSRN-id2358747

Klaassen, F. J. and J. R. Magnus (2001): “Are points in tennis independent and

identically distributed? Evidence from a dynamic binary panel data model,” Journal of

the American Statistical Association, 96, 500–509.

Koehler, J. and C. Conley (2003): “The’Hot Hand’Myth in Professional Basketball,”

Journal of Sport & Exercise Psychology, 25, 253.

Larkey, P. D., R. A. Smith, and J. B. Kadane (1989): “It’s okay to believe in the

hot hand,” Chance, 2, 22–30.

Shleifer, A. (2000): Inefficient Markets: An Introduction to Behavioral Finance, Claren-

don Lectures in Economics. Oxford Univ Press.

Siwoff, S. (1988): The Elias Baseball Analyst, 1988, MacMillan Publishing Company.

Smith, G. (2003): “Horseshoe pitchers hot hands,” Psychonomic bulletin & review, 10,

753–758.

Stern, H. S. and C. N. Morris (1993): “A statistical analysis of hitting streaks in

baseball: Comment,” Journal of the American Statistical Association, 88, 1189–1194.

Vergin, R. C. (2000): “Winning streaks in sports and the misperception of momentum.”

Journal of Sport Behavior.

Wardrop, R. L. (1995): “Simpson’s paradox and the hot hand in basketball,” The Amer-

ican Statistician, 49, 24–28.

——— (1999): “Statistical tests for the hot-hand in basketball in a controlled setting,”

American Statistician, 1, 1–20.

Wooldridge, J. (2010): Econometric Analysis of Cross Section and Panel Data, The MIT

Press, 2 ed.

34

Page 36: SSRN-id2358747

A Tables

variable mean sd p25 p50 p75 p99 NBatting .273 .0292 .253 .272 .292 .342 3265Homerun .0332 .0195 .0185 .0308 .046 .0856 3265Strikeout .183 .0617 .138 .177 .222 .347 3265OnBase .345 .0377 .32 .343 .368 .444 3265Walk .0994 .0352 .0749 .0945 .12 .193 3265PlateAppearances 492 120 388 498 596 702 3265AtBats 442 107 351 449 533 640 3265Source:

Table A.1 – Batter-year Summary Statistics. This table provides summary statistics of MLB battingstatistics at the batter-year level over the duration of the sample period (2000-2011). Batter-years in whichthe batter had fewer than 300 plate appearances have been excluded from this table.

variable mean sd p25 p50 p75 p99 NBatting .268 .0223 .254 .268 .282 .323 789Homerun .0304 .0169 .0178 .0289 .0415 .0728 789Strikeout .191 .059 .147 .186 .228 .346 789OnBase .339 .0299 .32 .338 .356 .416 789Walk .0966 .0302 .0758 .0939 .113 .18 789PlateAppearances 2035 1726 668 1413 3034 7132 789AtBats 1830 1543 594 1283 2754 6563 789Source:

Table A.2 – Batter Summary Statistics. This table provides summary statistics of MLB battingstatistics at the batter level over the duration of the sample period (2000-2011). Batter-years in which thebatter had fewer than 300 plate appearances have been excluded from this table.

35

Page 37: SSRN-id2358747

variable mean sd p25 p50 p75 p99 NBatting .264 .0309 .245 .266 .285 .336 2607Homerun .0304 .0107 .0232 .0298 .0369 .06 2607Strikeout .191 .0554 .151 .183 .223 .352 2607OnBase .333 .0325 .312 .333 .354 .406 2607Walk .093 .0263 .0744 .091 .109 .162 2607PlateAppearances 570 219 343 560 783 947 2607AtBats 519 204 309 505 717 885 2607Source:

Table A.3 – Pitcher-year Summary Statistics. This table provides summary statistics of MLB pitchingat the pitcher-year level over the duration of the sample period (2000-2011). Pitcher-years in which thepitchers faced fewer than 300 batters have been excluded from this table.

variable mean sd p25 p50 p75 p99 NBatting .264 .028 .247 .268 .283 .327 820Homerun .0307 .00985 .0249 .0304 .0359 .0571 820Strikeout .191 .0538 .153 .182 .223 .345 820OnBase .335 .0284 .319 .336 .353 .401 820Walk .0965 .0232 .0803 .0948 .11 .158 820PlateAppearances 1814 1978 432 979 2370 8624 820AtBats 1650 1818 391 879 2161 7893 820Source:

Table A.4 – Pitcher Summary Statistics. This table provides summary statistics of MLB pitching atthe pitcher level over the duration of the sample period (2000-2011). Pitcher-years in which the pitchersfaced fewer than 300 batters have been excluded from this table.

36

Page 38: SSRN-id2358747

(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5

mainBatter State 0.0319∗∗∗ 0.160∗∗∗ 0.00564∗∗∗

(6.27) (6.29) (3.90)

Hot 0.00829∗∗∗ 0.00680∗∗∗ 0.00473∗∗

(4.27) (3.90) (2.60)

Cold -0.00741∗∗∗ -0.00930∗∗∗ -0.00388(-3.49) (-4.29) (-1.77)

Batter Ability 0.360∗∗∗ 1.811∗∗∗ 0.387∗∗∗ 0.383∗∗∗ 0.384∗∗∗ 0.381∗∗∗

(15.74) (16.00) (25.63) (16.08) (16.19) (14.81)

Pitcher Ability 0.510∗∗∗ 2.589∗∗∗ 0.513∗∗∗ 0.511∗∗∗ 0.511∗∗∗ 0.515∗∗∗

(33.37) (32.74) (34.72) (33.38) (33.38) (32.43)

Stadium 0.672∗∗∗ 3.357∗∗∗ 0.678∗∗∗ 0.676∗∗∗ 0.676∗∗∗ 0.679∗∗∗

(17.33) (17.45) (20.92) (17.40) (17.39) (16.57)

Samehand -0.0127∗∗∗ -0.0638∗∗∗ -0.0126∗∗∗ -0.0127∗∗∗ -0.0127∗∗∗ -0.0128∗∗∗

(-11.60) (-11.55) (-14.71) (-11.53) (-11.53) (-11.32)

Observations 1192266 1192266 1087227 1192266 1192266 1101466

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.5 – Dependent Variable: Hits (Batters) This table reports estimates for the batter regressionswhere the dependent variable indicates a hit. Column (1) gives the estimates from the OLS regression givenin equation (1) and using the baseline measure of the batter’s state as estimated by equation (3). Column(2) gives estimates for the logistic regression given in equation (2) using the same independent variables.Columns (3)-(6) contain OLS estimates using alternative measures of the batter’s state: column (3) uses theestimate of state given in equation (5); columns (4)-(6) use the additive, proportional and distribution basedcutoffs respectively to measure the batter’s current state (hot, cold or normal). In all columns, the measureof the batter’s state use his 25 most recent at bats (L=25). Note that all plate appearances that did notconstitute an at bat were excluded from the regressions. Standard errors are clustered at the batter level.

37

Page 39: SSRN-id2358747

(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5

mainPitcher State 0.0468∗∗∗ 0.240∗∗∗ 0.0118∗∗∗

(9.61) (9.58) (7.59)

Hot 0.00776∗∗∗ 0.00740∗∗∗ 0.00387∗

(4.07) (3.98) (2.42)

Cold -0.00842∗∗∗ -0.00871∗∗∗ -0.00855∗∗∗

(-4.64) (-4.53) (-4.20)

Pitcher Ability 0.372∗∗∗ 1.910∗∗∗ 0.451∗∗∗ 0.401∗∗∗ 0.402∗∗∗ 0.420∗∗∗

(15.73) (15.56) (17.43) (16.55) (16.54) (16.51)

Batter Ability 0.552∗∗∗ 2.892∗∗∗ 0.551∗∗∗ 0.553∗∗∗ 0.553∗∗∗ 0.551∗∗∗

(49.41) (48.02) (46.99) (49.44) (49.45) (47.42)

Stadium 0.677∗∗∗ 3.442∗∗∗ 0.652∗∗∗ 0.690∗∗∗ 0.691∗∗∗ 0.682∗∗∗

(19.73) (19.70) (18.03) (19.93) (19.98) (18.85)

Samehand -0.0128∗∗∗ -0.0661∗∗∗ -0.0125∗∗∗ -0.0128∗∗∗ -0.0128∗∗∗ -0.0126∗∗∗

(-10.24) (-10.23) (-9.49) (-10.18) (-10.18) (-9.65)

Constant -0.159∗∗∗ -3.208∗∗∗ -0.168∗∗∗ -0.158∗∗∗ -0.158∗∗∗ -0.161∗∗∗

(-17.82) (-67.75) (-17.95) (-17.50) (-17.50) (-17.16)

Observations 1128156 1128156 998268 1128156 1128156 1032373

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.6 – Dependent Variable: Hits (Pitchers) This table reports estimates for the pitcher regres-sions where the dependent variable indicates a hit. Column (1) gives the estimates from the OLS regressiongiven in equation (1) and using the baseline measure of the pitcher’s state as estimated by equation (3). Col-umn (2) gives estimates for the logistic regression given in equation (2) using the same independent variables.Columns (3)-(6) contain OLS estimates using alternative measures of the pitcher’s state: column (3) usesthe estimate of state given in equation (5); columns (4)-(6) use the additive, proportional and distributionbased cutoffs respectively to measure the batter’s current state (hot, cold or normal). In all columns, themeasure of the pitcher’s state use his 25 most recent at bats (L=25). Note that all plate appearances thatdid not constitute an at bat were excluded from the regressions. Standard errors are clustered at the pitcherlevel.

38

Page 40: SSRN-id2358747

(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5

mainBatter State 0.0751∗∗∗ 1.926∗∗∗ 0.0730∗∗∗

(15.94) (15.60) (14.59)

Hot 0.00422∗∗∗ 0.00743∗∗∗ 0.00206∗∗∗

(6.05) (8.68) (3.36)

Cold -0.00547∗∗∗ -0.00642∗

(-5.79) (-2.04)

Batter Ability 0.691∗∗∗ 18.62∗∗∗ 0.704∗∗∗ 0.758∗∗∗ 0.763∗∗∗ 0.767∗∗∗

(49.99) (33.79) (49.19) (54.55) (52.28) (52.96)

Pitcher Ability 0.346∗∗∗ 10.29∗∗∗ 0.344∗∗∗ 0.348∗∗∗ 0.347∗∗∗ 0.345∗∗∗

(19.35) (20.83) (18.28) (19.30) (19.35) (18.28)

Stadium 0.664∗∗∗ 19.73∗∗∗ 0.676∗∗∗ 0.680∗∗∗ 0.673∗∗∗ 0.693∗∗∗

(19.41) (20.51) (18.88) (19.44) (19.50) (19.02)

Samehand -0.00438∗∗∗ -0.120∗∗∗ -0.00458∗∗∗ -0.00428∗∗∗ -0.00435∗∗∗ -0.00449∗∗∗

(-8.56) (-7.23) (-8.57) (-8.22) (-8.38) (-8.26)

Observations 1192266 1192266 1101466 1192266 1192266 1101466

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.7 – Dependent Variable: Home runs (Batters) This table reports estimates for the batterregressions where the dependent variable indicates a home run. Column (1) gives the estimates from the OLSregression given in equation (1) and using the baseline measure of the batter’s state as estimated by equation(3). Column (2) gives estimates for the logistic regression given in equation (2) using the same independentvariables. Columns (3)-(6) contain OLS estimates using alternative measures of the batter’s state: column(3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive, proportional anddistribution based cutoffs respectively to measure the batter’s current state (hot, cold or normal). In allcolumns, the measure of the batter’s state use his 25 most recent at bats (L=25). Note that all plateappearances that did not constitute an at bat were excluded from the regressions. Standard errors areclustered at the batter level.

39

Page 41: SSRN-id2358747

(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5

mainPitcher State 0.0334∗∗∗ 1.055∗∗∗ 0.00366∗∗∗

(6.70) (6.69) (4.58)

Hot 0.00347∗∗∗ 0.00345∗∗∗ 0.00326∗∗∗

(4.13) (3.96) (4.58)

Cold -0.000544 0.00238(-0.54) (0.55)

Pitcher Ability 0.253∗∗∗ 8.178∗∗∗ 0.313∗∗∗ 0.278∗∗∗ 0.270∗∗∗ 0.283∗∗∗

(11.07) (11.31) (12.10) (11.84) (11.08) (11.51)

Batter Ability 0.708∗∗∗ 20.82∗∗∗ 0.698∗∗∗ 0.708∗∗∗ 0.708∗∗∗ 0.700∗∗∗

(66.29) (79.31) (60.72) (66.33) (66.33) (61.80)

Stadium 0.610∗∗∗ 19.22∗∗∗ 0.613∗∗∗ 0.621∗∗∗ 0.620∗∗∗ 0.625∗∗∗

(18.92) (19.36) (18.22) (19.01) (19.06) (18.63)

Samehand -0.00389∗∗∗ -0.122∗∗∗ -0.00375∗∗∗ -0.00388∗∗∗ -0.00388∗∗∗ -0.00372∗∗∗

(-8.57) (-7.93) (-7.91) (-8.54) (-8.55) (-7.99)

Observations 1128156 1128156 998571 1128156 1128156 1032373

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.8 – Dependent Variable: Home runs (Pitchers) This table reports estimates for the pitcherregressions where the dependent variable indicates a home run. Column (1) gives the estimates from the OLSregression given in equation (1) and using the baseline measure of the pitcher’s state as estimated by equation(3). Column (2) gives estimates for the logistic regression given in equation (2) using the same independentvariables. Columns (3)-(6) contain OLS estimates using alternative measures of the pitcher’s state: column(3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive, proportional anddistribution based cutoffs respectively to measure the batter’s current state (hot, cold or normal). In allcolumns, the measure of the pitcher’s state use his 25 most recent at bats (L=25). Note that all plateappearances that did not constitute an at bat were excluded from the regressions. Standard errors areclustered at the pitcher level.

40

Page 42: SSRN-id2358747

(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5

mainBatter State 0.0882∗∗∗ 0.592∗∗∗ 0.0205∗∗∗

(17.17) (17.38) (14.49)

Hot 0.0113∗∗∗ 0.0149∗∗∗ 0.0113∗∗∗

(7.08) (7.96) (7.33)

Cold -0.0123∗∗∗ -0.0177∗∗∗ -0.0136∗∗∗

(-7.86) (-9.56) (-7.58)

Batter Ability 0.778∗∗∗ 5.146∗∗∗ 0.878∗∗∗ 0.856∗∗∗ 0.863∗∗∗ 0.868∗∗∗

(74.79) (54.53) (93.49) (87.75) (89.31) (89.52)

Pitcher Ability 0.799∗∗∗ 5.185∗∗∗ 0.794∗∗∗ 0.801∗∗∗ 0.801∗∗∗ 0.797∗∗∗

(63.89) (96.18) (60.77) (63.84) (63.90) (61.02)

Stadium 0.0860∗∗∗ 0.657∗∗∗ 0.0887∗∗∗ 0.0932∗∗∗ 0.0923∗∗∗ 0.0930∗∗∗

(4.26) (4.51) (4.11) (4.54) (4.50) (4.29)

Samehand 0.0173∗∗∗ 0.123∗∗∗ 0.0171∗∗∗ 0.0173∗∗∗ 0.0173∗∗∗ 0.0172∗∗∗

(16.68) (16.34) (15.86) (16.50) (16.56) (15.93)

Observations 1192266 1192266 1088000 1192266 1192266 1101466

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.9 – Dependent Variable: Strikeouts (Batters) This table reports estimates for the batterregressions where the dependent variable indicates a strikeout. Column (1) gives the estimates from the OLSregression given in equation (1) and using the baseline measure of the batter’s state as estimated by equation(3). Column (2) gives estimates for the logistic regression given in equation (2) using the same independentvariables. Columns (3)-(6) contain OLS estimates using alternative measures of the batter’s state: column(3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive, proportional anddistribution based cutoffs respectively to measure the batter’s current state (hot, cold or normal). In allcolumns, the measure of the batter’s state use his 25 most recent at bats (L=25). Note that all plateappearances that did not constitute an at bat were excluded from the regressions. Standard errors areclustered at the batter level.

41

Page 43: SSRN-id2358747

(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5

mainPitcher State 0.122∗∗∗ 0.796∗∗∗ 0.202∗∗∗

(22.27) (23.42) (18.34)

Hot 0.0202∗∗∗ 0.0235∗∗∗ 0.0108∗∗∗

(10.98) (11.97) (7.40)

Cold -0.0179∗∗∗ -0.0219∗∗∗ -0.0186∗∗∗

(-11.11) (-12.96) (-9.87)

Pitcher Ability 0.699∗∗∗ 4.553∗∗∗ 5.430∗∗∗ 0.807∗∗∗ 0.813∗∗∗ 0.817∗∗∗

(47.15) (56.89) (71.86) (49.83) (50.60) (51.55)

Batter Ability 0.811∗∗∗ 5.472∗∗∗ 5.471∗∗∗ 0.813∗∗∗ 0.813∗∗∗ 0.817∗∗∗

(80.59) (102.67) (96.66) (80.12) (80.46) (77.29)

Stadium 0.159∗∗∗ 1.072∗∗∗ 1.084∗∗∗ 0.185∗∗∗ 0.179∗∗∗ 0.183∗∗∗

(6.70) (6.85) (6.48) (7.49) (7.30) (7.10)

Samehand 0.0180∗∗∗ 0.126∗∗∗ 0.121∗∗∗ 0.0180∗∗∗ 0.0180∗∗∗ 0.0176∗∗∗

(10.03) (10.18) (9.37) (9.98) (9.98) (9.35)

Observations 1128156 1128156 998333 1128156 1128156 1032373

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.10 – Dependent Variable: Strikeouts (Pitchers) This table reports estimates for the pitcherregressions where the dependent variable indicates a strikeout. Column (1) gives the estimates from the OLSregression given in equation (1) and using the baseline measure of the pitcher’s state as estimated by equation(3). Column (2) gives estimates for the logistic regression given in equation (2) using the same independentvariables. Columns (3)-(6) contain OLS estimates using alternative measures of the pitcher’s state: column(3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive, proportional anddistribution based cutoffs respectively to measure the batter’s current state (hot, cold or normal). In allcolumns, the measure of the pitcher’s state use his 25 most recent at bats (L=25). Note that all plateappearances that did not constitute an at bat were excluded from the regressions. Standard errors areclustered at the pitcher level.

42

Page 44: SSRN-id2358747

(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5

mainBatter State 0.0640∗∗∗ 0.282∗∗∗ 0.0160∗∗∗

(14.11) (14.26) (10.27)

Hot 0.0151∗∗∗ 0.0156∗∗∗ 0.00768∗∗∗

(7.65) (8.33) (4.58)

Cold -0.0134∗∗∗ -0.0132∗∗∗ -0.00790∗∗∗

(-6.92) (-6.84) (-3.93)

Batter Ability 0.539∗∗∗ 2.376∗∗∗ 0.604∗∗∗ 0.590∗∗∗ 0.589∗∗∗ 0.593∗∗∗

(20.16) (21.48) (21.16) (20.52) (20.51) (19.40)

Pitcher Ability 0.583∗∗∗ 2.599∗∗∗ 0.580∗∗∗ 0.585∗∗∗ 0.585∗∗∗ 0.585∗∗∗

(44.73) (44.59) (43.36) (44.83) (44.81) (43.51)

Stadium 0.508∗∗∗ 2.241∗∗∗ 0.521∗∗∗ 0.516∗∗∗ 0.517∗∗∗ 0.523∗∗∗

(14.05) (14.20) (13.62) (14.11) (14.11) (13.59)

samehand -0.0240∗∗∗ -0.106∗∗∗ -0.0242∗∗∗ -0.0240∗∗∗ -0.0240∗∗∗ -0.0243∗∗∗

(-19.15) (-19.23) (-18.60) (-18.99) (-19.00) (-18.60)

Observations 1458194 1458194 1320665 1458194 1458194 1353797

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.11 – Dependent Variable: On Base (Batters) This table reports estimates for the batterregressions where the dependent variable indicates the batter reached base. Column (1) gives the estimatesfrom the OLS regression given in equation (1) and using the baseline measure of the batter’s state asestimated by equation (3). Column (2) gives estimates for the logistic regression given in equation (2) usingthe same independent variables. Columns (3)-(6) contain OLS estimates using alternative measures of thebatter’s State: column (3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive,proportional and distribution based cutoffs respectively to measure the batter’s current state (hot, cold ornormal). In all columns, the measure of the batter’s state use his 25 most recent plate appearances (L=25).Standard errors are clustered at the batter level.

43

Page 45: SSRN-id2358747

(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5

mainPitcher State 0.0720∗∗∗ 0.327∗∗∗ 0.0203∗∗∗

(0.00442) (0.0202) (0.00159)

Hot 0.0146∗∗∗ 0.0130∗∗∗ 0.0103∗∗∗

(0.00180) (0.00179) (0.00145)

Cold -0.0163∗∗∗ -0.0137∗∗∗ -0.0103∗∗∗

(0.00181) (0.00188) (0.00190)

Pitcher Ability 0.414∗∗∗ 1.877∗∗∗ 0.509∗∗∗ 0.461∗∗∗ 0.462∗∗∗ 0.463∗∗∗

(0.0229) (0.106) (0.0249) (0.0238) (0.0239) (0.0253)

Batter Ability 0.611∗∗∗ 2.797∗∗∗ 0.612∗∗∗ 0.611∗∗∗ 0.611∗∗∗ 0.611∗∗∗

(0.00976) (0.0450) (0.0103) (0.00976) (0.00976) (0.0101)

Stadium 0.552∗∗∗ 2.474∗∗∗ 0.535∗∗∗ 0.571∗∗∗ 0.573∗∗∗ 0.571∗∗∗

(0.0318) (0.143) (0.0339) (0.0324) (0.0325) (0.0343)

Samehand -0.0219∗∗∗ -0.0994∗∗∗ -0.0215∗∗∗ -0.0219∗∗∗ -0.0219∗∗∗ -0.0217∗∗∗

(0.00127) (0.00578) (0.00135) (0.00128) (0.00128) (0.00133)

Observations 1344753 1344753 1183943 1344753 1344753 1235394

Marginal effects; Standard errors in parentheses(d) for discrete change of dummy variable from 0 to 1∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.12 – Dependent Variable: On Base (Pitchers) This table reports estimates for the pitcherregressions where the dependent variable indicates the batter reached base. Column (1) gives the estimatesfrom the OLS regression given in equation (1) and using the baseline measure of the pitcher’s state asestimated by equation (3). Column (2) gives estimates for the logistic regression given in equation (2) usingthe same independent variables. Columns (3)-(6) contain OLS estimates using alternative measures of thepitcher’s State: column (3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive,proportional and distribution based cutoffs respectively to measure the pitcher’s current state (hot, cold ornormal). In all columns, the measure of the pitcher’s state use his 25 most recent plate appearances (L=25).Standard errors are clustered at the batter level.

44

Page 46: SSRN-id2358747

(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5

mainBatter State 0.114∗∗∗ 1.141∗∗∗ 0.0210∗∗∗

(24.33) (23.66) (18.47)

Hot 0.0157∗∗∗ 0.0188∗∗∗ 0.0129∗∗∗

(12.48) (13.84) (10.77)

Cold -0.0147∗∗∗ -0.0127∗∗∗

(-11.10) (-7.98)

Batter Ability 0.673∗∗∗ 6.709∗∗∗ 0.803∗∗∗ 0.781∗∗∗ 0.779∗∗∗ 0.784∗∗∗

(37.42) (31.61) (41.97) (39.17) (39.61) (38.67)

Pitcher Ability 0.775∗∗∗ 8.617∗∗∗ 0.776∗∗∗ 0.779∗∗∗ 0.778∗∗∗ 0.779∗∗∗

(60.37) (67.03) (57.17) (60.50) (60.53) (57.85)

Stadium 0.223∗∗∗ 2.604∗∗∗ 0.227∗∗∗ 0.242∗∗∗ 0.236∗∗∗ 0.244∗∗∗

(7.16) (7.18) (6.81) (7.52) (7.41) (7.20)

Samehand -0.0160∗∗∗ -0.184∗∗∗ -0.0165∗∗∗ -0.0162∗∗∗ -0.0161∗∗∗ -0.0164∗∗∗

(-18.79) (-19.17) (-18.36) (-18.61) (-18.66) (-18.14)

Observations 1458194 1458194 1334045 1458194 1458194 1353797

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.13 – Dependent Variable: Walks (Batters) This table reports estimates for the batter regres-sions where the dependent variable indicates a walk. Column (1) gives the estimates from the OLS regressiongiven in equation (1) and using the baseline measure of the batter’s state as estimated by equation (3). Col-umn (2) gives estimates for the logistic regression given in equation (2) using the same independent variables.Columns (3)-(6) contain OLS estimates using alternative measures of the batter’s State: column (3) uses theestimate of state given in equation (5); columns (4)-(6) use the additive, proportional and distribution basedcutoffs respectively to measure the batter’s current state (hot, cold or normal). In all columns, the measureof the batter’s state use his 25 most recent plate appearances (L=25). Standard errors are clustered at thebatter level.

45

Page 47: SSRN-id2358747

(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5

mainPitcher State 0.0456∗∗∗ 0.554∗∗∗ 0.00673∗∗∗

(8.84) (9.00) (6.00)

Hot 0.00681∗∗∗ 0.00580∗∗∗ 0.00323∗∗∗

(5.72) (4.71) (3.85)

Cold -0.00569∗∗∗ -0.00432∗

(-4.16) (-2.17)

Pitcher Ability 0.655∗∗∗ 7.764∗∗∗ 0.730∗∗∗ 0.695∗∗∗ 0.697∗∗∗ 0.710∗∗∗

(43.53) (35.43) (46.84) (44.91) (44.92) (44.73)

Batter Ability 0.622∗∗∗ 7.090∗∗∗ 0.615∗∗∗ 0.623∗∗∗ 0.623∗∗∗ 0.618∗∗∗

(67.08) (78.53) (62.37) (67.09) (67.12) (64.03)

Stadium 0.305∗∗∗ 3.638∗∗∗ 0.298∗∗∗ 0.318∗∗∗ 0.314∗∗∗ 0.310∗∗∗

(9.84) (9.11) (9.31) (10.09) (10.01) (9.76)

Samehand -0.0137∗∗∗ -0.171∗∗∗ -0.0135∗∗∗ -0.0136∗∗∗ -0.0136∗∗∗ -0.0136∗∗∗

(-18.21) (-17.90) (-16.89) (-18.17) (-18.17) (-17.32)

Constant -0.0528∗∗∗ -4.039∗∗∗ -0.0585∗∗∗ -0.0540∗∗∗ -0.0534∗∗∗ -0.0541∗∗∗

(-18.21) (-94.76) (-18.97) (-18.25) (-18.23) (-18.03)

Observations 1344753 1344753 1188433 1344753 1344753 1235394

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.14 – Dependent Variable: Walks (Pitchers) This table reports estimates for the pitcherregressions where the dependent variable indicates a walk. Column (1) gives the estimates from the OLSregression given in equation (1) and using the baseline measure of the pitcher’s state as estimated by equation(3). Column (2) gives estimates for the logistic regression given in equation (2) using the same independentvariables. Columns (3)-(6) contain OLS estimates using alternative measures of the pitcher’s State: column(3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive, proportional anddistribution based cutoffs respectively to measure the pitcher’s current state (hot, cold or normal). In allcolumns, the measure of the pitcher’s state use his 25 most recent plate appearances (L=25). Standarderrors are clustered at the batter level.

46

Page 48: SSRN-id2358747

(1) (2) (3) (4)Continuous Prop5 Add5 Dist5

Batter State (walks) 0.107∗∗∗

(23.59)

Batter State (hits) 0.00593(1.94)

Batter State (hrs) 0.0727∗∗∗

(9.64)

Hot (walks) 0.0147∗∗∗ 0.0177∗∗∗ 0.0116∗∗∗

(12.72) (15.85) (12.14)

Cold (walks) -0.0151∗∗∗ -0.0140∗∗∗ -0.0111∗∗∗

(-12.88) (-11.87) (-8.31)

Hot (hits) 0.00406∗∗∗ 0.00254∗ 0.000903(3.67) (2.11) (0.94)

Cold (hits) -0.000256 -0.0000841 -0.000697(-0.21) (-0.06) (-0.63)

Hot (hrs) 0.00416∗∗∗ 0.00668∗∗∗ 0.00328∗∗∗

(3.68) (8.27) (3.72)

Cold (hrs) -0.00485∗∗∗ -0.00606∗

(-5.62) (-1.97)

Batter Ability (walks) 0.609∗∗∗ 0.731∗∗∗ 0.707∗∗∗ 0.703∗∗∗

(37.96) (89.33) (87.59) (88.52)

Batter Ability (hits) -0.0173 -0.0145 -0.0171∗ -0.0138(-1.77) (-1.76) (-2.08) (-1.67)

Batter Ability (hrs) 0.188∗∗∗ 0.262∗∗∗ 0.276∗∗∗ 0.269∗∗∗

(9.98) (18.04) (18.17) (18.50)

Pitcher Ability 0.775∗∗∗ 0.778∗∗∗ 0.777∗∗∗ 0.778∗∗∗

(60.16) (80.56) (80.49) (80.62)

Stadium 0.221∗∗∗ 0.235∗∗∗ 0.230∗∗∗ 0.240∗∗∗

(6.86) (7.38) (7.23) (7.51)

Samehand -0.0172∗∗∗ -0.0173∗∗∗ -0.0174∗∗∗ -0.0173∗∗∗

(-20.13) (-34.37) (-34.54) (-34.41)

Observations 1449800 1458194 1458194 1458194

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.15 – Endogenous Response: Walks (Batters) This table estimates the endogenous responseof the defense to the batter’s ability and state. The dependent variable indicates a walk. Column (1)incorporates in the batter’s State (as estimated by equation (3)) and ability in both hits and home runs intothe regression estimates from Table A.13 column (1). Columns (2) and (3) use additive cutoffs to measurethe batter’s state in hits, home runs and extra base hits. Estimates for the cold state in columns (3) and (4)are omitted because significantly more than 5% of the time batters have zero walks or home runs in theirprevious 25 attempts. Column (4) includes controls for batter ability. Standard errors are clustered at thebatter level.

47

Page 49: SSRN-id2358747

(1) (2) (3) (4) (5) (6)Hit Hr Strikeout OnBase Walk WalkR

Batter State 0.00524 0.0438∗∗∗ 0.0626∗∗∗ 0.0413∗∗∗ 0.0915∗∗∗ 0.0871∗∗∗

(0.95) (8.08) (10.90) (8.64) (16.32) (15.81)

Pitcher Ability 0.525∗∗∗ 0.393∗∗∗ 0.797∗∗∗ 0.587∗∗∗ 0.796∗∗∗ 0.795∗∗∗

(31.26) (19.02) (51.40) (43.52) (53.44) (53.47)

Stadium 0.801∗∗∗ 0.958∗∗∗ 0.254∗∗∗ 0.721∗∗∗ 0.404∗∗∗ 0.396∗∗∗

(19.28) (21.79) (8.83) (18.59) (9.29) (9.13)

Samehand -0.0182∗∗∗ -0.00777∗∗∗ 0.0243∗∗∗ -0.0333∗∗∗ -0.0233∗∗∗ -0.0234∗∗∗

(-13.18) (-11.77) (16.03) (-20.96) (-18.75) (-18.82)

Batter State (hits) 0.00797∗

(2.34)

Batter State (hrs) 0.0795∗∗∗

(9.44)

Observations 1045160 1045160 1045160 1250826 1250826 1243403

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.16 – Robustness: Measuring State using Batter Fixed Effects This table gives estimatesfor the batter regressions where fixed effects are used to control for batter ability. The dependent variable islisted directly under the column number. Each column gives the estimates from the OLS regression given inequation (6) and using the baseline measure of the batter’s state as estimated by equation (3). Column (6)includes the batter’s state in hit in home runs to capture the endogenous response of the defense. Batterswith fewer than 2,500 attempts in the dataset were excluded from the regressions. Standard errors areclustered at the batter level.

48

Page 50: SSRN-id2358747

(1) (2) (3) (4) (5) (6)Prop10 Prop2p5 Add10 Add2p5 Dist10 Dist2p5

Hot 0.00459∗∗ 0.0122∗∗∗ 0.00567∗∗∗ 0.0133∗∗∗ 0.00240 0.00528∗∗

(3.20) (4.82) (4.05) (4.94) (1.60) (2.58)

Cold -0.00496∗∗∗ -0.00735∗ -0.00454∗∗ -0.00869∗∗ -0.00512∗∗ -0.00569∗

(-3.34) (-2.49) (-2.79) (-2.95) (-2.93) (-2.21)

Observations 1192266 1192266 1192266 1192266 1101466 1101466

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.17 – Robustness to thresholds. Dependent Variable: Hits (Batters) This table reportsthe results for the same regressions as given in Columns (4)-(6) (additive, proportional and distribution basedthresholds for hot and cold) except using 10% and 2.5% thresholds as an alternative to the 5% thresholdsused in Table A.5. The same control variable were included in the regressions but the coefficients are omittedfrom the above table.

(1) (2) (3) (4) (5) (6)Prop10 Prop2p5 Add10 Add2p5 Dist10 Dist2p5

Hot 0.0122∗∗∗ 0.0172∗∗∗ 0.0128∗∗∗ 0.0155∗∗∗ 0.00822∗∗∗ 0.00533∗

(8.72) (6.69) (9.38) (5.72) (5.70) (2.58)

Cold -0.0109∗∗∗ -0.0148∗∗∗ -0.0114∗∗∗ -0.0165∗∗∗ -0.00912∗∗∗ -0.0112∗∗∗

(-7.85) (-5.43) (-7.79) (-6.16) (-5.79) (-3.87)

Observations 1458194 1458194 1458194 1458194 1353797 1353797

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.18 – Robustness to thresholds. Dependent Variable: On Base (Batters) This tablereports the results for the same regressions as given in Columns (4)-(6) (additive, proportional and distribu-tion based thresholds for hot and cold) except using 10% and 2.5% thresholds as an alternative to the 5%thresholds used in Table A.11. The same control variable were included in the regressions but the coefficientsare omitted from the above table.

49

Page 51: SSRN-id2358747

(1) (2) (3) (4) (5) (6)OLS40 Logit40 OLS dist40 OLS10 Logit10 OLS dist10

mainBatter State 0.0526∗∗∗ 0.263∗∗∗ 0.00678∗∗∗ 0.00122∗∗∗ 0.00610∗∗∗ -0.00410

(8.21) (8.26) (4.85) (3.72) (3.73) (-1.18)

Batter Ability 0.357∗∗∗ 1.794∗∗∗ 0.388∗∗∗ 0.370∗∗∗ 1.860∗∗∗ 0.369∗∗∗

(15.88) (16.14) (25.30) (15.85) (16.12) (25.16)

Pitcher Ability 0.513∗∗∗ 2.605∗∗∗ 0.512∗∗∗ 0.514∗∗∗ 2.607∗∗∗ 0.509∗∗∗

(32.97) (32.35) (34.28) (32.98) (32.36) (34.69)

Stadium 0.680∗∗∗ 3.395∗∗∗ 0.693∗∗∗ 0.684∗∗∗ 3.418∗∗∗ 0.688∗∗∗

(17.55) (17.67) (21.17) (17.48) (17.59) (21.37)

Samehand -0.0131∗∗∗ -0.0657∗∗∗ -0.0130∗∗∗ -0.0131∗∗∗ -0.0656∗∗∗ -0.0125∗∗∗

(-11.94) (-11.89) (-14.99) (-11.77) (-11.71) (-14.69)

Observations 1154102 1154102 1068338 1154102 1154102 1101939

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.19 – Robustness to history lengths. Dependent Variable: Hits (Batters) This tablereports the results for the same regressions as given in Columns (1)-(3) of Table A.5 except using alternativehistory lengths of L=40 (Columns (1)-(3) above) and L=10 (Column (4)-(6) above) to measure the batter’scurrent stateas given by equation (3).

(1) (2) (3) (4) (5) (6)OLS40 Logit40 OLS dist40 OLS10 Logit10 OLS dist10

mainBatter State 0.0885∗∗∗ 0.389∗∗∗ 0.0176∗∗∗ 0.00375∗∗∗ 0.0165∗∗∗ 0.00129

(14.86) (15.12) (11.24) (12.63) (12.74) (0.36)

Batter Ability 0.527∗∗∗ 2.321∗∗∗ 0.612∗∗∗ 0.555∗∗∗ 2.446∗∗∗ 0.585∗∗∗

(20.36) (21.70) (20.40) (19.98) (21.28) (18.84)

Pitcher Ability 0.585∗∗∗ 2.610∗∗∗ 0.582∗∗∗ 0.584∗∗∗ 2.604∗∗∗ 0.582∗∗∗

(44.40) (44.30) (42.92) (44.31) (44.20) (43.34)

Stadium 0.515∗∗∗ 2.273∗∗∗ 0.526∗∗∗ 0.515∗∗∗ 2.270∗∗∗ 0.527∗∗∗

(14.20) (14.34) (13.71) (14.15) (14.29) (13.55)

samehand -0.0241∗∗∗ -0.107∗∗∗ -0.0243∗∗∗ -0.0242∗∗∗ -0.108∗∗∗ -0.0242∗∗∗

(-19.28) (-19.37) (-18.77) (-19.11) (-19.19) (-18.47)

Observations 1414753 1414753 1314423 1414753 1414753 1352552

t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001

Table A.20 – Robustness to history lengths. Dependent Variable: On base (Batters) This tablereports the results for the same regressions as given in Columns (1)-(3) of Table A.11 except using alternativehistory lengths of L=40 (Columns (1)-(3) above) and L=10 (Column (4)-(6) above) to measure the batter’scurrent stateas given by equation (3).

50