ssrn-id2358747
TRANSCRIPT
Electronic copy available at: http://ssrn.com/abstract=2358747
The Hot Hand Fallacy: Cognitive Mistakes or
Equilibrium Adjustments? Evidence from Baseball∗
Brett Green† Jeffrey Zwiebel‡
Current Draft: November 2013
Abstract
We test for a “hot hand” (i.e., short-term streakiness in performance) in Major League
Baseball using panel data. We find strong evidence for its existence in all ten statisti-
cal categories we consider. The magnitudes are significant; being “hot” corresponds to
roughly a one quartile increase in the distribution. Our results are in notable contrast
to the majority of the hot hand literature, which has found little to no evidence for
a hot hand in sports, often employing basketball shooting data. We argue that this
difference is attributable to endogenous defensive responses: basketball presents suffi-
cient opportunity for defensive responses to equate shooting probabilities across players
whereas baseball does not. As such, prior evidence on the absence of a hot hand (de-
spite widespread belief in its presence) should not be interpreted as a cognitive mistake
– as it typically is in the literature – but rather as an efficient equilibrium adjustment.
We provide a heuristic manner for identifying a priori which sports are likely to per-
mit an equating endogenous response response and discuss potential implications for
identifying the hot hand effect in other settings.
∗We are grateful to Terrence Hendershott, Przemyslaw Jeziorski, Stefan Nagel, Michael Roberts andAnnette Vissing-Jorgensen for helpful comments and discussion.†Haas School of Business, University of California, Berkeley, CA 94720, email: [email protected],
webpage: http://faculty.haas.berkeley.edu/bgreen/‡Graduate School of Business, Stanford University, Stanford, CA 94305, email: [email protected],
webpage: http://www.gsb.stanford.edu/users/zwiebel
Electronic copy available at: http://ssrn.com/abstract=2358747
1 Introduction
The absence of a “hot hand” in sports is a frequently cited motivating example in behavioral
economics of common cognitive mistakes. Following Gilovich et al. (1985)’s (GVT) seminal
study of streakiness in basketball shooters, a large literature has examined the hot hand.
Most of this literature follows GVT in purporting to show that sports players do not have a
propensity for hot or cold streaks, contrary to the almost universally held perceptions among
such players, their coaches, and fans alike. That is, temporal outcomes are not found to
exhibit streaks beyond what would follow randomly under independence, and consequently,
widespread perceptions of hot hands are attributed to a basic cognitive mistake of inferring
patterns in random data.1
Much of this hot hand literature, however, is plagued by two critical drawbacks. First, it
often fails to take endogenous strategic responses into account. A hot shooter in basketball
should be defended more intensely, and should take more difficult shots, both of which will
lower his shooting percentage. And second, many of the tests conducted in this literature,
especially tests attempting to address this endogeneity, lack the power to identify plausible
and significant models of hot hands.
To get an intuitive sense of the importance of endogenous responses in sports, consider
field goal shooting in basketball, the focus of GVT and a large amount of the subsequent
literature on the hot hand. One might be inclined to think that “better” players should make
a higher percentage of the field goals they attempt. However, this is not the case. The top
20 leading scorers include most of the widely acknowledged top offensive and highest paid
players in the game; 14 out of 20 were selected for the all star game. Yet, the median (mean)
field goal percentage among these “stars” was 0.447 (.465), which is slightly below, but
statistically indistinguishable from, that of a typical starting player .460 (.466).2 Figure 1(a)
depicts the relationship (or lack thereof) between number of points scored and the likelihood
of success in professional basketball.
There is an obvious and widely accepted explanation for why some NBA scoring leaders
and universally acknowledged stars do not shoot any better than the league average. Even
the most casual basketball observer can readily observe that Kobe Bryant (career shooting
percentage of .454) draws far more defensive attention, and attempts far more difficult shots,
1GVT (p. 311) states: “Evidently, people tend to perceive chance shooting as streak shooting and theyexpect sequences exemplifying chance shooting to contain many more alternations than would actually beproduced by a random (chance) process. Thus, people ‘see’ a positive serial correlation in independentsequences, and they fail to detect a negative serial correlation in alternating sequences.”
2Typical starting player is defined as one who started in at least 50 percent of all regular season games(i.e., 41 out of 82 games).
1
500
1000
1500
2000
2500
Poi
nts
scor
ed
.3 .4 .5 .6 .7
Field goal percentage
(a) Basketball: Successes
050
010
0015
00
Num
ber
of a
ttem
pts
.3 .4 .5 .6 .7
Field goal percentage
(b) Basketball: Attempts
Figure 1 – Relation between points scored, attempts and success percentage in the National BasketballAssociation (NBA) using data from the 2012-13 season. Figures include data from all players who started inat least 50 percent of all regular season games. In neither category is the correlation coefficient statisticallydifferent from zero.
than his teammates. Indeed, a fundamental aspect of defensive strategy in basketball involves
allocating more defensive attention to better shooters and less to weaker shooters. Insofar
as defenses have great flexibility in allocating their defensive attention to cover the five
opposing players on the court, and offenses are free in turn to allocate their shots across
their players on the court, it would be natural to expect, to a first approximation, that such
adjustments would yield shooting percentages that are equated across players on a given
team. If Kobe Bryant’s probability of making a shot was higher than that of teammate Matt
Barnes, defenses should cover Bryant more and Barnes less, and Kobe should shoot more and
Barnes less, until these probabilities were equated.3 Hence, in equilibrium individual shooting
percentages are not likely to be indicative of players’ shooting abilities; good shooters are
likely to shoot more, and raise the overall team shooting percentage, but should not be
expected to have a higher shooting percentage than their teammates. This also implies there
should not be a relationship between the shooting percentage and the number of attempts,
which is consistent with Figure 1(b). Aharoni and Sarig (2012) also provide evidence that
good shooters and hot shooters both shoot more and raise overall team shooting percentages.
Few would dispute the importance of defensive and shooting adjustments when evaluating
field goal percentages across players. No one that we are aware of argues that Kobe Bryant
is but an average shooter, and no one disputes that he is defended more and attempts harder
3There are some important qualifications and limitations to this simple argument, based on specificfeatures of the game – i.e., 3-point shots are worth more, shots by inside players are harder to set-up thanoutside shots, etc. We discuss a number of these qualifications in more detail in Section 2, where we will arguethat i) most of these are likely to be of second order, and consequently, a model of endogenous responsesequating shooting percentages is likely to be a fairly good approximation to reality, and ii) such qualificationslead to deviations from equal percentages in the direction matched by the data.
2
shots than others. The same reasoning, however, should apply if a player is “hot”. A hot
player, who is temporarily shooting better than he normally does, would temporarily exceed
his teammates shooting percentage if he only took his usual shots and received his usual
defensive attention. In response, he should attempt more difficult shots, and he should receive
more defensive attention, until these adjustments lower his marginal shooting percentage to
equate them with his teammates.4 Consequently, for the same reason good shooters are
unlikely to exhibit a higher shooter percentage than average players, “hot” shooters are
unlikely to exhibit a higher future shooting percentage than they themselves typically realize.
As such, it is rather noteworthy that tests finding no evidence that recent shooting success in
basketball predicts future successes have been readily interpreted by the behavioral literature
as compelling evidence for a cognitive mistake by players, coaches, and fans alike (i.e., a false
belief in the presence of hot hands). It seems that a more natural prior interpretation would
be one of efficient equilibrium adjustments by defenses in the face short-term changes in
performance.5
Also problematic is the lack of power in many of the tests that have been employed in
research on the hot hand. This is especially notable in attempts to address the issue of
endogeneity. For example, GVT also consider free throws in Study 4 of their paper. Since
free throws are not defended, and are taken from the same spot of the floor, they do not
suffer from the endogeneity of field goals (that is, of ordinary shots). GVT look at whether
the likelihood of success of a second free-throw is related to the success of a first free-throw.
However, they only consider nine players, with no more than 430 pairs of free throws for the
player who shot the most, and less than 200 pair of free throws for five of the nine players.
Along similar lines, many of the tests by others for endogeneity have looked for evidence of
a hot hand player by player, often relying on a few hundred, or at most, a few thousand
observations per player. We however demonstrate below that under simple specifications for
a hot hand, that seem well in line with common perceptions of streakiness, one would need
a dataset at least an order of magnitude larger than this in order to be able to distinguish
randomness from a hot hand.
In this paper we address both of these drawbacks by looking for evidence of streakiness in
4Huizinga and Weil (2008) have found that shooters who have recently made several shots in a rowshoot more frequently, and proceed to interpret this as evidence that defensive adjustments are limited.This interpretation, however, does not seem to follow from the result, as one might expect both offensiveadjustments (more shots) and defensive adjustments (more defensive attention) in response to a bettershooter. The Huizinga-Weil interpretation here is akin to interpreting Kobe Bryant’s high shooting frequencyas evidence that he does not receive extra defensive attention, which seems highly implausible.
5We speculate further in Section 5 as to whether behavioralists exhibit their own confirmation bias ininterpreting these results as a cognitive mistake rather than an equilibrium adjustment.
3
050
100
150
200
Hits
.15 .2 .25 .3 .35
Batting average
(a) Baseball: Successes
200
300
400
500
600
700
Pla
te a
ppea
ranc
es
.15 .2 .25 .3 .35
Batting average
(b) Baseball: Attempts
Figure 2 – Relation between number successes, number of attempts and success percentage in MajorLeague Baseball using data from the 2013 season. Figures include data from all players who started in atleast 50 percent of all regular season games. In both relationships, the coefficient is positive and statisticallysignificant.
baseball player performance employing a large panel of data. We will argue in detail in Section
2 that the scope for endogenous responses in baseball is likely to be far more limited than in
basketball, and insufficient to equate margins across players as in basketball. This is partially
evidenced by Figure 2, which indicates that good players are both more likely to be successful
and have more attempts. Notice that this lies in stark contrast with analogous plots for
baseball in Figure1 In brief, the argument is that because batters face a pitcher individually
and sequentially, there is relatively little scope for the defensive team to shift resources from
defending one batter to another.6 Consequently, if a hot hand exists among baseball players,
recent past performance should predict future outcomes. Additionally, our tests have high
statistical power, for two reasons. First, there is an abundance of baseball data. There
are roughly 200,000 batter-pitcher matchups per year (the unit of our observation), and
we employ twelve years of data, giving us over two million observations. Second, we look
for general evidence of streakiness over the panel of our data instead of considering each
player individually. Given this, our tests have a high power to discern reasonable models of
streakiness from random outcomes.
We consider ten different statistical categories in baseball. In particular, we consider five
statistics for hitters: batting average, home runs per at bat, on-base percentage, strike outs
per at bat, and walks per plate appearance; and the same five statistics for pitchers: batting
6The most significant adjustment that can be made which transfers defensive resources across hitters isthat a pitcher can “pitch around” a good hitter or a hot hitter, by not giving the hitter a good pitch to hit.However such activity i) will lead to more walks which we can observe, and ii) will not affect batting averageor our other hitting statistics, which are calculated in proportion to at bats which nets out walks from plateappearances. See Section 2 for a detailed discussion on endogenous adjustments in both basketball andbaseball.
4
average allowed, home runs allowed per at bat etc. The first three statistics are “good”
outcomes for hitters and “bad” outcomes for pitchers. The fourth statistic is bad for hitters
and good for pitchers. With all these statistics we hypothesize that recent performance
for a given statistic predicts future performance in the same statistic, for both hitters and
pitchers. The fifth statistic, walks per plate appearance, begets a different interpretation.
Like the first three statistics, a walk is a good outcome for a hitter. However, unlike the
other statistics, walks are likely to increase if a pitcher is pitching around (i.e., trying to
avoid pitching to) a given hitter. Consequently, this is where we expect to find evidence of
an endogenous reaction - good hitters and hot hitters should receive fewer good pitches to
hit, resulting in more walks. Given this, not only do we expect walks to beget more walks
for hitters, but also hot performance in other categories to beget more walks as well.
For each of these statistics we examine whether recent past history is predictive of current
outcomes, for both hitters and pitchers. For all our tests, we control for the ability of the
hitter and pitcher. We also control for other factors that affect outcome, such as ballpark and
platoon differentials (hitters hit worse and pitchers pitch better when facing a same-handed
opponent).
Strikingly, we find recent performance is highly significant in predicting performance in
all ten statistical categories that we examine. In all cases, being “hot” in a statistic makes
one more likely to perform well in the same statistic. A recent history on the order of about
25 at bats, which equals about 5 games or close to one week for the average hitter, has the
most predictive power over the next at bat. Furthermore these effects are of a significant
magnitude: for instance, a 30 percent increase in the number of times that a batter has
gotten on base in the last 25 at bats predicts a 5% increase in the likelihood of getting on
base in the next at bat, after controlling for all other explanatory variables. Moreover, we
also find clear evidence that when a hitter is hot in other statistical categories, he is likely
to draw more walks as well.
To assess the significance of these results and how they compare with common percep-
tions, one needs a model for streakiness. While we remain agnostic as to the specific model
underlying streaks, for some simple intuition we consider a basic three state Markov model,
where players can switch between a normal state, a hot state, and a cold state.7 Note that
in such a model, the probability of a successful outcome, conditional on a good recent his-
tory, will be less than the probability of a success in the good state, because of both, 1) the
possibility that recent successes were due to luck rather than being in the hot state, and 2)
7Two alternative models that strike us as plausible are a setting with a continuous state variable for“hotness”, and a setting where switches between states are determined by outcomes rather than exogenouschanges in the state. We do not pursue such models here, but may consider this in future work.
5
a random switch out of the hot state. This mitigation notwithstanding, we show that the
streakiness exhibited in our data corresponds to a model where the difference between hot
and cold states is highly significant. For instance, our results are consistent with a model
in which a hot batter is 33% (or 9 percentage points) more likely to get a hit than a cold
batter. As we discuss in the conclusion, these results are important enough to rationalize
common coaching decision and fan perceptions with respect to hot players.
The rest of the paper proceeds as follows. In Section 2 we discuss the difference between
endogenous responses available in basketball and baseball in more detail, in order to highlight
the difference in a setting where endogenous responses are likely to equate margins across
players and a setting where such responses are not likely to do so. We follow this discussion
with a brief summary of the literature. Section 3 discusses the data and our methodology,
and Section 4 presents our results. In Section 5, we present a simple model of streakiness
to give an intuitive sense of the magnitudes consistent with our empirical results. Section
6 provides a general discussion and compares the hot-hand in sports literature with the
finance literature, drawing parallels with the efficient market hypothesis, market responses,
and behavioral finance. Section 7 concludes. Empirical results are located in Section A.
2 Endogenous Responses
2.1 Basketball vs. Baseball
It is instructive to discuss the different endogenous responses available in basketball and
baseball, in order to distinguish between a setting where margins are likely to be equated
across players and those where they are not. We first discuss basketball, to detail our
argument that endogenous responses are likely to be sufficient to equate margins there, and
then turn our attention to baseball, where we will argue that endogenous responses are far
more limited.
In basketball all active players are on the court at the same time, and defenses are very
fluid. On each possession, the offensive team must decide who will shoot the ball, and the
defensive team must decide how to allocate their defensive resources across the offensive
players. There are many defensive adjustments that shift defensive attention from one offen-
sive player to another. For example, better defenders are assigned to better offensive players,
good players are double-teamed, and defenders temporarily move off their assigned man to
help out in the coverage of another player. Such adjustments happen nearly continuously
and on practically every possession: switches, double-teams, posts, and other manners of
temporary defensive help are given and withdrawn across players fluidly, in manners small
6
and large, often changing within seconds. A defender may briefly step away from his assigned
man (or assigned zone if playing zone defense) to help double-team the player with the ball
for two seconds if the ball-handler is a dangerous shooter, and for only one second if the
ball-handler is an average shooter. Moreover, such detailed adjustments and switches are
central to a team’s defensive strategy and follow rather detailed and sophisticated schemes.8
Such strategies are frequently adjusted and altered over the course of a game, depending on
what has transpired, the current game situation, and the current players in the game. All
defensive players are continuously engaged in shifting attention across offensive players based
on their ability, recent performance, and current shot opportunities. Indeed, a player’s defen-
sive ability is evaluated to a large extent by his ability to make such adjustments efficiently
and rapidly.
Offensives in turn adjust to these defensive decisions in seeking the “best available shot”
(we discuss below what we mean by this term). On offense, a team passes the ball among
players trying to set up the “best” shot, subject to the constraint of a shot clock, which
requires a shot to be taken in a given amount of time.9 Optimally, a shot should be taken
when it is a better shot than the expected best shot that will be found in the remaining
time before the shot clock expires. This suggests a threshold shot quality, with the threshold
decreasing with the time remaining on the shot chock.
A good shooter or a hot shooter that is more likely than others to make particular shots
will draw more defensive attention, which will in turn make the marginal shot more difficult.
If there is likely to be better marginal shots available for player A than player B, player A
should be shooting more and player B should be shooting less. Defenses should adjust as
well, drawing attention from player B to player A. In such a manner, the quality of a player’s
marginal shots should be equated across players on a team. Good players, or hot players,
should not have better marginal shots than bad players or cold players; both defensive and
offensive adjustments should serve to equate such margins.
To this point we have spoken of the value of the shot as if it is the same as the likely
success of a shot. This is not quite accurate for a number of reasons. Most significantly,
long shots behind the 3-point line count for 3 points, whereas other field goals count for only
2 points.Hence the expected scoring from a shot will be given by the probability the shot
will be made weighted by 2 for an ordinary shot and 3 for a 3-point shot. Not surprisingly,
8For example, offensive player X, a good inside player without outside shooting skills, might be double-teamed 40% the time that he has the ball on the right side of the basket within 6 feet of the basket, but onlywhen he is being covered by Y and not by Z, and only when his teammate W is not in the game, whereas hemight receive very loose coverage from one man at most - who is spending some of his attention shadowingother nearby offensive players - if he is 15 feet away from the basket.
9In the NBA, a team has 24 seconds to shoot, whereas in men’s college basketball, a team has 35 seconds.
7
3-point shots are made less frequently than 2-point shots, and roughly in proportion to
these weights: in the NBA in 2012-13, the average 3-point shot succeeded with probability
.364, and the average 2-point shot succeeded with probability .485.10 While the frequency
of 3-points shots will consequently affect shooting percentages, one can readily verify that
for most players this effect is rather small given the relative infrequency of 3-point shots.
For our purposes it is sufficient to note that one can calculate expected scoring per shot
for each player (weighting 3 point shots by 3 and 2-point shots by two, and accounting for
each player’s shooting percentage for each of the two types of shots), and one still finds star
players with expected scoring percentages indistinguishable from league averages.11
We now turn to baseball, where we presently argue that endogenous defensive responses
are far more limited than basketball, and are unlikely to be sufficient to equate margins
across batters (offensive players).12 There are several reasons for this difference.
Most importantly, in baseball pitchers face batters sequentially rather than simultane-
ously. Unlike in basketball, there is not a fixed defensive resource (the number and defensive
ability of players on the field) that can be allocated and adjusted across all the offensive
players. Rather, each hitter faces the pitcher and his defensive team (his fielders) one at a
time, and the pitcher and all fielders can focus almost all of their defensive attention on the
hitter at hand. Consequently, if batter A is a better hitter than batter B, the opposing team
has little scope to transfer defensive resources from B to A, equating hitting margins. More-
over, even in the unlikely event that they could transfer resources in a significant manner,
it is not clear that this strategy would be optimal: such transfers would only be optimal if
the extra resources expended on better hitters would have a greater impact in lowering their
success rate than expending those same resources on weaker hitters.
Additionally, in contrast to basketball, where the offensive team is free to allocate the
shot to any player in the game, in baseball the offensive team does not choose which hitter
10The number for other years is similar and this has only changed gradually from year to year. Themoderately higher expected points realized from the average 3-point shot (1.09) compared to the average2-point shot (.97) can be readily explained by the fact that failed long shots are significantly more likely tolead to easy scoring opportunities for opponents (on fast breaks).
11Even after adjusting for 3-point shots, there are a number of other reasons that expected points per shotonly approximates shot value. For example, there is a greater chance that the ball will be stolen by opponents(and hence no shot taken) when passing to a teammate that is inside (close to the basket), and consequentlyinside shots should have a compensating higher likelihood of success, as they do. Additional reasons includethe following: Some shots are more likely to draw fouls, yielding additional scoring opportunities; some shotsare taken under greater time pressure (i.e., the shot clock is running out); and some shots are inframarginalwhen the defense breaks down (i.e., fast breaks). These specific details are not important for our argumenthere, but to note that in accord with the hypothesis of sufficient endogeneity to equate marginal value of shotsacross players, that deviations between expected points and shot value are matched, at least qualitatively,by an offsetting difference in scoring percentage.
12We speak of batters here; the argument for pitchers is essentially the same.
8
gets to bat at each moment based on likely success, but rather, each hitter in the line-up
takes their turn at bat, and after hitting each player only gets to bat again after the other
eight players in the line-up have all batted. Consequently, the offensive team has little scope
to equate margins as well.
The general inability to transfer defensive resources across hitters should not to be con-
fused with the ability to make a number of optimal defensive adjustments for each hitter
that do not involve a transfers across players. Pitchers choose their type of pitches (i.e.,
fastball, change-up, slider) and the location of their pitches (i.e., inside, outside, high, low)
based on each hitter’s history at hitting such pitches. And fielders regularly shift in the field,
based on the hitter’s proclivity for hitting to certain spots, the game situation, and the type
of pitches likely to be thrown. These common adjustments, however, do not involve moving
resources from one hitter to another, but rather are made optimally for each hitter. Nor
do they consume a limited defensive resource, thereby restricting the ability to make such
adjustments for subsequent hitters. Consequently, there is no reason to expect such adjust-
ments to equate margins across players: if hitter A is better than hitter B after taking into
account the optimal adjustments for each hitter, there is no significant manner to transfer
some defensive resources from B to A.
There are a few minor adjustments a defensive team can make that may transfer some
defensive resources from one hitter to another, but these are rather limited in scope. The
most significant adjustment a defensive team can make across hitters is to “pitch around” a
good hitter; that is, either to avoid the hitter by intentionally walking him (putting him on
first base without giving him the opportunity to hit), or by not giving him a good pitch to hit
(and consequently, greatly increasing the chance that he will get a walk). Insofar as a walk
is significantly more costly to the defense when first base is already occupied, walking one
hitter makes walking the next hitter more costly, thereby providing a link across hitters.13
However, the ability to “pitch around” a hitter is limited by the fact that in an average
game situation, a walk is a well above average outcome for even the best of hitters (and
correspondingly, a below average outcome for the pitcher). Consequently, such a strategy
is primarily reserved for particular, and fairly uncommon, setting – in particular, when a
strong hitter is at bat, with men on base, with first base open, towards the end of a close
game, and with a weaker hitter to follow. (This is precisely the setting when a base hit will
do the most damage relative to a walk.) Significantly for our purposes, such a strategy, even
if it was far more common than it in fact is, would not equate margins across different hitters
for the statistics we consider. This is because all of our statistics save for the walk and on
base statistics are calculated exclusive of walks. For example, we consider the probability of
13Indeed, batters are almost never intentionally walked when there is a baserunner already on first base.
9
hitting a home run in a given at bat conditional on the recent history of home runs per at
bat, where at bats are defined by plate appearances by the batter exclusive of walks.14 So
when a batter receives a walk, this will not count as an observation with our other three
statistics, and as such, walks will not directly affect our measure of hotness. Moreover, we
can use the prevalence of walks to observe endogenous behavior. It is well known that good
hitters walk more frequently than bad hitters; we would also expect to see hot hitters, as
measured by any of our four other statistics, walked more frequently than cold hitters.
A team can also change pitchers, bringing in a better pitcher, a more rested pitcher,
or a better matched-up pitcher, to face a better hitter. However, such opportunities are
also very limited, and we always control for the quality of the pitcher that the hitter faces.
The scope for such replacements are limited due to the rule that replaced players cannot
re-enter a game. Given limited numbers of pitchers, and the need to rest pitchers across
games appropriately (due to the strain of pitching on the arm), teams are very careful in
their choice to replace pitchers. Typically, a team uses only 2-4 pitchers in a game, with the
starter expected to pitch most of the game, and relief pitchers used in the later innings. Most
pitcher replacements come at predictable times (based on the number of pitches the pitcher
has thrown, the inning in the game, or the use of a pinch hitter) rather than based on the
particular hitter to be faced. One important exception to this is that at critical moments
late in a close game, teams frequently do change pitchers in consideration of the particular
hitter to be faced (and similarly, teams change hitters in consideration of the pitcher to be
faced). The primary consideration for such changes is typically the platoon advantage – it
is generally advantageous (disadvantageous) for the pitcher to be facing a hitter of the same
(opposite) handedness as the batter. We control for the platoon situation (that is, whether
the pitcher and batter are of the same or opposite hands) in all of our tests.
Some commentators sometimes speak of one further possible manner that defensive re-
sources can be transferred across hitters. In particular a pitcher could conserve energy for
one particular hitter; “reaching back for something extra” when facing a better hitter. To
the extent that this is possible, this would serve as a defensive adjustment that transfers
resources across hitters. However, there is little to no evidence that this phenomenon is
important, or even exists.15
Beyond these few adjustments that we have argued are limited in scope, we know of
14Walks (as well as a few rarer outcomes; i.e., hit by pitcher and balks) are excluded from at bats inthe definition of these standard baseball statistics; this does not require any adjustment on our part fromstandard practice.
15Often this is couched in terms of a pitcher throwing harder for key batters. However, radar gun readings,which record pitch velocity for all pitches thrown, do not reveal any difference in the velocity of pitches thatdifferent batters face from a given pitcher. Moreover, as noted above, it is not clear this strategy, even ifpossible, would be optimal.
10
no other defensive strategies that involve a significant transfer of defensive resources across
hitters. Once again, what is critical here is that hitters are faced sequentially, and the defense
can optimize strategically for each hitter, unlike in basketball, where all shooters must be
defended simultaneously. And given this limited scope for transfers of defensive resources
across hitters, there is no reason to expect margins to be equated across hitters. In contrast
with basketball shooters, better hitters and pitchers should realize higher success rates than
weaker hitters and pitchers.
2.2 Other Sports: Endogeneity and Hot Hands in General
We have gone on at some length here detailing the different scope for endogenous responses
in basketball and baseball in order to convey a critical difference with respect to the hot hand
between the two sports: In basketball, endogenous responses are available that are likely to
lead to the equating of shooting success rates across teammates, whereas such endogenous
responses do not exist in baseball. Moreover, if sufficient endogenous responses exist in
basketball to equate margins across players of different long-term abilities, these responses
should also adjust to equate margins across transitory differential ability (i.e., hot hands) as
well. Put simply, under the hypothesis that there are transitory elements of ability across
all sports, one should not expect to find evidence for a hot hand in basketball shooting
percentages, but one should expect to see it in baseball hitting and pitching success rates.
We hypothesize that such transitory components in ability are typical in sports, and that
one should find evidence for a hot hand in a particular sport if and only if there is limited
scope for endogenous responses which transfer defensive resources across players. Thus, we
would predict that golf, bowling, track and field, and shooting (all of which admit no defense),
and baseball and tennis (which have limited or no scope for transferring defensive resources
across players), should all exhibit evidence of a hot hand. Conversely, soccer, football, hockey,
and basketball, all of which involve fluid defenses that choose how to allocate resources across
players, as well as offenses that choose how to allocate scoring opportunities across players,
should find limited or no evidence for hot hands in shooting/scoring probabilities.
Since endogenous responses, when available, should equate margins both across perma-
nent and transitory differences in abilities, we further hypothesize that one straightforward
manner to identify a priori whether a sport admits a sufficient endogenous response to negate
the effect of a hot hand or not would be to look at the unconditional variation in success
rates across players for the sport or statistic in question. If there is little variation, and if
variation is not correlated with other measures of ability (such as, say, market pay), this
is indicative of an endogenous response which equates margins. In contrast, if there is a
11
lot of variation, and this variation is related to other measures of ability, this is indicative
of the absence of such an endogenous response. That is, we argue that one can identify
the importance of endogenous defensive responses from observing differences in long-term
performance. We then hypothesize that this will in turn predict whether there are responses
to short-term variations in performance, i.e., whether one will observe a hot hand or not.
That is, the fact that Kobe Bryant’s shooting percentage (as well as that of other bas-
ketball stars) is not different from the league average, is indicative of the existence of strong
endogenous responses in basketball which equate margins. Consequently, one should see
such responses to perceptions of streakiness in basketball as well. Conversely, the fact that
Albert Pujols’ batting average and home run rate (as well as that of other baseball stars)
are consistently and predictably far above average, year in and year out, is indicative that
margins are not equated in baseball. Consequently, we would predict that one should not
expect to see an endogenous response negate the effect of streakiness either, giving rise to
a hot hand in baseball data. Presently we explore baseball data, and leave this general
hypothesis regarding endogenous responses and hot hands to future analysis.16
2.3 Related Literature
Since GVT, a vast literature on streakiness in sports – primarily conducted by statisticians
and sports enthusiasts – has developed. Much of this research attempts to identify the hot-
hand effect across a variety of sports.17 Bar-Eli et al. (2006) provides a comprehensive review
of this literature. To summarize, they conclude that “the empirical evidence for the existence
of the hot hand is considerably limited” and that while a number of the studies they survey
claim to find evidence in favor of the hot hand, “when one closely examines the results,
demonstrations of hot hands per se are rare and often weak, due to various reasons...”. Our
results lie in stark contrast to these conclusions.
Concerns related to endogenous responses in basketball have been noted from the very
beginning. Yet, they have often operated in the periphery, whereas we argue they are
fundamental to any identification strategy. As mentioned earlier, GVT attempt to address
these concerns by obtaining similar results using free-throw performance and a “controlled”
16The difference between field goal shooting and free-throw shooting in basketball is very suggestive here.As we have already noted, there is no discernable difference between the field goal percentage of the playersconsidered the best offensive players in basketball and average players. In sharp contrast, there is a verysignificant difference in free-throw shooting (which is not defended and which does not permit an endoge-nous response) between players, with players who are considered better outside shooters predictably andpersistently shooting far better in free-throws than those who are not considered good outside shooters.
17See for example Adams (1992, 1995); Vergin (2000); Koehler and Conley (2003); Larkey et al. (1989);Forthofer (1991); Wardrop (1995, 1999); Gilden and Wilson (1995); Klaassen and Magnus (2001); Frameet al. (2003).
12
shooting experiment. However, their attempts to do so lack power and are therefore unable to
reject plausible models of streakiness. Other studies have attempted to address endogeneity
by looking at sports in which endogenous responses are less of a concern. These studies are
also generally underpowered and deliver (at best) weak statistical evidence for a hot-hand
effect and have little to say about the economic magnitudes. Not surprisingly, the most
supportive evidence in favor of the hot hand has been found in horseshoe pitching (Smith,
2003) and bowling (Dorsey-Palmateer and Smith, 2004), where the scope for endogenous
responses is virtually non-existent. We differ from these papers in that: i) we have far
more data and hence far more power to our tests, and ii) we consider the hot hand and
endogenous responses to it from an economic (i.e., equilibrium) perspective, providing an
equlibrium theory of where one should expect to find and where one should not expect to
find evidence for a hot hand.
Our argument – that in equilibrium, one should not expect to see a hot-hand effect in
the basketball shooting data even if it exists – is most closely related to Aharoni and Sarig
(2012). They provide several pieces of indirect evidence that are consistent with both en-
dogenous responses and the hot hand. Namely, that players with strong recent shooting
performance (two or three consecutive successes) stay in the game longer and are more likely
to take the next shot. Further, the presence of a player with strong recent shooting perfor-
mance has a positive influence on the shooting percentage of their teammates. Moreover,
while teammate of a hot shooter exhibit higher shooting percentages (presumably due to the
increased defensive attention on the hot shoooter), the hot shooter’s shooting percentage,
conditional on taking the next shot, remains unchanged. If instead the hot hand were a
cognitive illusion then the shooting percentage of the hot shooter should decrease following
the additional defensive attention.18 We start with a similar motivation – i.e., that defensive
adjustments in basketball will equate shooting margins, thereby concealing shooting streak-
iness, but instead look for direct evidence of streakiness in another sport where we argue
that such adjustments do not exist. As such, we look for, and expect to find, direct evidence
of streakiness in baseball, where Aharoni and Sarig (2012) must look for indirect evidence
in basketball.
Within baseball, statistical analysis has largely failed to identify support for streakiness
(Siwoff, 1988; Gould, 1989; Vergin, 2000; Albert and Bennett, 2003). Most of this analysis
is conducted at the player or player-year level and involves conducting a runs test (or Wald-
18Additional playing time and a higher propensity to shoot the next shot could also be explained byan incorrect perception (and hence suboptimal decision-making) on the part of players and coaches. Forexample, if a coach incorrectly believes that player A on the opposing team is hot and (suboptimally) devotesmore defensive resources to that player, then less defensive resources will be devoted to the remaining playerscausing their shooting percentage to rise even if player A’s streak was purely random.
13
Wolfowitz test). Albright (1993) conducts regressions for the category of hits at the player-
year level. He concludes that actual performance appears to be generated in a manner
consistent with an i.i.d. process. Stern and Morris (1993) point out that Albright’s analysis
is biased against finding a streakiness effect and lacks power. The simulations we conduct in
Section 5 are in agreement with Stern and Morris (1993). The key methodological differences
between our work and the previous literature, is that we estimate regressions across using the
full panel of data and look at both batters and pitchers in a variety of statistical categories.
By doing so, we obtain additional power and overcome the downward bias. Our results
strongly reject an i.i.d. data-generating process and suggest the magnitude of the hot-hand
effect is significant and strategically important.
3 Data and Methodology
3.1 Data
We use data from Major League Baseball for the 2000-2011 seasons, which is publicly avail-
able from Retrosheet.org. Each observation in the Retrosheet data set is an event, defined to
be any occurrence that changes the state of the game, not including the pitch count. Most
of the events correspond to a plate appearance (PA), which occurs when a batter completes
his turn with a hit, walk, out or reaching base on an error.19 Each observation also includes
roughly 90 variables that describe the state of the game prior to and following the event
(e.g., batter, pitcher, score, outs, inning, base-runners, stadium, date, etc.). The data from
2000-2011 contains contains approximately 2 million observations.
3.2 Methodology
Though we do not put forth a specific model of how outcomes are determined in baseball
(or other sports), it will be useful to delve a bit into the notion of “streakiness” in order to
justify the methods used in our analysis.
3.2.1 What is Streakiness?
It is worth discussing exactly what we mean by streakiness as the term is has been used
loosely among sports fans as well as in the literature. We define a streaky player as one
whose (unconditional) likelihood of success varies over time at a relatively high frequency
19Retrosheet distinguishes between a total of 25 different types events. There are several types of events,such as a stolen base or pick off, that do not constitute a plate appearance.
14
(a “streak” lasts perhaps a week to a month).20 This notion seems to capture what is
conventional usage of streakiness or the hot hand effect. For example, in 1997 Joey Cora
batted 0.475 during a 24 game span (roughly 4 weeks) but only 0.262 during the rest of
the season. Here, the frequency (or duration) is crucial to the notion of being streaky.
Lower frequency variation is generally not considered to be indicative of streaky behavior.
For example, many players annual batting average improves during their early years and
declines in years prior to retirement. Yet few (us included), would argue that this is evidence
of streaky behavior. Rather, it is simply long term time-variation in ability. Thereby, in
order to test for the presence of streakiness, it is necessary to distinguish it from this longer
term variation in ability.
In the literature, most tests are performed at the player level. The advantage of perform-
ing tests at the player level is that they do not require an additional distinction between the
player’s ability and whether the player is hot or cold. Instead, under the null hypothesis,
the ability of the player on which the test is performed is constant throughout the sample
period (often a year), and the test is whether recent past performance predicts the outcome
of future attempts. As we will discuss in Section 5.2, due to the small sample size, this
approach generally lacks power to reject simple, yet fairly plausible, models of streakiness.
Using panel data affords a much greater number of observations, which gives us the power
necessary to identify the presence of streakiness. However, it also requires us to distinguish
between a good player and a hot player. That is, we must control for each player’s ability
when trying to detect any deviations from it.
We conceive of ability as consisting of two components. A long-term component that
captures attributes of the player that are unlikely to change over the course of a season (e.g.,
raw talent, skill, speed, strength, hand-eye coordination) and a short-term component that
captures higher frequency variations attributable to factors such as confidence, attitude and
minor adjustments in technique. The question is then whether and by how much this short-
term component influences the outcome of the current attempt. The basic test will look at
how much predictive power recent performance has on the current attempt after controlling
for long-term ability and other observable variables.
3.3 Regression Model
We estimate a (reduced-form) regression model (both logistic and OLS) for ten different
outcomes of particular interest (five for batters and five for pitchers). Namely, whether
the plate appearance resulted in a hit, homerun, strikeout, walk as well as whether the
20Whether recent success has a causal impact is not crucial to, nor excluded from, this definition.
15
batter reached base. When considering each of the first three outcomes (hit, homerun and
strikeout), we drop all plate appearances that did not result in an official at bat. At bats
are plate appearances excluding walks as well as several other rare occurrences in which the
plate appearance terminates without the batter attempting to earn a hit (e.g., when the
batter is hit by a pitch or hits a sacrifice bunt). Letting Yit denote an indicator for whether
the outcome Y occurred in player i’s attempt at time t, we estimate the following OLS and
logistic regression equations:
Yit = α + γStateit + βXit + εit (1)
log
(Pit
1− Pit
)= α + γStateit + βXit (2)
where Pit = Pr(Yit = 1), Stateit, is a measure of how well player i has performed recently
and Xit denotes a vector of control variables, which includes the long term ability of player
i, as well as the ability of the pitcher or batter he is facing. For each outcome, we estimate
the model in which i refers to a batter in a given year and the model in which i refers to
pitcher in a given year.
The coefficient of interest, γ, captures the hot hand effect. Under the null hypothesis that
this effect does not exist, we would expect that γ = 0. A positive coefficient is indicative of
streaky behavior whereas a negative coefficient is indicative of reversals.
When the dependent variable corresponds to whether the batter is walked, γ, can also
be interpreted as capturing an endogenous response of the defense. That is, suppose the
data indicates that a batter, who has performed particularly well recently, is more likely to
be walked (γ > 0). One explanation is that the batter’s ability for discerning balls from
strikes is higher than usual and thus his plate appearance is more likely to result in a walk
(i.e., he is streaky). An alternative story, is that the pitcher responds to the batter’s recent
success by throwing pitches that are more difficult to hit and thus more likely to be outside
the strike zone and result in a walk. Such a response would be most easily justified in
situations where the batter was more likely to get a home run (or extra base hit) rather than
simply a single, in which case he reaches the same base as if he was walked. Thus, when
estimating the regression coefficients for whether the batter gets walked, we will attempt to
distinguish between these explanations by including measures of both long term ability and
recent performance with respect to both hits and home runs (in addition to these measures
with respect to walks themselves).
16
3.4 Measuring State and Ability
In the baseline model, we use the player’s recent success rate (e.g., batting average) to
estimate his current state. More specifically, in our baseline regression model, we estimate
the state of a player using his recent success rate in his last L attempts.
Stateit = RLit ≡
1
L
L∑k=1
Yi,t−k. (3)
In selecting the history length (L), one encounters the following tradeoff. Longer histories
contain less noise but are also less relevant for measuring streakiness (and more relevant for
measuring long-term ability). Shorter histories contain more noise, but are also more relevant
for capturing high frequency variation. For the majority of our analysis we use L = 25 (and
drop the L superscript). This history length corresponds roughly to performance in the 5
most recent games for a batter and takes roughly a week for a “regular” batter. To account
for this, we also require that these 25 most recent attempts occurred within the last 10
days so as to ensure the measure captures “recent” performance. We arrived at L = 25 for
two reasons. First, we feel this history length correctly balances the tradeoff above using
the conventional notion of how long a typical streak lasts (i.e., several weeks to a couple
months). Second, we experimented with a variety of different history lengths using a single
season of data, for which we obtained the stronger results using L = 25, than alternative
history lengths in which we obtained qualitatively similar (though quantitatively weaker)
results. We also report estimates using the full data set for two alternative lengths L=10
and L=40. We also consider several additional methods for measuring a player’s state, (see
Section 3.6.1), for which we obtain similar results.
We consider a variety of methods to control for the player’s ability. In the baseline
model, we do so by calculating the player’s rate of success during the season not including a
“window” of W attempts before and after the current attempt.
Abilityit =1
Ni − 2W×
∑s/∈[t−W,t+W ]
Yis (4)
The window is needed to avoid biasing the estimates for a wide-range of data-generating
processes. To understand the need for the window, consider an attempt in which Rit > Ri,
where Ri is player i’s overall (unwindowed) success rate over the course of the season. By
construction, the success rate in all other attempts (excluding the last L) must be strictly
lower than Ri and therefore it is mechanically less likely that player i is successful in his next
attempt. Thus, the measurement of permanent ability must not overlap with the measure
17
of streakiness. We chose W = 50 in an effort to reduce any lasting effects of the player’s
(potential) streakiness (recall that we use L = 25 for the baseline model), while at the same
time balancing the need to maintain a statistically meaningful estimate of the players ability
based on Ni − W observations.21 In this vain, we drop observations with less than 100
attempts outside the window, though our results are strengthened if these observations are
included. We also consider several other ways to control for ability (see Section 3.6.1), for
which we obtain similar results.
3.5 Control Variables and Endogenous Defensive Responses
In addition to controlling for the state and ability of the player, additional control variables
were included in the regressions. In particular, the ability of the opposing batter/pitcher,
the stadium and whether the pitcher and batter were of the same hand. Ability of the
opposing batter or pitcher (player j) was measured by computing the success rate of all
batters (pitchers) who faced pitcher (batter) j in that year excluding the current at bat. For
example, we use the batting average of batters who faced pitcher j over the course of the
year excluding the current at bat. We control for the stadium by calculating the success rate
of all batters in that stadium over the course of the season. We used a dummy variable to
control for whether the batter and pitcher were of the same hand. The dummy variable is
equal to 1 if the pitcher’s throwing arm is the same as the side from which the batter hit.
As we argued in Section 2, there is a limited set of ways that a defensive baseball team can
shift resources toward better (or hotter) players and thereby equate margins across different
hitters.22 Furthermore, to the extent that they can, most of these responses can be controlled
for in the data. Most significantly, a team may choose to ”pitch around” or intentionally
walk a good hitter, or it may bring in a better pitcher to face a good hitter. In both cases,
these responses are accounted for in our analysis. First, we drop all walks when predicting
hits, homeruns and strikeouts. Second, in the regression analysis, we control for both pitcher
ability and the platoon effect.
Our data set provides a wide range of additional situational variables that one might
expect to predict the success of a particular outcome (e.g., inning of at bat, number of outs,
number of runners on base or in scoring position). However, similar to Albright (1993), these
variables were not found to be statistically significant nor do they alter the predictions of
the model and are omitted from our reported results.
21Using this measure will result in unbiased estimates if the true data generating process is i.i.d. Further,it is consistent asymptotically if as Ni →∞, we let W →∞ at a rate slower than Ni so that Ni−2W →∞.
22This is evidenced by the fact that a batters ability is generally measured by his batting average (orsuccess rate), which is in stark contrast to basketball.
18
3.6 Robustness
In addition to the baseline regression, we consider a number of alternative specifications.
Most of these involve using other methods for measure both both state and ability.
3.6.1 Alternative Measures of State
In the baseline model, the measure of state, will tend to overweight players with higher
ability. To address this, we consider two alternative strategies. The first is to use dummy
variables to indicate whether a player, based on recent performance, appears to be hot, cold
or neutral. To do so, we use an estimate in the form of Stateit = [Hotit Coldit].
• Additive Cutoffs: A player is considered hot (cold) if his recent success rate is A (B)
percentage points above (below) his long term success rate or ability. That is,
Hotit = 1 {Rit ≥ Abilityit + A}
Coldit = 1 {Rit ≤ Abilityit −B}
where A, B are chosen so that the average player is hot, in 5% of all at bats and cold
in 5% of all attempts. For this, and each of the specifications using dummy variables
below, we obtained similar results for alternative using 2.5% and 10% thresholds. We
reported these in the categories of batting average and on base percentage for batters
in Tables A.17 and A.18.23
• Proportional Cutoffs: A player is considered hot (cold) if his recent success rate is
q > 1 (r < 1) times above (below) his ability. That is,
Hotit = 1 {Rit ≥ q · Abilityit}
Coldit = 1 {Rit ≤ r · Abilityit}
where q, r are chosen so that the average player is hot in 5% of all at bats and cold in
5% of all at bats.24
As a convention, we maintain the same criterion across both batters and pitchers, despite
that outcomes which are considered “good” for batters, such as hits and homeruns, are
considered “bad” for pitchers and vice versa (strikeouts). The advantage of this convention
23Results for other categories are available upon request.24In the category of home runs and walks, Rit takes a value of zero more than 5% of the time for the
typical player. Therefore, there does not exist an r > 0 such that the average player is cold 5% of the time.Therefore, in these two categories the category “cold” is omitted.
19
is to maintain the same sign prediction for being hot (or cold) across all outcomes. That is,
regardless of whether an outcome is good or bad for the player, if the outcome has occurred
more frequently recent past, than if players are streaky, we would expect the outcome is
more likely to occur in the current attempt and hence a positive (negative) coefficient on the
hot (cold) dummy variable.
The second strategy we employ is to compute how the players current recent history, Rit,
compares to his overall distribution of histories. In this case, we construct an estimate of the
players state, which corresponds to the fraction of the time that his recent history is above
the distribution of his histories in the opposite half of the season. Letting S(t) ∈ {1, 2}denote whether the attempt t occurred in the first or second half of the season, we let
Stateit =1
|{s /∈ S(t)}|∑s/∈S(t)
1{RLit > RL
is} = freq{RLit > RL
is, ∀s /∈ S(t)} (5)
Making use of this statistic for the player’s state, we first run a a regression of the form
given by equation (1). We then construct dummy estimates of the form given above using
the following method.
• Distribution Based Cutoffs: A player is considered to be hot in attempt t if Rit is in
the right tail (top 5%) of the distribution from the opposite half of his season and cold
if Rit is in the left tail (bottom 5%).25
3.6.2 Alternate Measures of Ability
An alternative way to control for player ability is by using player fixed effects. We estimate
the following regression model:
Yit = αi + γStateit + βXit + εit (6)
Estimating (6) using standard techniques generates a downward bias on γ due to the lagged
dependent variables in the State variable.26 The intuition for the bias is essentially the same
as the logic given above for why a window is needed.27 Therefore, we have also considered
25It is necessary to use the opposite half of the season for this criteria in order to avoid the overlap bias.To see this, let ti denote the time at which i’s recent success rate is at its maximum point and assume forsimplicity that ti is unique. Since Rit+1 < Rit, it must be that Yit = 0. In other words, if a player has beenvery successful recently relative to the rest of the entire season, then in his next attempt the unconditionalprobability of success must be necessarily lower.
26See, for example, Wooldridge (2010, p 373).27In fact, including a player (player-year) fixed effect is identical to using the lifetime (annual) batting
average of the player as an ability control.
20
three additional measures to control for the players ability; the player’s success rate in the
previous year, the entire sample omitting the season in which the observation takes place and
the players’ success rate over the entire sample including the season in which the observation
takes place. For each of these measures, we obtain similar regression estimates.28
3.6.3 Test for Serial Correlation
Though we remain agnostic as to a model of streakiness, most plausible models that come to
mind generate serial correlation in the error terms (after controlling for player ability). Thus,
we test for serial correlation using a Wooldridge test. We run the test for all five categories
using batter-year as the longitudinal variable (i) and including controls for pitcher ability,
the stadium and the platoon effect. As before, in the categories of hits, homeruns and
strikeouts, we omit plate appearances that did not result in an official at bat. In all five of
the categories for batters, we strongly reject the null-hypothesis of no serial correlation. The
same result obtains when restricting to various subsamples of the data (e.g., batter-years
with at least 200 at bats).
4 Results
Regression estimates are located in the Appendix. Standard errors are clustered at the player
level. Tables A.5-A.14 contain the estimates for each of the outcomes of interest, both for
batters and pitchers. The results are quite striking. In each, the estimate for γ is both
consistent with a hot-hand effect and statistically significant at a 1% confidence level when
using the players recent success rate, RLit, as the measure of state. This is true for both the
OLS and logistic specifications (Columns (1) and (2)). For the case of the linear model, the
coefficient can be interpreted as the increase (in percentage points) in the likelihood of the
outcome given a one percentage point increase in RLit. For example, if a players recent on
base percentage increases from say 0.200 to 0.500, then the probability of getting on base
in the current attempt increases by 1.8% (or 18 points). This corresponds to just over a 5%
increase in the on base percentage for a typical batter.
The third column of Tables A.5-A.14 contains the estimates when using the fraction of
attempts for which the current recent history exceeds the distribution of recent histories in
the opposite season half (the proxy for state in (5)). Here again, we obtain statistically
significant results consistent with streakiness.
28While the latter suffers from the overlap bias, the estimator is still consistent and the number of obser-vation is significantly larger than at the season level.
21
Columns (4)-(6) contain the results when using dummy variables as a proxy for whether
the batter or pitcher is hot or cold. For hits, we find that hot batters are roughly 1.6%
(16 batting points) more likely to get a hit than cold batters. Comparing this magnitude
to the summary statistics in Table A.1-A.2, corresponds to approximately a one quartile
increase relative to the population of other batters (i.e., from the 50th percentile to the 75th
percentile). The results are similar for pitchers.
For homeruns, the median batter hits a homerun in 3% of his at bats. Thus, it is not
possible to sort out the left five percent tail using the proportional or distribution based
methods because in more than 10% of all at bats, a batter has zero homeruns in his last
25 at bats. Using the additive method, we estimate that being hot versus cold leads to 1.2
percentage point increase in the likelihood, roughly a 40% increase for the median batter.
This also roughly corresponds to a one quartile increase. A similar result obtains in for both
strikeouts and reaching base.
In the case of walks, see Tables A.13-A.14, we again find roughly a one quartile difference
between a player identified as hot and one identified as cold. The absolute change in difference
in percentage points for pitchers is about half as large as it is for batters, though standard
deviation across pitchers is also lower.
Analyzing walks permits a test for whether a defense endogenously responds to recent
performance by batters. As mentioned earlier, the most significant adjustment a defensive
team can make across hitters is to pitch around a good hitter; that is, either to avoid the
hitter by throwing pitches outside the strike zone or intentionally walking him. It is well
known that good hitters walk more frequently than bad hitters; we would also expect to
see hot hitters walk more frequently. Table A.15 provides evidence of such responses. In
particular, it shows recent home runs by a batter are a strong predictor of whether the batter
gets walked in the current at bat. Such a response is economically justifiable because giving
up a home run is substantially more penal than giving up a walk. There is also evidence that
recent hits are predictive of walks, and further that this effect is concentrated in batters with
recent extra base hits rather than singles. This is also consistent with a defense adjusting
optimally; absent runners on base, a single is equivalent to a walk and thus a defense has
little reason to pitch around a batter who is particularly good or streaky in hitting singles.
The remaining tables contain a number of robustness checks. Specifically, Table A.16
contains estimates for the regression equation 6, which includes batter fixed effects as an
alternative control for ability. In all the five categories, the coefficient on the stateis positive.
Further, it is statistically significant in each category with the exception of hits.
Tables A.17-A.18 use 2.5% and 10% thresholds (as opposed to 5%) in order to classify
a batter as hot or cold (see Section 3.6.1). For parsimony, results are reported for batters
22
in two categories (hits and on base). The statistical significance of the estimates is similar
to the 5% case. When using the 2.5% (10%) threshold, the magnitude of the coefficients
are moderately larger (smaller) than the 5% thresholds. One interpretation is that the 10%
threshold leads to a higher frequency of Type I errors (incorrectly identifying the player as
hot or cold), thereby attenuating the estimated size of the effect.
Finally, Tables A.19 and A.20 contain estimates when using a longer (L = 40) and shorter
(L = 10) history lengths in the estimate of the batter’s state as given by equations (3) and
(5). Here again, we report the results for batters in the categories of hits and on base. The
sign of the estimates is the same for all specifications except when L = 10 and using the
distribution-based estimate of state (i.e., equation (5)). The longer history length (L = 40)
results in similar estimates, while the shorter history length (L = 25) results in weaker ones.
5 Simulations
In this section, we consider a simple Markov switching model of streakiness. Our goal here
is not to propose or test a specific model of streakiness; we remain agnostic here to the
specific form that streaky behavior may take. Rather, we present this simple model to
give some intuitive sense of magnitudes of streaky behavior that are consistent with our
empirical results. That is, we will ask what parameters in the Markov model would generate
streakiness on the scale that we have observed, and we will in turn argue that these appear
to be much in line with common intuitive notions of the hot-hand in sports. In order to
do so, we will use the model to generate data, conduct regressions on simulated data, and
compare our empirical estimates to those generated by the simulated data. As expected,
given our imperfect empirical inference of the underlying state, we find the difference between
performance in high and low states in the model is greater than the difference in performance
based on inferences of the state from recent history. The magnitude of the difference is
surprisingly large. In addition, the simulations demonstrate the need for a significant amount
of data (on the order of 105 observations) in order to have the statistical power needed to
identify streaky behavior.
5.1 A Simple Model of Streakiness
A player’s ability follows a Markov-switching process. There are three underlying states:
ω ∈ {Hot(H),Normal(N),Cold(C)}. In the normal state, the player’s (unconditional) prob-
ability of success is µ. When the player is hot (cold) the probability of success is µ + ∆
(µ−∆). The transition matrix, M , is symmetric, with transition probabilities chosen such
23
that the steady state distribution across states is (κ, 1 − 2κ, κ) with the half life of the H
and C state of T1/2.
Note that ∆ captures the magnitude of the streakiness effect while κ measures its preva-
lence.29 These are the two key parameters of interest. We begin by simulating data for
different values of these two parameters and estimating the least squares regression model
given in (1) using RLit as the measure for the player’s current state and the windowed ability
given by (3).
For each pair of values (∆, κ), we simulate 100 data sets according to this process, each
containing 1,000 players with 2,000 attempts per player (a total of two million observation).
Motivated by the category of hits for batters (see Table A.2), the ability of each player, µ,
is drawn from a distribution with mean 0.270 and standard deviation 2.5%. The results are
displayed in Table 5.1, in which we report the mean estimate for γ and the mean t-statistic
across the 100 simulations. In Table 5.2, we perform the same exercise where batters are
assumed to be homogenous (each with µ = 0.270). For simplicity, we assume that all other
parameters are uniform across all players and fix T1/2 = 25, which corresponds to the length
of the history used to evaluate the players state.
Magnitude of Streakiness
Frequency ∆ = 0.01 ∆ = 0.02 ∆ = 0.04 ∆ = 0.05 ∆ = 0.10
κ = 1%0.0105∗∗∗ 0.0120∗∗∗ 0.0139∗∗∗ 0.0158∗∗∗ 0.0289∗∗∗
(2.98) (3.40) (3.94) (4.48) (8.24)
κ = 5%0.0116∗∗∗ 0.0149∗∗∗ 0.0252∗∗∗ 0.0328∗∗∗ 0.0924∗∗∗
(3.28) (4.22) (7.19) (9.39) (27.36)
κ = 10%0.0122∗∗∗ 0.0184∗∗∗ 0.0389∗∗∗ 0.0543∗∗∗ 0.1606∗∗∗
(3.44) (5.23) (11.18) (15.73) (49.63)*** p<0.01, ** p<0.05, * p<0.1
Table 5.1 – Analysis of simulated data for players of heterogenous ability: µ ∼ N (0.270, 0.025) and T1/2 =25. The first number in each box is the mean estimated γ coefficient across N = 100 simulations from thespecification given in equation (1) with RL
it as the measure for the players current state and the windowedability given by equation (3), the mean t-statistic is reported in parentheses.
Both tables yield similar results. First, as expected, higher ∆ generate larger estimates
for the coefficient and its statistical significance. Increasing κ has a similar effect; the more
frequently a player is hot (cold), the more likely it is that strong (weak) recent performance
is indicative of being hot (cold).
The most notable observation comes from a comparison to the empirical estimates from
the baseball data. The analogous estimates to Tables 5.1 and 5.2 are given in column (1)
29When either ∆ = 0 or κ = 0, the data generating process is i.i.d.
24
Magnitude of Streakiness
Frequency ∆ = 0.01 ∆ = 0.02 ∆ = 0.04 ∆ = 0.05 ∆ = 0.10
κ = 1%0.0007 0.0007 0.0037 0.0037 0.0184∗∗∗
(0.21) (0.20) (1.04) (1.03) (5.22)
κ = 5%0.0017 0.0031 0.0138∗∗∗ 0.0217∗∗∗ 0.0821∗∗∗
(0.47) (0.88) (3.91) (6.18) (24.16)
κ = 10%0.0018 0.0073∗∗ 0.0280∗∗∗ 0.0435∗∗∗ 0.1509∗∗∗
(0.52) (2.07) (7.99) (12.51) (46.36)*** p<0.01, ** p<0.05, * p<0.1
Table 5.2 – Analysis of simulated data for players of homogeneous abilities: µ = 0.270 and T1/2 = 25.The first number in each box is the mean estimated γ coefficient across N = 100 simulations from thespecification given in equation (1) with RL
it as the measure for the players current state and the windowedability given by equation (3), the mean t-statistic is reported in parentheses.
of Tables A.5 and A.6, where we estimated a coefficient of 0.0319 for batters and 0.0468
for pitchers. Notice estimate are consistent with a ∆ between 0.04 and 0.05 (40-50 batting
points) and κ between 5 and 10%.30 This corresponds to roughly a 20% increase in the
likelihood of getting a hit for a hot player compared to an average player and a 40% increase
relative to a cold player. Note that this magnitude for ∆ is 5-10 times larger than the
difference in performance between players that we infer to be hot or cold in our dummy
specifications of Tables A.5 and A.6 (columns (4)-(6)). That is, if the Markov model was the
true underlying model, our dummy specifications are sufficiently noisy in identifying true
hot and cold players, that they only yield a fraction of the underlying difference in ability
based on state.31
Thus, while we found roughly a one quartile difference between a player empirically
identified as hot and cold player in the previous section, in order to generate this result
from our Markov model we would need the true difference between hot and cold players
to be on the order of a three quartile difference in performance (roughly 80 to 100 batting
points). GVT surveyed fans on their perceptions of a hot hand, and argued that they were
implausibly large, even if a small effect existed. However, these responses, which were on
the order of 5-10 percentage points, are very much in line with the differences implied by
our simulations; if respondents to the GVT survey believed they were being asked about
the difference in performance between a correctly identified hot player and a neutral player,
without adjusting defenses (which to us is the most natural reading of the question that
30Similar conclusions obtain when estimating alternative specifications. Results available upon request.31Of course, if we knew this Markov model to be the correct model, we could estimate who is hot and cold
with a more efficient procedure. That is not our intent here, as we do not want to test this one particularmodel of streakiness. Rather our point is that true underlying streakiness will be significantly higher thandifferences in performance from empirically inferred streakiness based on recent performance, due to bothnoise and model misspecification.
25
GVT asked), then they gave responses much in line with the parameters of the Markov
model needed to generate our empirical estimates.
5.2 Power
We have argued above that in sports where defenses can adjust freely, one should not expect
to find short-run predictability in player performance, whether or not a hot hand effect
exists. In contrast, this should not be the case in sports in which defensive adjustments are
sufficiently costly. It is notable then that tests in a number of other sports have not found
evidence of a hot hand. However, many of these tests suffer from a second drawback distinct
from endogeneity, that being the lack of power needed to reject their null hypothesis. In
Table 5.3, we use the Markov model to show the number of observations needed to detect
streakiness using our methodology is on the order of 105, while the previous studies that we
are aware of have used data in which the number of observations is at least one order of
magnitude lower.
Magnitude of Streakiness
Observations ∆ = 0.01 ∆ = 0.02 ∆ = 0.04 ∆ = 0.05 ∆ = 0.10
n = 103−0.0707 −0.0475 0.0132 −0.0084 0.0224
(-0.34) (-0.22) (0.18) (0.02) (0.23)
n = 104−0.0065 0.0024 0.0163 0.0278 0.0890∗
(-0.10) (0.07) (0.35) (0.58) (1.90)
n = 105−0.0000 0.0029 0.0155 0.0234 0.0819∗∗∗
(0.00) (0.19) (1.00) (1.50) (5.44)
*** p<0.01, ** p<0.05, * p<0.1
Table 5.3 – This table illustrates the number of observations required to identify streakiness as it dependson the magnitude of the effect (for fixed κ = 0.05). The first number in each box is the mean estimated γcoefficient across N = 100 simulations from the specification given in equation (1) with RL
it as the measurefor the players current state and the windowed ability given by equation (3), the mean t-statistic is reportedin parentheses.
6 Discussion
In this section we briefly discuss three topics: First, we address endogeneity in other sports,
and hypothesize that whether one should expect to find short term predictability in other
sports will depend on the availability of endogenous defensive responses. Second, we discuss
the magnitude of the effects we find, and argue that at a minimum, they appear consistent
26
with conventional notions of hot-hands. And third, we compare the hot-hand in sports
literature with the finance literature, drawing parallels with the efficient market hypothesis,
market responses, and behavioral finance.
6.1 Other Sports
While we have discussed differences between basketball and baseball in detail, an interesting
question is how to identify which sports have sufficient scope for an endogenous response
so as to equate margins and negate the effect of a hot-hand in a given statistic. In some
sports the answer is easy: sports without defenses (i.e., golf, bowling, track and field, etc.)
trivially lack the opportunity for defensive responses. Other one-on-one sports, such as
boxing, fencing, and chess, would also seem to lack such a response, given the inability to
transfer defenses resources across players. (A boxer should always be maximizing his chance
to score points, win rounds, or knock out his opponent, regardless of how hot his opponent
is.) Then there are sports such as soccer and hockey that share the defensive feature of
basketball, where defensive resources adapt freely across offensive players, and one might
expect margins to be equated in these sports. But beyond looking sport by sport at specific
rules of play, here we propose a simple test with which we hypothesize will identify whether a
sport will exhibit short-term predictability in performance. In particular, if defenses are able
to allocate resources across players in a manner that equates margins for a particular sport,
relatively “good” players and “bad” players should have similar unconditional success ratios.
Other measures of a player’s quality (salaries, awards, etc.) should not correlate strongly
with the statistic, nor should individual player’s performance in this statistic exhibit much
correlation across years.32 On the other hand, if defenses are not able to respond to equate
margins in a given sport, one should expect better players to perform better, and to see
these performance measurements persist across years. Hence baseball all-stars should have
higher batting averages and higher home run percentages and lower earned run averages
than average players, but basketball all-stars should not have higher shooting percentages
than average players.
Applying this notion to other sports, we would predict that if and only if one sees margins
equated unconditionally across players should one expect to see margins equated temporal
within players. That is, if and only if a sports’ rules and optimal strategy yields a sufficient
defensive adjustment so that the success rates of players are equated, should one then also
expect to observe adjustments which equate success rates temporally across any given player
32One might expect to see some small correlation, if the endogenous response is significant but incompletein some settings, or if the statistic does not perfectly measure the performance over which the defense isoptimizing (i.e., equating expected shot value in basketball is not the same as equating shooting percentages).
27
regardless of whether he is hot or cold. Thus, we should find predictability in sports statis-
tics if those same statistics correlate with other measures of players ability, and if player
performance in those statistics is persistent across long periods.
6.2 Magnitude of the Hot-Hand
One common response to our results is that perhaps players do exhibit a hot hand, but
even so, it is likely that common perceptions overestimate the magnitude of this effect
(for example, See Rabin). Since we do not have a measure of the magnitude of hot-hand
perceptions with which to compare our results, we cannot dismiss such a possibility. However,
it is notable that the strength of the hot-hand that we find seems very much in line with our
intuitive notions of a hot-hand, and that a number of coaching decisions based on perceptions
of a hot-hand seem to be consistent with this as well. Take, for example, batting average.
As noted in Section 5, we find that the difference in the likelihood of a hit between what
we empirically identify as a hot and cold hitter (i.e., what could be identified using just
recent outcomes) is on the order of about 15-20 batting average points, which corresponds
approximately to about a quartile increase in ranked performance for an average hitter (i.e.,
the difference between the median hitter and the 75th percentile hitter). This would suggest
that a coach should choose to play an average quality reserve over a somewhat above average
starter if the reserve is particularly hot and/or the starter is particularly cold. At the same
time it also implies that it would not generally make sense to bench a very top hitter for
an average hitter, regardless of how cold/hot the two players are. This is very much in line
with conventional coaching wisdom, where marginal starting hitters are shuffled in and out
of the lineup on a daily basis based on their recent performance and a sense of whether they
are hot or not, whereas top starting hitters are rarely replaced, even when they are very
cold. Similarly, top starting pitchers almost always take their regular turn in the pitching
rotation (typically, once every five games) regardless of recent performance as long as they
are healthy enough to do so, whereas marginal starting pitchers will often gain/lose their
starting spots based on good/bad recent performances.33
Furthermore, it is worth noting that a player or a coach might be able to better identify
their own hot streaks or the hot streaks of players they are coaching more efficiently than
we can using only recent performance outcomes. In particular, hot/cold streaks might also
manifest in other physical or psychological displays such as confidence, harder pitches, minor
injuries, etc. observed by the player/coach but not observed by the econometrician. This
33And it is notable that coaches routinely reference recent performance when justifying such line-up moves,saying things like a certain player is “in a groove”, is “swinging a hot bat”, or has been ”throwing very welllately”.
28
would in turn magnify the difference in performance between times identified as hot and cold.
(Recall that in the Markov model we presented above in Section 6, the performance difference
between actual states was several times larger than the performances differences conditional
on observed hot and cold streaks, insofar as observed streaks are a noisy measure of actual
states.) Hence, to the extent this is true, one would expect to see more active strategic
responses to perceptions of player hotness.
6.3 Efficient Markets, Market Responses, and Behavioral Finance
It is useful to draw a comparison between the hot-hand literature and the finance literature.
A basic organizing principle in much finance research is semi-strong market efficiency: One
should not expect to realize abnormal returns (excess returns after a proper adjustment for
risk) associated with firm characteristics that are publicly known. The market should react
efficiently to any known information, and consequently, past prices, current firm characteris-
tics, firm management, strategic advantages, etc. should all be reflected in the market price
if this information is public. This is not to say that such deviations never occur; indeed,
the finance literature is replete with examples of deviations from market efficiency and these
anomalies are a source of a great deal of research. However, as a whole, this principle does a
good job explaining market movements (or the lack thereof) relating to a wide range of firm
characteristics and actions. Consequently, when research finds that a certain firm character-
istic or action does not predict future abnormal returns, the standard interpretation is that
the market has priced this factor correctly, rather than that the factor has no value. And in
order to determine the value of the factor, researchers instead try to find settings where the
factor changes unexpectedly and exogenously, or where the factor existed but its presence
was not known publicly.
The parallel to the hot-hand literature is that just as markets react efficiently to firm
characteristics and equate expected (risk-adjusted) returns, basketball defenses should react
efficiently and equate shot values across players. Basketball teams have much scope to
allocate defensive resources across offensive players, just as market participants have scope
to allocate their resources across firms. Better players should not have a higher shooting
percentage than weaker players, and hot players should not have a higher shooting percentage
(going forward) than cold players. The first of these two predictions is uncontroversial: it
is well-accepted that most NBA stars lack abnormally high career shooting percentages
precisely because they attract more defensive coverage. But in the exact same vein, it
seems that upon finding a lack of short-term predictability in shooting success, the natural
interpretation should once again be efficient defensive adjustments, and not that shooting is
29
random and perceptions of hot-hands are a cognitive mistake.
We focus our tests on baseball, where the ability to transfer defensive resources across
different hitters is very limited, and consequently one should expect short term predictability
in outcomes if a hot hand exists. That is, the rules and play in baseball do not permit a
sufficient endogenous defensive response to equate margins across different players. Our
focus on baseball instead of basketball is analogous with looking for evidence on the value
of a firm characteristic in financial markets when the markets are constrained from freely
reacting to these characteristics - say, due to limited information. In this present study, we
look for and find strong evidence for a hot hand in baseball across all ten statistical categories
that we examine. Furthermore, the magnitude is significant, and corresponds qualitatively
with conventional perceptions.
More generally, the misattribution of equilibrium outcomes to partial equilibrium moti-
vations that we believe underlies the hot hand literature appears common to us. A number of
behavioral economics findings can readily be reinterpreted as optimal equilibrium responses.
For example, results interpreted as managerial overconfidence are often indistinguishable
from optimal managerial decisions under agency, and results interpreted as attention biases
can be reinterpreted as optimal information processing under uncertainty.
Such concerns are closely related to familiar questions of endogeneity that are central to
empirical work in many areas of economics. The care that must be taken when interpreting
correlations between a choice variable and outcomes is akin to the same care needed in
interpreting results in settings where strategic interactions equate margins. Berk and Green
(2004), to cite one of many examples, note that similar returns across mutual fund managers
might not represent similar investment abilities (the standard interpretation), but rather,
could also reflect different choices of equilibrium scale in a setting of differential investing
abilities and decreasing returns to scale. This observation is quite similar to ours: similar
shooting percentages across time (conditional on recent successes) might represent similar
abilities across time (i.e., the lack of a hot hand), or it might instead represent equilibrium
defensive responses.
7 Conclusion
We test for a “hot hand” (i.e., short-term streakiness in performance) in Major League
Baseball using panel data. We consider ten statistical categories, and find strong evidence of a
hot hand in all of them. Moveover, the magnitudes are significant; strong recent performance
relative to being in a neutral state corresponds to roughly a one quartile increase in the
distribution of present performance. Thus, a 50th percentile hitter will hit like a 75th
30
percentile hitter following strong recent performance.
Our results are in notable contrast to the majority of the hot hand literature, which
has found little to no evidence for a hot hand, often employing basketball shooting data.
We argue that a primary explanation for this difference is endogenous defensive responses:
basketball presents sufficient opportunity for defensive responses to equate shooting prob-
abilities across players whereas baseball does not. As such, much prior evidence on the
absence of a hot hand despite widespread belief in its presence should not be interpreted as
a cognitive mistake—as it typically is in the literature, but rather, as an efficient equilibrium
adjustment.
We also provide a simple heuristic for identifying a priori which sports are likely to
permit such a response and discuss potential implications for identifying the hot hand effect
or ability in other settings. Defenses should treat permanent and transient skill differences of
offensive players in a similar manner: defenses are likely to adjust to equate margins across
hot and cold players if and only if they also adjust to equate margins across good and bad
players. Hence we should expect to see streakiness in a statistic if and only if we also see
permanent quality differences across players in the same statistic.
Another drawback in much of the literature is a lack of power in the tests considered.
Baseball panel data – because it is so extensive – allows us to address such issues. In
simulations we address both the questions of what type of streakiness parameters would be
necessary in order to generate our observed effects, and how much data is necessary in order
to reliably identify or reject the presence of a hot hand. We consider a simple Markov model
and find that our results are consistent with a rather strong hot hand effect, and additionally,
the need for a significant number of observations to effectively distinguish such a model from
random outcomes.
A final remark on the hot-hand as a “behavioral” phenomenon is in order. We opened
this paper noting that the absence of a hot-hand in sports is a common motivating example
of behavioralists. Our results challenge the common behavioral interpretation of the hot
hand as a cognitive mistake, but moreover, we question whether the hot hand bias should
have ever been interpreted as a behavioral phenomenon, even if the standard interpretation
were true. The behavioral mistake that the prior literature purports to show is a tendency
to infer patterns in random data. Conversely, however, behavioral finance research has
focused on challenging the efficient market hypothesis prediction of no predictability in asset
returns.34 Indeed, this literature argues that psychological considerations and inferential
mistakes, together with limits to arbitrage, should lead to predictability in market returns
34Predictability could also be due to time-varying required rates of return, but that is tangential to ourpoint here.
31
(see Shleifer, 2000; Hirshleifer, 2001). Here it is rather interesting that proponents of efficient
markets have argued along the same lines of hot-hand behaviorists, noting the tendency to
infer patterns in random data. And similarly, sports fans, players, and coaches, argue (as do
we) that psychological factors are an important mechanism underlying hot-hands in sports.
Our point in drawing this analogy between the asset pricing literature and the hot hand
literature is not to say that the outcome in these areas need be related in any manner;
it could well be the case the stock market exhibits predictability, yet at the same time
sports players performance does not. What does seem odd to us, however, is that both the
existence of predictability (in the market) and the purported lack of predictability (in sports
performance) are both taken as evidence in favor of a behavioral approach. One can only
wonder whether our results will also be hailed as further behavioral evidence.35
35Psychology affects sports performance!
32
References
Adams, R. M. (1992): “The’Hot Hand’revisited: Successful basketball shooting as a func-
tion of intershot interval,” Perceptual and Motor Skills, 74, 934–934.
——— (1995): “Momentum in the performance of professional tournament pocket billiards
players.” International Journal of Sport Psychology.
Aharoni, G. and O. H. Sarig (2012): “Hot hands and equilibrium,” Applied Economics,
44, 2309–2320.
Albert, J. and J. Bennett (2003): Curve ball: Baseball, statistics, and the role of chance
in the game, Springer.
Albright, S. C. (1993): “A statistical analysis of hitting streaks in baseball,” Journal of
the American Statistical Association, 88, 1175–1183.
Bar-Eli, M., S. Avugos, and M. Raab (2006): “Twenty Years of Hot Hand Research:
Review and Critique,” Psychology of Sports and Excercise, 7, 525–553.
Berk, J. and R. Green (2004): “Mutual fund flows and performance in rational markets,”
Journal of Political Economy, 112, 1269–1295.
Dorsey-Palmateer, R. and G. Smith (2004): “Bowlers’ hot hands,” The American
Statistician, 58, 38–45.
Forthofer, R. (1991): “Streak shooterThe sequel,” Chance, 4, 46–48.
Frame, D., E. Hughson, and J. Leach (2003): “Runs, regimes, and rationality: The
hot hand strikes back,” Tech. rep., Working paper.
Gilden, D. L. and S. G. Wilson (1995): “Streaks in skilled performance,” Psychonomic
Bulletin & Review, 2, 260–265.
Gilovich, T., R. Vallone, and A. Tversky (1985): “The Hot Hand in Basketball: On
The Misperception of Random Sequences,” Cognitive Psychology, 17, 295–314.
Gould, S. J. (1989): “The streak of streaks,” Chance, 2, 10–16.
Hirshleifer, D. (2001): “Investor psychology and asset pricing,” Journal of Finance, 56.
Huizinga, J. and S. Weil (2008): “Hot Hand or Hot Head: The Truth About Heat
Checks in the NBA,” University of Chicago Working Paper.
33
Klaassen, F. J. and J. R. Magnus (2001): “Are points in tennis independent and
identically distributed? Evidence from a dynamic binary panel data model,” Journal of
the American Statistical Association, 96, 500–509.
Koehler, J. and C. Conley (2003): “The’Hot Hand’Myth in Professional Basketball,”
Journal of Sport & Exercise Psychology, 25, 253.
Larkey, P. D., R. A. Smith, and J. B. Kadane (1989): “It’s okay to believe in the
hot hand,” Chance, 2, 22–30.
Shleifer, A. (2000): Inefficient Markets: An Introduction to Behavioral Finance, Claren-
don Lectures in Economics. Oxford Univ Press.
Siwoff, S. (1988): The Elias Baseball Analyst, 1988, MacMillan Publishing Company.
Smith, G. (2003): “Horseshoe pitchers hot hands,” Psychonomic bulletin & review, 10,
753–758.
Stern, H. S. and C. N. Morris (1993): “A statistical analysis of hitting streaks in
baseball: Comment,” Journal of the American Statistical Association, 88, 1189–1194.
Vergin, R. C. (2000): “Winning streaks in sports and the misperception of momentum.”
Journal of Sport Behavior.
Wardrop, R. L. (1995): “Simpson’s paradox and the hot hand in basketball,” The Amer-
ican Statistician, 49, 24–28.
——— (1999): “Statistical tests for the hot-hand in basketball in a controlled setting,”
American Statistician, 1, 1–20.
Wooldridge, J. (2010): Econometric Analysis of Cross Section and Panel Data, The MIT
Press, 2 ed.
34
A Tables
variable mean sd p25 p50 p75 p99 NBatting .273 .0292 .253 .272 .292 .342 3265Homerun .0332 .0195 .0185 .0308 .046 .0856 3265Strikeout .183 .0617 .138 .177 .222 .347 3265OnBase .345 .0377 .32 .343 .368 .444 3265Walk .0994 .0352 .0749 .0945 .12 .193 3265PlateAppearances 492 120 388 498 596 702 3265AtBats 442 107 351 449 533 640 3265Source:
Table A.1 – Batter-year Summary Statistics. This table provides summary statistics of MLB battingstatistics at the batter-year level over the duration of the sample period (2000-2011). Batter-years in whichthe batter had fewer than 300 plate appearances have been excluded from this table.
variable mean sd p25 p50 p75 p99 NBatting .268 .0223 .254 .268 .282 .323 789Homerun .0304 .0169 .0178 .0289 .0415 .0728 789Strikeout .191 .059 .147 .186 .228 .346 789OnBase .339 .0299 .32 .338 .356 .416 789Walk .0966 .0302 .0758 .0939 .113 .18 789PlateAppearances 2035 1726 668 1413 3034 7132 789AtBats 1830 1543 594 1283 2754 6563 789Source:
Table A.2 – Batter Summary Statistics. This table provides summary statistics of MLB battingstatistics at the batter level over the duration of the sample period (2000-2011). Batter-years in which thebatter had fewer than 300 plate appearances have been excluded from this table.
35
variable mean sd p25 p50 p75 p99 NBatting .264 .0309 .245 .266 .285 .336 2607Homerun .0304 .0107 .0232 .0298 .0369 .06 2607Strikeout .191 .0554 .151 .183 .223 .352 2607OnBase .333 .0325 .312 .333 .354 .406 2607Walk .093 .0263 .0744 .091 .109 .162 2607PlateAppearances 570 219 343 560 783 947 2607AtBats 519 204 309 505 717 885 2607Source:
Table A.3 – Pitcher-year Summary Statistics. This table provides summary statistics of MLB pitchingat the pitcher-year level over the duration of the sample period (2000-2011). Pitcher-years in which thepitchers faced fewer than 300 batters have been excluded from this table.
variable mean sd p25 p50 p75 p99 NBatting .264 .028 .247 .268 .283 .327 820Homerun .0307 .00985 .0249 .0304 .0359 .0571 820Strikeout .191 .0538 .153 .182 .223 .345 820OnBase .335 .0284 .319 .336 .353 .401 820Walk .0965 .0232 .0803 .0948 .11 .158 820PlateAppearances 1814 1978 432 979 2370 8624 820AtBats 1650 1818 391 879 2161 7893 820Source:
Table A.4 – Pitcher Summary Statistics. This table provides summary statistics of MLB pitching atthe pitcher level over the duration of the sample period (2000-2011). Pitcher-years in which the pitchersfaced fewer than 300 batters have been excluded from this table.
36
(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5
mainBatter State 0.0319∗∗∗ 0.160∗∗∗ 0.00564∗∗∗
(6.27) (6.29) (3.90)
Hot 0.00829∗∗∗ 0.00680∗∗∗ 0.00473∗∗
(4.27) (3.90) (2.60)
Cold -0.00741∗∗∗ -0.00930∗∗∗ -0.00388(-3.49) (-4.29) (-1.77)
Batter Ability 0.360∗∗∗ 1.811∗∗∗ 0.387∗∗∗ 0.383∗∗∗ 0.384∗∗∗ 0.381∗∗∗
(15.74) (16.00) (25.63) (16.08) (16.19) (14.81)
Pitcher Ability 0.510∗∗∗ 2.589∗∗∗ 0.513∗∗∗ 0.511∗∗∗ 0.511∗∗∗ 0.515∗∗∗
(33.37) (32.74) (34.72) (33.38) (33.38) (32.43)
Stadium 0.672∗∗∗ 3.357∗∗∗ 0.678∗∗∗ 0.676∗∗∗ 0.676∗∗∗ 0.679∗∗∗
(17.33) (17.45) (20.92) (17.40) (17.39) (16.57)
Samehand -0.0127∗∗∗ -0.0638∗∗∗ -0.0126∗∗∗ -0.0127∗∗∗ -0.0127∗∗∗ -0.0128∗∗∗
(-11.60) (-11.55) (-14.71) (-11.53) (-11.53) (-11.32)
Observations 1192266 1192266 1087227 1192266 1192266 1101466
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.5 – Dependent Variable: Hits (Batters) This table reports estimates for the batter regressionswhere the dependent variable indicates a hit. Column (1) gives the estimates from the OLS regression givenin equation (1) and using the baseline measure of the batter’s state as estimated by equation (3). Column(2) gives estimates for the logistic regression given in equation (2) using the same independent variables.Columns (3)-(6) contain OLS estimates using alternative measures of the batter’s state: column (3) uses theestimate of state given in equation (5); columns (4)-(6) use the additive, proportional and distribution basedcutoffs respectively to measure the batter’s current state (hot, cold or normal). In all columns, the measureof the batter’s state use his 25 most recent at bats (L=25). Note that all plate appearances that did notconstitute an at bat were excluded from the regressions. Standard errors are clustered at the batter level.
37
(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5
mainPitcher State 0.0468∗∗∗ 0.240∗∗∗ 0.0118∗∗∗
(9.61) (9.58) (7.59)
Hot 0.00776∗∗∗ 0.00740∗∗∗ 0.00387∗
(4.07) (3.98) (2.42)
Cold -0.00842∗∗∗ -0.00871∗∗∗ -0.00855∗∗∗
(-4.64) (-4.53) (-4.20)
Pitcher Ability 0.372∗∗∗ 1.910∗∗∗ 0.451∗∗∗ 0.401∗∗∗ 0.402∗∗∗ 0.420∗∗∗
(15.73) (15.56) (17.43) (16.55) (16.54) (16.51)
Batter Ability 0.552∗∗∗ 2.892∗∗∗ 0.551∗∗∗ 0.553∗∗∗ 0.553∗∗∗ 0.551∗∗∗
(49.41) (48.02) (46.99) (49.44) (49.45) (47.42)
Stadium 0.677∗∗∗ 3.442∗∗∗ 0.652∗∗∗ 0.690∗∗∗ 0.691∗∗∗ 0.682∗∗∗
(19.73) (19.70) (18.03) (19.93) (19.98) (18.85)
Samehand -0.0128∗∗∗ -0.0661∗∗∗ -0.0125∗∗∗ -0.0128∗∗∗ -0.0128∗∗∗ -0.0126∗∗∗
(-10.24) (-10.23) (-9.49) (-10.18) (-10.18) (-9.65)
Constant -0.159∗∗∗ -3.208∗∗∗ -0.168∗∗∗ -0.158∗∗∗ -0.158∗∗∗ -0.161∗∗∗
(-17.82) (-67.75) (-17.95) (-17.50) (-17.50) (-17.16)
Observations 1128156 1128156 998268 1128156 1128156 1032373
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.6 – Dependent Variable: Hits (Pitchers) This table reports estimates for the pitcher regres-sions where the dependent variable indicates a hit. Column (1) gives the estimates from the OLS regressiongiven in equation (1) and using the baseline measure of the pitcher’s state as estimated by equation (3). Col-umn (2) gives estimates for the logistic regression given in equation (2) using the same independent variables.Columns (3)-(6) contain OLS estimates using alternative measures of the pitcher’s state: column (3) usesthe estimate of state given in equation (5); columns (4)-(6) use the additive, proportional and distributionbased cutoffs respectively to measure the batter’s current state (hot, cold or normal). In all columns, themeasure of the pitcher’s state use his 25 most recent at bats (L=25). Note that all plate appearances thatdid not constitute an at bat were excluded from the regressions. Standard errors are clustered at the pitcherlevel.
38
(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5
mainBatter State 0.0751∗∗∗ 1.926∗∗∗ 0.0730∗∗∗
(15.94) (15.60) (14.59)
Hot 0.00422∗∗∗ 0.00743∗∗∗ 0.00206∗∗∗
(6.05) (8.68) (3.36)
Cold -0.00547∗∗∗ -0.00642∗
(-5.79) (-2.04)
Batter Ability 0.691∗∗∗ 18.62∗∗∗ 0.704∗∗∗ 0.758∗∗∗ 0.763∗∗∗ 0.767∗∗∗
(49.99) (33.79) (49.19) (54.55) (52.28) (52.96)
Pitcher Ability 0.346∗∗∗ 10.29∗∗∗ 0.344∗∗∗ 0.348∗∗∗ 0.347∗∗∗ 0.345∗∗∗
(19.35) (20.83) (18.28) (19.30) (19.35) (18.28)
Stadium 0.664∗∗∗ 19.73∗∗∗ 0.676∗∗∗ 0.680∗∗∗ 0.673∗∗∗ 0.693∗∗∗
(19.41) (20.51) (18.88) (19.44) (19.50) (19.02)
Samehand -0.00438∗∗∗ -0.120∗∗∗ -0.00458∗∗∗ -0.00428∗∗∗ -0.00435∗∗∗ -0.00449∗∗∗
(-8.56) (-7.23) (-8.57) (-8.22) (-8.38) (-8.26)
Observations 1192266 1192266 1101466 1192266 1192266 1101466
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.7 – Dependent Variable: Home runs (Batters) This table reports estimates for the batterregressions where the dependent variable indicates a home run. Column (1) gives the estimates from the OLSregression given in equation (1) and using the baseline measure of the batter’s state as estimated by equation(3). Column (2) gives estimates for the logistic regression given in equation (2) using the same independentvariables. Columns (3)-(6) contain OLS estimates using alternative measures of the batter’s state: column(3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive, proportional anddistribution based cutoffs respectively to measure the batter’s current state (hot, cold or normal). In allcolumns, the measure of the batter’s state use his 25 most recent at bats (L=25). Note that all plateappearances that did not constitute an at bat were excluded from the regressions. Standard errors areclustered at the batter level.
39
(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5
mainPitcher State 0.0334∗∗∗ 1.055∗∗∗ 0.00366∗∗∗
(6.70) (6.69) (4.58)
Hot 0.00347∗∗∗ 0.00345∗∗∗ 0.00326∗∗∗
(4.13) (3.96) (4.58)
Cold -0.000544 0.00238(-0.54) (0.55)
Pitcher Ability 0.253∗∗∗ 8.178∗∗∗ 0.313∗∗∗ 0.278∗∗∗ 0.270∗∗∗ 0.283∗∗∗
(11.07) (11.31) (12.10) (11.84) (11.08) (11.51)
Batter Ability 0.708∗∗∗ 20.82∗∗∗ 0.698∗∗∗ 0.708∗∗∗ 0.708∗∗∗ 0.700∗∗∗
(66.29) (79.31) (60.72) (66.33) (66.33) (61.80)
Stadium 0.610∗∗∗ 19.22∗∗∗ 0.613∗∗∗ 0.621∗∗∗ 0.620∗∗∗ 0.625∗∗∗
(18.92) (19.36) (18.22) (19.01) (19.06) (18.63)
Samehand -0.00389∗∗∗ -0.122∗∗∗ -0.00375∗∗∗ -0.00388∗∗∗ -0.00388∗∗∗ -0.00372∗∗∗
(-8.57) (-7.93) (-7.91) (-8.54) (-8.55) (-7.99)
Observations 1128156 1128156 998571 1128156 1128156 1032373
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.8 – Dependent Variable: Home runs (Pitchers) This table reports estimates for the pitcherregressions where the dependent variable indicates a home run. Column (1) gives the estimates from the OLSregression given in equation (1) and using the baseline measure of the pitcher’s state as estimated by equation(3). Column (2) gives estimates for the logistic regression given in equation (2) using the same independentvariables. Columns (3)-(6) contain OLS estimates using alternative measures of the pitcher’s state: column(3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive, proportional anddistribution based cutoffs respectively to measure the batter’s current state (hot, cold or normal). In allcolumns, the measure of the pitcher’s state use his 25 most recent at bats (L=25). Note that all plateappearances that did not constitute an at bat were excluded from the regressions. Standard errors areclustered at the pitcher level.
40
(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5
mainBatter State 0.0882∗∗∗ 0.592∗∗∗ 0.0205∗∗∗
(17.17) (17.38) (14.49)
Hot 0.0113∗∗∗ 0.0149∗∗∗ 0.0113∗∗∗
(7.08) (7.96) (7.33)
Cold -0.0123∗∗∗ -0.0177∗∗∗ -0.0136∗∗∗
(-7.86) (-9.56) (-7.58)
Batter Ability 0.778∗∗∗ 5.146∗∗∗ 0.878∗∗∗ 0.856∗∗∗ 0.863∗∗∗ 0.868∗∗∗
(74.79) (54.53) (93.49) (87.75) (89.31) (89.52)
Pitcher Ability 0.799∗∗∗ 5.185∗∗∗ 0.794∗∗∗ 0.801∗∗∗ 0.801∗∗∗ 0.797∗∗∗
(63.89) (96.18) (60.77) (63.84) (63.90) (61.02)
Stadium 0.0860∗∗∗ 0.657∗∗∗ 0.0887∗∗∗ 0.0932∗∗∗ 0.0923∗∗∗ 0.0930∗∗∗
(4.26) (4.51) (4.11) (4.54) (4.50) (4.29)
Samehand 0.0173∗∗∗ 0.123∗∗∗ 0.0171∗∗∗ 0.0173∗∗∗ 0.0173∗∗∗ 0.0172∗∗∗
(16.68) (16.34) (15.86) (16.50) (16.56) (15.93)
Observations 1192266 1192266 1088000 1192266 1192266 1101466
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.9 – Dependent Variable: Strikeouts (Batters) This table reports estimates for the batterregressions where the dependent variable indicates a strikeout. Column (1) gives the estimates from the OLSregression given in equation (1) and using the baseline measure of the batter’s state as estimated by equation(3). Column (2) gives estimates for the logistic regression given in equation (2) using the same independentvariables. Columns (3)-(6) contain OLS estimates using alternative measures of the batter’s state: column(3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive, proportional anddistribution based cutoffs respectively to measure the batter’s current state (hot, cold or normal). In allcolumns, the measure of the batter’s state use his 25 most recent at bats (L=25). Note that all plateappearances that did not constitute an at bat were excluded from the regressions. Standard errors areclustered at the batter level.
41
(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5
mainPitcher State 0.122∗∗∗ 0.796∗∗∗ 0.202∗∗∗
(22.27) (23.42) (18.34)
Hot 0.0202∗∗∗ 0.0235∗∗∗ 0.0108∗∗∗
(10.98) (11.97) (7.40)
Cold -0.0179∗∗∗ -0.0219∗∗∗ -0.0186∗∗∗
(-11.11) (-12.96) (-9.87)
Pitcher Ability 0.699∗∗∗ 4.553∗∗∗ 5.430∗∗∗ 0.807∗∗∗ 0.813∗∗∗ 0.817∗∗∗
(47.15) (56.89) (71.86) (49.83) (50.60) (51.55)
Batter Ability 0.811∗∗∗ 5.472∗∗∗ 5.471∗∗∗ 0.813∗∗∗ 0.813∗∗∗ 0.817∗∗∗
(80.59) (102.67) (96.66) (80.12) (80.46) (77.29)
Stadium 0.159∗∗∗ 1.072∗∗∗ 1.084∗∗∗ 0.185∗∗∗ 0.179∗∗∗ 0.183∗∗∗
(6.70) (6.85) (6.48) (7.49) (7.30) (7.10)
Samehand 0.0180∗∗∗ 0.126∗∗∗ 0.121∗∗∗ 0.0180∗∗∗ 0.0180∗∗∗ 0.0176∗∗∗
(10.03) (10.18) (9.37) (9.98) (9.98) (9.35)
Observations 1128156 1128156 998333 1128156 1128156 1032373
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.10 – Dependent Variable: Strikeouts (Pitchers) This table reports estimates for the pitcherregressions where the dependent variable indicates a strikeout. Column (1) gives the estimates from the OLSregression given in equation (1) and using the baseline measure of the pitcher’s state as estimated by equation(3). Column (2) gives estimates for the logistic regression given in equation (2) using the same independentvariables. Columns (3)-(6) contain OLS estimates using alternative measures of the pitcher’s state: column(3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive, proportional anddistribution based cutoffs respectively to measure the batter’s current state (hot, cold or normal). In allcolumns, the measure of the pitcher’s state use his 25 most recent at bats (L=25). Note that all plateappearances that did not constitute an at bat were excluded from the regressions. Standard errors areclustered at the pitcher level.
42
(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5
mainBatter State 0.0640∗∗∗ 0.282∗∗∗ 0.0160∗∗∗
(14.11) (14.26) (10.27)
Hot 0.0151∗∗∗ 0.0156∗∗∗ 0.00768∗∗∗
(7.65) (8.33) (4.58)
Cold -0.0134∗∗∗ -0.0132∗∗∗ -0.00790∗∗∗
(-6.92) (-6.84) (-3.93)
Batter Ability 0.539∗∗∗ 2.376∗∗∗ 0.604∗∗∗ 0.590∗∗∗ 0.589∗∗∗ 0.593∗∗∗
(20.16) (21.48) (21.16) (20.52) (20.51) (19.40)
Pitcher Ability 0.583∗∗∗ 2.599∗∗∗ 0.580∗∗∗ 0.585∗∗∗ 0.585∗∗∗ 0.585∗∗∗
(44.73) (44.59) (43.36) (44.83) (44.81) (43.51)
Stadium 0.508∗∗∗ 2.241∗∗∗ 0.521∗∗∗ 0.516∗∗∗ 0.517∗∗∗ 0.523∗∗∗
(14.05) (14.20) (13.62) (14.11) (14.11) (13.59)
samehand -0.0240∗∗∗ -0.106∗∗∗ -0.0242∗∗∗ -0.0240∗∗∗ -0.0240∗∗∗ -0.0243∗∗∗
(-19.15) (-19.23) (-18.60) (-18.99) (-19.00) (-18.60)
Observations 1458194 1458194 1320665 1458194 1458194 1353797
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.11 – Dependent Variable: On Base (Batters) This table reports estimates for the batterregressions where the dependent variable indicates the batter reached base. Column (1) gives the estimatesfrom the OLS regression given in equation (1) and using the baseline measure of the batter’s state asestimated by equation (3). Column (2) gives estimates for the logistic regression given in equation (2) usingthe same independent variables. Columns (3)-(6) contain OLS estimates using alternative measures of thebatter’s State: column (3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive,proportional and distribution based cutoffs respectively to measure the batter’s current state (hot, cold ornormal). In all columns, the measure of the batter’s state use his 25 most recent plate appearances (L=25).Standard errors are clustered at the batter level.
43
(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5
mainPitcher State 0.0720∗∗∗ 0.327∗∗∗ 0.0203∗∗∗
(0.00442) (0.0202) (0.00159)
Hot 0.0146∗∗∗ 0.0130∗∗∗ 0.0103∗∗∗
(0.00180) (0.00179) (0.00145)
Cold -0.0163∗∗∗ -0.0137∗∗∗ -0.0103∗∗∗
(0.00181) (0.00188) (0.00190)
Pitcher Ability 0.414∗∗∗ 1.877∗∗∗ 0.509∗∗∗ 0.461∗∗∗ 0.462∗∗∗ 0.463∗∗∗
(0.0229) (0.106) (0.0249) (0.0238) (0.0239) (0.0253)
Batter Ability 0.611∗∗∗ 2.797∗∗∗ 0.612∗∗∗ 0.611∗∗∗ 0.611∗∗∗ 0.611∗∗∗
(0.00976) (0.0450) (0.0103) (0.00976) (0.00976) (0.0101)
Stadium 0.552∗∗∗ 2.474∗∗∗ 0.535∗∗∗ 0.571∗∗∗ 0.573∗∗∗ 0.571∗∗∗
(0.0318) (0.143) (0.0339) (0.0324) (0.0325) (0.0343)
Samehand -0.0219∗∗∗ -0.0994∗∗∗ -0.0215∗∗∗ -0.0219∗∗∗ -0.0219∗∗∗ -0.0217∗∗∗
(0.00127) (0.00578) (0.00135) (0.00128) (0.00128) (0.00133)
Observations 1344753 1344753 1183943 1344753 1344753 1235394
Marginal effects; Standard errors in parentheses(d) for discrete change of dummy variable from 0 to 1∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.12 – Dependent Variable: On Base (Pitchers) This table reports estimates for the pitcherregressions where the dependent variable indicates the batter reached base. Column (1) gives the estimatesfrom the OLS regression given in equation (1) and using the baseline measure of the pitcher’s state asestimated by equation (3). Column (2) gives estimates for the logistic regression given in equation (2) usingthe same independent variables. Columns (3)-(6) contain OLS estimates using alternative measures of thepitcher’s State: column (3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive,proportional and distribution based cutoffs respectively to measure the pitcher’s current state (hot, cold ornormal). In all columns, the measure of the pitcher’s state use his 25 most recent plate appearances (L=25).Standard errors are clustered at the batter level.
44
(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5
mainBatter State 0.114∗∗∗ 1.141∗∗∗ 0.0210∗∗∗
(24.33) (23.66) (18.47)
Hot 0.0157∗∗∗ 0.0188∗∗∗ 0.0129∗∗∗
(12.48) (13.84) (10.77)
Cold -0.0147∗∗∗ -0.0127∗∗∗
(-11.10) (-7.98)
Batter Ability 0.673∗∗∗ 6.709∗∗∗ 0.803∗∗∗ 0.781∗∗∗ 0.779∗∗∗ 0.784∗∗∗
(37.42) (31.61) (41.97) (39.17) (39.61) (38.67)
Pitcher Ability 0.775∗∗∗ 8.617∗∗∗ 0.776∗∗∗ 0.779∗∗∗ 0.778∗∗∗ 0.779∗∗∗
(60.37) (67.03) (57.17) (60.50) (60.53) (57.85)
Stadium 0.223∗∗∗ 2.604∗∗∗ 0.227∗∗∗ 0.242∗∗∗ 0.236∗∗∗ 0.244∗∗∗
(7.16) (7.18) (6.81) (7.52) (7.41) (7.20)
Samehand -0.0160∗∗∗ -0.184∗∗∗ -0.0165∗∗∗ -0.0162∗∗∗ -0.0161∗∗∗ -0.0164∗∗∗
(-18.79) (-19.17) (-18.36) (-18.61) (-18.66) (-18.14)
Observations 1458194 1458194 1334045 1458194 1458194 1353797
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.13 – Dependent Variable: Walks (Batters) This table reports estimates for the batter regres-sions where the dependent variable indicates a walk. Column (1) gives the estimates from the OLS regressiongiven in equation (1) and using the baseline measure of the batter’s state as estimated by equation (3). Col-umn (2) gives estimates for the logistic regression given in equation (2) using the same independent variables.Columns (3)-(6) contain OLS estimates using alternative measures of the batter’s State: column (3) uses theestimate of state given in equation (5); columns (4)-(6) use the additive, proportional and distribution basedcutoffs respectively to measure the batter’s current state (hot, cold or normal). In all columns, the measureof the batter’s state use his 25 most recent plate appearances (L=25). Standard errors are clustered at thebatter level.
45
(1) (2) (3) (4) (5) (6)OLS Logit OLS dist Prop5 Add5 Dist5
mainPitcher State 0.0456∗∗∗ 0.554∗∗∗ 0.00673∗∗∗
(8.84) (9.00) (6.00)
Hot 0.00681∗∗∗ 0.00580∗∗∗ 0.00323∗∗∗
(5.72) (4.71) (3.85)
Cold -0.00569∗∗∗ -0.00432∗
(-4.16) (-2.17)
Pitcher Ability 0.655∗∗∗ 7.764∗∗∗ 0.730∗∗∗ 0.695∗∗∗ 0.697∗∗∗ 0.710∗∗∗
(43.53) (35.43) (46.84) (44.91) (44.92) (44.73)
Batter Ability 0.622∗∗∗ 7.090∗∗∗ 0.615∗∗∗ 0.623∗∗∗ 0.623∗∗∗ 0.618∗∗∗
(67.08) (78.53) (62.37) (67.09) (67.12) (64.03)
Stadium 0.305∗∗∗ 3.638∗∗∗ 0.298∗∗∗ 0.318∗∗∗ 0.314∗∗∗ 0.310∗∗∗
(9.84) (9.11) (9.31) (10.09) (10.01) (9.76)
Samehand -0.0137∗∗∗ -0.171∗∗∗ -0.0135∗∗∗ -0.0136∗∗∗ -0.0136∗∗∗ -0.0136∗∗∗
(-18.21) (-17.90) (-16.89) (-18.17) (-18.17) (-17.32)
Constant -0.0528∗∗∗ -4.039∗∗∗ -0.0585∗∗∗ -0.0540∗∗∗ -0.0534∗∗∗ -0.0541∗∗∗
(-18.21) (-94.76) (-18.97) (-18.25) (-18.23) (-18.03)
Observations 1344753 1344753 1188433 1344753 1344753 1235394
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.14 – Dependent Variable: Walks (Pitchers) This table reports estimates for the pitcherregressions where the dependent variable indicates a walk. Column (1) gives the estimates from the OLSregression given in equation (1) and using the baseline measure of the pitcher’s state as estimated by equation(3). Column (2) gives estimates for the logistic regression given in equation (2) using the same independentvariables. Columns (3)-(6) contain OLS estimates using alternative measures of the pitcher’s State: column(3) uses the estimate of state given in equation (5); columns (4)-(6) use the additive, proportional anddistribution based cutoffs respectively to measure the pitcher’s current state (hot, cold or normal). In allcolumns, the measure of the pitcher’s state use his 25 most recent plate appearances (L=25). Standarderrors are clustered at the batter level.
46
(1) (2) (3) (4)Continuous Prop5 Add5 Dist5
Batter State (walks) 0.107∗∗∗
(23.59)
Batter State (hits) 0.00593(1.94)
Batter State (hrs) 0.0727∗∗∗
(9.64)
Hot (walks) 0.0147∗∗∗ 0.0177∗∗∗ 0.0116∗∗∗
(12.72) (15.85) (12.14)
Cold (walks) -0.0151∗∗∗ -0.0140∗∗∗ -0.0111∗∗∗
(-12.88) (-11.87) (-8.31)
Hot (hits) 0.00406∗∗∗ 0.00254∗ 0.000903(3.67) (2.11) (0.94)
Cold (hits) -0.000256 -0.0000841 -0.000697(-0.21) (-0.06) (-0.63)
Hot (hrs) 0.00416∗∗∗ 0.00668∗∗∗ 0.00328∗∗∗
(3.68) (8.27) (3.72)
Cold (hrs) -0.00485∗∗∗ -0.00606∗
(-5.62) (-1.97)
Batter Ability (walks) 0.609∗∗∗ 0.731∗∗∗ 0.707∗∗∗ 0.703∗∗∗
(37.96) (89.33) (87.59) (88.52)
Batter Ability (hits) -0.0173 -0.0145 -0.0171∗ -0.0138(-1.77) (-1.76) (-2.08) (-1.67)
Batter Ability (hrs) 0.188∗∗∗ 0.262∗∗∗ 0.276∗∗∗ 0.269∗∗∗
(9.98) (18.04) (18.17) (18.50)
Pitcher Ability 0.775∗∗∗ 0.778∗∗∗ 0.777∗∗∗ 0.778∗∗∗
(60.16) (80.56) (80.49) (80.62)
Stadium 0.221∗∗∗ 0.235∗∗∗ 0.230∗∗∗ 0.240∗∗∗
(6.86) (7.38) (7.23) (7.51)
Samehand -0.0172∗∗∗ -0.0173∗∗∗ -0.0174∗∗∗ -0.0173∗∗∗
(-20.13) (-34.37) (-34.54) (-34.41)
Observations 1449800 1458194 1458194 1458194
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.15 – Endogenous Response: Walks (Batters) This table estimates the endogenous responseof the defense to the batter’s ability and state. The dependent variable indicates a walk. Column (1)incorporates in the batter’s State (as estimated by equation (3)) and ability in both hits and home runs intothe regression estimates from Table A.13 column (1). Columns (2) and (3) use additive cutoffs to measurethe batter’s state in hits, home runs and extra base hits. Estimates for the cold state in columns (3) and (4)are omitted because significantly more than 5% of the time batters have zero walks or home runs in theirprevious 25 attempts. Column (4) includes controls for batter ability. Standard errors are clustered at thebatter level.
47
(1) (2) (3) (4) (5) (6)Hit Hr Strikeout OnBase Walk WalkR
Batter State 0.00524 0.0438∗∗∗ 0.0626∗∗∗ 0.0413∗∗∗ 0.0915∗∗∗ 0.0871∗∗∗
(0.95) (8.08) (10.90) (8.64) (16.32) (15.81)
Pitcher Ability 0.525∗∗∗ 0.393∗∗∗ 0.797∗∗∗ 0.587∗∗∗ 0.796∗∗∗ 0.795∗∗∗
(31.26) (19.02) (51.40) (43.52) (53.44) (53.47)
Stadium 0.801∗∗∗ 0.958∗∗∗ 0.254∗∗∗ 0.721∗∗∗ 0.404∗∗∗ 0.396∗∗∗
(19.28) (21.79) (8.83) (18.59) (9.29) (9.13)
Samehand -0.0182∗∗∗ -0.00777∗∗∗ 0.0243∗∗∗ -0.0333∗∗∗ -0.0233∗∗∗ -0.0234∗∗∗
(-13.18) (-11.77) (16.03) (-20.96) (-18.75) (-18.82)
Batter State (hits) 0.00797∗
(2.34)
Batter State (hrs) 0.0795∗∗∗
(9.44)
Observations 1045160 1045160 1045160 1250826 1250826 1243403
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.16 – Robustness: Measuring State using Batter Fixed Effects This table gives estimatesfor the batter regressions where fixed effects are used to control for batter ability. The dependent variable islisted directly under the column number. Each column gives the estimates from the OLS regression given inequation (6) and using the baseline measure of the batter’s state as estimated by equation (3). Column (6)includes the batter’s state in hit in home runs to capture the endogenous response of the defense. Batterswith fewer than 2,500 attempts in the dataset were excluded from the regressions. Standard errors areclustered at the batter level.
48
(1) (2) (3) (4) (5) (6)Prop10 Prop2p5 Add10 Add2p5 Dist10 Dist2p5
Hot 0.00459∗∗ 0.0122∗∗∗ 0.00567∗∗∗ 0.0133∗∗∗ 0.00240 0.00528∗∗
(3.20) (4.82) (4.05) (4.94) (1.60) (2.58)
Cold -0.00496∗∗∗ -0.00735∗ -0.00454∗∗ -0.00869∗∗ -0.00512∗∗ -0.00569∗
(-3.34) (-2.49) (-2.79) (-2.95) (-2.93) (-2.21)
Observations 1192266 1192266 1192266 1192266 1101466 1101466
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.17 – Robustness to thresholds. Dependent Variable: Hits (Batters) This table reportsthe results for the same regressions as given in Columns (4)-(6) (additive, proportional and distribution basedthresholds for hot and cold) except using 10% and 2.5% thresholds as an alternative to the 5% thresholdsused in Table A.5. The same control variable were included in the regressions but the coefficients are omittedfrom the above table.
(1) (2) (3) (4) (5) (6)Prop10 Prop2p5 Add10 Add2p5 Dist10 Dist2p5
Hot 0.0122∗∗∗ 0.0172∗∗∗ 0.0128∗∗∗ 0.0155∗∗∗ 0.00822∗∗∗ 0.00533∗
(8.72) (6.69) (9.38) (5.72) (5.70) (2.58)
Cold -0.0109∗∗∗ -0.0148∗∗∗ -0.0114∗∗∗ -0.0165∗∗∗ -0.00912∗∗∗ -0.0112∗∗∗
(-7.85) (-5.43) (-7.79) (-6.16) (-5.79) (-3.87)
Observations 1458194 1458194 1458194 1458194 1353797 1353797
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.18 – Robustness to thresholds. Dependent Variable: On Base (Batters) This tablereports the results for the same regressions as given in Columns (4)-(6) (additive, proportional and distribu-tion based thresholds for hot and cold) except using 10% and 2.5% thresholds as an alternative to the 5%thresholds used in Table A.11. The same control variable were included in the regressions but the coefficientsare omitted from the above table.
49
(1) (2) (3) (4) (5) (6)OLS40 Logit40 OLS dist40 OLS10 Logit10 OLS dist10
mainBatter State 0.0526∗∗∗ 0.263∗∗∗ 0.00678∗∗∗ 0.00122∗∗∗ 0.00610∗∗∗ -0.00410
(8.21) (8.26) (4.85) (3.72) (3.73) (-1.18)
Batter Ability 0.357∗∗∗ 1.794∗∗∗ 0.388∗∗∗ 0.370∗∗∗ 1.860∗∗∗ 0.369∗∗∗
(15.88) (16.14) (25.30) (15.85) (16.12) (25.16)
Pitcher Ability 0.513∗∗∗ 2.605∗∗∗ 0.512∗∗∗ 0.514∗∗∗ 2.607∗∗∗ 0.509∗∗∗
(32.97) (32.35) (34.28) (32.98) (32.36) (34.69)
Stadium 0.680∗∗∗ 3.395∗∗∗ 0.693∗∗∗ 0.684∗∗∗ 3.418∗∗∗ 0.688∗∗∗
(17.55) (17.67) (21.17) (17.48) (17.59) (21.37)
Samehand -0.0131∗∗∗ -0.0657∗∗∗ -0.0130∗∗∗ -0.0131∗∗∗ -0.0656∗∗∗ -0.0125∗∗∗
(-11.94) (-11.89) (-14.99) (-11.77) (-11.71) (-14.69)
Observations 1154102 1154102 1068338 1154102 1154102 1101939
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.19 – Robustness to history lengths. Dependent Variable: Hits (Batters) This tablereports the results for the same regressions as given in Columns (1)-(3) of Table A.5 except using alternativehistory lengths of L=40 (Columns (1)-(3) above) and L=10 (Column (4)-(6) above) to measure the batter’scurrent stateas given by equation (3).
(1) (2) (3) (4) (5) (6)OLS40 Logit40 OLS dist40 OLS10 Logit10 OLS dist10
mainBatter State 0.0885∗∗∗ 0.389∗∗∗ 0.0176∗∗∗ 0.00375∗∗∗ 0.0165∗∗∗ 0.00129
(14.86) (15.12) (11.24) (12.63) (12.74) (0.36)
Batter Ability 0.527∗∗∗ 2.321∗∗∗ 0.612∗∗∗ 0.555∗∗∗ 2.446∗∗∗ 0.585∗∗∗
(20.36) (21.70) (20.40) (19.98) (21.28) (18.84)
Pitcher Ability 0.585∗∗∗ 2.610∗∗∗ 0.582∗∗∗ 0.584∗∗∗ 2.604∗∗∗ 0.582∗∗∗
(44.40) (44.30) (42.92) (44.31) (44.20) (43.34)
Stadium 0.515∗∗∗ 2.273∗∗∗ 0.526∗∗∗ 0.515∗∗∗ 2.270∗∗∗ 0.527∗∗∗
(14.20) (14.34) (13.71) (14.15) (14.29) (13.55)
samehand -0.0241∗∗∗ -0.107∗∗∗ -0.0243∗∗∗ -0.0242∗∗∗ -0.108∗∗∗ -0.0242∗∗∗
(-19.28) (-19.37) (-18.77) (-19.11) (-19.19) (-18.47)
Observations 1414753 1414753 1314423 1414753 1414753 1352552
t statistics in parentheses∗ p < 0.05, ∗∗ p < 0.01, ∗∗∗ p < 0.001
Table A.20 – Robustness to history lengths. Dependent Variable: On base (Batters) This tablereports the results for the same regressions as given in Columns (1)-(3) of Table A.11 except using alternativehistory lengths of L=40 (Columns (1)-(3) above) and L=10 (Column (4)-(6) above) to measure the batter’scurrent stateas given by equation (3).
50