the effects of information-based trading on the daily

60
1 The Effects of Information-Based Trading on the Daily Returns and Risks of Individual Stocks Xiangkang Yin and Jing Zhao La Trobe University First Version: 27 March 2013 This Version: 2 April 2014 Corresponding author, Department of Finance, La Trobe Business School, La Trobe University, Bundoora, Victoria 3086, Australia. Tel: 61-3-9479 3120, Fax: 61-3-9479 1654, Email: [email protected]. The authors would like to thank Xiaozhou, Zhou, Rong Wang and participants of 2014 MFA Annual Conference, 26 th Australasian Finance and Banking Conference, and seminars at La Trobe University, Audencia Nantes School of Management and Monash University for constructive comments.

Upload: others

Post on 30-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

1

The Effects of Information-Based Trading on the Daily Returns and Risks of

Individual Stocks

Xiangkang Yin and Jing Zhao

La Trobe University

First Version: 27 March 2013

This Version: 2 April 2014

Corresponding author, Department of Finance, La Trobe Business School, La Trobe University, Bundoora, Victoria 3086, Australia. Tel: 61-3-9479 3120, Fax: 61-3-9479 1654, Email: [email protected]. The authors would like to thank Xiaozhou, Zhou, Rong Wang and participants of 2014 MFA Annual Conference, 26th Australasian Finance and Banking Conference, and seminars at La Trobe University, Audencia Nantes School of Management and Monash University for constructive comments.

2

The Effects of Information-Based Trading on Daily Returns and Risks of

Individual Stocks

ABSTRACT

This paper investigates the dynamic relation between information-based trading of a stock and its

daily return and risk. It develops a theoretical model to motivate the regression specifications for

empirical analysis. Based on two samples of stocks, we demonstrate that the expected trading

imbalances of a stock determine its daily return while the expected trades determine its volatility.

Trading imbalance arisen from private information plays a dominant role in determining return

but trading due to disputable public information is the dominant contributor to risk. Public-

information trading is closely associated with idiosyncratic risk rather than systematic risk.

JEL Classification: D82, G12, G14

Keywords: Information-based trading, Return volatility, Systematic risk, Idiosyncratic risk

3

The important role played by information in securities trading is well recognized. A common

theme of information-based trading is the adverse selection issue caused by private information

as analyzed by seminal works of Grossman (1976), Glosten and Milgram (1985), Kyle (1985).

Because uninformed market players are at the risk of trading with privately informed speculators,

they require an information risk premium to compensate for their potential loss to informed

traders. In this sense, information risk is likely to be a risk factor determining asset return

differentials in a cross-section. Easley, Kiefer, O’Hara, and Paperman (1996) develop the

concept of PIN (Probability of INformed trading) to measure trading motivated by privately

informed investors. Since then, PIN measure has been widely adopted in the literature and

Easley, Hvidkjaer and O’Hara (2002, 2010) show that PIN is priced and can explain the cross-

sectional difference of securities returns. 1 Duarte and Young (2009) introduce Symmetric

Order-flow Shocks (SOSs) to securities trading into the original PIN model and define PSOS

(Probability of SOS) to isolate the illiquidity component of PIN.2 It is found that PSOS measure

is also a risk factor explaining cross-sectional difference of stock returns.

This paper studies a related but different issue. It focuses on the effects of information-

based trading of an individual stock on its daily return and risk. Similar to Duarte and Young

(2009), we consider two types of information-based trading. The first type of trading is

originated from privately informed traders, who observe some private signals of the stock and

thus have superior information over other market players. They take this informational

advantage to buy or sell the stock to maximize their profits. The second type is caused by

1 Mohanram and Rajgopal (2009) replicate Easley, Hvidkjaer and O’Hara’s (2002) work and show that although PIN is priced for the sample period 1984-1988 it does not constitute a risk factor for the period of 1998-2002. 2 Symmetric Order-flow Shock (SOS) has two explanations in Duarte and Young (2009). One cause of an SOS is the occurrence of a public information event. Because traders have different opinions or interpretations about the public information, both buy and sell orders increase when the public information is released. The other cause of an SOS is that traders simply coordinate trading at particular times to reduce trading costs. This paper inclines to the first interpretation. We will call trading activities induced by disagreement on a piece of public information disputable-public-information-induced trading or simply SOS trading.

4

different opinions or beliefs of a public information event such as an earnings announcement or

disclosure of a new investment opportunity. Investors who are optimistic of the news buy the

asset while those who are pessimistic sell it. Our main findings can be summarized as the

following. First, the expected amounts of net buy orders (i.e., the expected trading imbalances

from various sources) determine daily stock return, while the expected amounts of trades

(including both buys and sells) determine return risk. Second, privately informed trading

dominates SOS trading in both marginal and total effects on stock return,3 while SOS trading

dominates privately informed trading in return risk. Third, SOS trading or trading induced by

disputable public information has a significant effect on total risk (return variance) of individual

stocks and idiosyncratic risk but its effect on systematic risk (market beta of stock return) is

mild.4 Fourth, the size of a firm matters in the sense that smaller firms provide stronger

supporting evidence to the first three findings.

The intuitions behind the findings, particularly the first finding, are straightforward. For

a trading day with good (bad) news of a stock, privately informed speculators will quietly buy

(sell) it, which pushes the stock price up (down) and leads the market to end up with a high daily

return. Such one-sized trading moves price in one direction and it does not necessarily cause

excess volatility of return. On the other hand, a piece of disputable information can trigger a

surge in both buy and sell order flows. Such symmetric order-flow shocks lead price and return

to fluctuate but they do not substantially move return in one direction. We formalize these ideas

in a theoretical model based on Brennan, Chordia, and Subrahmanyam and Tong (2012), which

3 The marginal effect is defined as the change of dependent variable caused by one unit change of explanatory variable. The total effect is measured by the average of the absolute values of the marginal effect times the daily explanatory variable over the estimation window divided by the average of the absolute values of dependent variable over the same time period. 4 Idiosyncratic risk is measured by the standard deviation of return residuals obtained through a market model of intraday return.

5

builds up a relation between order flow and price change. We enrich the model by assuming that

order-flows arrive at the market following independent Poisson processes (Easley, Hvidkjaer and

O’Hara (2002), and Roşu (2009)). Following the model’s predictions, our empirical analysis

concentrates on the effects of expected trading imbalances caused by privately informed traders

and SOS traders on return and the effects of expected trades from these traders on stock risk.

Inspired by the concepts of PIN and PSOS,5 we use expected numbers of buy and sell

orders from different types of traders to characterize information-based trading. We use two sets

of measures. The first set includes expected numbers of net buy orders from informed traders

and SOS traders and expected numbers of trades from these traders. The second set adopts

relative measures; that is, all these expected numbers are scaled by the expected number of total

orders submitted by all investors.6 The measures based on expected numbers perfectly match

variables used in the theoretical model and they in general lead to better performance of

regression models. But the scaled measures are unit-free, which facilitate comparison across

stocks. Standard PIN and PSOS are time invariant over the estimation window such as a couple

of months to a year. Since we intend to investigate stock return and risk at daily level, we

require dynamic or daily measures of information-based trading. To this end, we adopt the

Hidden Markov Model (HMM) approach developed by Yin and Zhao (2014), which can estimate

various expected numbers of buy orders and sell orders at daily level or even at a higher

frequency. Their simulation and empirical analyses demonstrate the HMM approach can

generate quite accurate estimates measuring information-based trading.

5 PIN is usually defined as the ratio of the expected number of buy and sell orders stemming from informed traders to the expected total trades, while PSOS is defined as the ratio of the expected number of trades from SOS traders to the expected total number of trades. Thus, they are our scaled measures of total orders from privately informed trading and SOS trading, respectively. 6 Our theoretical model and empirical analysis include a third type of trading, which is motivated by liquidity needs and is irrelevant to private or public information.

6

This paper is closely related to the growing literature on the PIN model and its variants as

we have mentioned. But instead of the effects of information on the cross-sectional differentials

of expected stock returns, we focus on how information drives the evolution of daily return and

risk of each individual stock. Our premise is that stock price is continuously adjusted to private

and/or disputable public information through trading dynamics. This evolution dictates the level

and variation of daily return of each stock. Informed trading may lift average return over a long

period up because of adverse selection, negative private information should drive return down in

the short-run. More importantly, it can be shown that it is disputable public information rather

than private information that dominates the effect on the total risk and idiosyncratic risk of stock

return. This is to a certain extent consistent with Duarte and Young’s (2009) argument that it is

systematic order-flow shock (SOS) rather than private information that is decisive to the

differential of expected returns across stocks.

This paper is also related to Chordia and Subrahmanyam (2004), who investigate the

relation between order imbalance and daily return of individual risky assets. Their findings are

consistent with ours in the effect of trading imbalance on daily return, although they are more

interested in revealing how market makers dynamically accommodate autocorrelated imbalances

emanating from large traders, in order to explore the positive relation between lagged imbalances

and return. Their key explanatory variable, order imbalance, is daily observation of the

difference between buy and sell orders, which differs from our measures in two ways. First, our

order imbalance is measured by the difference of expected buy and sell orders rather than

observed numbers. This treatment enables us to filter the noise in daily observations. Second,

we decompose order imbalance according three trading movies while Chordia and

Subrahmanyam (2004) do not consider such decomposition. By separating trading types, we are

7

able to specify which type of trading is the dominant contributor to the level of daily return.

Moreover, Chordia and Subrahmanyam (2004) do not consider the risk of stock return. Instead,

we address total risks, systematic risks and idiosyncratic risks of individual stocks.

Our finding of the strong link between dispersed beliefs on stock value and total risk is

supportive to Banerjee and Kremer’s (2010) the prediction that a jump in the difference of

opinions leads to an increase in return volatility. On the other hand, our finding of the

insignificant relation between SOS trading and systematic risk for most sample stocks is related

to Patton and Verardo (2012). They propose a model where investors can use public information

to extract information of the aggregate economy and find that daily realized market beta is higher

on earnings announcement days but it declines on post-announcement days to a level below its

non-event average. Thus, the relation between market beta and trading due to disputable public

news over a period could be quite ambiguous, which is consistent with our finding. Rees and

Thomas (2010) note that forecast dispersion proxies for idiosyncratic uncertainty about future

cash flows during earnings announcements, which is in line with our empirical result that

disagreement on public news is positively related to idiosyncratic risk.

The remainder of this paper is organized as follows. Section I develops a theoretical

model of the relation between return dynamics and trading activities. Section II introduces the

approach of estimating the daily measures of information-based trading. Data and samples are

described in Section III. Section IV examines the effects of different types of trades on daily

return and total risk, while Section V studies the dynamic relationship between information-

based trading and systematic or idiosyncratic risk. Further tests and robustness checking are

briefly reported in Section VI. The concluding remarks are provided in the last section.

8

I. A Theoretical Model and Its Specifications for Empirical Analysis

To motivate our empirical analysis, we develop a dynamic model associating asset return

and its volatility with information-based trading, based on the formulation of Brennan, Chordia,

Subrahmanyam, and Tong (2012), which originates from Glosten and Harris (1988). Let ,

denote the expected value of a risky asset, conditional on the public information available

immediately after the th transaction of day t. Similar to Brennan et al. (2012), we assume that

, evolves according to

, , , , , (1)

where , is the order size of the th transaction of day t with , 0 corresponding to a buyer-

initiated trade and , 0 a seller-initiated trade, and is the (inverse) market depth parameter.

Thus, , in (1) reflects the revision in expectations upon an order arrival. It captures the

adverse selection effect of a transaction on price because the transaction can potentially contain

private information unknown to the market. Term , is the unobservable innovation between

the 1 th and th transactions due to the arrival of public information, of which all market

players have an unanimous view. Let , denote the direction of the th transaction of day t, i.e.,

, 1 if , 0 and , 1 if , 0. Brennan et al. (2012) further consider a fixed

component of transaction costs to account for inventory holding costs and fixed costs.

Therefore, the transaction price of the th trade of day t, , , can be written as

, , , . (2)

Using (1) and (2), price change between two transactions is equal to

Δ , ≡ , , , , , , . (3)

If there are transactions on day t, the total return of that day can be represented in terms of

aggregate trade-by-trade price changes, i.e., ∑ Δ , . Our focus is how the expected

9

return and volatility are affected by information-based trading. For the simplicity of exposition,

we assume the order size of each transaction being constant and normalize it to one share so that

, , . Such simplified assumption is widely adopted in theoretical analysis (see, for

example, Glosten and Milgrom (1985)). It is also consistent with the concepts of PIN and PSOS

measures in the empirical analysis, where the number of transactions rather than trading volume

is considered. Based on trade directions of order flows, we can figure out the number of buyer-

initiated orders on day t, ∑ max , , 0 , and the number of seller-initiated orders,

∑ min , , 0 .

In the literature, it is widely assumed that the arrivals of buy orders and sell orders are

independent Poisson processes (see for example, Easley, Hvidkjaer and O’Hara (2002) and Roşu

(2009)). Thus, we assume and follow Poisson distributions with parameters ; and ; ,7

respectively. A key innovation of our model is that it allows the distributions of buy and sell

order flows to vary from day to day, as reflected by time-varying parameters ; and ; . For

the total number of trades on day t, , it is also Poisson distributed with time-varying

mean and variance ; ; . We then obtain the probability of an arriving order being buyer-

initiated Pr , 1 ;

; ; and the probability of it being seller-initiated Pr , 1

;

; ;. The first two moments of the distribution of the order flow become Ε ,

; ;

; ;

and Ε , 1 , so that Var , 1 ; ;

; ;. Applying these results, we obtain the

following proposition by routine computation.

7 The Poisson distribution is a one-parameter distribution with its mean (arrival rate) equal to variance. We will use order arrival rate and the expected number of orders interchangeably.

10

Proposition 1. If price response to order flows follows (3) and buy and sell orders arrive

following independent Poisson processes with parameters ; and ; respectively, the expected

total return and variance on trading day t are

Ε ; ; E . (4)

Var ; ; 2 1 ; ;

; ;Var , (5)

where ≡ ∑ , .

Proof: See Appendix A.

Since ; and ; are the arrival rates and positive, there is ; ;

; ;1. Therefore,

; ;

; ; is a higher-order small term and is negligible in comparison to other terms. On the

other hand, reflects the fixed costs, which is small and is not the focal point of this study. If

we model it zero, the second term on the right-hand side of (5) disappears. With these

consideration, we rewrite (5) as

Var ; ; Var Constant and higher-order terms. (6)

In (4)-(6), represents the effects of non-disputable public news on day t, which is not

associated with abnormal trading (see, for example, Llorente, Michaely, Saar and Wang (2002)).

An important insight shed by the model is that the expected daily return of a risky asset is related

to the expected trade imbalance, while the volatility of return is positively related to the expected

total trade if the higher order effects are ignored. Moreover, expected daily return has direction

in the sense that whether it is positive or negative depends on whether the expected amount of

net buy orders of the day is positive or negative. This distinguishes the model from asset pricing

11

model where the focus is on the relation of expected long-term return and risk factors but return

direction is not a concern.

To study information-based trading, we should separate different types of transactions.

We consider three trading motives as we mentioned in the Introduction. The first is the liquidity

needs when investors want to adjust their portfolios to hedge their risks or rebalance their

portfolios due to some exogenous shocks. The second type of trading activities is generated by

speculative investors who have private information on the fundamental value of the asset. The

third type of trades comes from symmetric order flow shocks, such as a disputable public news

event, of which the occurrence induces some investors to buy the asset but others to sell. In

Proposition 1, the numbers of buy orders and sell orders aggregate these three types of

orders. Let ; ( ; be the arrival rate of liquidity buys (sells) on trading day t, ; ( ; the

arrival rate of privately informed buys (sells), and ; ( ; the arrival rate of buys (sells) due to

symmetric order flow shocks. Then, the means of and , ; and ; , can be decomposed

into three components

; ; ; ; , ; ; ; ; . (7)

Since liquidity trading constitutes the base of each day’s trading activities, the associated

arrival rates ; and ; are strictly positive for all trading days. The remaining two components

in ; and ; can be zero if the trading day has no private information and/or disputable

publication information. Moreover, we require that ; ; 0 since a private signal is either

positive or negative, which induces informed traders either to buy or to sell. But for ; and

; , they are either both equal to zero if there is no disputable news event or both positive if

there is such an event. Barclay and Warner (1993), Hasbrouck (1995), and Chakravarty (2001)

document evidence of a disproportionately greater price impact that is attributable to informed

12

trading. Alexander and Peterson (2007) also note that trades resulting in greater proportional

price impacts are more likely to have been made by informed traders than noise or liquidity

traders. Therefore, we modify the impact of trading on price, , , by considering these three

different types of orders empirical analysis. Furthermore, can be treated as an independent

error term n empirical analysis. In light of Proposition 1, we estimate the following regression

relations for each individual stock:

; ; ; ; ; ; , (8)

; ; ; ; ; ; , (9)

where we use to denote realized volatility to proxy for the variance of return. We include a

lagged term in the regressions to capture the persistence of dependent variables.

The variables we used in measuring trading activities are expected numbers of buy and

sell orders on a trading day. For instance, ; ; and ; ; are expected order

imbalance and expected order originated from SOS traders. Such raw measures are not

convenient for cross-sectional comparison if some stocks involving heavy trading while others

are very light. To facilitate cross-sectional comparison, we scale these measures by the expected

number of total trades (or the arrival rate of all orders), ; ; . Such scaling actually leads

us to the PIN developed by Easley et al. (1996) and PSOS introduced by Duarte and Young

(2009):

; ;

; ;, ; ;

; ;. (10)

However, we would like to point out that the original PIN and PSOS measures are constant over

the estimation window. The PIN and PSOS used in this paper are estimated by the Hidden

Markov Model (HMM) approach (see the next section), which vary from day to day. We further

13

define the ratio of the expected number of net buys due to private information to the expected

total number of trades as PNbIN (probability of net buys due to private information) and the ratio

of the expected number of net buys due to SOSs to the expected total number of trades as

PNbSOS (probability of net buys due to symmetric order flow shocks):

; ;

; ;, ; ;

; ;. (11)

Then, we can estimate an alternative version of regression models (8)-(9):

, (12)

. (13)

In these regressions, measures of liquidity trading is not explicitly included for two reasons.

First, ; ;

; ;≡ 1 so that ; ;

; ; should not be included in (13). Second, the

expected number of liquidity buys is expected to be similar to that of liquidity sells on average,

; ;

; ; is almost zero and negligible.8 To make (12) in the same format as (13), the expected net

liquidity buy, ; ;

; ;, is also dropped in (12).

To preserve tractability, we analyze price change (total return) in the theoretical model,

which is standard practice in the microstructure literature on informed trading (see for example,

Hong and Stein (1999); Chordia and Subrahmanyam (2004)). However, in the empirical

analysis, in order to preserve comparability in the cross-section, we analyze simple daily returns

and report the associated results since the implications of the theoretical model hold to both price

changes and simple returns. Total return and log return, as alternative measures to simple return,

are considered in the robustness checks.

8 Some empirical studies actually presume ; ≡ ; , see for example Easley, Engle, O’Hara and Wu (2008).

14

II. Daily Measures of Information-Based Trading

In order to empirically test the effects of information-based trading on daily return and

risk of an individual asset, we need to estimate the daily order arrival rates according to their

trading motives to measure information-based trading. Although the estimation of the standard

PIN and PSOS measures of Easley, Hvidkjaer, and O’Hara (2002) and Duarte and Young (2009)

has been discussed thoroughly in the literature, these conventional measures are constant over

the whole estimation period, say a quarter, and it is difficult to use them to well capture short-

term variations in information-based trading. To overcome the difficulty, Yin and Zhao (2014)

develop a new Hidden Markov Model (HMM) approach, which can estimate daily measures with

satisfactory accuracy. This section briefly outlines the approach and the estimation process.

The core of this approach is a Hidden Markov Model (HMM), which links the observable

trading activities to the unobservable information state of the market of a risky asset. The

information state is used to describe whether private and/or public information events of the asset

occur or not, and if they occur, how intense they are.9 More specifically, it is characterized by

the expected numbers of buyer-initiated and seller-initiated orders arriving at the market. Each

trading day is associated with a distribution of information states and its evolution portraits the

trading process of the risky asset. Formally, the HMM consists of two parts: a two-dimensional

unobservable stochastic process of state ≡ ; , ; : 1, … , , satisfying the Markov

properties; a bivariate state-dependent trading process ≡ , : 1,⋯ , . In this

model, T is the time horizon being considered, indicates the hidden state at time t, and and

represent the observable time series of buyer- and seller- initiated trades, respectively. The

9 For instance, informed investors may receive an extremely good signal of a company or it is just slightly better than the expected. The private signal can be observed by either a very limited number of investors or a relatively large amount of investors. For public information event, divergence in opinions can be either profound or mild.

15

distributions of and depend only on the current state and not on previous states or trades,

i.e.,

Pr Pr | and Pr , Pr | ,

where ≡ , , … , and ≡ , , … , . Although the Markov property implies

that conditioning on the history of the process up to time t is equivalent to conditioning only on

the most recent value of , there exists a dependence structure in the evolution of hidden states.

The transition matrix of this 2-dimensional Markov chain can be written as

Γ

γ , ; , γ , ; ,γ , ; , γ , ; ,

⋯γ , ; , γ , ; ,γ , ; , γ , ; ,

⋮ ⋱ ⋮γ , ; , γ , ; ,γ , ; , γ , ; ,

⋯γ , ; , γ , ; ,γ , ; , γ , ; ,

,

where

γ , ; , ≡ Pr ; , ; ; , ; ,

is the probability that state is , at time 1 conditional on it being , at time , and m

and n are the ranges of the two components of hidden state. The unconditional probability of the

hidden state being in state , at time t, , ; ≡ Pr ; , ; , is a key variable of any

HMM. Denoting these probabilities by the row vector

≡ , ; , , ; , … , , ; , … , , ; , … , , ; , , ; ,

we can deduce the distribution of states at time 1 from its distribution at time t by

Γ. Moreover, the distribution of future information states, over a forecast horizon of h days,

can be calculated by Γ .

16

Consistent with prior literature and the assumption in Section I, buy and sell order flows

arrive at the market according to a bivariate independent Poisson process for state , .10 Thus,

given state being , the probability of observing buy orders and sell orders at time t,

Pr | ; , ; , is , where

; ;

! and ; ;

!.

In the above expressions, λ ; and λ ; are the arrival rates of buys and sells, respectively, when

buy state is i and sell state is j. The marginal distribution of observing , at time t can

be calculated by

Pr Pr | ; , ; Pr ; , ; ,

where -diagonal matrix is defined by

≡0

⋱0

and ≡1⋮1

.

The HMM model also yields a probability distribution of states for each day, conditional on the

history of observed trades:

Pr ; , ; , for 1,2, … , . (14)

The parameters of the model include the initial distribution of states , transition matrix

Γ and order arrival rates λ ; and λ ; ( 1, 2, … , , 1, 2, … , . They can be estimated by

maximizing the following likelihood function as shown by Yin and Zhao (2014):11

Θ Γ Γ ⋯Γ .

10Although buys and sells are independent in a specific state, the observed daily numbers of buys and sells are contemporaneously and serially correlated because of correlation between states. 11 The details of parameter estimation of the HMM based on Expectation and Maximization Algorithm (see Baum, Petrie, Soules, and Weiss (1970)) are given in Appendix B.1.

17

The numbers of buy and sell states, m and n, are determined in model selection according to

information criterion, such as Akaike information criterion (AIC) or Bayesian information

criterion (BIC).

After obtaining λ ; , λ ; and Pr ; , ; in the process of

estimating the HMM, Yin and Zhao (2014) further develop a two-step approach to decompose

the order arrival rates, λ ; and λ ; into three components

; ; ; ; , ; ; ; ; . (15)

The first step applies k-means clustering together with the jump method (see Sugar and James

(2003)) to all observed trade imbalances | | 1, 2, … in order to identify the arrival

rates of trades due to private information for each hidden state. They argue that the states

belonging to the cluster with the smallest mean of trade imbalances do not contain trades with

private information. The rest states contain private information revealed by their substantial

expected trade imbalances. After partitioning the states by this way, the estimates of arrival rates

of trades due to private information, ; and ; , can be easily obtained as specified in

Appendix B.2. The second step conducts a 2-means clustering analysis on the observations of

balanced trades 1, 2, … to separate states with disputable public information

from states without disputable public information. The estimation of arrival rates of trades due

to disputable public information, ; and ; , is detailed in Appendix B.2. Therefore, we can

obtain the estimates of the arrival rates of different types of trades at trading day t in the

framework of the HMM approach,

18

; ;,

Pr ; , ; , ; ;,

Pr ; , ; ,

; ;,

Pr ; , ; , ; ;,

Pr ; , ; ,

; ;,

Pr ; , ; , ; ;,

Pr ; , ; ,

(16)

where the conditional probability of the hidden state Pr ; , ; is available after

the estimation of the HMM as detailed in Appendix B.1.

III. Data and Sample Description

We use two samples of stocks for our empirical analysis. The first dataset is a sample of

120 stocks that were traded on the New York Stock Exchange (NYSE) in 2010 and 2011. It

consists of 40 stocks randomly selected from S&P 500 Index, S&P MidCap 400 Index, and S&P

SmallCap 600 Index, respectively. The ticker symbols of these sample stocks are detailed in

Panel A of Table I. This dataset has been used by Yin and Zhao (2014), which demonstrates that

the HMM approach can effectively measures information-based trading for all the sample stocks

and performs better than prevailing approaches. In particular, both positive contemporaneous

correlation between buys and sells and serial dependence of order flows are captured with high

accuracy. Because the sample firms are selected from three indexes, they are representatives for

a variety of industries and market capitalizations. This sample includes only NYSE stocks to

avoid possible variation caused by differences in trading protocols.

INSERT TABLE I HERE

The second sample consists of all constituent stocks of S&P 500 Index in 2010 and 2011.

We exclude the stocks added or removed from the index over the sample period and the final

sample contains 451 stocks. S&P500 stocks are arguably the most actively traded stocks,

capturing 75% coverage of U.S. equities in terms of market capitalization. This sample presents

19

a more comprehensive picture of the market, particularly for large stocks. There are 40 stocks

appearing in both two samples and serving as the bridge between the two samples.

Transaction data of all sample stocks are taken from the Thomas Reuters Tick History

(TRTH) transaction database over a two-year period from January 1, 2010 through December 31,

2011. For each sample stock, transactions and quotes that occur before and at the open are

excluded, as well as those at and after the close. Quotes with zero bid or ask prices, quotes for

which the bid-ask spread is greater than 50% of the price, and transactions with zero prices are

also excluded to eliminate possible data errors. Data of November 26, 2010 and November 25,

2011 are removed due to an early “day after thanksgiving” closing. The Lee-Ready (1991)

algorithm is applied to the TRTH transaction data to determine the daily numbers of buys and

sells.

For each stock, we measure firm size by the average daily market capitalization over the

sample period. Panel B of Table I summaries the statistics of characteristics for the first sample

of 120 stocks and its three size-based groups. It indicates a positive relation between firm size

and the average daily total number of trades, and the average daily absolute trade imbalance.

Consistent with prior studies, the subsample of S&P SmallCap 600 constituents has the largest

sample mean of average percentage effective bid-ask spread and the smallest sample mean of

average daily turnover (measured by the ratio of the number of shares traded to the number of

shares outstanding). The summary statistics of the characteristics of the second sample of 451

stocks constituting S&P 500 index are provided in Panel C of Table I.

For each sample stock i, we apply the HMM approach to estimate the arrival rates of

different types of trades on each trading day as specified by (16) in the previous section. We

20

also estimate the scaled daily measures of information-based trading, i.e., , , , ,

, , , , using (10) and (11).

We consider two types of return series here, i.e., open-to-close daily returns and close-to-

close daily returns. The close-to-close daily returns are obtained from the Center for Research in

Security Prices (CRSP), while the open-to-close daily returns are calculated based on the open

and closing prices taken from the Thomas Reuters Tick History (TRTH) transaction database.

Total risk of returns is defined as return variance. As a common indicator of return variance,

realized variance provides a relatively accurate measure (Andersen and Bollerslev (1998)) and

reflects time variations in total risk. For a particular trading day, the realized variance is

calculated as the sum of squared intraday returns. As the sampling frequency of intraday returns

approaches infinity, realized variance is free from measurement errors (Andersen, Bollerslev,

Diebold, and Labys (2001)). We choose sampling frequencies of 15-minute and 10-minute for

intraday returns to balance the desire for reduced measurement error with the need to avoid the

microstructure biases that arise at the highest frequencies.

IV. The Effects of Information-based Trading on Return and Total Risk

In this section, we empirically examine the effects of information-based trading on stock

returns and their total risks.

A. Descriptive Statistics and Nonparametric Tests

To preserve comparability in the cross-section, Table II presents only descriptive

statistics of the scaled daily measures of information-based trading, i.e., PIN, PSOS, PNbIN, and

PNbSOS because their counterparts of non-scaled measures are hardly comparable across

21

sections. , , , and are the averages of their underlying daily

measures over the sample period. As Panel A shows, the cross-sectional sample means of

and are 0.136 and 0.358 respectively for the first sample of 120 stocks, reflecting the

existence of substantial information-based trading in the market. On the other hand, the cross-

sectional means of and are close to zero. While each , is likely to

be vary small and close to zero leading a close-to-zero mean, small indicates that the

privately informed buys and sells offset each other over a long period when a sell is treated as a

negative buy.

INSERT TABLE II HERE

In order to examine the daily return’s relation with information-based trading, we

separately calculate the averages of scaled daily measures for days with a positive return and for

days with a negative return.12 The results of their cross-sectional sample means are given in

Panel A of Table II as well. Consistent with the theoretical prediction from our theoretical

model, the cross-sectional mean of on trading days with positive return is 0.041, while

that on trading days with negative return is -0.051. It demonstrates that positive (negative) daily

returns are associated with net privately informed buys (sells). In contrast, the difference

between ’s on positive and negative return days is much smaller and equal to 0.003

( 0.002 0.001 . In order to further explore the contemporaneous link of return and

information-based trading for sample stocks, we test the equality of the average , ( ,

, or , ) on days with a positive return and that on days with a negative return.

Using the 5% significance level,13 we find only 13.33% of stocks whose PINs are significantly

12 We use close-to-close daily return here. In regression analysis in the next subsection, both close-to-close and open-to-close returns are used. 13 Throughout of the paper, the statistical significance level is at the 5% level if it is not specified.

22

different on positive and negative return days and the corresponding future for PSOS is 17.75%.

These figures indicate that PIN and PSOS are not good proxies for the determinants driving daily

return. Although measure may be priced in long-term asset pricing tests as a risk factor, it

does not distinguish privately informed buys with privately informed sells and has an ambiguous

effect on contemporaneous daily returns. In particular, the cross-sectional mean of is 0.136

on days with a positive return, which is very close to that on days with a negative return (0.135).

Similarly, the cross-sectional mean of on days with a positive return is close to that on

days with a negative return (0.350 vs. 0.365). However, there are 94.17% and 49.17% of stocks,

respectively, whose and PNbSOS on days with a positive return are significantly

different from those on days with a negative return. It implies that positive daily returns are

significantly driven by contemporaneous net privately informed buys but to a less extent driven

by net buys due to public information. This result is consistent with Alexander and Peterson

(2007), who argue that trades resulting in greater proportional price impacts are more likely, on

average, to have been made by informed traders than noise or liquidity traders.

For total risk, we sort trading days into quintiles by its realized variance for each

sample stock.14 The cross-sectional mean of is 0.446 for the trading days within the

largest quintile ( , while that is 0.273 for the trading days within the smallest quintile

( ). The difference-in-means test shows that on trading days within is significantly

different from that within for almost all sample stocks (i.e., 96.67% of 120 stocks), but the

corresponding figure for drops to 66.67%. For the other two measures, i.e., and

, the cross-sectional means for the trading days within are close to their counterparts

within . It implies that excess total risk of returns is mainly determined by the number of

14 Realized variance is calculated based on a time interval of 10 minutes. Both 10- and 15minute frequencies are used in the regression analysis in the next subsection.

23

orders rather than order imbalance and it is more profoundly related to disputable public

information than private information.

Panel B of Table II presents descriptive statistics of the four measures for the three size-

based subsamples of the 120 stocks, and the second sample of 451 stocks constituting S&P500

Index, respectively. The cross-sectional means of both and obtained by averaging

over all trading days decrease with firm size, which implies that information-based trading is

more prevalent in the market of small firms than that of large firms. The results of Panel A hold

for all three size-based subsamples and the second sample. In particular, for almost all sample

stocks, on days with a positive return is significantly different from that on days with a

negative return, while on days within the largest quintile is significantly different

from that on days within the smallest quintile. Monotonicity in firm size can also been seen

for difference-in-means tests of when days are sorted by daily return and tests of

and when days are sorted by . We also note that although the descriptive statistics

and nonparametric test results of the subsample of large size group are similar to those of

S&P500 sample, difference between the two still exist. This difference may reflect the fact the

S&P500 includes both NYSE and NASDAK stocks while the subsample of large size group

concentrates on 40 NYSE stocks included in S&P500 index.

In order to further investigate return and total risk in a simple nonparametric way, we sort

the trading days of each sample stock in another way, i.e., sorting days into quintiles according to

the value of one of daily measures of information based trading. We then take the averages of

daily return and realized variance of each quintile over time and across section. The results are

documented in Table III. Panel A shows that the average daily return is higher for the trading

days with a larger measure of , , consistent with the theoretical prediction of our model.

24

Such phenomenon is strongest for small, less frequently traded stocks, where the average daily

return is 1.1% for days within the smallest , quintile and 1.2% for trading days within

the largest quintile. Meanwhile, the relationship of , , , or , with the

contemporaneous daily returns are ambiguous. Although , distinguishes buys with

sells, SOS traders do not possess private information about the value of the stock so that the

profits of their trading activities cannot be assured in general. Panel B shows that the average

daily realized variance is higher for trading days with a larger measure of , , which is again

consistent with our theoretical prediction. The other three scaled measures of information-based

trading, i.e., , , , and , , do not have clear and consistent effects on the

contemporaneous total risk of returns.

INSERT TABLE III HERE

In summary, both Table II and Table III show that net information-based buys are

associated with positive contemporaneous daily returns and such effect is largely driven by

privately informed trading rather than SOS trading. The conventional measures of PIN and

PSOS do not distinguish buys from sells and thus cannot effectively reveal the short-term

association of returns with information-based trading. Excess total risk of return can be induced

by both buys and sells due to information arrivals, where the effect of trading due to disputable

public information is much stronger than that of privately informed trading. It implies the

differential effects of two-sided shocks and one-sided shocks on total risk.

B. The Effects of Information-Based trading on Daily Return

To quantify the relation between information-based trading and daily return, we examine

regression model (8), where is measured by either open-to-close or close-to-close daily return

25

on day t. The averages of regression coefficients of (8) and autocorrelation-corrected t-statistics

are reported in Panel A of Table IV. To see the direction of the effect of each explanatory

variable, we count and report the percentage of sample stocks with regression coefficient being

significantly positive or negative. While the coefficient of each explanatory variable proxies its

marginal effect, we are also interested in the total effect of the variable. In order to examine the

normalized total effect of each regressor in individual regressions, we consider a measure of

effect size, which is the ratio of average of the absolute total effects to the average of absolute

values of dependent variable. For instance, the effect size of ; ; on is calculated by

∑ ; ; ∑ | |⁄ . Panel A presents the results of the first sample of the 120

stocks, its three size-based subsamples, and the second sample of the 451 stocks constituting

S&P500 Index, respectively, for both open-to-close and close-to-close daily returns. Because the

results are robust to the choice of daily return, we take open-to-close for example to discuss.

First of all, the marginal impacts of different trades are quite different. In particular, the

regression coefficient of the expected number of net buys due to private information, ; ; ,

is positive and significant for the majority of sample stocks. In contrast, the coefficient of the

expected number of net buys due to disputable public information, ; ; , or liquidity needs,

; ; , is mostly positive but they are significant for no more than 32.50% of sample stocks.

In terms of magnitude, the marginal effect of informed trading is also the largest among the three

types of trades, as implied by its largest average regression coefficient of ; ; . The

explanatory power of SOS trading or liquidity trading to daily return is relatively low. Previous

literature (see for example Chakravarty (2001); Alexander and Peterson (2007)) note that trades

resulting in greater proportional price impacts are more likely, on average, to have been made by

26

informed traders than noise or liquidity traders.15 Our findings are consistent with this claim. It

in turn demonstrates the validity of the two-step approach in identifying the order arrivals of

different types of trades detailed in Appendix B. The difference of total effects between the

three types of trading is more impressive because of their sizes of expected net buys. As we can

see that the total effect of ; ; ranges from 26.6% to 39.3%. On the other hand, the total

effect of ; ; ranges from 9.96% to 13.3% and that of liquidity trading ranges from 7.6%

to 12.9%.

INSERT TABLE IV HERE

The information-based trading could affect the contemporaneous returns of large and

frequently traded stocks differently from small and infrequently traded stocks. We compare the

results of the three size-based subsamples of the 120 stocks reported in Panel A to explore such

possibility. The size-stratified results demonstrate that both marginal effect and total effect of

privately informed trading on daily return decreases with firm size. This implies that the price

impact of private information depends the size of stock market capitalization. All other things

being equal, private informing moves price less effectively for a stock with larger market

capitalization. The intuition behind this is straightforward. Large stocks usually trade more

frequently and the private information is more easily to be hidden by the high transaction traffic.

Thus, it is less possible for informed trades to be followed by other investors in the market.

Moreover, the sheer size of market capitalization of large stocks means that the resources owned

by informed speculators are relatively small so that their role played in these markets is relatively

small. Different from privately informed trading, neither marginal effects nor total effects of

15 Some papers in the literature (see for example Barclay and Warner (1993); Hasbrouck (1995); Chakravarty (2001); and Alexander and Peterson (2007)) document the presence of stealth trading by institutional investors and find that medium-sized trades, more likely to be attributable to informed traders, tend to have a disproportionately greater aggregate price impact.

27

both liquidity trading and public-information induced trading appear to be monotonic in firm size.

Our model seem to perform better for small firms as the average R2 of regressions decreases with

firm size from over 10% to less than 6%.

The coefficient of lagged return is negative on average and the stocks with a significantly

negative coefficient are much more that with a significantly positive coefficient. It implies that

stock returns are more like to reverse themselves rather than continue their trends. The average

effect size of lagged return is smaller than that of ; ; , ; ; or ; ; for both

samples and the three subsamples, which demonstrates the dominant role played by trading

activities in explaining return dynamics.

The results of testing regression (12) are documented in Panel B of Table IV in the same

format as Panel A. For more than 90% of the sample stocks, daily return is positively and

significantly associated with the probability of net buys due to private information .

There is no sample stock with regression coefficient of or being significantly

negative for either open-to-close or close-to-close return. The average regression coefficient of

always exceeds that of by a substantial margin, implying larger marginal

effect of private information than disputable public information on daily return. Regarding total

effect, the average effect size of is larger than 28.3% for all samples and subsamples

considered, while that of is less than 8.31%. Although the scaled measures of

information-based trading are adopted in (12) to explain daily return instead of their non-scaled

counterparts, the results of Panel A discussed above qualitatively hold here. However, for seven

out of 10 cases the average adjusted R2 in Panel B is slightly smaller than that in Panel A,

indicating regression specification (8) marginally better conforms to the prediction of theoretical

model. But when scaled measures of information-based trading are adopted as regressors in (12),

28

we can do the cross-sectional comparison between the marginal effects of the regressors by

performing a simple different-in-means test of equality of and for each sample and

subsample. As reported in the last row of Panel B, all the hypothesis tests yield a p-value less

than 0.05. It shows that the marginal effect of on daily return is significantly larger in

comparison to at the 5% level.

C. The Effects of Information-Based trading on Total Risk of Return

For total risk, we regress (9) using realized variance of intraday returns sampled at the

15- or 10-minute frequency. The results are presented in Panel A of Table V in the same fashion

as Panel A of Table IV. Let us first look at the first sample of 120 stocks and the case where

realized return variance is generated by 15-minute frequency. The regression coefficient of

; ; is positive and significant for almost all the stocks (95%), showing the substantial

effect of SOS trading on the total risk of return. Meanwhile, the regression coefficients of

; ; and ; ; are significant and positive for only 16.67% and 8.33% of the sample

stocks, respectively. In terms magnitude, the average regression coefficient associated of

; ; is also the largest, while the average coefficient of ; ; is marginally larger than

that of ; ; . These results are consistent with the prior literature that information arrival

may induce excess volatility. Our results, however, further show that the marginal effects of the

two types of information-based trading are different. The strong relation between ; ; and

reflects the substantial effect of belief divergence of public news on the volatility of stock

price. When a public information event occurs, say an announcement of profitability outlook of

a firm, investors may disagree about the implication of the event. Those with a positive view

actively buy the stock and push its price up, while those with a negative view (the announcement

29

may be not as good as expected) actively sell it and push its price down, which makes stock

return volatile in a dynamic trading process. This evidence that trading motivated by

heterogeneous beliefs amplifies total risk of return is consistent with the findings in the literature.

For instance, Shalen (1993) shows that belief dispersion gives a measure of excess price

variability and are related to price volatility. On the other hand, for the majority of sample

stocks, the marginal effect of the expected number of trades due to private information is

insignificant. It reflects the one-sided impact of private information on stock price. If informed

traders obtain a signal of stock price, it is either high or low and induces them to either buy or

sell. Thus, stock price moves in one direction without substantial fluctuations. Regression

coefficient of the lagged realized variance is also significantly positive for the majority of sample

stocks. It demonstrates that the persistence of the realized variance cannot be fully explained by

information-based trading.

INSERT TABLE V HERE

Panel A also reports the average effect sizes measuring the total effect of each

explanatory variable. The total effect of the expected number of SOS trades ; ; on return

volatility is dominant and larger than that informed trading ; ; or liquidity trading

; ; . About 49.7% of volatility can be explained by ; ; , 29.9% by ; ; , and

9.12% by ; ; . Liquidity trading as two-sided trading contributes more significantly to

realized variance than one-sided trading of privately informed. In addition, liquidity trading

exists in the market for all trading days while private information occurs less frequently. Admati

and Pfleiderer (1988) note that informed traders attempt to disguise their trades by placing them

during times of abnormally heavy trading, which helps to explain the relatively small total effect

30

of private-information trading. Overall, the regression in (9) exhibits pronounced explanatory

power, with an average adjusted R2 of 37.09% for the first sample of 120 stocks.

In order to examine the possible variation in the relation between information-based

trading and total risk, we also report the results of the three size-based subsamples in Panel A.

The earlier results obtained from the whole sample hold for all three subsamples. However, we

notice the increase of average effect size of ; ; as firm becomes larger and the overall

explanatory power of regression (9), measured by the average adjusted R2, also increases with

firm size. We further extend our analysis to the second sample of 451 stocks constituting

S&P500 Index, of which the results are similar to those of the large size group of the first sample.

To ensure that our results are not sensitive the particular choice of return frequency in

estimating realized volatility, we also measure by using the 10-minute intraday returns and

report the results of regression (9) in Panel A of Table V as well. The findings about the

association between information-based trading and total risk of return under 15-minute

frequency remain unchanged. However, the average adjusted R2 across sample stocks are further

increased compared to those results based on the 15-minute intraday returns. The building block

of our theoretical model is (3) that specifies a trade-by-trade price impact. A higher frequency

measurement for return variance is closer to the theoretical setting and it can better capture the

risk driven by intraday trading activities. Thus, regression model (9) has more explanatory

power.

In the regression specification, we exclude the higher order term of ; ;

; ;. When there

is no private information on trading day t, ; ;

; ; is expected to be negligibly small and thus

the omission of its higher order term almost yields no errors. If there are substantial trade

31

imbalances due to private information, the higher order term contributes negatively to the total

risk of return while the coefficient of ; ; is always positive as shown in (5). Since private

information is either positive or negative so that ; ; | ; ; | | ; ; |, the

regression coefficient of ; ; may become insignificant or even significantly negative

when the higher order term of ; ;

; ; is omitted in the regression specification. Panel A of

Table V shows that there exist a small number of sample stocks with regression coefficient of

; ; being negative and significant, which further validates our theoretical prediction in

Proposition 1.

In addition to expected numbers of trades generated by different trading motives, we

further test the theoretical prediction using the relative measures of expected trades, PIN and

PSOS, by running regression (13) for each sample stock. We document the results in Panel B of

Table V with measured based on 15-minute and 10-minute intraday returns, respectively.

As expected, is a dominant contributor to the return variance for almost all stocks in the

sample and its marginal and total effects are more profound than those of for all the

samples and subsamples considered. It demonstrates that more dispersed beliefs are likely to be

associated with higher return risk, while the impact of private information trading tends to less

influential. Compared with Panel A, the average adjusted R2 is reduced a noticeable margin

when these scaled measures of information-based trading are adopted. In the last row of Panel B,

we perform a difference-in-means test for the equality of the marginal effects of and ,

i.e., in (13), for each sample and subsample. All of them yield a p-value less than 0.01.

It leads us to conclude that the effect of SOS trading on total risk is significantly larger than

private-information trading.

32

V. The Effects of Information-based Trading on Systematic Risk and Idiosyncratic Risk

The variance of daily return Var in (10) represents total risk, including both

systematic risk and non-systematic (idiosyncratic) risk. Financial analysis is keen to

composition of total risk because the former is supposed to be priced while the latter is not. In

this section, we further identify the effects of information-based trading on systematic risk and

idiosyncratic risk.

A. The Estimation of Daily Beta and Idiosyncratic Risk

An important concept for evaluating an asset's exposure to systematic risk is market beta

of the asset. We consider an intraday market model to obtain estimates of the beta and

idiosyncratic risk for individual stocks on each trading day. This enables us to analyze the time-

varying association of information-based trading with beta and idiosyncratic risk. In particular,

for each sample stock, we consider the following intraday market model on trading day t

, , , (17)

where , denotes the j-th intraday return of the stock on trading day t sampled at a certain

frequency, , the j-th intraday market return, the daily beta measuring the systematic risk of

the asset, and , the j-th realized return residual. The corresponding daily idiosyncratic risk

is defined as the standard deviation of realized residuals , . Our first sample consists of

120 stocks randomly selected from three size-based indexes and we use the exchange-traded

fund (ETF) tracking the total market return as Todorov and Bollerslev (2010). More specifically,

Vanguard Total Stock Market ETF with ticker symbol VTI is used, and its intraday prices are

available in Thomas Reuters Tick History database. Our second sample is the constituents of

S&P 500 index, therefore we adopt SPDR S&P 500 ETF with ticker symbol SPY to measure the

33

corresponding market return. Both two ETFs are actively traded in the market and arbitrage

opportunities ensure that their prices do not deviate considerably from the values of their

underlying indexes. Similar to the analysis of total risk, we consider intraday stock and market

returns sampled at 15-minute and 10-minute frequency.

B. The Effects of Expected Amounts of Various Buy and Sell Orders on Market Beta and

Idiosyncratic Risk

To explore the effects of information-based trading, we run the following two regressions

for each sample stock respectively,

; ; ; ; ; ; , (18)

; ; ; ; ; ; . (19)

Table VI displays the averages of regression coefficients and autocorrelation-corrected t-

statistics of individual stock regressions, and the percentage of the sample stocks with regression

coefficient being significantly positive or negative for each explanatory variable. Panel A adopts

intraday returns sampled at the 15-minute frequency to estimate the daily beta and idiosyncratic

risk. There are a few of stylized facts. First, consistent with the results of total risk, the marginal

and total effects of SOS trading on both systematic risk and idiosyncratic risk are much larger

than the other two types of trading. The coefficient of ; ; is not only larger on average

but also has much more chance to be significantly positive and is associated with much larger

average effect size. Take S&P500 sample for example. The average effect size of ; ; is

5.18 times of that of ; ; and 6.57 times of that of ; ; . Second, the marginal effect

of ; ; on systematic and idiosyncratic risks decreases in firm size although the coefficient

of beta regression is not significant on average for medium and large subsamples. Third, all

34

three types of trades have much greater marginal and total effects on idiosyncratic risk than on

systematic risk for all samples and subsamples. The t-statistics of these three regressors are

small and insignificant for most sample and subsamples in beta regressions but they are

significant on average in the regressions of idiosyncratic risk. Fourth, corresponding to the

previous point, the average adjusted R2 of the idiosyncratic risk regressions, which is no smaller

than 40%, is also larger than its counterpart of beta regressions, which is no larger than 5%.

Fifth, the average adjusted R2 of the idiosyncratic risk regressions increases in firm size that

means our model have more explanatory power for idiosyncratic risk of large stocks.

INSERT TABLE VI HERE

Overall, our results demonstrate that information-based trading is more closely associated

with idiosyncratic risk rather than systematic risk. When there are more trading order flows

induced by investors’ dispersed beliefs on public information, the stock’s idiosyncratic risk

increases significantly. Traders willingly bear idiosyncratic risk when they perceive the asset is

mispriced. Therefore, dispersed beliefs lead to excessive idiosyncratic risk. For the link of

disputable public information and beta, our finding is related to prior empirical studies, which

document significant evidence of variation in beta typically associated with stock fundamentals.

Patton and Verardo (2012) propose a model where investors can extract public information of the

aggregate economy and find that daily realized beta increases on earnings announcement days

but declines on post-announcement days. Noting that earning announcement can trigger more

SOS trading activities because the announcement may be interpreted differently by investors (see

Yin and Zhao (2014)), this is likely to explain the link between beta and ; ; , averaged

over a two year period. This is more relevant to small firms though

35

In Panel B of Table VI, we use 10-minute intraday returns instead to estimate market beta

and idiosyncratic risk through (17). The results shown by Panel A are qualitatively unchanged,

although the overall explanatory power of (18) or (19) increases when the sampling frequency of

intraday return becomes higher. Once again, the better performance of the higher-frequency

analysis is consistent with the setting of our theoretical model, which is based on the trade-by-

trade impact on price.

C. The Effects of Daily PIN and PSOS on Market Beta and Idiosyncratic Risk

Although PIN or PSOS are considered as proxies for risk factors in the literature, there

are no studies directly examining their relationships with systematic risk or idiosyncratic risk.

To explore the relations, we run the following two regression models for each sample stock

respectively,

(20)

(21)

Panels C and D of Table VI report the results of individual regressions when 15-minute or 10-

minute intraday returns are used in estimating the intraday market model (17) respectively.

Regarding systematic risk, the overall explanatory power is low with an average adjusted R2

being less than 7%, and the measures of information-based trading may contribute positively or

negatively to daily beta. With regards to idiosyncratic risk, the average adjusted R2 is more than

37% for all samples and subsamples considered. For all sample stocks, is positively and

significantly associated with . It implies that trading due to disputable public information

generates substantial idiosyncratic risk. Meanwhile, for no more than 60% of the sample stocks,

is positively and significantly associated with . Both the marginal and total effects of

36

on systematic risk or non-systematic risk are much larger than those of . This is

consistent with the dominating role played by ; ; as we have seen previously in Panels A

and B. Moreover, contributes more significantly to non-systematic risk than to systematic

risk, evidenced by the larger percentage of sample stocks with regression coefficient of

being significant in explaining than that in explaining . It is consistent with the theoretical

prediction of Hughes, Liu and Liu (2007) that private signals at the firm-level are generally

understood to be far more informative of idiosyncratic shocks than systematic factors. Our study

empirically examines the dynamic relationship between systematic or non-systematic risk and

information risk and obtains qualitatively similar results. It implies that information risk may be

subsumed by existing risk factors, especially idiosyncratic risk.

In the last row of Panel C and Panel D, we perform a difference-in-means test of the

equality of coefficients of and for each sample and subsample considered. For the

null hypothesis of in (20) that is related to the marginal effects on daily systematic risk,

the p-value of the hypothesis test is all larger than 0.1, suggesting that the equality of the

marginal effects of and on daily beta cannot be rejected at the 10% level. As shown

by the individual stock regressions in Panels C and D, the marginal effects of both and

on daily beta is relatively weak and close to zero. Thus, they are not significantly

different. For the null hypothesis of in (21), the p-value of the hypothesis test is all

smaller than 0.001, confirming significantly different effects of and on daily

idiosyncratic risk.

37

VI. Further Tests and Robustness Checking

To examine the validity of theoretical model’s prediction and robustness of our empirical

findings, we have performed various further tests. We briefly report the major tests we have

conducted and their conclusions in this section.

A. The Significance of the Effects of Expected Amounts of Trades on Daily Return

Although whether or PSOS is priced as a risk factor in the long-term cross-sectional

analysis is heavily debated (see, for example, Duarte and Young (2009); Easley, Hvidkjaer and

O’Hara (2010)), there is no study examining how daily measures of PIN and PSOS explain the

contemporaneous return. According to Proposition 1, stock returns are determined by expected

numbers of net buy orders (order imbalances) by different trading motives while return volatility

is determined by expected trades from different investors. These predictions imply that the

expected numbers of trades generated by private information signals and disputable public

information should have weak and negligible impacts on daily return. To examine this

implication, we use ; ; , ; ; and ; ; respectively to replace ; ; ,

; ; and ; ; in (8) and the scaled measures and to replace

and in (12) and run the following regressions for each sample stock:

; ; ; ; ; ;

.

It is found that the average adjusted R2 reduces to less than 1% for all the samples and

subsamples considered and coefficients of and in all individual regressions are

insignificant. It shows the importance of distinguishing information-based buys from

information-based sells in analyzing their associations with contemporaneous returns. It should

38

be and not and that determine daily return. These findings

together with the results reported in Table IV not only demonstrate the validity of theoretical

predictions but also the effectiveness of dynamic measures of information-based trading derived

from the HMM approach.

B. Measures of Stock Return

In the empirical analysis, we adopt the rate of simple return while the theoretical model is

developed based on total return for tractability. To ensure our findings are not sensitive the

particular choice of return measure, we duplicate the analysis of Table IV for the first sample of

120 stock using both total returns and log returns as alternative measures to simple returns. The

results of Table IV qualitatively hold.

C. Higher Order Terms

The theoretical model specifies the effects of higher order terms of information-based

trading on the variance of stock return. The regression results reported thus far are of linear

regressions, ignoring the second-order terms of information-based trading. We have also run

regressions of total risk including these second-order terms. It is found that that modified models

perform better in the sense that the coefficients of linear regressors are more significant with a

correct sign and/or more stocks turn out to have significant coefficients of linear regressors with

a correct sign. The average adjusted R2 is also improved marginally.

39

VII. Concluding Remarks

This paper intends to gain a better understanding of the relation between trading activities

and the returns and risks of individual assets. We develop a theoretical model, which shed

insights of distinct effects of trading activates on stock return and risk in a dynamic fashion. The

expected net buys with different trading motives are expected to be positively related to daily

return, while the corresponding expected numbers of buy and sell orders are related to return risk.

Motivated by the theoretical predictions of the model, we run daily time-series regressions for

individual stocks. Using a sample of 120 NYSE stocks and another sample of 451 stocks

constituting S&P500 Index, it is shown that expected net buys due to private information

contributes significantly to daily returns with marginal and total effects larger than that due to

disputable public information. On the other hand, the expected order number due to disputable

public information plays a dominant role in determining return risk.

We further examine the effects of daily information-based trading on systematic or

idiosyncratic risk. We find a strong relation to idiosyncratic risk and overall explanatory power

of regression models of idiosyncratic risk is substantial. However, the association of

information-based trading with systematic risk is quite weak. For all sample stocks, the effects

of SOS trading on both types of risks dominate that of private information trading. This suggests

that on the one hand there is some risk factor related to information-based trading which is not

captured by market beta but may be subsumed by idiosyncratic risk. On the other hand, it is

SOS trading rather than private information trading determines this risk factor. These claims are

in line with Duarte and Young’s (2009) argument that PSOS is priced as a risk factor but not PIN.

However, we leave investigating the reasons for the insignificance of the effects on systematic

risk for our future research.

40

Appendix A: Proof of Proposition 1

Recalling Ε ,; ;

; ;, Var , 1 ; ;

; ;, and Ε Var ;

; , we have

E , E E , Ε Ε , Ε ; ;

; ;

; ; .

Var , Ε Var , Var E ,

Ε Var , Var E ,

Ε Var , Var E , Ε ; ; .

Therefore, after writing ∑ , as we have

E Ε , , ,

; ; E

Var Var , , ,

Var , Var , , Var

2Cov , , , ,

; ; Var , Var 2 Var ,

; ; 2 1 ; ;

; ;Var .

41

Appendix B: Estimation of the HMM and ; , ; , ; and ;

B.1. Estimation of the HMM by the Expectation and Maximization Algorithm

We apply the Baum-Welch algorithm (see Baum, Petrie, Soules, and Weiss (1970)) in the

estimation of the HMM. In particular, we regard the hidden states as missing data while the

Complete-Data Log-Likelihood (CDLL) is the log-likelihood of the parameter set based on

observed time series of buy and sell order flows and the unobservable time series of states, i.e.,

log Pr , | , where is a time series realization of state variable Ht

with t ranging from 1 to T. For forward probabilities η whose 1 -th element is

defined as η , ; Pr , ; , ; , we have η and η η Γ

for 2, 3, … , . To apply the Expectation and Maximization (EM) algorithm, we also define

ζ Γ ζ as the vector of backward probabilities for 1, 2, … , 1 with ζ ′,

where the 1 -th element of is

ζ , ; Pr , , … , | ; , ; .

Further, let , ; and , ; , ; be zero-one variables that

, ; 1ifandonlyif ; , ; ,

, ; , ; 1ifandonlyif ; , ; , ; , ; .

With this notation, the CDLL of the HMM is given by

log Pr ,

, ; log , ; , ; , ; log , ; , , ; log ,

,,

.

We use the EM algorithm to estimate the HMM as follows:

E Step: Compute the conditional expectations of the missing data, given the observations

and the current estimate of . Specifically, conditional expectations of , ; and

42

, ; , ; are estimated by:

, ; Pr ; , ; ,η , ; ζ . ;

|,

, ; , ; Pr ; , ; , ; , ; | ,η , ; , ; , , ζ , ;

|.

M Step: Maximize the CDLL, where the missing data are replaced by their conditional

expectations, to determine the estimate of . Thus, replace all , ; and , ; , ; in CDLL

by their conditional means , ; and , ; , ; , and maximize it with respect to , Γ, and

λ ; and λ ; . The solution to the maximization problem consists of

, ; , ; , , ; ,∑ , ; , ;

∑ ∑ ∑ , ; , ;,

,

∑ ∑ , ;∑ ∑ , ;

and ,∑ ∑ , ;∑ ∑ , ;

.

The above E and M steps are repeated many times until some convergence criterion has been

satisfied, for instance the improvement in the CDLL is less than 10-6. This EM algorithm

provides us with three sets of parameter estimates: , Γ, and λ ; and λ ; . Once and Γ are

estimated, we have Γ .

Applying Bayes’ rule, the posterior distribution of states in (14) can be calculated by

Pr ; , ;, ; , ; η , ; ζ , ;

| , ; .

B.2. Estimation of ; , ; , ; and ;

Based on clustering analysis, Yin and Zhao (2014) introduce a two-step approach to

decompose the state-dependent order arrival rates λ ; and λ ; into the three components in

Equation (15).

43

Step One: Partitioning hidden states depending on whether it contains private

information or not.

K-means clustering is performed on observed trading imbalances over the whole

estimation window, i.e., | | for 1, 2, … , , and then determine the number of clusters

by using the jump method of Sugar and James (2003). If there is only one cluster,16 we infer that

observed trading imbalances are similar and there is no significant evidence for the existence of

private information during the period. Therefore, we have ; ; 0 for all hidden states.

The rationale behind such a claim is the common “wisdom” that trading due to liquidity needs or

disputable public information is symmetric and only generates small trade imbalance, while

privately informed trading is often associated with substantial trade imbalance. If there is

privately informed trading, the daily trade imbalances cannot be consistent over time.

If the clustering analysis indicates that there are multiple clusters with different centers of

trade imbalances, we treat λ ; λ ; as an out-of-sample observation and assign it to the

cluster whose center is the closest to it. If λ ; λ ; belongs to the cluster with the smallest

center, state , is identified as the one involving no privately informed trading because the

small trade imbalances can only be caused by liquidity trading and/or SOS trading. We have

; ; 0 and use to denote the set consisting of states in this cluster. If λ ; λ ;

belongs to one of the other clusters, state , contains private information, and we have

; λ ; λ ; λ ; # λ ; # and ; 0 if λ ; λ ; ,

; 0 and ; λ ; λ ; λ ; # λ ; # if λ ; λ ; ,

16 In our simulation and sample data analysis, the case of one cluster has never occurred.  

44

where #, # is a matching state of state , , which is a state in with balanced trade the

closest to the balanced trade of state , .17 The matching state is used to proxy the small trade

imbalance caused by liquidity needs or disputable public information in state , .

Step Two: Classifying hidden states into two sets depending on whether it contains

disputable public information or not.

Clustering analysis together with the jump method is now applied to the observed

balanced trades over the whole estimation window, i.e., | | for 1, 2, … , . If

there is only one cluster,18 it implies that investors have very similar interpretations for public

information in the market and all balanced orders are generated by liquidity traders, because

disagreement on a public information disclosure should induce a considerable increase in both

buy and sell orders. Therefore, we have ; ; 0 for all , .

If more than one cluster is detected, we use 2-means clustering on observed balanced

trades to form two clusters. For state , , we treat the expected number of balanced trades

λ ; λ ; λ ; λ ; as an out-of-sample observation, and assign it to one of the two clusters

with the closer center to it. We use to denote the set consisting of the states in the cluster with

a smaller center and these states are considered not associated with disputable public information.

The rest states constitute set , which involve trading due to disputable public information

because their balanced trades are much larger than those in set . If state , belongs to set ,

we have its arrival rates of public information-driven buys and sells being zero therefore

17 Mathematically, #, # ∗, ∗ ∈ λ ; λ ; λ ; λ ; λ ; ∗ λ ; ∗ .

18 This can be the case when we use the EHO or EEOW model to generate simulation data although it is hard to

imagine there are no diverse interpretations of all public information disclosed over a considerably long period. In

the application of real data, the trading of all sample stocks implies more than one cluster.

45

; ; 0. If state , belongs to set , we can determine its expected buys and sells

triggered by public information as follows

; λ ; ; max#, # ∈ ∩

λ ; #

; λ ; ; max#, # ∈ ∩

λ ; # .

where ; and ; are obtained in the first step. The last terms in the above equations proxy ; ,

and ; in (1), respectively. Set includes both liquidity trading and privately informed trading

while set includes both liquidity trading and public-information trading. Their intersection, i.e.,

set ∩ , includes states that involve only liquidity trading. We use the largest arrival rates of

buy and sell orders in ∩ to subtract liquidity order arrival rates from the aggregate buy and

sell order arrival rates to ensure that the arrival rates of buy and sell orders driven by disputable

public information are not exaggerated.

46

REFERENCES Admati, Anat R., and Paul Pfleiderer, 1988. A theory of intraday patterns: volume and price

variability, Review of Financial Studies 1, 3–40. Alexander, Gordon, and Mark A. Peterson, 2007. An analysis of trade-size clustering and its

relation to stealth trading, Journal of Financial Economics 84, 435–471. Andersen, Torben G.; and Bollerslev Tim, 1998. Answering the sceptics: yes standard volatility

models do provide accurate forecasts, International Economic Review 39, 885–905. Andersen, Torben G., Tim Bollerslev, Francis X. Diebold, and Paul Labys, 2001. The

distribution of realized exchange rate volatility, Journal of the American Statistical Association 96, 42–55.

Banerjee, Snehal, and Ilan Kremer, 2010. Disagreement and learning: dynamic patterns of trade, Journal of Finance 65, 1269–1302.

Barclay, Michael J., and Jerold B. Wamer, 1993. Stealth trading and volatility: Which trades move prices? Journal of Financial Economics 34, 281–306.

Baum, Leonard E., Ted Petrie, George Soules, and Norman Weiss, 1970. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains, Annals of Mathematical Statistics 41, 164–171.

Brennan, Michael J., Tarun, Chordia, Avanidhar Subrahmanyam, and Qing Tong, 2012. Sell-order liquidity and the cross-section of expected stock returns, Journal of Financial Economics 105, 523–541.

Chakravarty, Sugato, 2001. Stealth trading: Which trader’s trades move prices? Journal of Financial Economics 61, 289–307.

Chordia, Tarun, and Avanidhar Subrahmanyam, 2004. Order imbalance and individual stock return: Theory and evidence, Journal of Financial Economics 72, 485–518.

Duarte, Jefferson, and Lance Young, 2009, Why is PIN priced? Journal of Financial Economics 91, 119–138.

Easley, D., Engle, R., O’Hara, M., Wu, L., 2008. Time-varying arrival rates of informed and uninformed trades. Journal of Financial Econometrics 6, 171–207.

Easley, David, Soeren Hvidkjaer, and Maureen O’Hara, 2002, Is information risk a determinant of asset returns? Journal of Finance 57, 2185–2221.

Easley, David, Soeren Hvidkjaer, and Maureen O’Hara, 2010. Factoring information into returns, Journal of Financial and Quantitative Analysis 45, 293–309.

Glosten, Lawrence R., and Lawrence E. Harris, 1988. Estimating the components of the bid-ask spread, Journal of Financial Economics 21, 123–142

Glosten, Lawrence R., and Paul R. Milgrom, 1985. Bid, ask and transaction prices in a specialist market with heterogeneously informed traders, Journal of Financial Economics 14, 71–100.

Grossman, Sanford, 1976. On the efficiency of competitive stock markets where traders have diverse information, Journal of Finance 31, 573–85.

Hasbrouck, Joel, 1995. One security, many markets: determining the contributions to price discovery, Journal of Finance 50, 1175–1199.

Hughes, John S., Jing Liu, and Jun Liu, 2007. Information asymmetry, diversification, and cost of capital, The Accounting Review 82, 3, 705–729.

Kyle, Albert S., 1985. Continuous auctions and insider trading, Econometrica 53, 1315–1326. Lee, Charles M.C., Belinda Mucklow, and Mark J. Ready, 1993. Spreads, depths, and the impact

of earnings information: An intraday analysis, Review of Financial Studies 6, 345–374.

47

Llorente, Guillermo, Roni Michaely, Gideon Saar, and Jiang Wang, 2002. Dynamic volume-return relation of individual stocks, Review of Financial Studies 15, 1005–1047.

Mohanram, Partha, and Shiva Rajgopal, 2009. Is PIN priced risk? Journal of Accounting and Economics 47, 226–243.

Newey, Whitney K., and Kenneth D. West, 1994. Automatic lag selection in covariance matrix estimation, Review of Economic Studies 61, 631–653.

Patton, Andrew J., and Michela Verardo, 2012. Does beta move with News? Firm-specific information flows and learning about profitability, Review of Financial Studies 25, 2789–2839.

Mohanram, Partha, and Shiva Rajgopal, 2009. Is PIN priced risk? Journal of Accounting and Economics. 47, 226–243.

Rees, Lynn, and Wayne Thomas, 2010. The stock price effects of changes in dispersion of investor beliefs during earnings announcements, Review of Accounting Studies 15, 1–31.

Roşu, Ioanid, 2009, A dynamic model of the limit order book, Review of Financial Studies, 22, 4601–4641.

Shalen, Catherine T., 1993. Volume, volatility and the dispersion of beliefs, Review of Financial Studies 6, 405–434.

Sugar, Catherine A., and Gareth M. James, 2003. Finding the number of clusters in a dataset, Journal of the American Statistical Association 98, 750–763.

Todorov, Viktor, and Tim Bollerslev, 2010. Jumps and betas: A new framework for disentangling and estimating systematic risks, Journal of Econometrics 157, 220–35.

Yin, Xiangkang and Jing Zhao, 2014, A hidden Markov model approach to Information-based trading: Theory and applications, http://ssrn.com/abstract=2412321.

Zucchini, Walter, and Iain L. MacDonald, 2009. Hidden Markov models for time series: An introduction using R (Chapman & Hall, CRC Press).

48

Table I Ticker symbols of the 120 sample stocks and summary statistics of the sample characteristics

The first sample includes 120 stocks that traded on the NYSE in 2010 and 2011. They are randomly selected from S&P 500 Index, S&P MidCap 400 Index, and S&P SmallCap 600 Index with 40 stocks from each index. Panel A lists the ticker symbols of the sample stocks. Panel B presents descriptive statistics of stock characteristics for the entire sample of 120 stocks and three size-based groups. For each stock, AvgCap is the average daily market capitalization over the sample period, AvgTurn the average daily turnover (the number of shares traded divided by the number of shares outstanding), AvgNT the average daily total number of trades, AvgAIMB the average daily absolute trade imbalance measured by the difference between numbers of buyer- and seller-initiated orders, and AvgEfspd the average daily percentage effective bid-ask spread. The second sample consists of stocks constituting S&P 500 Index in 2010 and 2011, of which the descriptive statistics of characteristics are reported in Panel C.

PanelA:Tickersymbolsofthe120samplestocks

S&P500constituents S&PMidCap400constituents S&PSmallCap600constituentsADM CSC JBL RAI AAP HNI CVD SM ABM FOR CCC ORBAVP DE LUK ROK ADS IEX DCI SON AHS FUL CRY POLBAX ECL MWV SVU AJG JLL DLX SXT AIT HHS CUZ PPSBA EL NEM SWK AYI KEX DRC TKR ALE IVC DAR RTBIG EQT NKE TSO BCO LNT ESI TRN AXE KRG DIN SCLC ETR NOC UTX BGC MAN FLO URS AZZ KWR DRH SNXCBS FHN NOV VFC BKH MLM FRT VCI BDC LDL DW SUPCEG HAR NU WEC CBT NYT GES WRB BGS LTC EIG UBACF HOT PBI WMT CRL PNM GGG WSO BMI MED EXP UNFCFN IVZ PCG XEL CSL ROL GHL XEC CBR ONB FIX ZEP

PanelB:Characteristicsofthe120samplestocksanditsthreesize‐basedsubsamples

AvgCapinmillion$

AvgTurnin% AvgNT AvgAIMB AvgEfspd

in%Entiresample Mean 10582.85 0.312 2008.61 169.29 0.041Median 2558.76 0.284 1185.43 117.16 0.031Std.Dev. 34317.1 0.147 2590.98 271.24 0.033Minimum 148.43 0.086 101.85 0.0029 0.008Maximum 295774.9 0.958 19689.07 2393.64 0.191Smallsizegroup:S&PSmallCap600constituents Mean 793.60 0.268 548.65 67.79 0.073Median 713.54 0.245 468.52 63.83 0.061Std.Dev. 422.18 0.122 349.93 36.67 0.039Mediumsizegroup:S&PMidCap400constituents Mean 2811.65 0.333 1189.93 115.63 0.030Median 2603.26 0.339 1101.29 107.20 0.031Std.Dev. 1242.59 0.139 487.30 38.62 0.009Largesizegroup:S&P500constituents Mean 28143.3 0.335 4287.22 324.46 0.019Median 10119.11 0.294 3025.99 223.44 0.016Std.Dev. 55813.6 0.17 3453.29 428.29 0.013

PanelC:Characteristicsofthe451samplestocksconstitutingS&P500

AvgCapinmillion$

AvgTurnin% AvgNT AvgAIMB AvgEfspd

in%Mean 24042 0.347 5048 324.16 0.020Median 11174.82 0.31 3690.07 264.69 0.016Std.Dev. 40391.58 0.17 4436.59 236.63 0.027

49

Table II Descriptive statistics and nonparametric tests of measures of information-based trading

This table presents descriptive statistics and nonparametric tests of the four daily scaled measures of information-based trading, i.e., PIN, PSOS, PNbIN, and PNbSOS. Descriptive statistics focus on the cross-sectional distributions of average daily measures over selected days, where time-series average is denoted by a bar over a variable. The nonparametric tests examine whether these daily measures on certain days statistically differ from those on other days. The percentage of stocks with a p-value less than 5% in a difference-in-means test is reported. For each sample stock, trading days are sorted into two groups by the sign of its daily return or into quintiles by its realized variance . Panel A is of the first sample of 120 stocks while Panel B is of the three size-based subsamples of the 120 stocks and the second sample of 451 stocks constituting S&P500 Index.

PanelA:Theentiresampleof120stocks Mean 25% Median 75% Mean 25% Median 75%

Alltradingdays 0.135 0.100 0.131 0.155 0.358 0.300 0.337 0.391Dayswith 0 0.136 0.100 0.129 0.153 0.350 0.292 0.321 0.390Dayswith 0 0.135 0.096 0.131 0.157 0.366 0.307 0.347 0.401Dayswith ∈ 0.115 0.081 0.112 0.144 0.446 0.394 0.427 0.489Dayswith ∈ 0.151 0.113 0.143 0.172 0.273 0.199 0.258 0.321 Mean 25% Median 75% Mean 25% Median 75%

Alltradingdays ‐0.004 ‐0.012 0.001 0.009 0.001 ‐0.012 0.001 0.019Dayswith 0 0.041 0.022 0.040 0.061 0.002 ‐0.011 0.002 0.021Dayswith 0 ‐0.051 ‐0.069 ‐0.044 ‐0.022 ‐0.001 ‐0.013 0.000 0.018Dayswith ∈ ‐0.009 ‐0.022 ‐0.007 0.004 0.001 ‐0.011 0.002 0.017Dayswith ∈ 0.006 ‐0.013 0.011 0.032 0.001 ‐0.013 0.001 0.022Hypothesistest , , , ,

PercentageofstockswithH0thatthemeasuresondayswith0and 0areequalrejected 13.33% 17.50% 94.17% 49.17%

PercentageofstockswithH0thatthemeasuresondayswithinthesmallestandlargestquintilesof areequal rejected. 66.67% 96.67% 9.17% 18.33%

50

PanelB:Thethreesize‐basedsubsamplesofthe120samplestocksandthesampleofS&P500constituents Cross‐sectionalmeanof Cross‐sectionalmeanof Smallsize Mediumsize Largesize S& 500 Smallsize Medium

size Largesize S& 500

Alltradingdays 0.170 0.131 0.105 0.092 0.384 0.355 0.335 0.334Dayswith 0 0.168 0.131 0.109 0.094 0.380 0.346 0.324 0.324Dayswith 0 0.173 0.131 0.101 0.091 0.389 0.365 0.345 0.345Dayswith ∈ 0.149 0.114 0.082 0.075 0.457 0.441 0.439 0.444Dayswith ∈ 0.187 0.145 0.122 0.107 0.313 0.277 0.228 0.220 Cross‐sectionalmeanof Cross‐sectionalmeanof Smallsize Mediumsize Largesize S& 500 Smallsize Medium

size Largesize S& 500

Alltradingdays ‐0.009 0.000 ‐0.004 ‐0.003 0.003 0.000 0.002 0.000Dayswith 0 0.057 0.045 0.020 0.024 0.004 0.001 0.002 0.001Dayswith 0 ‐0.077 ‐0.048 ‐0.029 ‐0.030 0.003 ‐0.001 ‐0.001 0.000Dayswith ∈ ‐0.009 ‐0.010 ‐0.010 ‐0.011 0.003 0.000 0.001 0.000Dayswith ∈ ‐0.001 0.012 0.008 0.008 0.003 ‐0.001 0.001 0.000 , , Smallsize Mediumsize Largesize S& 500 Smallsize Medium

size Largesize S& 500

PercentageofstockswithH0thatthemeasuresondayswith 0 and 0 areequalrejected 15.00% 2.50% 22.50% 16.63% 7.50% 25.00% 20.00% 20.40%

PercentageofstockswithH0thatthemeasuresondayswithinthesmallestandlargestquintilesof areequalrejected. 57.50% 60.00% 82.50% 83.15% 95.00% 95.00% 100% 99.78% , , Smallsize Mediumsize Largesize S& 500 Smallsize Medium

size Largesize S& 500

PercentageofstockswithH0thatthemeasuresondayswith 0 and 0 areequalrejected 100% 95.00% 87.50% 93.13% 57.50% 40.00% 50.00% 53.88%

PercentageofstockswithH0thatthemeasuresondayswithinthesmallestandlargestquintilesof areequalrejected. 10.00% 5.00% 12.50% 19.51% 15.00% 17.50% 22.50% 12.86%

51

Table III Daily return and total risk conditional on measures of information-based trading

For each sample stock, we sort its trading days by one of its daily measures of information based trading, i.e., , ,

, , , , or , , into quintiles and then calculate the average daily return and realized variance for each quintile. Panel A presents the cross-sectional sample mean of average daily returns and Panel B presents results of realized variance. The difference between the quintiles is also reported.

PanelA:Dailyreturn

Quintile Comparison Smallest

1 2 3 4Largest

5 "5 1" "5 3" "3 1"

Sortedby , Entire120stocks 0.001 0.001 0.000 0.001 0.001 ‐0.001 0.001 ‐0.001Smallsizegroup 0.001 0.001 0.000 0.001 0.000 ‐0.001 ‐0.001 0.000Mediumsizegroup 0.001 0.001 0.000 0.000 0.001 0.000 0.001 ‐0.001Largesizegroup 0.003 0.000 0.000 0.000 0.002 ‐0.001 0.002 ‐0.003S&P500constituents 0.001 0.000 0.000 0.000 0.001 0.000 0.001 ‐0.001Sortedby , Entire120stocks 0.001 0.001 0.001 0.001 ‐0.001 ‐0.003 ‐0.002 0.000Smallsizegroup 0.001 0.001 0.001 0.001 ‐0.001 ‐0.002 ‐0.002 0.000Mediumsizegroup 0.002 0.001 0.001 0.001 ‐0.002 ‐0.004 ‐0.003 ‐0.001Largesizegroup 0.002 0.001 0.001 0.001 ‐0.001 ‐0.003 ‐0.002 ‐0.001S&P500constituents 0.002 0.002 0.001 0.000 ‐0.002 ‐0.004 ‐0.003 ‐0.001Sortedby , Entire120stocks ‐0.008 ‐0.002 0.001 0.004 0.009 0.017 0.008 0.009Smallsizegroup ‐0.011 ‐0.004 0.001 0.005 0.012 0.023 0.011 0.012Mediumsizegroup ‐0.007 ‐0.002 0.001 0.003 0.008 0.016 0.007 0.008Largesizegroup ‐0.006 ‐0.001 0.001 0.003 0.007 0.013 0.006 0.007S&P500constituents ‐0.006 ‐0.002 0.001 0.003 0.007 0.013 0.006 0.006Sortedby , Entire120stocks 0.000 0.000 0.001 0.001 0.002 0.002 0.001 0.001Smallsizegroup 0.000 0.001 0.000 0.000 0.002 0.002 0.002 0.000Mediumsizegroup 0.000 0.000 0.000 0.001 0.002 0.002 0.003 0.000Largesizegroup 0.001 0.000 0.002 0.000 0.003 0.002 0.001 0.001S&P500constituents 0.001 0.000 ‐0.001 0.001 0.001 0.000 0.003 ‐0.002

52

PanelB:Dailyrealizedvariance

Quintile Comparison Smallest

1 2 3 4 Largest5 "5 1" "5 3" "3 1"

Sortedby , Entire120stocks 0.0003 0.0004 0.0004 0.0004 0.0003 0.0000 ‐0.0001 0.0001Smallsizegroup 0.0004 0.0005 0.0005 0.0005 0.0005 0.0000 ‐0.0001 0.0001Mediumsizegroup 0.0003 0.0003 0.0004 0.0003 0.0003 0.0000 ‐0.0001 0.0001Largesizegroup 0.0003 0.0003 0.0003 0.0003 0.0002 0.0000 ‐0.0001 0.0000S&P500constituents 0.0002 0.0003 0.0003 0.0002 0.0002 0.0000 ‐0.0001 0.0001Sortedby , Entire120stocks 0.0002 0.0003 0.0003 0.0004 0.0007 0.0005 0.0004 0.0001Smallsizegroup 0.0003 0.0004 0.0004 0.0005 0.0009 0.0006 0.0004 0.0002Mediumsizegroup 0.0002 0.0002 0.0003 0.0003 0.0006 0.0004 0.0003 0.0001Largesizegroup 0.0001 0.0002 0.0002 0.0003 0.0005 0.0004 0.0003 0.0001S&P500constituents 0.0001 0.0002 0.0002 0.0003 0.0005 0.0004 0.0003 0.0001Sortedby , Entire120stocks 0.0003 0.0004 0.0003 0.0004 0.0003 0.0000 0.0000 0.0000Smallsizegroup 0.0005 0.0005 0.0004 0.0005 0.0005 0.0000 0.0001 0.0000Mediumsizegroup 0.0003 0.0004 0.0003 0.0003 0.0003 0.0000 0.0000 0.0000Largesizegroup 0.0002 0.0003 0.0002 0.0003 0.0002 0.0000 0.0000 0.0000S&P500constituents 0.0003 0.0003 0.0002 0.0003 0.0002 0.0000 0.0000 0.0000Sortedby , Entire120stocks 0.0003 0.0003 0.0005 0.0004 0.0003 0.0001 ‐0.0002 0.0002Smallsizegroup 0.0004 0.0004 0.0007 0.0005 0.0004 0.0000 ‐0.0003 0.0004Mediumsizegroup 0.0003 0.0003 0.0004 0.0003 0.0003 0.0001 ‐0.0001 0.0002Largesizegroup 0.0002 0.0003 0.0004 0.0003 0.0003 0.0001 ‐0.0001 0.0002S&P500constituents 0.0002 0.0003 0.0004 0.0003 0.0003 0.0001 ‐0.0001 0.0002

53

Table IV The effects of information-based trading on daily return

This table reports the results of following regression models for individual stocks: ; ; ; ; ; ; where denotes the daily return on trading day t. In the first regression, ; ; , ; ; and ; ; are the expected numbers of net buys due to private information, disputable public information, and liquidity needs, respectively. Scaled measures and in the second regression are the probabilities of net buys due to private information and disputable public information, respectively. The table displays average regression coefficients, average t-statistics (in parentheses) corrected for autocorrelation, the percentage of sample stocks with the regression coefficient being significantly positive or negative at the 5% level, average and adjusted of the individual regressions across the first sample of 120 stocks, its three size-based groups, and the second sample of 451 stocks constituting S&P500 Index. Effect size measures the normalized total effect of an explanatory variable. For instance, the effect size of ; ; is calculated by ∑ ; ; ∑ | | .⁄ Average effect sizes across the sample stocks are reported in brackets. The p-values of testing that the mean of coefficients of is equal to that of are reported in Panel B.

PanelA:Regressionsofdailyreturnonarrivalratesofnetbuys ofdifferenttypes

Open‐to‐closedailyreturn Close‐to‐closedailyreturn Entire

120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500Entire120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500

Explanatoryvariables Averagecoefficient ‐0.051 ‐0.083 ‐0.027 ‐0.043 ‐0.039 ‐0.062 ‐0.091 ‐0.041 ‐0.053 ‐0.060

Averaget‐statistics ‐0.992 ‐1.656 ‐0.546 ‐0.775 ‐0.721 ‐1.100 ‐1.670 ‐0.741 ‐0.889 ‐1.069 Coefficient 0significant 0.83% 0.00% 2.50% 0.00% 1.33% 1.67% 0.00% 5.00% 0.00% 0.22% Coefficient 0significant 25.83% 45.00% 15.00% 17.50% 14.19% 23.33% 45.00% 20.00% 5.00% 18.63% Averageeffectsize 6.57% 8.85% 5.08% 5.39% 5.60% 7.30% 9.39% 6.49% 6.03% 6.58%

; ; Averagecoefficient ‐1.00e‐5 ‐2.05e‐5 ‐2.91e‐5 1.96e‐5 1.80e‐5 ‐1.44e‐6 7.27e‐5 ‐4.43e‐5 3.27e‐5 1.11e‐5 Averaget‐statistics 1.245 1.602 1.062 1.069 0.895 1.175 1.545 0.850 1.130 0.928 Coefficient 0significant 26.67% 32.50% 22.50% 25.00% 19.29% 27.50% 32.50% 20.00% 30.00% 21.29% Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 0.89% 0.83% 0.00% 0.00% 2.50% 2.22% Averageeffectsize 10.0% 12.9% 7.60% 9.47% 7.63% 10.1% 12.3% 7.42% 10.6% 8.28%

; ; Averagecoefficient 5.13e‐5 1.01e‐4 3.98e‐5 3.33e‐5 2.20e‐5 5.64e‐5 1.08e‐4 9.38e‐5 2.70e‐5 1.55e‐5 Averaget‐statistics 4.452 5.029 4.487 3.842 3.946 4.182 4.894 3.893 3.759 4.018 Coefficient 0significant 86.67% 90.00% 85.00% 85.00% 84.48% 80.83% 87.50% 80.00% 75.00% 82.26% Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% Averageeffectsize 33.6% 39.3% 35.3% 26.2% 27.4% 32.8% 38.2% 32.8% 27.3% 29.5%

54

; ; Averagecoefficient 4.30e‐5 3.41e‐5 3.68e‐5 2.80e‐5 2.01e‐5 4.35e‐5 3.40e‐5 6.87e‐5 ‐2.33e‐6 1.21e‐5 Averaget‐statistics 1.147 1.168 1.189 1.085 0.808 1.077 1.059 1.198 0.975 0.896 Coefficient 0significant 23.33% 22.50% 22.50% 25.00% 17.07% 22.50% 20.00% 22.50% 25.00% 19.73% Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 1.55% 0.00% 0.00% 0.00% 0.00% 1.11% Averageeffectsize 11.5% 13.2% 9.96% 11.3% 10.2% 11.8% 12.5% 10.4% 13.3% 11.3%AverageadjustedR2 8.18% 10.52% 8.88% 5.14% 5.51% 7.87% 10.28% 7.68% 5.66% 6.31%AverageR2 8.91% 11.23% 9.61% 5.90% 6.27% 8.61% 10.99% 8.42% 6.41% 7.06%

PanelB:RegressionsofdailyreturnondailymeasuresofPINandPSOS

Open‐to‐closedailyreturn Close‐to‐closedailyreturn Entire

120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500 Entire120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500

Explanatoryvariables Averagecoefficient ‐0.049 ‐0.082 ‐0.024 ‐0.040 ‐0.037 ‐0.060 ‐0.091 ‐0.040 ‐0.050 ‐0.057

Averaget‐statistics ‐0.942 ‐1.641 ‐0.453 ‐0.73 ‐0.664 ‐1.059 ‐1.628 ‐0.685 ‐0.863 ‐0.995 Coefficient 0significant 0.83% 0.00% 2.50% 0.00% 1.55% 1.67% 0.00% 5.00% 0.00% 0.22% Coefficient 0significant 23.33% 45.00% 10.00% 15.00% 13.30% 20.00% 42.50% 12.50% 5.00% 14.41% Averageeffectsize 6.41% 8.72% 4.78% 5.73% 5.52% 7.09% 9.32% 6.18% 5.78% 6.34%

Averagecoefficient 0.044 0.050 0.041 0.042 0.043 0.053 0.057 0.047 0.055 0.058 Averaget‐statistics 5.835 0.050 5.88 4.710 4.788 5.605 6.697 5.385 4.734 5.081 Coefficient 0significant 95.83% 100% 95.00% 92.50% 93.13% 95.00% 100% 95.00% 90.00% 93.35% Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% Averageeffectsize 37.4% 45.8% 37.9% 28.3% 28.7% 37.0% 44.8% 36.6% 29.6% 31.1%

Averagecoefficient 0.033 0.032 0.014 0.051 0.039 0.033 0.042 0.025 0.033 0.052 Averaget‐statistics 0.793 0.628 0.89 0.862 0.584 0.774 0.568 0.948 0.808 0.653 Coefficient 0significant 12.50% 12.50% 15.00% 10.00% 11.53% 12.50% 7.50% 17.50% 12.50% 14.63% Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 0.89% 0.00% 0.00% 0.00% 0.00% 1.11% Averageeffectsize 6.54% 6.23% 6.45% 6.93% 6.24% 7.18% 6.22% 7.01% 8.31% 6.57%AverageadjustedR2 8.06% 10.90% 8.47% 4.81% 4.98% 7.80% 10.79% 7.57% 5.05% 5.64%AverageR2 8.61% 11.43% 9.02% 5.38% 5.55% 8.35% 11.32% 8.13% 5.62% 6.20%Hypothesistest:Themeansofregressioncoefficientsof and areequal. p‐value 1.05e‐7 3.12e‐2 1.5e‐4 0.002 0.011 2.18e‐4 0.011 0.019 0.037 1.12e‐6

55

Table V The effects of information-based trading on total risk of return

This table reports the results of following regression models for individual stocks: ; ; ; ; ; ; where denotes the realized variance of intraday returns sampled at the 10-minute or 15-minute interval. In the first regression, ; ; , ; ; , and

; ; denote the expected numbers of trades due to liquidity needs, private information and disputable public information, respectively, on day t. Scaled measures and in the second regression are the probabilities of privately informed trading and SOS trading. The table displays average regression coefficients, average t-statistics (in parentheses) corrected for autocorrelation, the percentage of sample stocks with the regression coefficient being significantly positive or negative at the 5% level, average and adjusted of the individual regressions across the first sample of 120 stocks, its three size-based groups, and the second sample of 451 stocks constituting S&P500 Index. Effect size measures the normalized total effect of an explanatory variable. For instance, the effect size of ; ; is calculated by ∑ ; ; ∑ .⁄ Average effect sizes across the sample stocks are reported in brackets. The p-values of testing that the mean of coefficients of is equal to that of are reported in Panel B.

PanelA:Regressionsofdailyrealizedvarianceonarrivalratesoftradesofdifferenttypes

15‐minute 10‐minute Entire

120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500 Entire120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500

Explanatoryvariables Averagecoefficient 0.290 0.326 0.332 0.211 0.194 0.303 0.345 0.327 0.237 0.224

Averaget‐statistics 3.894 4.611 4.169 2.902 2.596 4.248 4.687 4.641 3.415 3.072 Coefficient 0significant 74.17% 85.00% 82.50% 55.00% 51.22% 74.17% 85.00% 75.00% 62.50% 61.20% Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 0.22% 0.00% 0.00% 0.00% 0.00% 0.67% Averageeffectsize 29.1% 32.7% 33.2% 21.4% 20.7% 30.6% 34.4% 32.9% 24.5% 23.4%

; ; Averagecoefficient 2.79e‐7 8.22e‐7 8.79e‐9 5.92e‐9 7.51e‐9 2.92e‐7 8.27e‐7 4.38e‐8 6.31e‐9 1.19e‐8 Averaget‐statistics 0.675 1.085 0.476 0.465 0.677 0.932 1.288 0.625 0.882 1.112 Coefficient 0significant 16.67% 25.00% 12.50% 12.50% 19.07% 28.33% 37.50% 22.50% 25.00% 27.05% Coefficient 0significant 0.83% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% Averageeffectsize 29.9% 36.2% 26.8% 26.7% 37.9% 30.3% 36.9% 24.5% 29.5% 37.4%

; ; Averagecoefficient 2.98e‐7 7.63e‐7 9.09e‐8 4.03e‐8 2.98e‐8 2.58e‐7 6.24e‐7 1.15 e‐7 3.50e‐8 2.90e‐8 Averaget‐statistics 0.571 0.864 0.384 0.466 0.293 0.611 0.787 0.511 0.535 0.299 Coefficient 0significant 8.33% 12.50% 7.50% 5.00% 7.10% 6.67% 7.50% 5.00% 7.50% 6.21% Coefficient 0significant 0.83% 0.00% 0.00% 2.50% 3.55% 0.00% 0.00% 0.00% 0.00% 3.33% Averageeffectsize 9.12% 11.6% 6.49% 9.26% 9.36% 8.00% 9.31% 6.77% 7.93% 8.47%

56

; ; Averagecoefficient 5.44e‐7 1.16e‐7 3.40e‐7 1.32e‐7 1.21e‐7 5.64e‐7 1.23e‐6 3.17e‐7 1.40e‐7 1.18e‐7 Averaget‐statistics 3.740 3.449 3.934 3.837 3.975 4.071 3.746 4.090 4.379 4.533 Coefficient 0significant 95.00% 90.00% 95.00% 100% 97.56% 94.17% 92.50% 92.50% 97.50% 96.90% Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% Averageeffectsize 49.7% 43.8% 46.3% 59.1% 61.1% 47.4% 42.3% 44.9% 54.9% 55.8%AverageadjustedR2 37.09% 32.91% 39.10% 39.25% 40.42% 40.82% 37.76% 41.11% 43.59% 45.03%AverageR2 37.59% 33.44% 39.59% 39.73% 40.90% 41.29% 38.25% 41.58% 44.04% 45.47%

PanelB:RegressionsofdailyrealizedvarianceondailymeasuresofPINandPSOS

15‐minute 10‐minute Entire

120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500 Entire120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500

ExplanatoryvariablesAveragecoefficient 0.342 0.361 0.374 0.289 0.278 0.354 0.381 0.140 0.312 0.302

Averaget‐statistics 4.663 5.080 4.867 4.044 4.048 5.057 5.538 0.550 4.508 4.199 Coefficient 0significant 89.17% 95.00% 90.00% 82.50% 78.49% 85.83% 87.50% 87.50% 82.50% 76.05% Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.22% Averageeffectsize 34.1% 36.9% 37.3% 28.9% 27.9% 35.4% 38.1% 36.9% 31.2% 30.3%

Averagecoefficient 2.48e‐4 4.39e‐4 1.50e‐4 1.55e‐4 1.60e‐4 2.40e‐4 3.89e‐4 1.66e‐4 1.65e‐4 1.64e‐4 Averaget‐statistics 1.377 1.832 1.145 1.152 0.950 1.442 1.841 1.289 1.197 0.965 Coefficient 0significant 32.50% 45.00% 25.00% 27.50% 22.17% 30.00% 40.00% 30.00% 20.00% 19.96% Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 0.89% 0.00% 0.00% 0.00% 0.00% 1.55% Averageeffectsize 9.84% 13.9% 8.27% 7.34% 7.62% 9.14% 12.1% 8.20% 7.05% 7.05%

Averagecoefficient 6.45e‐4 8.07e‐4 5.46e‐4 5.82e‐4 6.24e‐4 6.46e‐4 8.12e‐4 5.46e‐4 5.80e‐4 6.22e‐4 Averaget‐statistics 4.346 4.291 4.467 4.280 4.570 4.624 4.461 4.613 4.799 5.126 Coefficient 0significant 100% 100% 100% 100% 99.33% 99.17% 97.50% 100% 100% 98.45% Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% Averageeffectsize 53.7% 49.7% 50.2% 61.3% 62.6% 51.5% 47.7% 48.9% 57.8% 59.1%AverageadjustedR2 30.43% 26.32% 32.76% 31.06% 31.63% 33.86% 31.84% 34.17% 35.56% 36.84%AverageR2 30.85% 37.50% 33.16% 31.47% 32.04% 34.25% 32.25% 34.56% 35.95% 37.22%Hypothesistest:Themeansofregressioncoefficientsof and areequal. p‐value 2.95e‐6 0.004 0.002 5.64e‐3 2.20e‐9 2.11e‐6 0.001 0.003 6.26e‐2 8.02e‐10

57

Table VI The effects of information-based trading on systematic risk and idiosyncratic risk

This table reports the results of following regression models for individual stocks: ; ; ; ; ; ; ′ ′ ′

; ;′

; ;′

; ;′

′ ′ ′ ′ ′ where denotes a stock’s market beta on day t estimated from an intraday market model and is the daily idiosyncratic risk. In the first two regressions,

; ; , ; ; , and ; ; denote the expected numbers of trades due to liquidity needs, private information and disputable public information, respectively. Scaled measures and in the last two regressions are the probabilities of privately informed trading and SOS trading. The table displays average regression coefficients, average t-statistics (in parentheses) corrected for autocorrelation, the percentage of sample stocks with the regression coefficient being significantly positive or negative at the 5% level, average and adjusted of the individual regressions across the first sample of 120 stocks, its three size-based groups, and the second sample of 451 stocks constituting S&P500 Index. Effect size measures the normalized total effect of an explanatory variable. For instance, the effect size of ; ; is calculated by ∑ ; ; ∑ | | .⁄ Average effect size across the sample stocks are reported in brackets. The p-values of testing that the mean of coefficients of is equal to that of are reported in Panels C and D.

PanelA:Regressionsofdailybetaandidiosyncraticrisksampledata15‐minutefrequency

Dailybeta Dailyidiosyncraticrisk Entire

120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500Entire120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500

ExplanatoryvariablesLaggedterm Averagecoefficient 0.121 0.095 0.126 0.143 0.126 0.373 0.402 0.408 0.309 0.287

Averaget‐statistics 2.290 1.825 2.321 2.724 2.370 6.264 7.037 6.872 4.883 4.528Coefficient 0significant 55.83% 45.00% 60.00% 62.50% 60.09% 95.83% 97.50% 100% 90.00% 84.92%Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%Averageeffectsize 12.3% 9.81% 12.7% 14.5% 12.7% 37.3% 40.2% 40.7% 30.9% 28.7%

; ; Averagecoefficient 1.70e‐4 3.47e‐7 ‐1.28e‐4 1.57e‐5 ‐4.41e‐5 1.06e‐5 2.48e‐5 4.78e‐6 2.06e‐6 1.83e‐6Averaget‐statistics 0.456 0.002 ‐0.550 0.605 ‐0.458 2.776 2.703 2.282 3.342 3.478Coefficient 0significant 5.00% 12.50% 0.00% 2.50% 2.22% 70.83% 70.00% 57.50% 85.00% 83.59%Coefficient 0significant 6.67% 7.50% 7.50% 5.00% 8.65% 0.00% 0.00% 0.00% 0.00% 0.00%Averageeffectsize 5.72% 5.36% 3.20% 3.46% 2.83% 24.9% 25.3% 21.1% 28.4% 30.3%

; ; Averagecoefficient ‐4.62e‐5 4.95e‐4 ‐8.19e‐7 ‐1.15e‐5 1.01e‐5 3.85e‐6 9.04e‐6 1.67e‐6 8.43e‐7 4.91 e‐7Averaget‐statistics ‐0.251 0.901 ‐0.139 ‐0.204 0.494 0.701 0.958 0.435 0.711 0.418Coefficient 0significant 5.00% 27.50% 7.50% 2.50% 15.52% 10.83% 12.50% 7.50% 12.50% 9.53%Coefficient 0significant 7.50% 2.50% 10.00% 7.50% 4.21% 0.00% 0.00% 0.00% 0.00% 2.88%Averageeffectsize 3.76% 9.03% 4.67% 2.73% 3.59% 2.92% 3.38% 2.33% 3.04% 2.62%

58

; ; Averagecoefficient 9.60e‐4 2.46e‐3 3.34e‐4 8.66e‐5 6.58e‐5 1.07e‐5 1.59e‐5 5.86e‐6 2.47e‐6 2.18e‐6Averaget‐statistics 1.158 2.043 0.882 0.549 0.857 5.116 4.627 5.147 5.572 5.926Coefficient 0significant 25.83% 40.00% 22.50% 15.00% 19.51% 97.50% 95.00% 97.50% 100% 99.33%Coefficient 0significant 0.83% 2.50% 0.00% 0.00% 0.22% 0.00% 0.00% 0.00% 0.00% 0.00%Averageeffectsize 28.5% 49.1% 20.3% 16.2% 18.6% 18.5% 16.2% 17.9% 21.5% 22.2%

AverageadjustedR2 3.81% 4.76% 2.91% 3.75% 3.13% 45.97% 41.32% 46.17% 50.41% 50.66%AverageR2 4.58% 5.53% 3.69% 4.52% 3.91% 46.40% 41.79% 46.60% 50.81% 51.05%

PanelB:Regressionsofdailybetaandidiosyncraticrisksampledata10‐minutefrequency

Dailybeta Dailyidiosyncraticrisk Entire

120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500Entire120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500

ExplanatoryvariablesLaggedterm Averagecoefficient 0.136 0.102 0.140 0.167 0.151 0.392 0.423 0.419 0.334 0.315

Averaget‐statistics 2.603 2.162 2.606 3.040 2.819 6.683 7.581 7.206 5.263 5.081Coefficient 0significant 65.83% 57.50% 62.50% 77.50% 71.18% 94.17% 97.50% 95.00% 90.00% 88.03%Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%Averageeffectsize 13.9% 10.8% 14.1% 16.8% 15.2% 39.2% 42.3% 41.9% 33.4% 31.5%

; ; Averagecoefficient 1.03e‐4 4.13e‐4 ‐3.97e‐5 ‐6.32e‐5 ‐3.40e‐5 1.10e‐5 2.55e‐5 5.33e‐6 2.11e‐6 1.87e‐6Averaget‐statistics ‐0.097 0.453 ‐0.379 ‐0.365 ‐0.542 3.031 2.837 2.542 3.714 3.785Coefficient 0significant 4.17% 10.00% 2.50% 0.00% 1.77% 72.50% 67.50% 65.00% 85.00% 85.37%Coefficient 0significant 4.17% 0.00% 7.50% 5.00% 11.97% 0.00% 0.00% 0.00% 0.00% 0.00%Averageeffectsize 3.81% 4.96% 3.31% 3.15% 2.69% 25.0% 24.6% 22.3% 28.0% 30.1%

; ; Averagecoefficient 1.66e‐4 4.87e‐4 1.82e‐5 ‐6.35e‐6 8.42e‐6 3.35e‐6 7.33e‐6 2.01e‐6 7.17e‐7 4.77e‐7Averaget‐statistics 0.382 1.329 0.013 ‐0.196 0.421 0.710 0.891 0.584 0.656 0.400Coefficient 0significant 20.00% 47.50% 7.50% 5.00% 16.63% 10.00% 7.50% 15.00% 7.50% 7.98%Coefficient 0significant 9.17% 7.50% 12.50% 7.50% 6.21% 0.00% 0.00% 0.00% 0.00% 3.77%Averageeffectsize 6.11% 11.3% 3.92% 3.08% 3.18% 2.60% 2.89% 2.35% 2.57% 2.39%

; ; Averagecoefficient 9.51e‐4 2.04e‐3 6.91e‐4 1.23e‐4 6.57e‐5 1.17e‐5 1.65e‐5 5.56e‐6 2.47e‐6 2.12e‐6Averaget‐statistics 1.870 2.753 1.929 0.928 0.947 5.508 4.900 5.509 6.114 6.478Coefficient 0significant 41.67% 57.50% 47.50% 20.00% 23.50% 97.50% 95.00% 97.50% 100% 99.33%Coefficient 0significant 0.83% 0.00% 0.00% 2.50% 1.33% 0.00% 0.00% 0.00% 0.00% 0.00%Averageeffectsize 38.2% 57.2% 36.7% 20.7% 18.5% 17.7% 15.7% 17.0% 20.3% 20.8%

AverageadjustedR2 5.71% 7.16% 4.93% 5.04% 4.37% 49.16% 44.95% 48.57% 53.97% 54.03%AverageR2 6.46% 7.90% 5.69% 5.80% 5.14% 49.57% 45.39% 48.98% 54.34% 54.40%

59

PanelC:Regressionsofdailybetaandidiosyncraticrisksampledata15‐minutefrequency

Dailybeta Dailyidiosyncraticrisk Entire

120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500Entire120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500

ExplanatoryvariablesLaggedterm Averagecoefficient 0.124 0.098 0.129 0.143 0.128 0.409 0.428 0.436 0.365 0.344

Averaget‐statistics 2.331 1.888 2.367 2.738 2.393 7.098 7.604 7.374 6.317 5.997Coefficient 0significant 58.33% 47.50% 62.50% 65.00% 60.53% 98.33% 97.50% 100% 97.50% 95.12%Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%Averageeffectsize 12.5% 10.2% 12.9% 14.46 12.8% 40.9% 42.7% 43.5% 36.4% 34.3%Averagecoefficient 0.015 0.201 ‐0.132 ‐0.025 ‐0.124 0.005 0.007 0.003 0.005 0.004

Averaget‐statistics 0.049 0.601 ‐0.399 ‐0.055 ‐0.386 1.619 2.126 1.276 1.456 1.150 Coefficient 0significant 5.00% 12.50% 0.00% 2.50% 1.11% 40.00% 50.00% 32.50% 37.50% 25.72% Coefficient 0significant 5.83% 2.50% 10.00% 5.00% 6.21% 0.00% 0.00% 0.00% 0.00% 0.22% Averageeffectsize 3.85% 5.73% 3.11% 2.72% 2.99% 4.13% 5.77% 3.55% 3.08% 2.95%

Averagecoefficient 0.192 0.430 0.016 0.131 0.098 0.012 0.013 0.011 0.013 0.014Averaget‐statistics 1.282 2.468 0.245 1.133 0.892 6.271 6.034 6.108 6.671 6.997Coefficient 0significant 34.17% 65.00% 7.50% 30.00% 25.50% 100% 100% 100% 100% 100%Coefficient 0significant 4.17% 0.00% 10.00% 2.50% 3.33% 0.00% 0.00% 0.00% 0.00% 0.00%Averageeffectsize 9.24% 17.3% 5.04% 5.36% 5.22% 23.9% 22.1% 22.6% 27.1% 28.0%

AverageadjustedR2 3.38% 3.92% 2.43% 3.77% 2.87% 41.71% 37.66% 42.27% 45.20% 45.29%AverageR2 3.96% 4.50% 3.02% 4.35% 3.45% 42.06% 38.04% 42.61% 45.53% 45.62%Hypothesistest:Themeansofregressioncoefficientsof and areequal.

p‐value 0.965 0.292 0.110 0.828 0.528 1.17e‐12 1.85e‐6 1.19e‐4 8.13e‐5 2.22e‐11

60

PanelD:Regressionsofdailybetaandidiosyncraticrisksampledata10‐minutefrequency

Dailybeta Dailyidiosyncraticrisk Entire

120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500Entire120stocks

Smallsizegroup

Mediumsizegroup

Largesizegroup

S&P500

ExplanatoryvariablesLaggedterm Averagecoefficient 0.140 0.107 0.145 0.168 0.154 0.428 0.448 0.447 0.387 0.366

Averaget‐statistics 2.649 2.211 2.699 3.036 2.837 7.563 8.394 7.818 6.478 6.280Coefficient 0significant 66.67% 57.50% 67.50% 75.00% 72.51% 99.17% 100% 100% 97.50% 94.90%Coefficient 0significant 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00%Averageeffectsize 14.2% 11.2% 14.6% 16.9% 15.4% 42.7% 44.8% 44.6% 38.7% 36.5%Averagecoefficient 0.11 0.333 0.007 0.010 ‐0.130 0.005 0.006 0.004 0.005 0.004

Averaget‐statistics 0.352 1.138 0.018 ‐0.101 ‐0.457 1.653 2.103 1.440 1.415 1.147 Coefficient 0significant 10.83% 17.50% 12.50% 2.50% 1.55% 38.33% 55.00% 35.00% 25.00% 25.28% Coefficient 0significant 5.83% 0.00% 12.50% 5.00% 10.64% 0.00% 0.00% 0.00% 0.00% 0.22% Averageeffectsize 4.85% 8.17% 3.92% 2.47% 2.74% 3.87% 5.26% 3.52% 2.83% 2.74%

Averagecoefficient 0.26 0.540 0.134 0.106 0.083 0.012 0.013 0.011 0.013 0.014Averaget‐statistics 1.961 3.665 1.267 0.952 0.859 6.455 6.143 6.187 7.033 7.479Coefficient 0significant 42.50% 72.50% 35.00% 20.00% 25.06% 100% 100% 100% 100% 100%Coefficient 0significant 0.83% 0.00% 0.00% 2.50% 5.76% 0.00% 0.00% 0.00% 0.00% 0.00%Averageeffectsize 11.7% 23.4% 7.23% 4.50% 4.92% 23.0% 21.5% 21.7% 25.8% 26.8%

AverageadjustedR2 5.10% 6.28% 4.14% 4.89% 4.04% 44.82% 41.20% 44.39% 48.87% 49.23%AverageR2 5.67% 6.85% 4.71% 5.46% 4.61% 45.15% 41.55% 44.72% 49.18% 49.54%Hypothesistest:Themeansofregressioncoefficientsof and areequal.

p‐value 0.996 0.695 0.450 0.881 0.132 9.87e‐13 7.47e‐7 2.56e‐4 3.03e‐4 2.22e‐9