parameter estimation of an agent-based stock price model

13
Research Article Received 1 April 2012, Revised 20 November 2012, Accepted 8 January 2013 Published online 19 February 2013 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/asmb.1968 Parameter estimation of an agent-based stock price model Mine Ça˘ glar a * , Nihal Bahtiyar b and ˙ Ipek Altınta¸ s a The influence of the behavior and strategies of traders on stock price formation has attracted much interest. It is assumed that there is a positive correlation between the total net demand and the price change. A buy order is expected to increase the price, whereas a sell order is assumed to decrease it. We perform data analysis based on a recently proposed stochastic model for stock prices. The model involves long-range dependence, self-similarity, and no arbitrage principle, as observed in real data. The arrival times of orders, their quantity, and their duration are created by a Poisson random measure. The aggregation of the effect of all orders based on these parameters yields the log-price process. By scaling the parameters, a fractional Brownian motion or a stable Levy process can be obtained in the limit. In this paper, our aim is twofold; first, to devise statistical methodology to estimate the model parameters with an application on high-frequency price data, and second, to validate the model by simulations with the estimated parameters. We find that the statistical properties of agent level behavior are reflected on the stock price, and can affect the entire process. Moreover, the price model is suitable for prediction through simulations when the parameters are estimated from real data. The methods developed in the present paper can be applied to frequently traded stocks in general. Copyright © 2013 John Wiley & Sons, Ltd. Keywords: stock price; long-range dependence; self-similarity; Hurst parameter 1. Introduction Recently, an awareness of how the behavior and strategies of traders affect the stock price formation has attracted much interest. Researchers in economics, statistical physics and probability have approached the problem from different perspec- tives and proposed various models. Empirical studies motivated by the observation of non-Gaussian return distributions with fat-tails are supported by such theoretical models. In this paper, we estimate the parameters of an agent-based stock price process using high-frequency price data. Several studies in economics and statistical physics have attempted to represent the dynamics, in addition to the statis- tical properties, of real price data. See for example [1,2] and references therein for modeling the effect of agent behavior on price formation. The availability of high-frequency data over the last decade has also triggered such studies [3]. In [2], all agents can be placed into one of two different states, possibly according to their trading behavior as fundamentalists or chartists. The dynamics can be represented by a Markov chain or Langevin process where agents are able to interact. The price changes are modeled as endogenous responses of the market to imbalances between demand and supply. Another stream of agent-based models has been introduced in [4], adapting the framework of percolation models from statistical physics. In these models, traders are situated in a lattice and form clusters of agents with the same strategy; buy, sell, and inactive. The power-law fluctuations are captured by drawing upon the critical behavior of percolation models at a certain threshold of the connectivity level. In [5], deterministic functions are used to define the strategies of different types of agents. According to this approach, each trading strategy is a signal processing element using past price information and current net order. Then, a random price formation process is provided by adding white noise to the model. In [6], an agent- based model is constructed by agents’ decisions among three states: buying, selling, or staying idle. In their decisions, all agents take advantage of public information characterized by a sequence of independent and identically distributed Gaussian random variables. a Department of Mathematics, Koc University, Rumeli Feneri Yolu 34450 Sariyer, Istanbul, Turkey b Industrial Engineering Department, Koc University,Rumeli Feneri Yolu 34450 Sariyer,Istanbul, Turkey *Correspondence to: Mine Ça˘ glar, Department of Mathematics, Koc University, Rumeli Feneri Yolu 34450 Sariyer, Istanbul, Turkey. E-mail: [email protected] Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239 227

Upload: ipek

Post on 11-Apr-2017

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Parameter estimation of an agent-based stock price model

Research Article

Received 1 April 2012, Revised 20 November 2012, Accepted 8 January 2013 Published online 19 February 2013 in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/asmb.1968

Parameter estimation of an agent-based stockprice model

Mine Çaglara*†, Nihal Bahtiyarb and Ipek Altıntasa

The influence of the behavior and strategies of traders on stock price formation has attracted much interest. It is assumed that thereis a positive correlation between the total net demand and the price change. A buy order is expected to increase the price, whereasa sell order is assumed to decrease it. We perform data analysis based on a recently proposed stochastic model for stock prices.The model involves long-range dependence, self-similarity, and no arbitrage principle, as observed in real data. The arrival timesof orders, their quantity, and their duration are created by a Poisson random measure. The aggregation of the effect of all ordersbased on these parameters yields the log-price process. By scaling the parameters, a fractional Brownian motion or a stable Levyprocess can be obtained in the limit. In this paper, our aim is twofold; first, to devise statistical methodology to estimate the modelparameters with an application on high-frequency price data, and second, to validate the model by simulations with the estimatedparameters. We find that the statistical properties of agent level behavior are reflected on the stock price, and can affect the entireprocess. Moreover, the price model is suitable for prediction through simulations when the parameters are estimated from real data.The methods developed in the present paper can be applied to frequently traded stocks in general. Copyright © 2013 John Wiley &Sons, Ltd.

Keywords: stock price; long-range dependence; self-similarity; Hurst parameter

1. Introduction

Recently, an awareness of how the behavior and strategies of traders affect the stock price formation has attracted muchinterest. Researchers in economics, statistical physics and probability have approached the problem from different perspec-tives and proposed various models. Empirical studies motivated by the observation of non-Gaussian return distributionswith fat-tails are supported by such theoretical models. In this paper, we estimate the parameters of an agent-based stockprice process using high-frequency price data.

Several studies in economics and statistical physics have attempted to represent the dynamics, in addition to the statis-tical properties, of real price data. See for example [1, 2] and references therein for modeling the effect of agent behavioron price formation. The availability of high-frequency data over the last decade has also triggered such studies [3]. In [2],all agents can be placed into one of two different states, possibly according to their trading behavior as fundamentalists orchartists. The dynamics can be represented by a Markov chain or Langevin process where agents are able to interact. Theprice changes are modeled as endogenous responses of the market to imbalances between demand and supply. Anotherstream of agent-based models has been introduced in [4], adapting the framework of percolation models from statisticalphysics. In these models, traders are situated in a lattice and form clusters of agents with the same strategy; buy, sell, andinactive. The power-law fluctuations are captured by drawing upon the critical behavior of percolation models at a certainthreshold of the connectivity level. In [5], deterministic functions are used to define the strategies of different types ofagents. According to this approach, each trading strategy is a signal processing element using past price information andcurrent net order. Then, a random price formation process is provided by adding white noise to the model. In [6], an agent-based model is constructed by agents’ decisions among three states: buying, selling, or staying idle. In their decisions,all agents take advantage of public information characterized by a sequence of independent and identically distributedGaussian random variables.

aDepartment of Mathematics, Koc University, Rumeli Feneri Yolu 34450 Sariyer, Istanbul, TurkeybIndustrial Engineering Department, Koc University, Rumeli Feneri Yolu 34450 Sariyer, Istanbul, Turkey*Correspondence to: Mine Çaglar, Department of Mathematics, Koc University, Rumeli Feneri Yolu 34450 Sariyer, Istanbul, Turkey.†E-mail: [email protected]

Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239

227

Page 2: Parameter estimation of an agent-based stock price model

Probabilists use a Levy process or a fractional Brownian motion, which exactly model the statistical properties oflong-range dependence and self-similarity observed in real data. Various other models based on semi-martingales, purelycontinuous, pure jump, or a mixture of both are also available. For model selection, a class of estimators is provided toinfer the fine structure of the underlying process in [7]. Recently, stochastic processes based on Poisson shot-noise [8] andsemi-Markov processes [1] have been suggested as stock price models. In [1], two different semi-Markov processes areconstructed to represent active and inert investors. The model involves additional effects of both inert and active investorsin the log-return process. The relation between the presence of inert investors and long-range dependence is identified.

To represent the agent behavior in detail, we consider a new model inspired by the semi-Markov model and based ona Poisson random measure [9]. It also involves long-range dependence and self-similarity with no arbitrage, as observedin real data. In our model, variables such as the arrival times of orders, their quantity, and the duration of the trade areall created by a Poisson random measure. The aggregation of the effect of all orders based on these parameters yieldsthe log-return process. By scaling the parameters, a fractional Brownian motion or a stable Levy process is obtained inthe limit.

In this paper, the statistical estimation of the model parameters is performed by comparing the stochastic model withreal price-time data. Statistical estimation methods along with least squares function estimation are used. The estimationis followed by the simulation of the stochastic process for validation and prediction purposes. It is found that the model issuitable for prediction when the parameters are estimated from real data. The methods developed in the present paper canbe applied to frequently traded stocks in general.

The remainder of this article is organized as follows. Section 2 gives an account of the available transaction data andreviews the price process. Section 3 focuses on parameter estimation for this model. Simulation and validation of the priceprocess is performed in Section 4. Finally, Section 5 gives the conclusions.

2. Data and model

2.1. High frequency transactions data

The data used in this paper are retrieved from Istanbul Stock Exchange, including high frequency trading data in granu-larity of seconds between February 2007 and December 2009. We select one of the most liquid stocks from the bankingsector in Turkey for analysis, aiming to demonstrate the model and its estimation methods. Figure 1 shows the price seriesfor this stock. The data have been supplied by Istanbul Stock Exchange for research purposes. The original data filescontain not only the price information but also order type, order quantity, transaction quantity, order time, order date,and transaction time. These provide the required information to estimate all parameters of the price model. On the otherhand, ‘Monthly Price and Return Data’ for 1986–2011 are available in the web page [10] for all stocks in Istanbul StockExchange. An analysis of several stocks from various sectors would be valuable for comparison purposes.

We expand the trading data to a continuous price process by using the tick-by-tick interpolation method [3]. Thisamounts to assuming the most current price at the interpolated ticks of seconds. It is indicated in [3] that the time scalesfrom 1 s to 10 min are useful to observe the price formation process, whereas data of time scale from 10 min to 1 h are

March 2007 November 2007 June 2008 February 2009 December 20090

2

4

6

8

10

12

time (in sec)

stoc

k pr

ice

(TL)

Figure 1. Price series of a stock from the banking sector.

228

Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239

M. ÇAGLAR, N. BAHTIYAR AND I. ALTINTAS

Page 3: Parameter estimation of an agent-based stock price model

M. ÇAGLAR, N. BAHTIYAR AND I. ALTINTAS

convenient for seasonality analysis. We analyze seasonality, periodicity, and trend using daily prices, whereas we use datain seconds in order to fit more delicate parameters.

We first check for periodicity in the data set by studying the periodogram of the log-returns. The smoothed periodogramis computed by SAS (SAS Institute, Inc., Cary, NC, USA). There are several peaks in the periodogram, but none of themcorresponds to meaningful periods such as days or weeks. Besides, no characteristic peak is observed. Later, a linearmodel is tested to detect the presence of any seasonality. Let Xt denote the log return on the interval Œt � 1; t �. ANOVAis used to test daily and monthly effects in Xt as well as the effect of their interaction. Clearly, such a test assumes mean0 and independent Gaussian error terms, which is not the case in this paper. Nevertheless, the F-test yields a P -value of0.10, which is not significant. Neither the separate effects of days, months, nor their interaction turn out to be significantas a result of ANOVA. Therefore, we omit any seasonality component in the data.

2.2. An agent-based model for transactions

We consider a stochastic process Y D fY.t/ W t 2RCg for a stock price given by

Y.t/D Y.0/ exp.mt CZ.t// (1)

whereZ.t/ is called the log-price process. Our approach is along the lines of a (fractional) Black–Scholes model, but withZ replacing a (fractional) Brownian motion [11]. In the present work, the log-price Z.t/ is modeled as an aggregationof the effects of orders placed by all active agents in Œ0; t � where Z0 D 0 [9]. The effect of each order is assumed to beproportional to its duration and volume. Another assumption of the model is that there is a positive correlation betweenthe total net demand and the price change. Thereby, we expect that a buy order of an agent increases the price, whereas asell order decreases it.

We assume that each agent’s arrival is independent and identically distributed and occurs according to a Poisson process.We use the triplet .Si ; Ui ; Ri / to represent a single order from an agent, where Si is the arrival time of the order i , Uiis the duration of its effect on the price, and Ri is the rate of effect. The rate variable R is a convertor to monetary unitshaving positive values in case of a buy order and negative values in case of sell. These triplets compose the Poisson randommeasure N.ds; du; dr/ with mean measure �.ds; du; dr/.

Let .�; F;P / be a probability space. Let BR denote Borel � -algebra on R. Let N be a Poisson random measure on�R�RC �R;BR˝BRC ˝BR

�with mean measure

�.ds; du; dr/D �ds �.du/ �.dr/

where � > 0 is the arrival rate of the underlying Poisson Process, 1 < ı < 2, � is the distribution of a random variable Rand � is a probability measure satisfying Z 1

u

�.dy/� h.u/u�ı

ıas u!1 (2)

where h is a slowly varying function at infinity.Log-price process Z.t/ is given by an integral with respect to random measure N.ds; du; dr/ as

Z.t/D

Z 1�1

Z 10

Z 1�1

ru

�f

�t � s

u

�� f

��su

�N.ds; du; dr/: (3)

In this setting, f W R �! R is a Lipschitz continuous function representing the effect of an order and f .x/D 0 if x 6 0.Note that the log-price process defined in (3) is in fact a summation. Because the Poisson random measure is atomic, theintegral can be written as a series

Z.t/DXi WSi6t

RiUi

�f

�t � Si

Ui

�� f

��Si

Ui

�(4)

where i is used as an index to enumerate the atoms of N . Because f .x/ D 0 for x 6 0, only the orders that have arrivedbefore time t count.

Two limiting processes emerge when the parameters of Z.t/ are scaled properly; a fractional Brownian motion (fBm)or a stable Lévy process [9]. In simplified versions of the scaling theorems, one obtains a fractional Brownian motionwhen trading occurs more frequently, but in smaller quantities. We obtain a stable Levy process when the trading occursless frequently, but with higher effect rate. The primary sufficient condition for approximation of the limit processesand the long-range dependence aimed in the model is that the distribution of duration has a regularly varying tail asgiven in (2).

Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239

229

Page 4: Parameter estimation of an agent-based stock price model

3. Parameter estimation

3.1. Interarrival distribution

In the price model, the order arrivals are assumed to form a Poisson process. A Poisson process is essentially generated byarrivals that occur independently and one at a time. Estimations in this part are based on another characterization, whichis derived from independent and exponentially distributed interarrival times.

We use order times for fitting an exponential distribution. The data for each month is analyzed separately. We give thedetails of August 2008 for illustration. The order data are classified into buy/sell, and day/session types. The session typeorders are valid only during one session, whereas a day type order remains valid for both sessions during the same day.Out of 272,271 total orders in August 2008, there are 64,232 day type, 147,554 session type, and 60,485 canceled orders.

An exponential distribution is fit to buy and sell orders separately. The parameter � of the exponential distribution hasthe units sec�1 to match with the arrival rate of orders. It is estimated to be the reciprocal of the mean of interarrival timesof orders. The results are given in Table I where we see that a new order of each type arrives every 3–4 s. The overall meanof all interarrivals is approximately 2 s. The only situation that we neglect is that especially in the beginning of sessions,more than one order arrive at the same time. Observed mean and standard deviations are nearly equal to each other inboth buy and sell order cases. Although this fact strengthens our exponential distribution hypothesis, the Kolmogorov–Smirnov test rejects our hypothesis because of the curse of dimension effect. For shorter sequences of length 1000, the fitis significant with P -value less than 0.05.

The probability density function of the exponential distribution is illustrated along with the histogram for sell-orderinterarrivals in Figure 2. The day and session types are plotted separately. The buy order case is similar. Because of visualfit and the closeness of the estimates of the mean and the standard deviation in Table I, as expected in an exponential dis-tribution, we conclude that orders arrive according to a Poisson process. The parameter � is estimated also for each monthafter August 2008 until the end of 2009, which is a sufficiently long period to observe the general pattern and convenientlyhas the same data format. The results are similar and � ranges from 0.32 to 1.29 s�1 for combined arrivals of buy andsell type.

3.2. Hurst parameter and order durations

In this subsection, the Hurst parameter which governs long-range dependence of the process will be estimated and a heavy-tailed distribution will be fit to the order durations. The two are closely related because the Hurst parameter can be foundboth directly from the time series and from the tail parameter of the duration distribution according to the price model.

Table I. Interarrival time of buy and sell orders: mean, standard deviation, and parameterestimate as well as counts of session and day type orders.

Mean (s) Std. Dev. O� Session type Day type

Buy orders 3.295762 3.295855 0.303420 76,037 27,862Sell orders 4.184979 4.184979 0.238950 71,517 36,370

0 100 200 300 400 500 6000

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

interarrivals of day type sell orders (in sec)

Den

sity

interarrival times of day type sell ordersfitted exponential distribution

0 50 100 150 200 250 300 350 4000

0.05

0.1

0.15

0.2

interarrivals of session type sell orders (in sec)

Den

sity

interarrival_sell_session datafitted exponential distribution

Figure 2. Interarrival times for sell orders.

230

Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239

M. ÇAGLAR, N. BAHTIYAR AND I. ALTINTAS

Page 5: Parameter estimation of an agent-based stock price model

M. ÇAGLAR, N. BAHTIYAR AND I. ALTINTAS

Various methods exist for estimating the Hurst parameter H from a long-range dependent time series. Recently, resultsof a comparative analysis of the six most frequently used estimators ofH are reported in [12]. These belong to three mainclasses of estimation techniques of H : (i) based on data analysis in the time-domain; (ii) based on data analysis in thefrequency-domain; and (iii) based on wavelet transform of data. In the first group are the rescaled adjusted range estimatoror R/S-statistic, the variance-time estimator, and the index of dispersion for counts estimator, which are all nonparametric.In the second group, the Daniell periodogram-based estimator and Whittle maximum likelihood estimator are considered.The last one is wavelet estimation method, which is a semi-parametric method. This study confirms and complementsthe previous work while providing a more systematic analysis. Simulated series of Fractional Brownian motion is used toestimate H by using various methods. Then, Hurst parameter estimators are compared for bias and variance. The exper-imental results have confirmed that the wavelet estimator and the Whittle maximum likelihood estimator of H are theleast biased. Although the Whittle maximum likelihood estimator has significantly smaller variance and can be applied toshorter data samples than the wavelet estimator, the estimator is computationally simpler and faster than that of the Whittlemaximum likelihood estimator.

We use the wavelet estimation method of Veitch and Abry [13] for which the MATLAB (MathWorks, Natick, MA,USA) codes are available in [14]. It gives an asymptotically unbiased and efficient estimator, at least under normalityassumption. Let d.j; k/; k D 1; : : : ; nj ; j D 1; ; J be details of stationary series xk , k D 1; : : : ; N , obtained with discretewavelet transform where J is such that 2JC1 6N 6 2JC2. Here, nj shows the number of coefficients in octave j . In thewavelet estimation method [13], the mean of the details in the wavelength 2j , j D 1; : : : ; J are found after transformationof the time series with Daubechies wavelets. These mean values are theoretically used to predict the expected values ofdetails indicated with Edj and are the arithmetic average of nj values becausek D 1; : : : ; nj for each j . The relationshipwith the Hurst parameter H is approximately as follows, for large j values:

Ed2j / 22H�1 (5)

On the basis of this fact, the means of d.j; k/ values obtained from the data are plotted against j values at log–log scale.In essence, the slope of the line is used for estimation of H for higher octaves, say j1 to j2. The superior properties ofthe wavelet estimator are based on the fact that the long-range dependence present in the time domain representation iscompletely absent in the wavelet coefficient plane .j; k/. It is also found to be robust with respect to trend and seasonality.Therefore, we apply it directly to log-return data, as it has also been found free of seasonality under uncorrelated noiseassumptions with ANOVA as previously mentioned.

The Hurst parameter is estimated for the entire data set, together with a 95% CI by using log-returns in 1, 5 and 10 mingranularity, for which 15, 13, and 12 octaves are available, respectively. These octaves are used for j2, and j1 is takenas 7, 6 and 5, respectively. Time scales smaller than 1 min, in particular seconds, yield Hurst parameter values less than1=2, representing negative correlations. This is accounted for by the reactions of buyers and sellers against each other,as observed by other studies as well, for example, [3, 15]. When the time scale is increased to 1 min, the reactional upand down movements are replaced by a more persistent price variability. On the other hand, if time intervals longer than10 min are used, the CIs are large, because we have shorter sequences of aggregate data. Therefore, Hurst parameter esti-mation is performed only for 1, 5 and 10-min intervals. The estimates and the CIs are 0.546 [0.521, 0.572], 0.591 [0.548,0.634], 0.596 [0.553, 0.639], respectively, which indicate that the Hurst parameter is significantly greater than 0.5. A Hurstparameter around 0.6 has also been reported for several other price indexes or spot prices as well, for example, [1, 16].

To check whether the Hurst parameter varies with respect to the years, we analyze 3 years separately as given in Table II.The Hurst parameter estimates for 2008 are significantly less than those for 2007 and 2009. The estimates for 2007 and2008 do not even lie in the CI of 2009. This is considered as a signal of non-stationary data, and the analysis is repeatedfor 1/2, 1/4, 1/8 and 1/16 of the data [16] to detect the non-stationary part and the reason for this difference.

We start by dividing the data into 1/2 and repeat the estimation with the same initial octave used in the analysis of theentire set, namely j1 D 7. The first half of the data consists of log-returns between 02/02/2007 and 08/16/2008 whereasthe second half corresponds to 08/17/2008 and 12/31/2009, and have close H estimates especially for the most granular1-min data. Because there is a more prominent difference in yearly estimates, it is necessary to repeat the analysis for1/4 of the data set. The estimated Hurst parameter of the second quarter is found to be significantly lower than the rest

Table II. Estimates of Hurst parameter and 95% CIs.

2007 2008 2009

1 min 0.622 [0.544, 0.701] 0.421 [0.348, 0.494] 0.530 [0.462, 0.599]5 min 0.622 [0.528, 0.717] 0.335 [0.249, 0.421] 0.535 [0.455, 0.615]10 min 0.574 [0.479, 0.669] 0.513 [0.427, 0.600] 0.481 [0.402, 0.561]

Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239

231

Page 6: Parameter estimation of an agent-based stock price model

when the CIs are considered. For the most granular series of 1-min ticks, the estimates are 0:550; 0:436; 0:558; and0:495,respectively. We have further split the data into eight parts and pinpointed the part that has the lowest H. It is between04/07/2008 and 10/08/2008 where the Late-2000’s Global Financial Crisis begins. Because differentiation in H starts withfour pieces, we proceed with three different stationary parts in the data, namely with respect to years.

The Hurst parameter is directly related to the duration of orders, which is assumed to have a heavy-tailed distributionin the model. We illustrate the analysis for obtaining the parameters of a suitable distribution with August 2008. The datapertaining to the first six business days of a month are used because of the 16-digit storage limitation for ID numbers in theIstanbul Stock Exchange system. The ID of the orders that have arrived after this period exceeds 16-digit figures. TheseID numbers are crucial to link the order and transactions data files, which contain relevant information for calculating theduration. As the system zeroes its counter at the beginning of each month, the analysis is repeated for each month.

Order durations are calculated by taking the difference between the order arrival time and last transaction time for orderswith the same order ID. Figure 3 shows the histogram for session sell orders and the corresponding empirical CDF. Thehistograms for session buy orders and day orders are similar, except for the fact that the duration of session type orders isshorter on average than day orders, as the latter are valid in only one session. The session orders form about 74% of allrealized orders, which are 47,621 in total. All distributions appear to be heavy-tailed, as predicted by the model. To satisfythe condition (2), a Pareto distribution would be the simplest choice. We have chosen a generalized Pareto distribution,which has more parameters for greater flexibility in the fitting process. It is a built-in function in the distribution fittingtool of the Statistics Toolbox of MATLAB. The density function for the generalized Pareto distribution is given by

f .xjk; �; /D

�1C k

�x �

��1� 1k

(6)

where � is the scale parameter, is the threshold parameter, and k is the shape parameter effective on the tail in partic-ular. The parameter ı that we have used in the model coincides with 1=k here. Hence, we can find the Hurst index to be.3� 1=k/=2. Table III gives the parameters of the generalized Pareto distribution fitted on the order durations. Here, thethreshold parameter is not constrained to be positive in order to fit the tail better. In Figure 3, both the density fit and thefit to the tail of the CDF are provided. The procedures of the "paretotails" function of MATLAB are followed to fit the tailwith general location parameter. The duration data for a single month has a length of 35,000. The Kolmogorov–Smirnov

0 1000 2000 3000 4000 5000 6000 7000 80000

0.002

0.004

0.006

0.008

0.01

0.012

duration of session type sell orders (in sec)

Den

sity

duration_session_sell data

fitted generalized Pareto distribution

0 1000 2000 3000 4000 5000 6000 7000 80000.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

duration of session type sell orders (in sec)

cdf

empirical cdf

Pareto tail

Figure 3. Histogram and CDF of the duration of session type sell orders together with a generalized Pareto fit.

232

Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239

M. ÇAGLAR, N. BAHTIYAR AND I. ALTINTAS

Page 7: Parameter estimation of an agent-based stock price model

M. ÇAGLAR, N. BAHTIYAR AND I. ALTINTAS

Table III. Estimates for generalized Pareto distribution fit to order durations.

Orders k � H

Day type, buy 0.695319 1359.58 �600 0.781Day type, sell 0.572437 1103.84 �480 0.627Session type, buy 0.643222 266.13 �120 0.723Session type, sell 0.616000 215.81 �90 0.688

S statistic is approximately 1:22=pn for large n, which is 0.0065 for n D 35; 000, and 0.04 for n D 1000. The fit is

significant with a P -value less than 0.05 for shorter sequences in the order of 1000.From Table III, we see that the tail parameter estimates of the durations distributions for the order durations yield larger

values for the Hurst parameter than those found by estimation from the price series given above. Such discrepancy betweenthe real data and an idealized model is not unusual. On the other hand, it is consistent that the order realization times have aheavy-tailed distribution, just as the model predicts. The Hurst parameter is estimated from order durations in each monthafter August 2008 as well. A Pareto tail is fitted so thatH takes values in .0:5; 1/ as previously mentioned. The results aresimilar and H ranges from 0.63 to 0.81.

3.3. Estimation of the effect function and the effect rate

In this section, we exploit the same data set used in the estimation of the duration distribution for demonstrating the esti-mation of the effect function and effect rate. We do not account for orders that are fulfilled immediately upon arrival, asthey are realized instantaneously, and have no effect on the price according to our model. If incorporation of these into themodel is desired, then a more general jump process should be constructed to take into account the instantaneous jumps.Out of 28,357 orders, 76% of them had no transaction for a while and were realized at once later, whereas only 24% havebeen transacted in parts throughout a session.

Because our price model does not account for orders that completely become transacted after a while, we propose anexponential function that fits such dynamics best. It is given by

f .x/Dexp.ax/� 1

exp.a/� 1(7)

where the parameter a > 0 models the effect rate nonlinearly. In this way, we represent the jump at the last moment withthe closest continuous function, by choosing a as large as possible. For a given order, the change in time of the amountthat is transacted is represented by this function. The standard form of the function f will be shifted according to theorder arrival time and order duration, and will be scaled with the effect parameter r , as dictated by the model. Here, wewill use a single parameter r to convert the amount units to monetary units. Nonetheless at the end, this parameter will bemultiplied by the order amount, and will take a different value for each order.

It is evident that our approach here alters the original model for the purpose of representing the behavior in real data.We have distinguished the orders, not by r , which is a parameter that appears linearly in the model, but by the randomvariable a. By using the lsqcurvefit function in MATLAB, we have fitted f to the cumulative transaction function that isformed by the realization of orders. For this, upon dividing by the total number of transactions, we have scaled the observedfunction to the interval [0,1]. Furthermore, we have shifted its interval of definition to the interval [0,1] by scaling with thetransaction duration. Hence, we have seen to it that the observed function and the model function are both defined on thesame interval and take values in the same interval, so they have the same domain and range. The order dynamics andthe fitted effect function where a is estimated to be 1.2542 is shown in Figure 4.

Figure 5 shows the frequency distribution of a estimated from the data. The accumulation observed around the value700 matches with the orders that are all transacted at once, because the upper limit that we use in the least squares methodis 700. This is even steeper than a D 100 curve of Figure 4. The numeric precision in MATLAB limits us to 700 at most.The accumulation here is in accordance with the fact that 76% of 28,357 orders are orders that are transacted all at oncerather than piecewise. We cannot fit a single distribution because the plot in Figure 5 is bimodal. By considering the ordersthat are realized at the last moment separately, a heavy tailed distribution such as Pareto can be found suitable for the firstpart. Another option is to draw random numbers from the observed distribution via the bootstrap method instead of fittinga parametric distribution.

Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239

233

Page 8: Parameter estimation of an agent-based stock price model

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(scaled) duration

(sca

led)

qua

ntity

Figure 4. An order realization curve with estimated aD 1:2542:

0 100 200 300 400 500 600 7000

0.5

1

1.5

2

2.5x 104

Estimate of a

Fre

quen

cy

Figure 5. Distribution of parameter a.

On the basis of the model that is written in the form (4), the effect of order i can be shown by

riuif

�t � si

ui

�(8)

where si , ui , ri denote the realizations of the arrival time Si , the duration Ui , and the effect rate Ri , respectively. Notethat the effect is directly proportional to the transaction duration. When fitting the effect function, we have first normalizedit via division by total transaction quantity. This implies that ri not only serves the purpose of converting the transac-tion duration into monetary units in (8) but also has to include the effect of the quantity of transaction i , denoted by qi .Therefore, we define ri D rqi and estimate r as a single conversion parameter. So, the overall effect of an order i uponconversion into monetary units is given by riui D rqiui . Thinking of buy-orders as positive, and sell-orders as negative,ri becomes a real-valued random variable. With the function f that we have fitted, the intermediate values of the changeof the effect on the transaction period of a valid order can be obtained.

It is relevant to check the independence of the random variables effect rate R and duration U at this stage. From theearlier discussion, we need to consider the quantity Q instead of R as the only randomness in the effect rate. In Figure 6,a scatterplot of U and Q are given. There is no significant dependence between the variables, also as evident from theestimated correlation coefficient �0:0732.

All transaction amounts are given in the histogram in Figure 7, where buy-orders are positive, and sell-orders are nega-tive. When a normal distribution is fitted to this histogram, the mean is found to be 1039:7 and standard deviation 35; 442.In the calculations later, we continue with a normal distribution with mean 0:1 and standard deviation 3:54 in order towork with small numbers and keep track of the scaling 1=10; 000.

234

Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239

M. ÇAGLAR, N. BAHTIYAR AND I. ALTINTAS

Page 9: Parameter estimation of an agent-based stock price model

M. ÇAGLAR, N. BAHTIYAR AND I. ALTINTAS

0 0.5 1 1.5 2 2.5 3x 104

0

0.5

1

1.5

2

2.5

3

3.5x 105

Duration (sec)

Qua

ntity

Figure 6. Scatterplot of quantity versus duration.

−4 −3 −2 −1 0 1 2 3 4

x 105

0

5000

10000

15000

quantity

freq

uenc

y

Figure 7. The distribution of order quantity; buy (positive) and sell (negative) orders.

Let A be the set of indices of all buy-orders on a specified day, and let S be the set of all sell-orders. Let qi be thetransaction magnitude of order i . Because all transactions end at the end of the day, for simplicity, we can use the transac-tion magnitudes of all buy and sell-orders during an entire day to arrive at a common r value. Assuming that buy-ordersincrease the price, and sell-orders decrease the price, at the end of the day, we have the following relation:

Xi2A

rqiui �Xi2S

rqiui D r

"Xi2A

qiui �Xi2S

qiui

#/ logP1 � logP0: (9)

In the relation shown earlier, P0 and P1 stand for the opening and closing prices for a day respectively. We can interpretthe left hand side of (9) as the net effect of buy and sell-orders on the price. Because the cumulative effect of orders isconsidered, the intermediate values of the effect function f do not have any role in this calculation. For the six-day periodfrom which data are drawn, we calculate the terms on the left hand side and right hand side of (9) excluding r , and estimater from the slope by fitting a smallest squares line.

For the six-day period, whenPi2A qiui �

Pi2S qiui values are plotted on the x-axis, and the daylong logarithmic

change in the price is plotted on the y-axis, we have not been able to observe the predicted linear relation. The effect ofthe transactions is not reflected on the instantaneous prices on that day. Moreover, the slope of the regression line is notsignificantly different from 0, and the correlation coefficient is very low. Therefore, we have plotted the right hand side of(9) with a delay of 1 and 2 days, suspecting that the effect may be reflected on the prices after a certain period of time.In these calculations, we have scaled the quantity with 1=10; 000 and duration with 1=3600 to maintain the numericalprecision used by the computer. The 2-day shifted price change is shown in Figure 8. Here, a positive linear relationshipwith a high correlation coefficient of 0:9258 is observed. The conversion factor r is found to be 1:93e�05. Although it

Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239

235

Page 10: Parameter estimation of an agent-based stock price model

−1000 −500 0 500 1000 1500 2000−0.03

−0.02

−0.01

0

0.01

0.02

0.03

0.04

Daily net contribution, with delay of two days

Dai

ly lo

g−re

turn

Figure 8. Daily log-return versus daily net contribution with delay of two days.

is a very small number, it is significant, with a p-value of 0:008. As a result of the regression, the y-intercept is foundto be �0:013. The associated p-value is 0:03. Even though it is statistically significant to some degree, this parameterwill be taken to be 0 in order for r to act as a multiplication factor. A shift in time does not fundamentally change thenature of the model, but just means that the price sequence will emerge as a result of the transactions only after a certaintime delay.

Similar results are obtained for the rest of the data in which the estimates of r range from 7:9e�08 to 4:7e�05.To be used at the fitting stage, we assume that the factor ri has a normal distribution, because we have previouslyobserved that the quantity distribution is normal. We work back from the scalings previously mentioned and recoverthe correct parameters. In view of the scaling for the amount, we had calculated the mean to be 0:1 and the standarddeviation to be 3:54. Because the scaling for time is equivalent to converting seconds into hours, we can assume thatthe coefficients ri come from a normal distribution with mean .1:93e � 05/.0:1/ D 1:93e � 06 and standard deviation.1:93e� 05/.3:54/D 6:8322e� 05, and that time is measured in units of hours. If one measures time in units of seconds,one needs to make the mean and the standard deviation even smaller. The reason for obtaining numbers as small as thosepreviously mentioned is that the effect at the end is proportional to the logarithm of the price.

4. Simulation and validation

4.1. Simulation method

The simulation method is based on the approximation of the series (4). The only difficulty is that the time has an infinitepast. In order to reflect long-range dependence through time, one needs to simulate 10 units of past for every unit of timeafter 0 in practice. This can be overcome by the simplification obtained from the conditional distribution with a Paretodistribution [17]. The Pareto probability density function is given by g.x/D ııx�ı�1 for x > 0 where is the thresholdparameter and ı is the shape parameter. Several threshold parameters are considered for simulating orders arriving beforetime 0. For example, among the orders that arrive in period Œ�t;�t C 1� with �t < 0, only those durations that exceed twill be effective, so it is enough to create only the orders with long durations by using a threshold parameter of t [17].

4.2. Simulation results

Because the year 2008 was the year of crisis, simulations are performed on the basis of the parameters estimated for 2007and 2009, which were more stable years compared with 2008. The price series for these years are shown in Figure 9.

The parameters chosen for the simulation are based on estimations that were performed in previous sections. The param-eters common to 2007 and 2009 are �D 0:3 s�1, equivalently 18 min�1, D 489 s, equivalently 8.15 min, aD 50 for f in(7), and the effect rate parameters, namely a normal distribution with mean 3:2�10�8, and standard deviation 1:14�10�6.The shape parameter is estimated as ı D 1:76, equivalently H D 0:62 for 2007, and ı D 1:94, equivalently H D 0:53

for 2009. Here, we have used a Pareto distribution with shape parameter obtained from Hurst parameter estimations from

236

Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239

M. ÇAGLAR, N. BAHTIYAR AND I. ALTINTAS

Page 11: Parameter estimation of an agent-based stock price model

M. ÇAGLAR, N. BAHTIYAR AND I. ALTINTAS

the price time series with wavelet method. The threshold parameter is estimated by the smallest observation in the orderdurations data set. This estimate is close to an estimate that would be obtained from the mean, namely 1010 s, usingmoment estimation method. In the latter case, the mean is ı=.ı� 1/, which can be used to estimate after ı is set by theHurst parameter estimates.

The process is created by employing minutes as time units in the simulation. Then, the price process Y is obtained byusing

Y.t/D Y.0/ emtCcZ.t/

which is similar to Equation (1). Here, we do not need parameter c, because the conversion coefficient r that is estimatedabove converts the product of ‘duration’ and ‘quantity’ into price change directly, and the standard deviation for Z is notfixed to 1. On the other hand, the data analysis showed that m is not significantly different from 0 in our data. That is whywe have used mD 0 and c D 1 in the simulation.

We have observed that on small scales, the similarity between the simulation results and real data is not sufficient. Thevariation in the simulated series at small scales is less than that in real price series. In order to balance this, the standarddeviation of the effect rate r is taken to be on the order of 10�4 instead of 10�6 . Afterwards, it is made to fall into thesame interval as real data, by exploiting the self-similarity property of process Z, through scaling in space and time. Wecan see a limitation of interpretation of each i as an order in the model: the longest time scale is a day in that case, whichis compensated by self-similarity scaling. This conversion also compensates for the alteration performed in the standarddeviation of R. The simulated time series is given in Figure 10. The values of the obtained price sequence changes withina few TL interval as in real data. Simulated and real paths show visual resemblance.

February 2007 June 2007 December 20074

5

6

7

8

9

10

11

12

Pric

e (T

L)

January 2009 June 2009 December 20091.5

2

2.5

3

3.5

4

4.5

5

5.5

6

6.5

Pric

e (T

L)

Figure 9. Real price series for 2007 and 2009.

February 2007 June 2007 December 2007

5

6

7

8

9

10

11

12

Pric

e (T

L)

January 2009 June 2009 December 20091.5

2.5

3

3.5

4

4.5

5

5.5

6

Pric

e (T

L)

Figure 10. Simulated trajectories.

Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239

237

Page 12: Parameter estimation of an agent-based stock price model

5. Comparison with related models

Because price is generated by the demand of market participants, microeconomic models of interacting agents have beenexploited in several models as reviewed in [18]. These are conducted by physicists in collaboration with economists, com-bining the physicists’ familiarity with computer simulations and the experience of economists, who are more interestedin exact mathematical solutions. Their aim is to reproduce the real data characteristics, at least, fat tails and correlationsobserved in relation to the price series. The interest still continues; see, for example, [19] where trade and order waitingtime distributions are considered empirically and by simulations. Our study, which aims at reproducing the empiricalfindings, long-range dependence in particular, is along the same lines. In [18], the practical significance of the agent-based models is questioned by considering their predictions, which are later confirmed. In the published work, two of thethree cases are found to hold. Although this is hard to measure exactly, a good fitting model is promising for predictingfuture events.

On the probability side, the models not only aim at an exact formulation of the statistical properties but also pro-vide analytical results on option pricing specific to the model. They are based on semi-martingales, Poisson randommeasures, semi-Markov processes, or fractional Brownian motion. The statistical estimation methods are also studied sep-arately with these models (e.g., [7]). Therefore, an analysis with stochastic processes provides a more complete accountof market fluctuations when price process details are considered. Stochastic processes based on Levy processes, fractionalBrownian motion and semi-martingales replicate the statistical features of the price data with a few parameters. On theother hand, agent-based stochastic models are superior, as they could have predictive power from measurable features inagent behavior. Therefore, we compare our model with that of [1] as it also includes the agent behavior explicitly. Bothare analytical approaches, which are shown to have a limiting Gaussian distribution under certain scaling. They can easilybe simulated for further properties, which are hard to show analytically.

In [1], inert agents’ behavior is shown to cause the long-range dependence observed in the log-price series. For eachagent, a semi-Markov process is associated with the trading activity. The underlying Markov chain is used to determine theintention of an agent such as sell, buy or stay inactive. The so-called inert agents are assumed to remain in the inactive statefor a time period with a heavy-tailed distribution that is sufficient to prove that the scaling limit is a fractional Brownianmotion. Estimation or model validation results based on the model of [1] are not available. Their arguments on empiricalevidence of inertia are somewhat qualitative. It is known that some traders have long inert periods, but this is not enoughto conclude that such periods follow a heavy-tailed distribution.

There are many more parameters to be estimated in the model of [1] as it was not devised from a practical point of view,but from a theoretical perspective. First, an important stage of estimation involves generating ‰.t/, which corresponds tothe size of a typical trade at time t . Second, the number of investors at any time is fixed, which is not realistic. Estimatingthis number in a piecewise constant fashion to overcome this problem requires the estimation of such intervals as well.Then, there are the parameters of the Markov chain and sojourn times associated with the semi-Markov process of eachagent. These are the transition probabilities between the buy, sell, inactive states, in the least, and the distribution of thetimes spent in each state. In comparison, the contributions of our model go beyond reproducing a long-range dependentsequence, or capturing the covariance structure and volatility as follows. The number of investors varies randomly overtime, whereas there are a fixed number of investors at any time in [1]. The number of parameters are neither as small as inidealized limiting models, nor as many as in the semi-Markov model of [1]. The price model is robust, with a spectrum ofdistributions between a Gaussian limit and a Levy motion, which is possibly skewed. The correlation between trade quan-tity and price is intrinsic in the model; implying the volume and price correlation observed in real data (e.g., [19]). Thecorrelation between trade waiting times and returns (e.g., [19]) appears implicitly in the model through order durations,which support at least one transaction directly affecting the price.

These properties demonstrate that our model is rich enough to be applied in practice. When the price follows this model,we can provide a formula to identify a martingale measure analytically as in [8] or find the maximum drawdown as a mea-sure of risk by simulations [20]. In brief, our model predicts the stock price with an adequate number of parameters, arobust distribution, and a capacity to predict the prices from the current demand.

6. Conclusions

The results in this paper have two-way implications between theoretical price process and the statistical properties of realprice data. We have noted the following:

� The log-price data show long-range dependence in the scales of 1 min and larger. The Hurst parameter is about 0.6,which is consistent with previous studies.

238

Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239

M. ÇAGLAR, N. BAHTIYAR AND I. ALTINTAS

Page 13: Parameter estimation of an agent-based stock price model

M. ÇAGLAR, N. BAHTIYAR AND I. ALTINTAS

� The arrival process of orders is Poisson.� The time until the transaction of an order has a heavy-tailed distribution.

The last two observations are crucial assumptions of the price model used in this paper.We conclude that the statistical properties of agent-level behavior are reflected on the stock price and can affect the

entire process. The price model is suitable for prediction when the parameters are estimated from real data. The methodsdeveloped in the present paper can be applied to frequently traded stocks in general. As future work, an analysis of stocksfrom various sectors can be considered for comparison purposes.

Acknowledgements

This work was supported by TUBITAK Project 109T665. The data were provided by IMKB, Istanbul Stock Exchange Marketfor research purposes. The authors would like to thank anonymous referees for their valuable comments, which improved thepresentation of the results.

References1. Bayraktar E, Horst U, Sircar R. A limit theorem for financial markets with inert investors. Mathematics of Operations Research 2006;

31:789–810. November.2. Alfarano S, Lux T, Wagner F. Time variation of higher moments in a financial market with heterogeneous agents: an analytical approach.

Journal of Economic Dynamics and Control 2008; 32:101–136.3. Dacorogna MM, Gencay R, Muller U, Olsen RB, Pictet OV. An Introduction to High Frequency Finance. Academic Press: San Diego, 2001.4. Cont R, Bouchaud J. Herd behaviour and aggregate fluctuations in financial markets. Macroeconomic Dynamics 2000; 4:170–196.5. Farmer JD, Joshi S. The price dynamics of common trading strategies. Journal of Economic Behavior and Organization 2002; 49:149–171.6. Ghoulmie F, Cont R, Nadal J-P. Heterogeneity and feedback in an agent-based market model. Journal of Physics: Condensed Matter 2005;

17:S1259—S1268.7. Woerner JHC. Analyzing the fine structure of continuous time stochastic processes. Progress in Probability 2010; 63:473–492. Birkhauser.8. Kluppelberg C, Kuhn C. Fractional Brownian motion as a weak limit of Poisson shot noise processes—with applications to finance. Stochastic

Processes and their Applications 2004; 113:333–351.9. Çaglar M. Stock Price processes with infinite source poisson agents, 2011. submitted, arXiv:1106.6300v1 (Available from: http://arxiv.org/abs/

1106.6300v1 [Accessed on 20 November, 2012]).10. (Available from: http://www.ise.org/Data/StocksData.aspx [Accessed on 20 November, 2012]).11. Akçay Z. Fractional Brownian motion in finance from arbitrage point of view. MSc thesis, Koc University, August 2007.12. Jeong H-D J, Lee J-S R, McNickle D, Pawlikowski K. Comparison of various estimators in simulated FGN. Simulation Modelling Practice and

Theory 2007; 15:1173–1191.13. Veitch D, Abry P. A wavelet-based joint estimator of the parameters of long-range dependence. IEEE Transactions on Information Theory

1999; 45:878–897.14. (Available from: http://www.cubinlab.ee.unimelb.edu.au/~darryl/ [Accessed on 20 November, 2012]).15. Tsay RS. Analysis of Financial Time Series. Wiley: New York, 2002.16. Willinger W, Taqqu MS, Teverovsky V. Stock market prices and long-range dependence. Finance and Stochastics 1999; 3:1–13.17. Çaglar M. Simulation of fractional Brownian motion with micropulses. Advances in Performance Analysis 2004; 3:43–69.18. Samanidou E, Zschischang E, Stauffer D, Lux T. Agent-based models of financial markets. Reports on Progress in Physics 2007; 70:409–450.19. Scalas E, Raberto M, Cincotti S. Statistical analysis and agent-based microstructure modeling of high-frequency financial trading. IEEE Journal

of Selected Topics in Signal Processing 2012; 6:381–387.20. Vardar C. On the correlation of the supremum and the infemum and of maximum gain and maximum loss of Brownian motion with drift.

Journal of Computational and Applied Mathematics. DOI: 10.1016/j.cam.2013.01.010.

Copyright © 2013 John Wiley & Sons, Ltd. Appl. Stochastic Models Bus. Ind. 2014, 30 227–239

239