robust and adaptive algorithms for online portfolio selection
TRANSCRIPT
This article was downloaded by: [University Of Pittsburgh]On: 20 July 2013, At: 03:31Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: MortimerHouse, 37-41 Mortimer Street, London W1T 3JH, UK
Quantitative FinancePublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/rquf20
Robust and adaptive algorithms for online portfolioselectionTheodoros Tsagaris a , Ajay Jasra b & Niall Adams ca Tudor Capital Europe LLP, UKb Department of Statistics and Applied Probability, National University of Singapore,Singapore 117546c Department of Mathematics, Imperial College London, London SW7 2AZ, UKPublished online: 18 Jun 2012.
To cite this article: Theodoros Tsagaris , Ajay Jasra & Niall Adams (2012) Robust and adaptive algorithms for onlineportfolio selection, Quantitative Finance, 12:11, 1651-1662, DOI: 10.1080/14697688.2012.691175
To link to this article: http://dx.doi.org/10.1080/14697688.2012.691175
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose ofthe Content. Any opinions and views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be reliedupon and should be independently verified with primary sources of information. Taylor and Francis shallnot be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and otherliabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Quantitative Finance, Vol. 12, No. 11, November 2012, 1651–1662
Robust and adaptive algorithms for online
portfolio selection
THEODOROS TSAGARIS*y, AJAY JASRAz and NIALL ADAMSx
yTudor Capital Europe LLP, UKzDepartment of Statistics and Applied Probability, National University of Singapore, Singapore 117546
xDepartment of Mathematics, Imperial College London, London SW7 2AZ, UK
(Received 5 October 2011; in final form 26 April 2012)
We present an online approach to portfolio selection. The motivation is within the context ofalgorithmic trading, which demands fast and recursive updates of portfolio allocations as newdata arrives. In particular, we look at two online algorithms: Robust-Exponentially WeightedLeast Squares (R-EWRLS) and a regularized Online minimum Variance algorithm (O-VAR).Our methods use simple ideas from signal processing and statistics, which are sometimesoverlooked in the empirical financial literature. The two approaches are evaluated againstbenchmark allocation techniques using four real data sets. Our methods outperform thebenchmark allocation techniques in these data sets in terms of both computational demandand financial performance.
Keywords: Adaptive systems; Quantitative trading strategies; Statistics; Portfolio allocation
JEL Classification: G1, G10, G11
1. Introduction
In portfolio allocation problems, investors aim to opti-mize the return of the invested capital based on some costfunction by allocating a fraction of the capital to anumber of different assets. In the long established mean–variance theory (Markowitz 1952) for asset allocation, thefraction of the capital invested in each asset is known asthe portfolio weight, and all weights together form alinear combination (portfolio) that is optimal when theexpected return of the portfolio is maximized for a fixedlevel of variance of the portfolio. The approach arguesthat maximization of expected returns does not guaranteethat the portfolio will have the smallest variance. Hence, atrade-off between the expected return and the variance ofthe portfolio provides a more effective diversification ofinvestors’ funds. Investors are considered risk averse andwould prefer the portfolio with the smallest risk whenexpected returns are equal. Moreover, a portfolio withsmaller variance is a desirable attribute, as investors couldleverage by increasing the capital allocation, so that theportfolio would achieve higher return on capital.
Although the mean–variance analysis theory initially
generated little interest, it is now a mainstream theory
whose principles are constantly visited and re-invented.
We also wish to clarify that the meanings of the terms
assets and instruments are used in this text interchange-
ably and they are deemed as available investment vehicles.However, in mean–variance optimization it is well
known that the portfolio weights can be highly unstable.
This is due to the difficulty of estimating expected returns
(Merton 1980). As a result, there has been a substantial
amount of recent interest in improving estimation proce-
dures, including Jagannathan and Ma (2003), Ledoit and
Wolf (2003, 2004), Fabozzi et al. (2007, 2010), Baltutis
(2009), DeMiguel and Nogales (2009) and DeMiguel et al.
(2009a,b). This work ranges from imposing constraints on
the optimization function, to robust portfolio estimation
procedures. Whilst these publications are vital for an
understanding of portfolio allocation, they are mainly
concerned with batch procedures, which require historical
observations, as opposed to online techniques, which are
equipped with recursive estimation mechanisms. Batch
procedures are not necessarily designed to be computa-
tionally efficient and address the streaming nature of
financial data, nor to handle the high dimensionality of
the available assets for allocation.*Corresponding author. Email: [email protected]
Quantitative FinanceISSN 1469–7688 print/ISSN 1469–7696 online � 2012 Taylor & Francis
http://www.tandfonline.comhttp://dx.doi.org/10.1080/14697688.2012.691175
Dow
nloa
ded
by [
Uni
vers
ity O
f Pi
ttsbu
rgh]
at 0
3:31
20
July
201
3
We approach the asset allocation problem from thealgorithmic trading perspective, that is when investmentdecisions regarding allocations are taken automaticallythrough investment allocation algorithms as soon as dataarrive. Algorithmic trading, otherwise known as auto-mated or systematic trading, refers to the use ofalgorithms to conduct trading without any human inter-vention. As an example, in 2006, one-fifth of global equitytrading was administered through algorithmic techniques(Keehner 2007). Such transactions are executed within afew milliseconds and any latency can make a differencebetween a profitable or loss-making trade. In this context,batch algorithms are unsuitable and we must consideronline procedures. One such consideration is the imple-mentation of algorithms relating to portfoliooptimization.
In this study we use ideas from mean–variance theoryto automate the process regarding portfolio optimization.In particular, we make use of the algebraic link of theclassic mean–variance theory to Ordinary Least Squares(OLS) to allocate capital among various assets. Weconstruct these algorithms bearing in mind certain con-siderations with regard to efficiency of trading and thecharacteristics of financial data. These algorithms mayaccount for one or more of the following attributes.
. Adaptive: They have the ability to adapt to non-stationary market environments. By dynami-cally incorporating new information into theportfolio weights, one is likely to improve thefinancial performance of the resultingalgorithms.
. Robust: They are able to counter the adverseeffect of outliers in estimation.
. Regularized: They have mechanisms to reducethe high level of noise exhibited in financialdata, either through direct regularization ordimensionality reduction techniques.
. Efficient: They are sequential, one-pass meth-ods to suit the nature of the problem, that is toprocess information fast in order to exploitinvestment opportunities as they occur.
The above considerations, together with ideas of assetallocation using regression, enable us to devise suitabletechniques for algorithmic trading. We will use thesetechniques on real data sets and compare them againstestablished and well-documented asset allocationmethods.
1.1. Contribution and structure
Online or multi-period portfolio optimization has beeninvestigated in the literature; a non-exhaustive listincludes Smith (1967), Helmbold et al. (1998),Frauendorfer and Seide (2000), Li and Ng (2000),Agarwal et al. (2006), Chapados and Bengio (2007) andKuhn et al. (2009). Montana et al. (2008, 2009) investi-gated online algorithms for statistical arbitrage trading
strategies. Some of the (computationally fast) portfoliooptimization techniques originate in computer sci-ence/machine learning and they are algorithmically dis-tinct from the standard mean–variance-type proceduresthat are often found in the empirical finance literature (asexemplified by the above list). As such, one of the mainobjectives of this article is to bridge the efficientalgorithmic techniques found in various disciplines withthe long-established portfolio selection literature infinance. The online algorithms are developed here forthree reasons.
. For their importance from an appliedperspective.
. To cross-fertilize financial ideas with ideas fromsignal processing, statistics, and computer sci-ence and lead to more efficient techniques.
. To illustrate the potential improvements infinancial performance.
There are a substantial number of ideas in the listedfinancial literature that can improve the current allocationtechniques. However, they are seldom used in empiricalfinance and our objective is to provide a simple exposureto these ideas. For example, the constraints typically usedin mean–variance problems (e.g., DeMiguel et al. (2009))correspond to standard Tikhonov regularization and arewell-understood in signal processing as helping to guardagainst instability induced by ill-conditioned matrices. Ill-conditioned matrices are often encountered in mean–variance theory because of the multi-collinearity of assetlog-returns, which may lead to rank deficient problems(Hansen 1996). Moreover, as mentioned above, adaptivealgorithms have the ability to adapt their estimates to theunderlying data and they are naturally more suitable fornon-stationary environments, such as those in finance.In the sequel, we construct two algorithms that are relatedto the batch mean–variance and minimum variancemethodology. These two methods use simple ideas fromsignal processing and statistics to construct fast androbust approaches to portfolio selection.
The paper is structured as follows. Section 2 introducesthe mean–variance theory and our online framework.Section 3 develops the computation for our two methods.In section 4 our methods are applied to four real data setsand, finally, in section 5 we conclude the paper bydiscussing possible avenues for future work.
1.2. Notation and setup
In this paper, the following notation is adopted. Allvectors are column vectors, and we denote the transposeby the prime symbol, i.e. (�0)0 ¼�. The column vector of dones is written Id and the d� d identity matrix is writtenId�d. Given a collection of d-vectors xj, . . . ,xT, 1� j5Tsay, the (T� jþ 1)� d matrix composed of the concate-nation of these vectors is written Xj:T. Denote the Holderp-norm by k�kp ¼ ð
Ppi¼1 j�ij
pÞ1=p. The trace of a matrix A
is written tr(A).
1652 T. Tsagaris et al.
Dow
nloa
ded
by [
Uni
vers
ity O
f Pi
ttsbu
rgh]
at 0
3:31
20
July
201
3
2. Portfolio selection
In this section we introduce the problem and describe our
framework. The log-returns of d financial instruments are
observed at times 1, 2, . . . ,T: x1, . . . , xT,
xn¼ (x1,n, . . . , xd,n)0 for n2 {1, . . . ,T}. An investor seeks
to construct a portfolio by optimally (in some sense)
allocating funds to a collection of d instruments.
2.1. Batch portfolio selection
Most portfolio selection problems are stated in a static
or batch manner. For completeness we describe the
mean–variance theory (Markowitz 1952). Denote the
mean and covariance matrices of the log-returns as
� and �, respectively. Then the objective is to solve the
problem
max�
�0��1
2�0��
� �, s.t. �0Id ¼ 1,
where � is the d-vector of portfolio weights. This
optimization problem is straightforwardly solved via
Lagrange multipliers. In practical situations, the esti-
mated mean and covariance is substituted into the
optimization problem, leading to a data-dependent solu-
tion. Intrinsically, many of the portfolio optimization
problems that are considered in the literature may be
written as
max�ff ðX1:T; �Þ þ �½�0Id � 1�g or
min�ff ðX1:T; �Þ þ �½�0Id � 1�g,
for some function f, Lagrange multiplier � and matrix of
log-returns X1:T. For example, one of the problems from
DeMiguel et al. (2009b),
min�f�0��þ �k�k1 þ �½�
0Id � 1�g,
corresponds to a minimum variance portfolio with L1
constraints, where that ‘hat’ notation refers to an
estimated quantity. We note that this approach involves
constructing a covariance matrix and subsequently com-
puting its inverse to arrive at a solution. As mentioned
above, d can be very large and this often leads to
computational delays. These computational delays can be
detrimental in algorithmic trading, where tick data are
streaming and decisions about allocation need to be taken
instantly based on the latest information.Another known portfolio allocation technique that is
used throughout this article is the naive strategy which
assigns equal constant portfolio weights to all instruments
in the portfolio (i.e. Id1/d). This simple allocation
technique is of practical importance as it has been
shown in an empirical study (DeMiguel et al. 2009b) to
outperform many other complicated allocation
techniques.
2.2. Online portfolio selection
The simple extension studied in this paper is to consider
min�nffnðX�n:n; �nÞ þ �n½�
0nId � 1�g,
�n ¼ 1 _ ðn�Wþ 1Þ, n 2 f1, . . . ,Tg, ð1Þ
where �n is a Lagrange multiplier andW is a fixed windowof data. That is, the parameters are now estimated over asliding window W, rather than using all available data.Note that when W¼ 1, then �n¼ n, i.e. Xn:n, which is thevector xn. This is chosen to ensure that our algorithms areof approximately fixed computational complexity pertime-step (see section 4.4 for a discussion on windowlength selection). Note that the larger the sliding window,the more data are used for estimation. Conversely, thesmaller the sliding window, the more weight is given tomore recent data. Equation (1) includes some interestingspecial cases, such as
fnðX�n:n; �nÞ ¼ kIW � X�n:n�nk22 þ �nk�nk
22, ð2Þ
which could be considered a sequential ridge-regression,with d being the regularization parameter. This latterformulation is equivalent to a mean–variance problem(see section 2.2) with L2 constraints (see Britten-Jones(1999) for details). Note also that the function ofHelmbold et al. (1998) (F in their notation) also fallsinto the above framework.
The reason for giving (2) is to provide a link betweenmean–variance theory and recursive estimation algo-rithms. As such, we are able to devise recursive assetallocation algorithms through the use of recursive leastsquares for dealing with streaming data and take advan-tage of the number of regularization methods developedfor regression to deal with the inherent instability of theportfolio solution to estimation error.
2.2.1. Objective functions. The first case we propose is
fnðX�n:n; �nÞ ¼Xni¼1
�n�i�ðrið�nÞÞ, ð3Þ
with W equal to the size of all available observations, �:R!R
þ differentiable and rið�nÞ ¼Pd
j¼1½ð1� xj,i�j,nÞ=j,i�.The parameter is a scale parameter estimate that is usedto standardize the residual error (1� xj,i�j,n); we use arobust scale parameter defined later in section 3.1.1. Theparameter � is a forgetting factor; this is a well-knowntool in adaptive filtering (e.g., Haykin (1996)). The choiceof 1 in ri(�n) follows the work of Britten-Jones (1999).A heuristic explanation is as follows: setting the responsevariable equal to a positive constant implies that ourportfolio is minimized against an ideal portfolio that haspositive returns for each time step and is riskless (a vectorconstant has zero variance).
The objective function (3) corresponds to a sequentialform of M estimation (see, e.g., Deng (2008) for relatedideas). Equation (3) follows the recent trend in portfoliooptimization of using robust statistical procedures to
Robust and adaptive algorithms for online portfolio selection 1653
Dow
nloa
ded
by [
Uni
vers
ity O
f Pi
ttsbu
rgh]
at 0
3:31
20
July
201
3
estimate parameters of interest (see, e.g., DeMiguel andNogales (2009)). For reasons that will become apparent,the approach associated with (3) is termed robust-exponentially weighted recursive least squares(R-EWRLS).
The second case is
fnðX�n:n; �nÞ ¼1
2½�0nðX
0�n:nF�nX�n:nÞ�n þ �nk�nk
22�, ð4Þ
where F�¼ diag(�W, �W�1, . . . , 1). The task of estimatingparameters � and d is discussed later in section 3.2.1. Thiscorresponds to an online minimum-variance-type algo-rithm with L2 constraints (termed online minimum-variance (O-VAR) throughout). The matrix F� introducesa forgetting factor into the optimization scheme. The useof the estimated second moment instead of the covarianceis for computational reasons; we did not find a substantialdiscrepancy (in terms of financial performance) whencompared with using the covariance matrix. Note that amore standard recursive estimate can be obtained usingthe function
fnðX1:n; �nÞ ¼1
2½�0nðx
0nxnÞ�n þ �nk�n � �n�1k
22�,
but is not considered here due to the relationship of (4) tothe standard minimum-variance approach.
The batch version of (4) was studied by DeMiguel et al.(2009b). The L2 constraints correspond to distance L2
with the naive allocation strategy. The naive approach toallocation surpasses estimation of the sample mean andone would expect relatively stable portfolio weights.
Note that, for both procedures, there are unknownparameters �, d and . The next section discusses howthese parameters may be set, in addition to a recursiveformulation of the proposed optimizations.
3. Updating schemes
In this section we introduce our recursive updatingapproaches. This section is core to the development ofthe adaptive allocation algorithms as it formulatesefficient regression techniques appropriate for the natureof algorithmic trading.
3.1. R-EWRLS
Let us introduce some notation:
exn ¼ ðx1,n=1,n, . . . ,xd,n=d,nÞ0, �n ¼
Xdj¼1
1
j,n
� �,
qðxÞ ¼1
x
d�
dxðxÞ:
Then, ignoring the Lagrange multiplier (the result can berenormalized), we minimize (3). Differentiating, it follows
that the optimal �n solves
Xni¼1
�n�iqðrið�nÞÞ �iexi ¼Xni¼1
�n�iqðrið�nÞÞXdj¼1
xj,ij,i�j,n
" #exi:Since this equation is often nonlinear, we use the
approximation rn(�n)¼ rn(�n�1), with �n�1 given (i.e. by
the previous step, or by initialization). Now, let zn denote
the L.H.S. and �n ¼Pn
i¼1 �n�iqðrið�nÞÞexiex0i, then we solve
zn ¼ �n�n:
As �n ¼ ��n�1 þ qðrnð�ÞÞexnex0n, and writing Pn ¼ ��1n , it
follows via the Sherman–Morrison (e.g., Haykin (1996))
formula that
Pn ¼ ��1Pn�1 � �
�1nex0nPn�1,
with
n ¼qðrnð�n�1ÞÞ�
�1Pn�1exn1þ qðrnð�n�1ÞÞex0nPn�1exn :
Using zn ¼ �zn�1 þ qðrnð�nÞÞ �nexn, we thus have the
recursion
�n ¼ �n�1 þ qðrnð�n�1ÞÞ �nPnexn � nex0n�n�1:We have presented a recursive least-squares procedure
whose algebraic equivalence to the Kalman filter is well
known and understood (see chapter 12 of Sayed (2003)).
It should be remarked that related ideas have appeared in
the work of Cipra and Romera (1991) and our approach
is similar to robust filters (Masreliez 1975, Martin 1979,
Schick and Mitter 1994).
3.1.1. Robust recursive scale estimate. The calculation ofthe scale parameter is now detailed. Our approach uses
robust statistics. First, we note that the Median Absolute
Deviation (MAD) (e.g., Huber (2004)) estimate of scale is
given by
MADVðXn�Vþ1:nÞ ¼ medjðjxi,j �medlðxi,lÞjÞ,
j, l 2 f1 _ n� Vþ 1, . . . , ng, i 2 f1, . . . , dg,
where V is a chosen data window and med(�) is the median
function. Recent research has pointed to efficient tech-
niques to compute the median with O(V) average com-
plexity using recursive binning schemes (Tibshirani 2008).Second, an exponentially recursive median absolute
deviation (EWMAD) estimator is considered,
bðmedÞi,n ¼ �bi,n�1 þ cð1� �Þmedjðjxi,j � b�ðmedÞ
i,n jÞ,
j 2 f1 _ n� Vþ 1, . . . , ng,
where � is another forgetting factor and b�i,n is an
EWMED (Exponentially Weighted Recursive Median),
given by
b�ðmedÞi,n ¼ �b�ðmedÞ
i,n�1 þ ð1� �Þmedjðxi,jÞ,
j 2 f1 _ n� Vþ 1, . . . , ng,
1654 T. Tsagaris et al.
Dow
nloa
ded
by [
Uni
vers
ity O
f Pi
ttsbu
rgh]
at 0
3:31
20
July
201
3
where c¼ 1/��1(3/4). 1/0.6745 is a correction factor tomake MAD consistent with the normal distribution (e.g.,Huber (2004)). The EWMED is similar to the well-documented EWMA (e.g., Hamilton (1994)) with the onlydifference being that the EWMED estimator replaces thelatest information xn by its median estimate over thesliding window. On the basis of much preliminaryinvestigation on specific data sets, we have arbitrarilyset V¼ 20 and �¼ 0.99 for all of the applications. Due tothe robust nature of the above estimation, this method istermed robust-exponentially weighted recursive leastsquares.
3.1.2. Dealing with noisy data. As discussed above,financial data are inherently noisy and exhibit a highdegree of dependence. The noise hampers the ability toforecast accurately, and the dependence structure ofassets accentuates the problem, as pointed out in theintroduction; this is via the instability of portfolio weightscaused by potential rank deficiency. To alleviatethese problems we adopt a low rank matrix approxima-tion of X�n:n, W51, in order to eliminate those compo-nents of the data that contain most of the noise.This approach aims to optimally approximate, withrespect to some norm, a matrix of lower rank whileretaining the same dimension. It is well known thatthe best low rank approximation can be found bySingular Value Decomposition (SVD) under theFrobenius norm (see, e.g., Stewart (1993)). The approachis as follows.
Let 1 �er5 n ^ d be given and denote the singularvalue decomposition (SVD) of the returns matrixX�n:n ¼ Un�nV
0n. Consider the truncated SVD (Hansen
1987)
�ern ¼ Kn 0
0 0
!, Kn ¼
�1,n 0 � � �
0 . ..
� � �
..
. ...
�er, n
0BBBBB@
1CCCCCA,
then set X�n:n ¼ Un�ernV0n. We replace xn in the recursionsin section 3.1 with the final row of X�n:n. The value ofer isset during training. Note that the SVD of X�n:n can beupdated incrementally using the methods of Bunch andNielsen (1978).
3.2. Online minimum variance
The minimum-variance scheme is somewhat less involved.Suppose �, dn and W are given. It is straightforward toshow that, at time n, the solution of the optimizationproblem (1), with fn as in (4), is
�n ¼ððX0�n:nF�X�n:nÞ þ �nId�dÞ
�1Id
I0dððX
0�n:n
F�X�n:nÞ þ �nId�dÞ�1
Id
: ð5Þ
The main objective here is to calculate this quantityquickly. Suppose we are given the eigen-decomposition ofX0�n:nF�X�n:n, i.e. X
0�n:n
F�X�n:n ¼ Qn�nQ0n, then the inverse
in (5) is equal to
ððX0�n:nF�X�n:nÞ þ �nId�dÞ�1¼ Qnð�n þ �nId�dÞ
�1Q0n,
that is, one need only calculate the inverse of a
diagonal matrix. The recursive calculation of the eigen-
decomposition can be achieved using the methods of Yu
(1991) in O(d2) instead of the standard O(d2þ ) ( 40) for
matrix inversion. More specifically, the method of Yu
(1991) is to re-calculate the new eigen-decomposition of
R0 from R to R0 of the form
R0 ¼ Rþ �1�01 � �2�
02,
with �1, �2 vectors of the appropriate dimension. In our
case we have
�X0�nþ1:nþ1F�X�nþ1:nþ1
¼ X0�n:nF�nX�n:n � �Wx�nx
0�nþ ��1xnþ1x
0nþ1,
so the same ideas may be applied. Note that the
incremental SVD mentioned above could also be used.
3.2.1. Adaptive calculation of dn and j. There are stilltwo free parameters to be set, dn and �.
First, consider dn. Lacking an analytical solution, we
investigate d numerically based on an initial training data
period. To investigate the effect of d perturbations to
portfolio returns, we choose a short initial training
sequence of data to calculate traceðX0�n:nF�X�n:nÞ for a
given �. Then we select a collection of G equally spaced
points between trðX0�n:nF�X�n:nÞ=d and trðX0�n:nF�X�n:nÞ.
The algorithm is initialized at any of those points. At
re-balancing times (the times when the allocation is
altered) we compute the portfolio returns over the
training period for each of the G points and select the
one that generates the largest portfolio return. The range
of the grid is based upon the recommendations of Ledoit
and Wolf (2004). We found our results to be extremely
robust to the initial value of d.Second, consider �. In this scenario, we only recalculate
� at re-balancing times, which incurs the cost of
re-computing the eigen-decomposition of X0�n:nF�X�n:n.
We follow a similar procedure to that in adaptive
filtering. An attractive criterion for portfolio selection is
to minimize
kIW � X�n:n�nk22
(Britten-Jones 1999). As a result, at the mth re-balancing
time, the following stochastic approximation-type update
is used:
�m ¼ �m�1 þ1
ml
Xml
j¼ðm�1Þlþ1
sgn@
@�½kIW � F�X�j:j�j k
22�
� �:
See, for example, chapter 14 of Haykin (1996) for similar
self-tuning approaches for recursive filtering. Note that, if
�m =2 (0, 1), then we set �m¼ �m�1.
Robust and adaptive algorithms for online portfolio selection 1655
Dow
nloa
ded
by [
Uni
vers
ity O
f Pi
ttsbu
rgh]
at 0
3:31
20
July
201
3
3.3. Discussion
The two methods described here have some complemen-tary aspects. First, from the perspective of dealing withnoisy data, the methods use separate, but well-known,procedures. R-EWRLS uses the truncated SVD, whilstthe O-VAR uses a form of Tikhonov regularization viathe L2 constraint. Secondly, the R-EWRLS methodaccounts for outliers by down-weighting them through aby-product weighting quantity (q; see section 3.1) of therobust cost function. On the other hand, O-VAR does nothave an embedded mechanism to account for outliers asthey occur.
Thirdly, the O-VAR is adaptive to non-stationaryenvironments and accounts for variability in the under-lying environment through the self-tuning forgettingfactor �. However, the rank d needs to be set duringtraining. In the R-EWRLS case, � needs to be calibratedin advance and such calibration needs to take place everytime a shift occurs in the underlying environment. Also,ranker of the low rank approximation (section 3.1.2) needsto be set in advance.
It is likely that one procedure is likely to be preferredgiven the scenario. For example, when the data aresubject to a change in the economic cycle, one wouldexpect the O-VAR to perform significantly better; how-ever, O-VAR does not take into consideration theexpected returns. In that respect, O-VAR may be moresuitable for assets that are expected to grow in the future.For instance, it may be suitable for funds whose under-lying investments have positive expectation and desire toallocate robustly. Alternatively, it could be suitable for analgorithmic trading system that allocates between alloca-tion strategies in an adaptive and efficient way. Finally,the R-EWRLS is linked to mean–variance theory andshould be suitable for any asset class and as a stand-aloneallocation strategy. Note that O-VAR is similar to a moreefficient version of the function of DeMiguel et al.(2009b).
4. Application
The techniques described in section 3 are applied to fourdata sets. Financial performance is compared withstandard methods. Note that a zero-rate risk-free interestrate is assumed throughout.
4.1. Data description
We perform our analysis on four data sets: spot ForeignExchange (FX), constituents of DJ Euro Stoxx, portfoliosof the NYSE, NASDAQ, and AMEX, and constituentsof the FTSE-100 (see figure 1). As mentioned below, wewill include a risk-free asset.
Our first data set consists of 19 spot currencies quotedagainst the American dollar. For ease of interpretation,we use the convention ‘USD/. . .’, where USD is alwaysthe base rate and is read as ‘units of foreign currency per 1USD’. The data set covers a period of approximately 5 1
2
years of daily data, from 01/10/2002 to 12/03/2008. Thespot data were obtained from the ‘FXHistory’ function-ality of OANDA (http://www.oanda.com).
The second consists of 43 constituents of DJ EuroStoxx 50, of approximately 5 years of closing prices, from21/10/2002 to 13/09/2007. The data were obtained fromYahoo (http://uk.finance.yahoo.com/) and were adjustedfor discontinuities related to financial events, such asstock splits and bonus issues.
The third data set is the daily returns on 25 portfoliosformed on size and book-to-market from the NYSE,NASDAQ and AMEX. The data are from 01/07/63 to 31/12/08. The data were obtained from http://mba.tuck.dart-mouth.edu/pages/faculty/ken.french/data_library.html.
Our final data set are six constituents (BA, Barclays,Lloyds TSB, M&S, RBS, Tescos) of the FTSE-100 shareindex. The daily data are the adjusted closing prices takenfrom 17/07/04 to 17/07/09 and also obtained from Yahoo.These particular data will be of interest to observe theperformance of relatively simple allocation schemesduring two financial crises: the sell-off in 2006 causedby algorithmic trading and the sub-prime mortgage crisisin 2008.
4.2. The allocation strategies
In our comparison, in addition to the methods developedin section 3, we consider three standard batch strategies.
. NAIVE: This encompasses allocating funds inequal amounts to each asset. As notedby DeMiguel et al. (2009a), this strategyprovides an important benchmark despite itssimplicity.
. Mean–Variance (M-VAR): This is the standardMean–Variance portfolio. To remove anynumerical difficulties with inversion, as notedin the introduction, the covariance matrix isreplaced by �þ �Id�d. The regularizationparameter is chosen as � ¼ traceð�Þ, similar toLedoit and Wolf (2004).
. Minimum Variance (VAR): Standard minimumvariance with the covariance replaced as for M-VAR.
For R-EWRLS, �(x)¼ log{cosh(x)}. If one couldinterpret the procedure as a regression, this would implya hyperbolic secant error distribution (Benesty andGansler 2001). We experimented with a more standard� (e.g., Huber’s loss function (Huber 2004)), but did notfind that this significantly affected our conclusions. Notethat we implemented the method of Helmbold et al.(1998), but did not find a significant difference with theNAIVE strategy.
4.3. Comparison criteria
In order to compare and investigate our strategies, weconsider various criteria. The basic idea is to initialize allof the strategies in some way; the first 2 years (504 datapoints) of each data set are used for training (i.e. omitted
1656 T. Tsagaris et al.
Dow
nloa
ded
by [
Uni
vers
ity O
f Pi
ttsbu
rgh]
at 0
3:31
20
July
201
3
afterwards). In particular, the M-VAR and VAR strate-gies use the first 2 years of data to estimate the portfolioweights and these are used until the first re-balancinginstant. The re-balancing instant is then determined goingforward by the re-balancing window W, i.e. re-balancingevery 250 data points. Then the data in the time up to thelast re-balancing period is used to re-estimate the weights.The weights are initialized as 1 and are employed fromday 2 onwards. Note that the actual weights used tocompute portfolio returns are only based upon thosecalculated at re-balancing times. That is to say, we updatethe weights for the online methods, but only employ newweights at re-balancing times. As such, trading is infre-quent and the transaction cost associated with theseallocation strategies is negligible. Therefore, we refrainfrom using transaction costs, as this would have intro-duced another layer of assumptions since transactioncosts are often substantially different from firm to firmgiven their ‘bargaining’ power to negotiate down tradingcommissions.
The criteria employed are standard in financial appli-cations. The returns for each day are calculated and weconsider the annualized returns and volatility, the Sharperatio, the percent average daily gain and loss, the percent
of winning trades (WT), the certainty equivalent (CE)(DeMiguel et al. 2009a), the maximum draw-down(MDD) and the turnover (TO). Of these, perhaps thelast two need a little explanation. The maximum draw-down is equal to
�minv1, v1 þ v2, . . . , v1 þ � � � þ vT, v2, v2
þv3, . . . , v2 þ � � � þ vT, . . . , vT
� �,
where vi is the percentage return at period t. In otherwords, it constitutes the maximum movement from peakto trough of the cumulative returns in percentage terms.The turnover is a measurement of the frequency oftrading. It is the average of the absolute difference of theportfolio weights between re-balancing times. For theSharpe ratio, we subtract the risk-free rate, assumed to be0.5% (annualized). In addition, the risk-free asset istradeable.
4.4. Initialization
We now discuss the selection of parameters for theR-EWRLS approach. We explore the Sharpe ratio for thespot FX and DJ Euro Stoxx 50 data sets over a grid of
0 500 1000 1500 2000
02
46
810
12
time
pric
e
0 200 400 600 800 1000 1200
010
020
030
040
0
time
pric
e/10
(a) (b)
0 2000 4000 6000 8000 10000
050
010
0015
0020
00
time
pric
e
0 200 400 600 800 1000 1200
200
400
600
800
time
pric
e
(c) (d)
Figure 1. Price data. Note that the DJ Stoxx prices have been scaled by 10.
Robust and adaptive algorithms for online portfolio selection 1657
Dow
nloa
ded
by [
Uni
vers
ity O
f Pi
ttsbu
rgh]
at 0
3:31
20
July
201
3
equally spaced values of the parameter er and theforgetting factor �. The results of the exploratory analysis
are depicted by means of contour plots in figure 2.For the R-EWRLS allocation strategy using the equi-
ties data, we note in figure 2 that the Sharpe ratio is
positive throughout the parameter space. There is anevident pattern that lower values of er exhibit higherSharpe ratios and the difference becomes more pro-nounced for larger values of �. For the FX data set, wenote (figure 2) that there are evident structures of higherSharpe ratio regions in the parameter space, suggestingdependence on the parameters � ander. In particular, thebest performance is achieved for values of � approxi-mately between 0.94 and 0.98, when er is greater than 4.On the basis of such plots, we select the values ofer and �.
The choice of W is important for both of our methodsand, to an extent, conflicts with �. That is to say, insteadof making � large, W can be made smaller, and vice versa.However, the choice of W is also a computational issue;we may only want to attribute a set memory to the storageassociated with the data. This is the line we follow and setW¼ 250 (approximately 1 year of trading), which is nottoo large for computational purposes and does not
interfere substantially with the data memory profileimplied by � for the purposes of portfolio selection.This is, the exponential decay profile would only betruncated for W greater than 250. Then the role of � is farclearer with respect to the forgetting of the data.
4.5. Numerical results
The algorithms were run with re-balancing performedevery 50, 150 and 250 days. On the basis of training, the
R-EWRLS used er ¼ 5 for the first two data sets, er ¼ 23for the third ander ¼ 2 for the fourth; respectively �2 {0.8,0.8, 0.8, 0.75}. For O-VAR, G¼ 100 (see section 3.2.1)and the initial �¼ 0.05. Note that, in each instance, theforgetting factor converged close to 1 (implying very littleforgetting) when there were sufficient re-balancingperiods. To study the impact of the training on
performance, we also investigated the in-sample perfor-mance (see table 4).
We conducted a computational speed comparisonbetween the batch mean–variance optimization approachand our methods. We coded the methodologies in Matlab(version 7.4) using an Intel Core 2 CPU machine runningat 2.00 GHz with 2 GB of RAM. In a data matrix of1000� 500 dimension, we found that an iteration needsapproximately 15 milliseconds compared with 2 secondsfor the batch mean–variance computation. In a separateexperiment, we increased the number of rows from 1000to 5000. The batch approach computation time increasedto 6 seconds. The results can be found in tables 1–3. Someof the annualized volatilities of the strategies shown in thetables would be rather high and unrealistic for aninvestor, but the results are clearly valid as we comparethe Sharpe ratio, which adjusts for volatility of theunderlying strategy. However, one needs to be cautiouswhen comparing the maximum draw-down of allocationstrategies, as this depends on the volatility of theunderlying strategy. Let us consider each data set inturn. The discussion below relates to the out-of-sampleperformance.
4.5.1. Spot FX results. Our first observation is that onlythe R-EWRLS method consistently produces positivereturns. Indeed, this is true with respect to differentre-balancing periods. This partly suggests (also fromfigure 2) that, for particularly noisy data, the truncatedSVD has a beneficial outcome with respect to portfolioweight computation; this is in contrast to M-VAR, whoseperformance is sensitive to changes in W and this result isin line with the literature (see the Introduction for adiscussion on the instability of M-VAR weights). We alsonote that the portfolio weights of R-EWRLS are more‘active’, as indicated by the turnover. This could alsoimply that R-EWRLS adapts better to the underlyingenvironment, given that it delivers consistently betterperformance than M-VAR. The O-VAR, due to itssimilarity to the NAIVE strategy, is unable to provide
Sharpe REWRLS − EQ
λ
r_tilde1 5 9 13 17 21 25 29 33 37 41
0.7
0.73
0.76
0.79
0.82
0.85
0.88
0.91
0.94
0.97
1
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
1.25
1.3Sharpe REWRLS − FX
λ
r_tilde1 3 5 7 9 11 13 15 17 19
0.7
0.73
0.76
0.79
0.82
0.85
0.88
0.91
0.94
0.97
1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Figure 2. Sharpe ratio to perturbation of the parameters of the R-EWRLS method for DJ Stoxx (left panel) and spot FX (rightpanel). The parameter � is the forgetting factor and the parameterer is the rank of the low rank data approximation. We setW¼ 250.
1658 T. Tsagaris et al.
Dow
nloa
ded
by [
Uni
vers
ity O
f Pi
ttsbu
rgh]
at 0
3:31
20
July
201
3
positive returns; the latter exhibits particularly badperformance here. This is because the NAIVE allocationstrategy implies only long positions and is expected tobenefit from the long-term growth typically exhibited inequities, but not necessarily for FX spot prices.
4.5.2. DJ Euro Stoxx 50 results. Moving to the seconddata set a more familiar pattern (i.e. as often reported inthe literature) is displayed. The NAIVE and VARstrategies perform relatively well, with quite favourableSharpe ratios, given the simplicity of the strategies. TheO-VAR method performs marginally better than theVAR strategy, but with a noticeable increase in turnover.R-EWRLS also delivers satisfactory performance andoutperforms M-VAR.
4.5.3. Portfolio data. The portfolio data provide somevery interesting results. In this case the O-VAR providesthe most impressive results from a financial perspective,
but performance tends to decrease as the re-balancingtime increases. The success of the O-VAR method islinked to a wide variety of factors. First, due to its
similarity to NAIVE, this method is likely to fair verywell; see figure 1 and the remarks in section 4.5.1.Secondly, the O-VAR method should fair well because allparameters are adaptive to the data. However, we note
that R-EWRLS is only trained on the first 2 years of data.Since the data are 45 years long, 2 years is clearlyinsufficient to train the algorithm. Although this is a littleunfair (e.g., the parameters can be retrained every 5 years,
as would be the case in practice), it highlights a smalldeficiency of the R-EWRLS method. Thirdly, against the
Table 1. Algorithm performance across data sets. The portfolios are re-balanced every 50 days. See section 4.3 for details.
Data Method % gain % loss MDD % WT TO Ann. r. Ann. v. Sharpe CE
1 O-VAR 0.18 �0.16 14.25 48.12 0.54 �0.72 3.18 �0.23 �2.31VAR 0.21 �0.22 17.91 49.71 0.072 �0.99 3.37 �0.30 �2.68R-EWRLS 0.48 �0.49 8.51 53.12 19.21 6.54 9.91 0.66 1.59M-VAR 1.56 �1.64 217.38 48.99 13.18 �20.32 40.83 �0.50 �40.74NAIVE 0.23 �0.22 17.64 49.89 �1.79 4.01 �0.45 �3.80
2 O-VAR 0.56 �0.53 9.89 55.43 0.82 15.22 11.24 1.35 9.60VAR 0.59 �0.57 10.64 55.91 0.08 15.16 12.20 1.24 9.06R-EWRLS 0.63 �0.68 12.45 56.29 160.31 13.86 13.23 1.05 7.25M-VAR 1.69 �1.50 49.24 49.54 10.18 12.09 33.21 0.36 �4.52NAIVE 0.61 �0.58 11.07 54.73 15.75 12.40 1.27 9.55
3 O-VAR 0.39 �0.37 52.79 58.84 1.68 16.19 8.87 1.82 11.76VAR 0.57 �0.64 62.82 56.99 0.08 12.48 13.98 0.89 5.49R-EWRLS 4.70 �3.98 1221.21 50.83 73.46 107.45 415.37 0.26 �100.24M-VAR 1.59 �1.59 232.52 55.99 12.32 43.89 41.99 1.04 23.34NAIVE 0.59 �0.64 67.64 56.79 11.47 14.26 0.80 4.34
4 O-VAR 1.74 �1.72 65.71 49.45 1.01 �2.59 38.09 �0.07 �22.15VAR 1.73 �1.16 139.70 50.20 0.09 �9.01 40.43 �0.22 �29.23R-EWRLS 9.28 �8.50 827.36 47.97 15.03 �0.52 323.58 �0.001 �162.31M-VAR 2.36 �2.34 178.38 47.89 4.50 �26.89 50.36 �0.53 �52.07NAIVE 1.92 �1.94 152.03 50.59 �10.35 42.23 �0.25 �31.47
Table 2. Algorithm performance across data sets. The portfolios are re-balanced every 150 days. See section 4.3 for details.
Data Method % gain % loss MDD % WT TO Ann. r. Ann. v. Sharpe CE
1 O-VAR 0.22 �0.22 19.20 49.35 0.27 �1.42 3.87 �0.37 �3.36VAR 0.22 �0.21 18.94 49.78 0.07 �1.78 4.01 �0.44 �3.79R-EWRLS 0.58 �0.61 20.89 53.89 30.01 5.99 12.37 0.48 �0.20M-VAR 0.65 �0.66 24.02 51.74 5.13 0.97 13.99 0.07 �6.03
2 O-VAR 0.60 �0.56 9.89 55.39 0.38 16.29 12.01 1.36 10.29VAR 0.59 �0.58 10.14 55.48 0.07 16.01 12.12 1.32 9.95R-EWRLS 0.77 �0.82 14.47 56.48 42.29 17.28 16.24 1.06 9.16M-VAR 1.03 �1.00 37.28 52.90 4.99 15.39 22.03 0.70 4.38
3 O-VAR 0.41 �0.40 66.28 59.24 2.37 16.89 9.98 1.69 11.90VAR 0.56 �0.61 67.01 56.89 0.07 12.67 13.64 0.93 5.85R-EWRLS 8.47 �7.08 2210.29 50.83 133.83 195.07 654.79 0.30 �132.33M-VAR 1.09 �1.10 243.29 56.70 7.31 31.05 30.97 1.00 15.57
4 O-VAR 2.27 �2.27 93.38 51.84 1.68 9.99 57.59 0.17 �18.81VAR 1.79 �1.84 138.29 49.98 0.07 �11.01 41.01 �0.27 �31.52R-EWRLS 5.58 �5.13 348.39 48.48 12.28 �12.12 158.45 �0.07 �91.35M-VAR 1.88 �1.74 87.62 49.79 2.16 0.01 28.32 0.0001 �14.15
Robust and adaptive algorithms for online portfolio selection 1659
Dow
nloa
ded
by [
Uni
vers
ity O
f Pi
ttsbu
rgh]
at 0
3:31
20
July
201
3
VAR method, the smoothness of the portfolio weights isregulated by the Tikhonov regularization. This may havea beneficial outcome for the performance through betterestimation (see the Introduction for a discussion of rankdeficiency).
4.5.4. FTSE-100. The final data provide an interestingset of results. Due to a variety of economic-, cultural-(business-wise) and investor-related factors, many quan-titative equity hedge funds have performed poorly duringthe current financial crisis. As a result, it is of interestfrom an applied perspective to observe the results of ourmodels in such a difficult trading period. Rather unsur-prisingly, many of the strategies perform badly. However,in two instances, both of our online methods providepositive returns. This is encouraging, as to an extent it
suggests that the ability to process data as it arrives and
adapt our strategies accordingly is more useful in practice
than standard batch methods.
4.5.5. General comments. On the basis of our investiga-tions, we make the following observations.
(1) The R-EWRLS method can be successful (positive
returns) for noisy data. However, when the initial
training period is insufficient/unreliable, very
unstable results are obtained. In addition, high
turnovers were observed for this method.(2) The online and adaptive nature of the O-VAR
method, coupled with its link to the NAIVE
strategy, leads to consistently strong performance
in comparison with the methods tested here.
Table 4. Algorithm performance across data sets (in-sample). The portfolios are re-balanced every 50 days. See section 4.3 fordetails. Note that the naive strategy results do not change from table 1 and are thus omitted.
Data Method % gain % loss MDD % WT TO Ann. r. Ann. v. Sharpe CE
1 O-VAR 0.19 �0.09 9.29 49.29 0.60 0.34 2.08 0.16 �0.70VAR 0.20 �0.17 12.21 49.19 0.074 �0.80 2.59 �0.31 �2.10R-EWRLS 0.55 �0.49 7.29 53.99 18.52 6.79 9.98 0.68 1.80M-VAR 1.56 �1.28 201.10 49.26 12.98 �20.10 40.12 �0.50 �40.16
2 O-VAR 0.56 �0.52 9.21 55.29 0.78 15.79 11.12 1.42 10.23VAR 0.57 �0.52 10.11 55.01 0.08 15.19 12.00 1.26 9.19R-EWRLS 0.63 �0.66 12.10 56.79 161.19 13.99 13.01 1.07 7.49M-VAR 1.69 �1.50 49.04 49.76 9.19 12.29 30.41 0.40 �2.92
3 O-VAR 0.41 �0.32 51.18 59.10 1.69 16.83 8.27 2.03 12.70VAR 0.57 �0.60 63.10 57.29 0.08 13.29 13.18 1.01 6.70R-EWRLS 4.75 �3.37 1220.29 51.82 68.96 109.29 418.29 0.26 �99.86M-VAR 1.59 �1.60 220.19 56.28 11.29 44.19 41.01 1.08 23.69
4 O-VAR 1.76 �1.62 61.38 49.99 0.96 0.01 29.20 0.0001 �14.59VAR 1.76 �1.08 129.27 51.90 0.09 �2.50 20.31 �0.12 �12.67R-EWRLS 3.32 �2.13 129.19 49.28 2.55 29.20 66.29 0.44 �3.95M-VAR 2.39 �2.18 160.22 48.28 3.28 �14.29 30.06 �0.48 �29.23
Table 3. Algorithm performance across data sets. The portfolios are re-balanced every 250 days. See section 4.3 for details.
Data Method % gain % loss MDD % WT TO Ann. r. Ann. v. Sharpe CE
1 O-VAR 0.22 �0.22 20.01 49.58 0.31 �1.89 3.99 �0.49 �3.89VAR 0.21 �0.21 18.61 49.61 0.07 �1.87 4.10 �0.46 �3.92R-EWRLS 0.46 �0.49 11.34 53.71 16.48 5.62 7.17 0.78 2.04M-VAR 0.51 �0.52 32.83 49.97 4.89 �0.37 11.47 �0.03 �6.12
2 O-VAR 0.58 �0.54 9.12 55.29 0.69 16.47 11.88 1.39 10.53VAR 0.59 �0.56 10.10 55.17 0.06 15.99 12.06 1.33 9.96R-EWRLS 0.75 �0.78 14.01 55.97 26.02 17.68 15.76 1.12 9.80M-VAR 0.92 �0.90 22.09 54.08 3.40 19.57 19.03 1.03 10.06
3 O-VAR 0.45 �0.46 59.48 58.29 2.40 15.97 10.21 1.56 10.87VAR 0.58 �0.63 64.38 56.97 0.06 12.49 13.38 0.93 5.80R-EWRLS 1.99 �1.93 212.48 52.27 28.45 25.80 56.79 0.45 �2.60M-VAR 0.86 �0.92 109.34 56.01 5.79 18.50 21.79 0.85 7.61
4 O-VAR 1.73 �1.82 152.39 50.99 1.19 �11.41 41.69 �0.27 �32.26VAR 1.74 �1.82 144.29 49.79 0.09 �12.01 43.78 �0.27 �33.90R-EWRLS 3.27 �2.53 148.60 48.91 3.15 28.06 70.08 0.40 �63.10M-VAR 1.69 �1.64 76.28 48.99 1.58 �2.67 36.58 �0.07 �20.96
1660 T. Tsagaris et al.
Dow
nloa
ded
by [
Uni
vers
ity O
f Pi
ttsbu
rgh]
at 0
3:31
20
July
201
3
In terms of the first point, the R-EWRLS approach isrelated to M-VAR procedures, which can work well whenthere is a detectable drift signal in the data. Whencombined with the robust scale computation and noisereduction, a potentially superior method is derived.However, there are a number of free parameters, whichare to be set. As a result, significant training is requiredand hence the success of the method is reliant on thislatter procedure.
The second point is clearly reflected in tables 1–3.The drawbacks of the R-EWRLS method are alleviated,but with the potential deficiency of being related to theNAIVE strategy, which makes a naive assumption for thedirection of the market by having long-only positions.This can lead to poor performance, e.g. for the FXspot data.
It is remarked that, from table 4, the two introducedmethods perform reasonably well. Whilst, due to thelook-ahead bias, these results are not indicators of theperformance of our training method, they indicate that,when well-calibrated, our procedures can produce posi-tive returns in many data scenarios.
5. Summary
We have derived two efficient methods to computeportfolio weights online without the need of matrixinversion. We compared the two methods with existingtechniques in portfolio optimization using four data sets.We have shown that our strategies predominantly out-perform the benchmarks when performance is measuredby the Sharpe ratio (note that this includes the method ofHelmbold et al. (1998)).
Future research could focus on extending our approachto include transaction costs (bid–ask spread and commis-sion) as a function of the portfolio weight, as well asconsidering adaptive re-balancing strategies (e.g., Baltutis(2009)). For example, the O-VAR method does notexplicitly incorporate previous weights in its estimate and,as such, can lead to high turnovers. In addition, futurework could be directed to make R-EWRLS fully adap-tive. This requires the online selection of the number ofsingular values and lies on the interface of statistics,finance, signal processing and computer science. Finally,one of the drawbacks of the O-VAR method is its relationto the NAIVE allocation strategy. This could be removedby using L1-type constraints, for example, leading to anonline lasso (Tibshirani 1996) method (see, e.g.,Anagnostopoulos et al. (2008)). In this context, as it isnot required that the portfolio weights are required tosum to one (i.e. standard path-wise co-ordinate optimi-zation (Friedman et al. 2007)), we are left with an onlinequadratic programming problem. To our knowledge, withthe exception of Zhang and Li (2009), there is littlemethodology for this problem; we are currently workingtowards a solution. Our work also opens up interestingtheoretical questions; e.g., to investigate the sensitivity ofthe portfolio weights (as in DeMiguel and Nogales (2009))of online algorithms.
Acknowledgements
The second author is supported by an MOE Singapore
grant. The views of Theodoros Tsagaris presented here
reflect solely the author’s opinion; this work was com-
pleted whilst a PhD student at Imperial College London.
References
Agarwal, A., Hazan, E., Kale, S. and Schapire, R.E.,Algorithms for portfolio management based on the Newtonmethod. Paper presented at the 23rd International Conferenceon Machine Learning, 2006.
Anagnostopoulos, C., Tasoulis, D., Adams, N.M. and Hand,D.J., Online optimization for variable selection in datastreams. Paper presented at the 18th European Conferenceon Artificial Intellegence (ECAI 2008), 2008, pp. 132–136.
Baltutis, M., Non-stationary stock returns and time to revise theoptimal portfolio. Technical Report, University of Vienna,2009.
Benesty, J. and Gansler, T., A robust fast recursive least squaresadaptive algorithm. Paper presented at the Acoustics, Speech,and Signal Processing Conference (Proceedings ICASSP’01),2001.
Britten-Jones, M., The sampling error in estimates of mean–variance efficient portfolio weights. J. Finance, 1999, 54,655–671.
Bunch, J.R. and Nielsen, J.R., Updating the singular valuedecomposition. Numer. Math., 1978, 31, 111–129.
Chapados, N. and Bengio, Y., Noisy k best paths forapproximated dynamic programming with application toportfolio optimization. J. Comput., 2007, 2, 12–19.
Cipra, T. and Romera, R., Robust Kalman filter and itsapplication to time series analysis. Kybernetika, 1991, 27,481–494.
DeMiguel, V. and Nogales, F.J., Porfolio selection with robustestimation. Oper. Res., 2009, 57, 560–577.
DeMiguel, V., Garlappi, L. and Uppal, R., Optimal versus naivediversification: How inefficient is the 1/N strategy. Rev.Financ. Stud., 2009a, 22, 1915–1953.
DeMiguel, V., Garlappi, L., Nogales, F.J. and Uppal, R., Ageneralized approach to portfolio optimization: Improvingperformance by constraining portfolio norms. Mgmt. Sci.,2009b, 55, 798–812.
Deng, G., Sequential and adaptive learning algorithms for M-estimation. EURASIP J. Adv. Sig. Proc., 2008, Article ID459586.
Fabozzi, F., Kolm, P.N., Pachamanova, D.A. andFocardi, S.M., Robust Portfolio Optimization andManagement, 2007 (Wiley: Hoboken, NJ).
Fabozzi, F., Huang, D. and Zhou, G., Robust portfolios:Contributions from operations research and finance. Ann.Oper. Res., 2010, 176, 191–220.
Frauendorfer, K. and Siede, H., Portfolio selection using multi-stage stochastic programming. Centr. Eur. J. Oper. Res., 2000,7, 277–290.
Friedman, J., Hastie, T., Holfling, H. and Tibshirani, R.,Pathwise coordinate optimization. Ann. Appl. Statist., 2007, 2,302–332.
Hamilton, J.D., Time Series Analysis, 1994 (PrincetonUniversity Press: Princeton).
Hansen, P.C., The truncated SVD as a method for regulariza-tion. BIT Numer. Math., 1987, 27, 534–553.
Hansen, P.C., Rank-Deficient and Discrete Ill-Posed Problems,1996 (Polyteknisk Forlag).
Haykin, S., Adaptive Filter Theory, 1996 (Prentice Hall:Engelwood Cliffs, NJ).
Robust and adaptive algorithms for online portfolio selection 1661
Dow
nloa
ded
by [
Uni
vers
ity O
f Pi
ttsbu
rgh]
at 0
3:31
20
July
201
3
Helmbold, D.P., Schapire, R.E., Singer, Y. andWarmuth, M.K., On-line portfolio selection using multi-plicative updates. Math. Finance, 1998, 8, 325–347.
Huber, P.J., Robust Statistics, 2004 (Wiley: New York).Jagannathan, R. and Ma, T., Risk reduction in large portfolios:Why imposing the wrong constraints helps. J. Finance, 2003,58, 1651–1684.
Keehner, J., Milliseconds are focus in algorithmic trades.Reuters, 2007. Available online at: http://www.reuters.com/.
Kuhn, D., Parpas, P., Rustem, B. and Fonseca, R., Dynamicmean–variance portfolio analysis under model risk. J. Comp.Finance, 2009, 12, 91–115.
Ledoit, O. and Wolf, M., Improved estimation of the covariancematrix of stock returns with an application to portfolioselection. J. Empir. Finance, 2003, 10, 603–621.
Ledoit, O. and Wolf, M., A well-conditioned estimator for largedimensional covariance matrices. J. Multivar. Anal., 2004, 88,365–411.
Li, D. and Ng, W.L., Optimal dynamic portfolio selection:Multi-period mean–variance formulation. Math. Finance,2000, 10, 387–406.
Markowitz, H., Mean–variance analysis in portfolio choice andcapital markets. J. Finance, 1952, 7, 77–91.
Martin, R.D., Approximate conditional-mean type smoothersand interpolators. Paper presented at the IEEE Conference onDec. Contr., 1979.
Masreliez, C., Approximate non-Gaussian filtering with linearstate and observation relations. IEEE Trans. Aut. Contr.,1975, 20, 107–110.
Merton, R.C., On estimating the expected return on a market:An exploratory investigation. J. Financ. Econ., 1980, 8,323–361.
Sayed, A.H., Fundamentals of Adaptive Filtering, 2003 (IEEE:Piscataway Township, NJ).
Schick, I.C. and Mitter, S.K., Robust recursive estimation in thepresence of heavy tailed observation noise. Ann. Statist., 1994,22, 1045–1080.
Smith, K.F., A transition model for portfolio revision.J. Finance, 1967, 22, 425–439.
Stewart, G.W., On the early history of the singular valuedecomposition. Siam Rev., 1993, 35, 551–566.
Tibshirani, R., Regression shrinkage and selection via the lasso.J. R. Statist. Soc. Ser. B, 1996, 58, 267–288.
Tibshirani, R., Fast computation of the median by successivebinning. Preprint, Stanford University, 2008.
Yu, K.B., Recursive updating the eigenvalue decomposition of acovariance matrix. IEEE Trans. Sig. Proc., 1991, 39,1136–1145.
Zhang, Y. and Li, Z., Zhang neural network for online solutionof time-varying convex quadratic program subject to timevarying linear-equality constraints. Phys. Lett. A, 2009, 373,1639–1643.
1662 T. Tsagaris et al.
Dow
nloa
ded
by [
Uni
vers
ity O
f Pi
ttsbu
rgh]
at 0
3:31
20
July
201
3