bigdatainassetmanagement - thierry...
TRANSCRIPT
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Big Data in Asset Management1
Thierry Roncalli†‡
†Professor of Economics, Évry University, France‡Head of Research & Development, Lyxor Asset Management, France
ESMA/CEMA/GEA meeting, CNMV, Madrid, Spain
November 13, 2014
1The opinions expressed in this presentation are those of the author and are notmeant to represent the opinions or official positions of Lyxor Asset Management. Iwould like to thank Antoine Frachot, Head of GENES, Kamel Gadouche, Head ofCASD, Sébastien Roussel and Alain Viénot from CASD for their helpful comments andfor providing the materials on the CASD technology. I also thank David Bessis fromTINYCLUES for providing some charts used in this presentation.
Thierry Roncalli Workshop on Big Data in Asset Management 1 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Outline
1 What Does Big Data Mean?DefinitionAnalysis of Big Data ProblemsThe Missing Factor
2 Data Management IssuesSome ExamplesManaging Data ProtectionIllustration with CASD
3 Machine LearningMachine Learning andEconometrics
Some Lessons About MachineLearning
4 Some Applications in AssetManagement
Lasso ApproachNonnegative MatrixFactorizationMeasuring the Liquidity ofETFsThe Art of Backtesting (orData Mining)
5 Conclusion
Thierry Roncalli Workshop on Big Data in Asset Management 2 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
DefinitionAnalysis of Big Data ProblemsThe Missing Factor
What Does Big Data Mean?
DefinitionAnalysis of Big Data ProblemsThe Missing Factor
Thierry Roncalli Workshop on Big Data in Asset Management 3 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
DefinitionAnalysis of Big Data ProblemsThe Missing Factor
A very hot topic
McKinsey Global Institute
Big data: The next frontier for innovation, competition, and productivity
June 2011
Nature, Science, The Economist, MacKinsey.Obama, White House, etc.New York Times (The Age of Big Data), WallStreet Journal (Meet the New Boss: Big Data),Financial Times (Acute ‘Big Data’ ChallengesFacing Asset Managers), etc.Davos World Economic Forum (Big Data, BigImpact): Big data = new asset class.
Le Monde, Libération,Le Figaro, l’Agefi, 20Minutes, France Inter,etc.Canal+ (Big Data : lesnouveaux devins).Rapport Lauvergeon‘Innovation 2030’
Thierry Roncalli Workshop on Big Data in Asset Management 4 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
DefinitionAnalysis of Big Data ProblemsThe Missing Factor
Definition
McKinsey Global Institute (2011)Big data refers to datasets whose size is beyond the ability of typicaldatabase software tools to capture, store, manage, and analyze.
⇒ There are certainly few big data problems.
Various aspectsLarge dataset (Megabyte, Gigabyte, Terabyte, Petabyte, Exabyte)Unstructured data (networked data but fuzzy relationships)Data-driven research, business & decisionsHigh skills (IT, statistics, etc.)
⇒ Big data problems differ from one sector to another.�� ��Big data 6= unified science
Thierry Roncalli Workshop on Big Data in Asset Management 5 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
DefinitionAnalysis of Big Data ProblemsThe Missing Factor
Application domains of big data
Web analyticsPattern recognitionPersonal location trackingText analysisPublic sector administration (government)Health careScientific research (human genome, etc.)Targeted marketingRetail/customer behaviorCredit scoringBanking transactionsFinance
Thierry Roncalli Workshop on Big Data in Asset Management 6 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
DefinitionAnalysis of Big Data ProblemsThe Missing Factor
Big data blurs the frontiers between sciences
What is the link between mathematics and biology?
ISSISSISSISSISSISSISSISSISSISSISSISSN 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0N 0002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002002-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-99-9920202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020
Volume 58, Number 4
of the American Mathematical Society
April 2011
About the Cover: Collective
behavior and individual rules
(see page 567)
Deformations of Bordered Surfaces and Convex Polytopespage 530
Taking Math to Heart: Mathematical Challenges in Cardiac Electrophysiologypage 542
A Brief but Historic Article of Siegelpage 558
Remembering Paul Malliavinpage 568
Thierry Roncalli Workshop on Big Data in Asset Management 7 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
DefinitionAnalysis of Big Data ProblemsThe Missing Factor
Big data blurs the frontiers between sciences
The answer is finance:
QUANTITATIVE FINANCE
RENAISSANCE TECHNOLOGIES, a quantitatively based financial
management firm, has openings for research and programming positions at its Long Island, NY research center.
Research & Programming Opportunities
We are looking for highly trained professionals who are interested in applying advanced methods to the modeling of global financial markets. You would be joining a group of roughly one hundred fifty people, half of whom have Ph.D.s in scientific disciplines. We have a spectrum of opportunities for individuals with the right scientific and computing skills. Experience in finance is not required. The ideal research candidate will have:
• A Ph.D. in Computer Science, Mathematics, Physics, Statistics, or a related discipline
• A demonstrated capacity to do first-class research
• Computer programming skills
• An intense interest in applying quantitative analysis to solve difficult problems
The ideal programming candidate will have:
• Strong analytical and programming skills
• An In depth knowledge of software development in a C++ Unix environment
Compensation is comprised of a base salary and a bonus tied to company-wide performance.
Send a copy of your resume to: [email protected] No telephone inquiries.
An equal opportunity employer.
Thierry Roncalli Workshop on Big Data in Asset Management 8 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
DefinitionAnalysis of Big Data ProblemsThe Missing Factor
Architecture of big data problems
Figure: Supervised / unsupervised learning problems'
&
$
%
Large raw data�� ��
?
Structured data�� ��
@@@@@R
Statistical learningMachine learning
��
��
���
XXXXXXy
Fitting/Prediction modelor
Classification model
��
��
Thierry Roncalli Workshop on Big Data in Asset Management 9 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
DefinitionAnalysis of Big Data ProblemsThe Missing Factor
The big challenge (1/2)�
�
�
�Gartner’s 3V model of big data:
Volume (amount of data)Velocity (speed of data)Variety (data types)
⇓��
� 4V model:
+ Veracity/Value
The challengeHow to transform raw data into structured (informative) data?
1 Heterogeneous variables ⇒ comparable and workable data2 Heterogeneous variables ⇒ new variables3 Heterogeneous variables ⇒ valuable variables4 Heterogeneous variables ⇒ model variables
Thierry Roncalli Workshop on Big Data in Asset Management 10 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
DefinitionAnalysis of Big Data ProblemsThe Missing Factor
The big challenge (1/2)��
��-‖ ‖
�� ��Raw data�� ��Model variables
�� ��Prediction
X Y = f (X) Z = g (Y)
The most difficult step is transforming X into Y :Averaging, Averaging2, Averaging3, etc.Cutting X1 into Y1, Y2, etc.Aggregating X1, X2, etc. into Y1Creating classes from X1, X2, etc.Conditioning X1 by X2, etc.Dummy variables everywhere!
⇒ The Y variables are more important than the model g itself!
Thierry Roncalli Workshop on Big Data in Asset Management 11 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
DefinitionAnalysis of Big Data ProblemsThe Missing Factor
The missing factor
The missing factor is the construction of the database:Structure of databases are generally difficult to change;Most of the time, data are located in several databases;Missing items have a big impact;Etc.
⇒ It is not a big data issue, but a challenge for the information system(IS).
�� ��BIG DATA = DATA + . . .
Thierry Roncalli Workshop on Big Data in Asset Management 12 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
Responsibilities and data management
GoalBuilding a comprehensive database of the hedge fund industry.
One solution is to merge existing commercial databases (HFR,EurekaHedge, BarclayHedge, Morningstar, Lipper TASS, etc).
⇒ Not simple.
How to complete this database with Newcits funds?
Who manages the project?
IT?Experts?
Thierry Roncalli Workshop on Big Data in Asset Management 13 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
The chicken-and-egg problem
An illustrationJanuary 2001: Second Consultative Paper for a New Basel CapitalAccord2001: A race for building operational risk loss databases (internal &external)Frachot and Roncalli (2002, 2003) document reporting biases in lossdata and show that the computation of the value-at-risk needs datacollection thresholds.Big impact on the design of operational risk loss databases
What is the puzzle?
You need data in order to test the model (data ⇒ model)You need to test the model before designing the database (model ⇒data)
Thierry Roncalli Workshop on Big Data in Asset Management 14 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
What is a data Scientist?
1970-1990: data are managed by statisticians (light projects)1990-2010: data are managed by IT people (heavy projects)
⇒ Now, data are managed by data scientists (computer science, modeling,statistics and analytics).
Why?Quick-and-dirty → slow-and-robust“Data first, then model” is WRONGDon’t get lost in the ISLiability of the project
Thierry Roncalli Workshop on Big Data in Asset Management 15 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
Relationships between the industry and academics
FictionConsider an asset management company who has a comprehensive datasetof order books (European markets, stocks, futures, ETFs, options, etc.).
This asset manager would like to sponsor a research chair and giveacademics access to the database in order to study the ETF industry(liquidity, volatility, micro-structure feedback, systemic risks, etc.).
How to proceed?
1 FTP?2 Internships?3 Academic consultants?4 Other solutions?
Thierry Roncalli Workshop on Big Data in Asset Management 16 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
Objective
Value of the dataData means informationData may have a costData may be sensitive or strategicData may be confidential
⇒ Data, data everywhere, but data are poorly exploited.
We prefer that data are not used rather than they get out of ourcontrol.
The objective is then to control data access, maximize outputs andminimize dissemination risk.
Thierry Roncalli Workshop on Big Data in Asset Management 17 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
The CASD technology
RemarkThe following slides concerning CASD have been kindly provided by theCASD team.
For more information:Kamel Gadouche, Director of CASD, [email protected] Frachot, Head of GENES, [email protected]
Thierry Roncalli Workshop on Big Data in Asset Management 18 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
The CASD technology
Groupe des Ecoles Nationales d’Economie et Statistique (GENES)An institution of higher education and research in the field ofeconomics, finance, etc (ENSAE, ENSAI, CREST, CEPE).Under the umbrella of the French Ministry of the EconomyThe following governmental directorates are represented in theGENES Board: French Treasury, Bank of France, INSEE, Ministry ofFinance, Ministry of Industry, Ministry of Higher Education.
2007-2008: GENES started developing a remote safe center for confidentialadministrative data (Centre d’Accès Securisé aux Données or CASD).
⇒ CASD is the official platform which hosts ‘sovereign’ (and thereforeconfidential) data coming from the French administration and dedicated toresearch in economics, social sciences and public policies.
Thierry Roncalli Workshop on Big Data in Asset Management 19 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
The CASD technologyThe SD Box
Smart Card Reader Biometric Reader
Screen & On-site Setting Pushbuttons
Self-contained unit wholly dedicated to remote access
RJ45 Connectors VGA Connectors USB Connectors
Thierry Roncalli Workshop on Big Data in Asset Management 20 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
The CASD technologyHow does it work?
No sensitive data stored in SD-Box
Physically & Digitally Locked to limit file retrievals
Rigorous & Imperative Authentication
Dedicated to securely accessing sensitive data
Easy Installation
Internet Connection Required
Simple to Configurate
Screen, Keyboard & Mouse Needed
No Effects on the Rest of the IS
Standardized Units
Simple & Economical Set-Up
Very Limited Need for Assistance
Reliability (Unique, Stable & Tested Configuration)
Thierry Roncalli Workshop on Big Data in Asset Management 21 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
The CASD technologyThe infrastructure
Insertions Extractions
Data Insertions/extractions are controlled. No Internet access
from their workspace.
Hermetic Bubble A group of tightly-sealed secured servers
Hadoop cluster is available for handling BigData
SD-Boxes are the only means of access to the Bubble.
Access occurs via the Internet by
encrypted channels
User applications and processing are executed strictly within the Bubble
Sensitive data is hosted only within the Bubble
Servers &
Applications
Sensitive Data
Thierry Roncalli Workshop on Big Data in Asset Management 22 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
The CASD technologyFor what purpose?
The objectiveThe Secure Data Access Centre (CASD) has been designed to address theissues of sensitive data dissemination.
Confidential data / Official dataCollaboration with CNILa to address privacy issues
aFrench administrative regulator in charge of data privacy.
Some examplesFiscal dataHealth data (potentially 200 To of highly sensitive data)National statistics (Eurostat)Environment
⇒ 600 researchers located in Europe are working on these data.Thierry Roncalli Workshop on Big Data in Asset Management 23 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
The CASD technologyThe users
Thierry Roncalli Workshop on Big Data in Asset Management 24 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
The CASD technologySome projects
London School of Economics, London (Distribution of high incomehouseholds between France and UK)University of London, London (Labor market: working hours as anadjustment mechanism)CREST, Paris (Absenteeism of Math and French teachers and successat middle school certificate)Sciences Po, Paris (Impact of the ZUS, ZRU, ZFU systems)INSERM, Kremlin Bicetre (Determining social and economic factorsfor causes of deaths)University of Paris 1/Banque de France (Over-indebtedness andwages’ evolution in response to the economic cycle)GREQAM, Marseille (Impact of the resort to the support granted torestaurant owners)HEC, Jouy-en-Josas (Factors of success in new business start-up)
Thierry Roncalli Workshop on Big Data in Asset Management 25 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Some ExamplesManaging Data ProtectionIllustration with CASD
The CASD technology
The CASD technology is now available for private companies (banks, assetmanagers, industrials):
On-site use (data are located in the company)Hosting at GENES
http://www.casd.eu/
Thierry Roncalli Workshop on Big Data in Asset Management 26 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Machine Learning and EconometricsSome Lessons About Machine Learning
Machine Learning
Machine Learning and EconometricsSome Lessons About Machine Learning
Thierry Roncalli Workshop on Big Data in Asset Management 27 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Machine Learning and EconometricsSome Lessons About Machine Learning
What can economists learn about big data?
“I keep saying the sexy job in the next ten years will be statisticians.People think I’m joking, but who would’ve guessed that computerengineers would’ve been the sexy job of the 1990s?” (Varian, 2009).
Big Data: New Tricks for Econometrics“I believe that these methods have a lot to offer and should be morewidely known and used by economists. In fact, my standard advice tograduate students these days is go to the computer science departmentand take a class in machine learning” (Varian, 2013).
⇒ Machine learning vs econometrics (classical statistics)
Thierry Roncalli Workshop on Big Data in Asset Management 28 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Machine Learning and EconometricsSome Lessons About Machine Learning
The new framework
EconometricsThe 3 Pillars:
Economic theoryParametric modelStatistical inference
The statistical tools:Linear regression, Maximumlikelihood & GMMLogit, Probit, Tobit, etc.ARMA, VaR, Cointegration,VECM, ARCH
⇒ Parsimony, goodness of fit,theoretical consistency, etc.
Machine learningThe 3 Pillars:
Data & featuresNon-parametric modelCross-validation
The statistical tools:Shrinkage regression (ridge, lasso, Lars,elastic net, spike, slab)Ensemble learning (boosting, bagging)Random forests, neural nets, supportvector machines, deep learning, etc.
⇒ Training/test sets, sparsity, success rate,etc.
Thierry Roncalli Workshop on Big Data in Asset Management 29 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Machine Learning and EconometricsSome Lessons About Machine Learning
Econometrics = probabilityLinear regression:
y = β1x1 +β2x2 +β3x3 + . . .+u
The variables x1, x2 and x3 are given by the theory (standardtransformation = log, lag)Probability distribution: (Y ,X ) is a Gaussian vectorR2 is the appropriate statistic to assess the goodness of fitMost of studies are interested by β and not by y !F test, t statistic, p value, Wald test, etc.
Valid cases: 83 Dependent variable: PUB6Missing cases: 71 Deletion method: ListwiseTotal SS: 2027.807 Degrees of freedom: 80R-squared: 0.543 Rbar-squared: 0.531Residual SS: 927.317 Std error of est: 3.405F(2,80): 47.470 Probability of F: 0.000
Standard Prob Standardized Cor withVariable Estimate Error t-value >|t| Estimate Dep Var-------------------------------------------------------------------------------MMALE -1.083609 1.455261 -0.744615 0.459 --- ---PUB3 0.800628 0.082676 9.683964 0.000 0.741746 0.709659JOB 1.144342 0.437637 2.614821 0.011 0.200283 0.081449
Two lessons:1 Focus on parameters!2 The estimated model
is elegant
Thierry Roncalli Workshop on Big Data in Asset Management 30 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Machine Learning and EconometricsSome Lessons About Machine Learning
Machine learning = a mix of statistics, computer science,economics, etc.
Non-parametric modelThe important quantity is y !It is not a probability model (inferencestatistics doesn’t matter)Cross-validation step is very important(training set vs test set vs probe test)The solution may be not elegant
Thierry Roncalli Workshop on Big Data in Asset Management 31 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Machine Learning and EconometricsSome Lessons About Machine Learning
A famous big data competition
The Netflix competition(http://en.wikipedia.org/wiki/Netflix_Prize)Netflix is a DVD rental/VOD company.
The Netflix prize (1 MUSD) was an open competition for the bestcollaborative filtering algorithm to predict user ratings for films, based onprevious ratings.
Training data = (user, movie, date, rating)Test data = (user, movie, date)Probe data = unknown
The winners are the joint team BellKor’s Pragmatic ChaosSolution = 154 pages to describe the model(s)!
Thierry Roncalli Workshop on Big Data in Asset Management 32 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Machine Learning and EconometricsSome Lessons About Machine Learning
A new way of thinking
Face recognitionHuman approach
Face: square, oval, circleHair: length, color, typeTypical patterns: eye, mouth,lips, ears, nose, eyelash, eyelidBeard, mustache → maleLong hair → female
Machine learning approachesHolistic: vector quantization(VQ), k-NNSemi-holistic: PCA, ICA,MAA, SVMParts-based approaches: NN,Non-negative matrixfactorization (NMF) Source: Lee and Seung (1999).
Thierry Roncalli Workshop on Big Data in Asset Management 33 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Machine Learning and EconometricsSome Lessons About Machine Learning
Main discoveries
Natural language processing (pattern recognition, neural networks)
Classification (collaborative filtering, scoring, boosting, bagging,k-means)
Lasso (loss function penalization)
Thierry Roncalli Workshop on Big Data in Asset Management 34 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Machine Learning and EconometricsSome Lessons About Machine Learning
Critical look about machine learning
2001: Publication of the Elements ofStatistical Learning
Frenzy impact on quants and HFmanagers ,
Disappointing results on investmentstrategies /
Thierry Roncalli Workshop on Big Data in Asset Management 35 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Some Applications in Asset Management
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Thierry Roncalli Workshop on Big Data in Asset Management 36 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
High potentialAccording to McKinsey Global Institute, Finance & insurance “arepositioned to benefit very strongly from big data as long as barriers to itsuse can be overcome”.
Source: McKinsey Global Institute (2011).
Thierry Roncalli Workshop on Big Data in Asset Management 37 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Examples of big data problems in asset management
Measuring the impact of high-frequency tradingMeasuring the liquidityIdentification of systemic risksComputing the market portfolioIssues on collateralEtc.
Thierry Roncalli Workshop on Big Data in Asset Management 38 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Examples of big data problems in asset management
and of course building trading strategies...
Source: Lyxor Investment Conference, Grand Palais, 4 November 2014.
Thierry Roncalli Workshop on Big Data in Asset Management 39 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Lasso regression
Lasso = a variant of the ridge regression with the L1 norm penalty(Tibshirani, 1996):
β = argmin(Y −Xβ)> (Y −Xβ)
u.c.m∑
j=1|βj | ≤ τ
This problem is easy to solve using the quadratic programming frameworkand the parametrization β = β+−β− with β+ ≥ 0 and β− ≥ 0.
Thierry Roncalli Workshop on Big Data in Asset Management 40 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Lasso regressionRidge / Tikhonov regularization Lasso / L1 regularization
Advantages of the lasso approach:Sparse modelSelection model
Thierry Roncalli Workshop on Big Data in Asset Management 41 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Applications of the lasso approach
Here are some applications (Roncalli, 2014):LeverageTransaction costsTurnoverCovariance matrix regularizationInformation matrix regularizationSparse portfolio optimizationSparse Kalman filtering
Thierry Roncalli Workshop on Big Data in Asset Management 42 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Hedge fund replication
1 Estimation step at time t
RHFt =
m∑i=1
βi,tR it +εt
where RHFt is the hedge funds return, R i
t is the return of the i thfactor, βi,t is the exposure of hedge funds in the i th factor and εt is anoise process.
2 Investment step for the time period [t, t +1]
RTrackert+1 =
m∑i=1
βi,tR it+1
where RTrackert+1 is the return of the tracker.
The key issue is the selection of the factors (Roncalli and Weisang, 2009).Thierry Roncalli Workshop on Big Data in Asset Management 43 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Hedge fund replication
We consider the following set of 12 factors:
1 an equity exposure in the S&P 500 index (SPX)2 a long/short position between Russell 2000 index and S&P5 00 index (RTY)3 a long/short position between DJ Eurostoxx 50 index and S&P 500 index (SX5E)4 a long/short position between TOPIX index and S&P 500 index (TPX)5 a long/short position between MSCI EM index and S&P 500 index (MSCI EM)6 an exposure in the 10Y US Treasury bond position (UST)7 a FX position between Euro and US Dollar (EUR/USD)8 a FX position between Yen and US Dollar (JPY/USD)9 an exposure in high yield (HY)10 an exposure in emerging bonds (EMBI)11 an exposure in commodities (GSCI)12 an exposure in gold (GOLD)
Thierry Roncalli Workshop on Big Data in Asset Management 44 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Hedge fund replication
OLS trackerThe performance depends onthe definition of the universe offactorsChoice of the factors =in-sample
Lasso trackerCross-validation based on thetracking error (Test set = 20%,1000 bootstrap simulations)Choice of the factors =out-of-sample
Thierry Roncalli Workshop on Big Data in Asset Management 45 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Hedge fund replication
Figure: Number of selected factors (Lasso tracker, 2000-2012)
Thierry Roncalli Workshop on Big Data in Asset Management 46 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Portfolio optimizationThe mean-variance optimization framework
We consider the following Markowitz optimization problem:
x? (γ) = argmin 12x>Σx −γx>µ
The solution is:x? (γ) = γΣ−1µ
The important quantity in mean-variance optimization is the informationmatrix I = Σ−1.
Thierry Roncalli Workshop on Big Data in Asset Management 47 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Portfolio optimizationInterpretation of MVO portfolios
Stevens (1998) considers the following regression:
Ri,t = β0 +β>i R(−i)t +εi,t
where R(−i)t denotes the vector of asset returns Rt excluding the ith asset
and εi,t ∼N (0,s2i ). Stevens (1998) shows that:
Ii,i =1
σ2i(1−R2
i) , Ii,j =−
βi,jσ2i(1−R2
i) and x?i (γ) = γ
µi − β>i µ(−i)
s2i
where s2i = σ2i(1−R2
i). We deduce the following conclusions:
1 The better the hedge, the higher the exposure. This is why highlycorrelated assets produces unstable MVO portfolios.
2 The long-short position is defined by the sign of µi − β>i µ(−i). If theexpected return of the asset is lower than the conditional expectedreturn of the hedging portfolio, the weight is negative.
Thierry Roncalli Workshop on Big Data in Asset Management 48 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Portfolio optimizationHedging portfolios with the empirical covariance matrix
Table: OLS hedging portfolios (in %) at the end of 2006
SPX SX5E TPX RTY EM US HY EMBI EUR JPY GSCISPX 58.6 6.0 150.3 -30.8 -0.5 5.0 -7.3 15.3 -25.5SX5E 9.0 -1.2 -1.3 35.2 0.8 3.2 -4.5 -5.0 -1.5TPX 0.4 -0.6 -2.4 38.1 1.1 -3.5 -4.9 -0.8 -0.3RTY 48.6 -2.7 -10.4 26.2 -0.6 1.9 0.2 -6.4 5.6EM -4.1 30.9 69.2 10.9 0.9 4.6 9.1 3.9 33.1US HY -5.0 53.5 160.0 -18.8 69.5 95.6 48.4 31.4 -211.7EMBI 10.8 44.2 -102.1 12.3 73.4 19.4 -5.8 40.5 86.2EUR -3.6 -14.7 -33.4 0.3 33.8 2.3 -1.4 56.7 48.2JPY 6.8 -14.5 -4.8 -8.8 12.7 1.3 8.4 50.4 -33.2GSCI -1.1 -0.4 -0.2 0.8 10.7 -0.9 1.8 4.2 -3.3si 0.3 0.7 0.9 0.5 0.7 0.1 0.2 0.4 0.4 1.2R2
i 83.0 47.7 34.9 82.4 60.9 39.8 51.6 42.3 43.7 12.1
Thierry Roncalli Workshop on Big Data in Asset Management 49 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Portfolio optimizationHedging portfolios with the L1 covariance matrix
Table: Lasso hedging portfolios (in %) at the end of 2006
SPX SX5E TPX RTY EM US HY EMBI EUR JPY GSCISPX 49.2 146.8 5.0 -3.2SX5E 5.1 32.3 3.2TPX 37.4 0.8 -3.1RTY 46.8 -3.1 10.4 1.9EM 25.0 61.3 6.5 0.8 4.3 2.6 22.4US HY 82.2 65.9 93.8 19.1 20.7EMBI 24.9 -70.3 71.6 17.5 33.8EUR -23.7 33.6 1.4 51.9 13.8JPY 9.3 1.1 7.6 48.9GSCI 10.2 -0.3 1.6 2.9si 0.3 0.7 1.0 0.5 0.7 0.1 0.2 0.4 0.4 1.3R2
i 82.0 44.9 33.9 82.1 60.4 38.0 51.5 39.7 41.5 7.8
Thierry Roncalli Workshop on Big Data in Asset Management 50 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Portfolio optimizationBacktest of the S&P 100 minimum-variance strategy
Bruder et al. (2013) consider the following backtest:Universe = S&P 100January 2000 - December 2011Monthly rebalancing
Table: Performance of OLS-MV and Lasso-MV portfolios
µ (x) σ (x) SR (x) MDD TurnoverOLS-MV 3.60% 14.39% 0.25 −39.71% 19.4Lasso-MV 5.00% 13.82% 0.36 −35.42% 5.9
RemarkIn December 2011, Google is hedged by 99 stocks if we consider theOLS-MV portfolio. Using the L1 norm, Google is hedged by 13 stocksa.
aThey are Boeing (4.6%), United technologies (1.1%), Schlumberger (1.8%),Williams cos. (1.8%), Microsoft (13.7%), Honeywell intl. (2.7%), Caterpillar (0.9%),Apple (25.0%), Mastercard (2.5%), Devon energy (2.9%), Nike (1.2%), Amazon(6.7%) and Apache (8.7%).
Thierry Roncalli Workshop on Big Data in Asset Management 51 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Nonnegative matrix factorizationLet A be a nonnegative matrix m×p. We define the NMF decompositionas follows:
A = BC
where B and C are two nonnegative matrices with respective dimensionsm×n and n×p.
In the case where the objective function is to minimize the Frobeniousnorm, Lee and Seung (2001) propose to use the multiplicative updatealgorithm:
B(t+1) = B(t)�(AC>(t)
)�(B(t)C(t)C>(t)
)C(t+1) = C(t)�
(B>(t+1)A
)�(B>(t+1)B(t+1)C(t)
)where � and � are respectively the element-wise multiplication anddivision operators. We have B = B(∞) and C = C(∞).
Thierry Roncalli Workshop on Big Data in Asset Management 52 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Interpretation of NMFWe consider the linear factor model Yt = βFt + εt where Yt is a n×1vector and Ft is a m×1 vector. We note Y = (Y1, . . . ,YT ) andF = (F1, . . . ,FT ).
Principal component analysisΣ = VΛV>
where Σ is the covariance matrix of Y = (Y1, . . . ,YT ).Nonnegative matrix factorization:
A = BCwhere A = Y>, B = β and C = F>.
⇒ We may interpret B as a matrix of weights and C as a matrix of factors.
RemarkIf A = BC, then A> = C>B>. The choice of the parametrization dependson the nature of the problem: analysis by observations (e.g. by tradingdates) or by variables (e.g. by stocks).
Thierry Roncalli Workshop on Big Data in Asset Management 53 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Differences between NMF and PCA
PCA: positive and negative weights = long-short portfoliosNMF: positive weights = long-only portfolios ⇒ regularization &noise reduction
⇒ This implies that the cumulated variance explained by the first j PCAfactors is always higher than the cumulated variance explained by the firstj NMF factors.
An example with 4 stocks.
Thierry Roncalli Workshop on Big Data in Asset Management 54 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Sensitivity to bull/bear markets
Figure: NMF decomposition of the first 100 largest stocks of the EURO STOXXindex
Thierry Roncalli Workshop on Big Data in Asset Management 55 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Stock classificationNMF may consider heterogeneous data (prices, volumes, Fama-French riskfactors, etc.).
K -meansclassifier
clustersector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 total0001 5 51000 3 6 1 102000 4 2 1 7 1 1 163000 5 1 3 1 7 174000 4 1 55000 4 2 3 1 106000 6 67000 6 68000 4 8 3 4 199000 6 6total 35 9 8 9 19 2 3 3 8 4 100
NMF classifier
clustersector #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 total0001 1 2 0 1 1 51000 4 1 1 2 1 1 102000 3 1 3 1 1 2 4 1 163000 5 3 1 4 2 1 1 174000 1 3 1 55000 1 2 1 2 4 106000 3 3 67000 1 4 1 68000 3 2 1 3 4 6 199000 1 2 1 2 6total 17 7 14 10 10 9 7 4 12 10 100
Sectors 6= right clusters ⇒ Some sectors are more heterogeneous thanothers (e.g. Financials: Banks, Insurance, Real Estate, Financial Services)
Thierry Roncalli Workshop on Big Data in Asset Management 56 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
A big data problem?Objective: measuring the liquidity of ETFs using limit order books inEuropean markets.Difficulties:
24 European exchangesCross-listingNumber of ETFs
Table: EURO STOXX 50 ETFs, 2012
ETF Totalcard(E) card(T )
#1 27753229 10580#2(a) 26356092 19464#2(b) 15684679 12621#3(a) 52489407 87111#3(b) 15254878 42910#4 50423195 95539Index 2033090600 107720987
Thierry Roncalli Workshop on Big Data in Asset Management 57 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Computing the liquidity spread
Figure: Boxplot2 of the liquidity spread (EURO STOXX 50, 2012)
2For each notional, the ends of the whiskers correspond to the minimum andmaximum values. The bottom and top of the box are the 1st and 3rd quartiles whereasthe median correspond to the line inside the box.
Thierry Roncalli Workshop on Big Data in Asset Management 58 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Computing the liquidity spread
Table: Median liquidity spread of MSCI World ETFs (in bps)
N (in MEUR) #1 #2 #3 #4 #5 #6 #7 #8 #90.0 12 9 12 9 13 40 32 22 350.1 13 11 13 9 17 41 33 23 350.3 14 11 13 10 19 45 36 23 620.5 16 12 14 10 22 48 42 24 910.8 23 13 15 13 25 51 51 25 1451.0 26 14 15 14 27 69 61 26 1811.3 30 15 16 15 30 98 74 27 2181.5 36 16 17 16 73 136 92 29 2721.8 43 17 17 19 111 172 110 32 3262.0 48 18 18 20 130 194 123 33 363�� ��Liquidity spread may be highly different than the bid-ask spread.
Thierry Roncalli Workshop on Big Data in Asset Management 59 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Measuring the liquidity improvement
The liquidity ratio is defined as: LRt (N) =SIndex
t (N)
SETFt (N)
Figure: Boxplot of the intraday liquidity ratio LRt (N) (EURO STOXX 50)
Thierry Roncalli Workshop on Big Data in Asset Management 60 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
What works / what doesn’t
Bond Stock Trend Mean Index HF Stock TechnicalScoring Picking Filtering Reverting Tracking Tracking Class. Analysis
Lasso , , , / ,NMF , /
Boosting , , ,Bagging , , ,
Random forests , / /Neural nets , / /
SVM , / / / /Sparse Kalman / ,
K-NN /K-means , ,
Testing protocols3 , , , , ,
, = encouraging results
/ = disappointing results
3Cross-validation, training/test/probe sets, K-fold, etc.Thierry Roncalli Workshop on Big Data in Asset Management 61 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Backtesting and Sharpe ratio
We consider a universe of n assets. Let µ and Σ be the vector ofexpected returns and the covariance matrix of asset returns. We notex = (x1, . . . ,xn) the portfolio.The tangency portfolio is:
x? =Σ−1 (µ− r1)
1>Σ−1 (µ− r1)
where r is the risk-free rate.If we consider the ex-post tangency portfolio of n correlated Brownianmotions (W1 (t) , . . . ,Wn (t)), we can show that:
sh (x?;T )∼√χ2n√T
where T is the time horizon used to estimate µ and Σ.
Thierry Roncalli Workshop on Big Data in Asset Management 62 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
Backtesting and Sharpe ratio
Figure: Sharpe ratio of Markowitz portfolios
It is easy to find a blue line above the red line...
Thierry Roncalli Workshop on Big Data in Asset Management 63 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
The example of the factor zoo
Factor investingFactor investing is the second form of smart beta. It consists in investingin common risk factors (or new betas) that explain the variance ofexpected returns.
Examples of risk factors are: size, value, momentum, quality, short-termreversal, low beta, low volatility, liquidity, etc.
Thierry Roncalli Workshop on Big Data in Asset Management 64 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
The example of the factor zoo
Figure: Harvey et al. (2014)
“Now we have a zoo of new factors” (Cochrane, 2011).
Thierry Roncalli Workshop on Big Data in Asset Management 65 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
The example of the factor zoo
“Standard predictive regressions fail to reject the hypothesis thatthe party of the U.S. President, the weather in Manhattan,global warming, El Niño, sunspots, or the conjunctions of theplanets, are significantly related to anomaly performance. Theseresults are striking, and quite surprising. In fact, some readersmay be inclined to reject some of this paper’s conclusions solelyon the grounds of plausibility. I urge readers to consider thisoption carefully, however, as doing do so entails rejecting thestandard methodology on which the return predictabilityliterature is built.”(Novy-Marx, 2014).
⇒ Do you think that they are risk factors or risk premia?
Thierry Roncalli Workshop on Big Data in Asset Management 66 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Lasso ApproachNonnegative Matrix FactorizationMeasuring the Liquidity of ETFsThe Art of Backtesting (or Data Mining)
False discoveries in biology and medicine
“90% of the world’s data was created in the last two years”(IBM).
Thierry Roncalli Workshop on Big Data in Asset Management 67 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Conclusion
How Big Data can impact ESMA?
Thierry Roncalli Workshop on Big Data in Asset Management 68 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
General I
The Economist.Data, Data Everywhere.February 2010.
McAfee A., Brynjolfsson E.Big Data: The Management Revolution.Harvard Business Review, 2012.
McKinsey Global Institute.Big Data: The Next Frontier for Innovation, Competition and Productivity.2011.
McKinsey Global Institute.Disruptive Technologies: Advances that will Transform Life, Business, and theGlobal Economy.2013.
Thierry Roncalli Workshop on Big Data in Asset Management 69 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
General II
Nature.Big Data.September 2008.
Science.Dealing with Data.February 2011.
Varian H.Big Data: New Tricks for Econometrics.SSRN, 2013.
Thierry Roncalli Workshop on Big Data in Asset Management 70 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
The Netflix prize
Koren Y.The BellKor Solution to the Netflix Grand Prize.www.netflixprize.com/assets/GrandPrize2009_BPC_BellKor.pdf, 2009.
Töscher A., Jahrer M., Bell R.The BigChaos Solution to the Netflix Grand Prize.www.netflixprize.com/assets/GrandPrize2009_BPC_BigChaos.pdf, 2009.
Piotte M., Chabbert M.The Pragmatic Theory solution to the Netflix Grand Prize.www.netflixprize.com/assets/GrandPrize2009_BPC_PragmaticTheory.pdf,2009.
Thierry Roncalli Workshop on Big Data in Asset Management 71 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Statistical learning I
Breiman L.Bagging Predictors.Machine Learning, 24, 1996.
Efron B., Hastie T., Johnstone I., Tibshirani R.Least Angle Regression.Annals of Statistics, 32(2), 2004.
Freund Y.Boosting a Weak Learning Algorithm by Majority.Information and Computation, 121(2), 1995.
Freund, Y. Shapire R.E.Experiments with a New Boosting Algorithm.Machine Learning: Proceedings of the Thirteenth International Conference,Morgan Kaufman, 1996.
Thierry Roncalli Workshop on Big Data in Asset Management 72 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Statistical learning II
Friedman J.H.Multivariate Adaptive Regression Splines.Annals of Statistics, 19(1), 1991.
Friedman J.H., Hastie T., Tibshirani R.Additive Logistic Regression: A Statistical View of Boosting.Annals of Statistics, 28(2), 2000.
Guyon I., Elisseeff A.An Introduction to Variable and Feature Selection.Journal of Machine Learning Research, 3, 2003.
Hastie T., Tibshirani R., Friedman J.The Elements of Statistical Learning.Second edition, Springer, 2009.
Lee D.D., Seung H.S.Learning the Parts of Objects by Non-negative Matrix Factorization.Nature, 401, 1999.
Thierry Roncalli Workshop on Big Data in Asset Management 73 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Statistical learning III
Shapire R.E.The Strength of Weak Learnability.Machine Learning, 5(2), 1990.
Tibshirani R.Regression Shrinkage and Selection via the Lasso.Journal of the Royal Statistical Society B, 58(1), 1996.
Tropp J.A., Gilbert A.C.Signal Recovery from Random Measurements via Orthogonal Matching Pursuit.IEEE Transactions on Information Theory, 53(12), 2007.
Vapnik V.Statistical Learning Theory.John Wiley and Sons, 1998.
Zou H., Hastie T., Tibshirani R.On the “Degrees of Freedom" of the Lasso.Annals of Statistics, 35(5), 2007.
Thierry Roncalli Workshop on Big Data in Asset Management 74 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Backtesting I
Cochrane J.H. (2011).Presidential Address: Discount Rates.Journal of Finance, 66(4), pp. 1047-1108.
Harvey C.R., Liu Y. and Zhu H. (2014).. . . and the Cross-Section of Expected Returns.SSRN, www.ssrn.com/abstract=2249314
Novy-Marx R. (2014).Predicting Anomaly Performance with Politics, The Weather, Global Warming,Sunspots, and The Stars.Journal of Financial Economics, 112(2), pp. 137-146.
Cazalet Z. and Roncalli T. (2014).Facts and Fantasies About Factor Investing.Lyxor Research Paper, 112 pages.
Thierry Roncalli Workshop on Big Data in Asset Management 75 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Finance I
d’Aspremont A., Luss R.Predicting Abnormal Returns From News Using Text Classification.Quantitative Finance, 2012.
d’Aspremont A., Bach F., El Ghaoui L.Optimal Solutions for Sparse Principal Component Analysis.Journal of Machine Learning Research, 9, 2008.
Basak D., Pal S., Patranabis D.J.Support Vector Regression.Neural Information Processing, 11, 2007.
Belloni A., Chen D., Chernozhukov V., Hansen C.Sparse Models and Methods for Optimal Instruments with An Application toEminent Domain.Econometrica, 80(6), 2012.
Thierry Roncalli Workshop on Big Data in Asset Management 76 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Finance II
Brodie J., Daubechies I., De Mol C., Giannone D., Loris I.Sparse and Stable Markowitz Portfolios.Proceedings of the National Academy of Sciences, 106(30), 2009.
Bruder B., Dao T-L., Richard J-C., Roncalli T.Trend Filtering Methods for Momentum Strategies.Lyxor White Paper Series, 8, 2011.
Bruder B., Gaussel N., Richard J-C., Roncalli T.Regularization of Portfolio Allocation.Lyxor White Paper Series, 10, 2013.
Carrasco M.A Regularization Approach to the Many Instruments Problem.Journal of Econometrics, 170(2), 2012.
Carrasco M., Noumon N.Optimal Portfolio Selection using Regularization.Working Paper, 2012.
Thierry Roncalli Workshop on Big Data in Asset Management 77 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Finance III
DeMiguel V., Garlappi L., Nogales F.J., Uppal R.A Generalized Approach to Portfolio Optimization: Improving Performance byConstraining Portfolio Norms.Management Science, 55(5), 2009.
Fan J., Zhang J., Yu K.Vast Portfolio Selection with Gross-exposure Constraints.Journal of the American Statistical Association, 107(498), 2012.
Giamouridis D., Paterlini S.Regular(ized) Hedge Fund Clones.Journal of Financial Research, 33(3), 2010.
Kalaba R., Tesfatsion L.Time-varying Linear Regression via Flexible Least Squares.Computers & Mathematics with Applications, 17(8), 1989.
Thierry Roncalli Workshop on Big Data in Asset Management 78 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Finance IV
Kim S-J., Koh K., Boyd S., Gorinevsky D.`1 Trend Filtering.SIAM Review, 51(2), 2009.
Montana G., Triantafyllopoulos K., Tsagaris T.Flexible Least Squares for Temporal Data Mining and Statistical Arbitrage.Expert Systems with Applications, 36(2), 2009.
Roncalli T.Introduction to Risk Parity and Budgeting.Chapman & Hall, 2013.
Roncalli T., Weisang G.Tracking Problems, Hedge Fund Replications and Alternative Beta.Journal of Financial Transformation, 31, 2011.
Scherer B.Portfolio Construction & Risk Budgeting.Third edition, Risk Books, 2007.
Thierry Roncalli Workshop on Big Data in Asset Management 79 / 80
What Does Big Data Mean?Data Management Issues
Machine LearningSome Applications in Asset Management
Conclusion
Finance V
Sonneveld P., van Kan J.J.I.M., Huang X., Oosterlee C.W.Nonnegative Matrix Factorization of a Correlation Matrix.Linear Algebra and its Applications, 431(3-4), 2009.
Stevens G.V.G.On the Inverse of the Covariance Matrix in Portfolio analysis.Journal of Finance, 53(5), 1998.
Wu L., Yang Y., Liu H.Nonnegative-Lasso and Application in Index Tracking.Computational Statistics and Data Analysis, 70, 2013.
Yen Y.M., Yen T.J.Solving Norm constrained Portfolio Optimization via Coordinate-wise DescentAlgorithms.Computational Statistics and Data Analysis, 2013.
Thierry Roncalli Workshop on Big Data in Asset Management 80 / 80