making data science pay: mastering the challenges of ...€¦ · 13/06/2018 · making data...
TRANSCRIPT
Making data science pay:
Mastering the challenges of analytics operations
Michel DebicheGuest Analyst, STAC
Copyright © 2018 Securities Technology Analysis Center LLC
®
Cognitive Reset Part 1: Window management technology
Copyright © 2018 Securities Technology Analysis Center LLC
®
Cognitive Reset Part 2: Window management context
Copyright © 2018 Securities Technology Analysis Center LLC
®
Investment Process
Gather
information
Digest
information
Make
decisionsExecute
decisions
Actions
Results
Info
Copyright © 2018 Securities Technology Analysis Center LLC
®
Pressures
• Scale
• Volume
• Variety
• Density
• Computational complexity
• Velocity of innovation
• Cost
• Regulation
Copyright © 2018 Securities Technology Analysis Center LLC
®
Dimensions of scale
• Scale• Volume
• Variety
• Kinds of data: structured, unstructured, text, binary
• Data entities: Millions of time series
• Density
• Transactions in microseconds
• Simultaneous transactions on multiple channels
• Computational complexity
• NLP, Image processing, AI
• Velocity of innovation
• Competitive pressures: New datasets, new models, new technologies
• Evolving opportunities
• Feedback loops
Copyright © 2018 Securities Technology Analysis Center LLC
®
Responses
• DevOps
• Data Lake
• Open Source
• Big Data
• Data Science
• AI
Copyright © 2018 Securities Technology Analysis Center LLC
®
Issues
• Model Factories: Hundreds of models with nowhere to go
• Redundant engineering
• Open source interoperation and upgrade nightmares
• Murky, expensive data lakes contributing little value
• Skills mismatches
• User resistance to new technologies
• Data lineage, audit trails
Copyright © 2018 Securities Technology Analysis Center LLC
®
Goals
• Maximize returns
• Minimize risk
• Market risk
• Model risk
• Systems risk
• Data risk
• Operational risk (people)
• Maximize productivity
Copyright © 2018 Securities Technology Analysis Center LLC
®
Principles
• Optimize use of resources
• People
• Time
• Data
• Technology
• End-to-end process design
• Agility
• Constant improvement
Copyright © 2018 Securities Technology Analysis Center LLC
®
Industrial Engineering
• Similar challenges and goals
• Eventually came to software engineering as DevOps
• Need to carry paradigm over to full data-to-decision pipeline
• Why is it so hard?
Copyright © 2018 Securities Technology Analysis Center LLC
®
DevOps: Elegant Concept
Copyright © 2018 Securities Technology Analysis Center LLC
®
DevOps: More complicated to implement
Copyright © 2018 Securities Technology Analysis Center LLC
®
So let’s think about QuantOps™
Copyright © 2018 Securities Technology Analysis Center LLC
®
Investment Process
Gather
information
Digest
information
Make
decisionsExecute
decisions
Actions
Results
Info
Copyright © 2018 Securities Technology Analysis Center LLC
®
Investment Process, Expanded
Research data, develop and test models
Devops for data prep, analytical
functions, API
Production pipeline: data to
curated feature
Model scoring engine
Model testing manager
Data
Core
Research Data
Feature
updates
Test backlog
Data
Model suite
updates
Features
Features
Scores
Results
Ad hoc data
ingestion
Ideas
Function
library
Feature
definition
Model
definition
Model
Repository
Feature
preparation
code
Function
definition
Results
Results
Results
Copyright © 2018 Securities Technology Analysis Center LLC
®
1
7
A Unifying Paradigm: QuantOps as a DAG
Copyright © 2018 Securities Technology Analysis Center LLC
®
A Unifying Paradigm: QuantOps as a DAG
• Standardize the connections
• Carefully define the data APIs
• Then all the technology is pluggable
• Makes it possible to efficiently address:
• Orchestration
• Data lineage
• Monitoring
• Audit trails
• Automated code generation and testing
Copyright © 2018 Securities Technology Analysis Center LLC
®
Where does STAC fit in?
• Implementing analytics ops is a big commitment with big payoffs
• Biggest challenge: effective communication, change management
• Design needs to be process-oriented and based on user needs
• Technology needs to respond to process requirements, not vice versa
• Emerging STAC roles:
• Facilitate dialogue & training on analytics ops challenges & best practices
• Accelerate technology selection based on community-source standards
driven by process-oriented model of the investment process
• Let us know if you want to be involved!