using mongodb as a tick database

Post on 15-Jan-2015

4.996 Views

Category:

Technology

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Learn how you can enjoy the developer productivity, low TCO, and unlimited scale of MongoDB as a tick database for capturing, analyzing, and taking advantage of opportunities in tick data. This presentation will illustrates how MongoDB can easily and quickly store variable data formats, like top and depth of book, multiple asset classes, and even news and social networking feeds. It will explore aggregating and analyzing tick data in real-time for automated trading or in batch for research and analysis and how auto-sharding enables MongoDB to scale with commodity hardware to satisfy unlimited storage and performance requirements.

TRANSCRIPT

Sr. Solution Architect, MongoDB

Matt Kalan

How Capital Markets Firms Use MongoDB as a Tick Database

Agenda

• MongoDB One Slide Overview

• FS Use Cases

• Writing/Capturing Market Data

• Reading/Analyzing Market Data

• Performance, Scalability, & High Availability

• Q&A

MongoDB Technical Benefits

Horizontally Scalable-Sharding

Agile &Flexible

High Performance-Indexes-RAM

Application

HighlyAvailable-Replica Sets

{ name: “John Smith”, date: “2013-08-01”), address: “10 3rd St.”, phone: [ { home: 1234567890}, { mobile: 1234568138} ] }

db.cust.insert({…})db.cust.find({ name:”John Smith”})

Most Common FS Use Cases

1. Tick Data Capture & Analysis

2. Reference Data Management

3. Risk Analysis & Reporting

4. Trade Repository

5. Portfolio Reporting

Writing and Capturing Tick Data

Tick Data Capture & Analysis Requirements

• Capture real-time market data (multi-asset, top of book, depth of book, even news)

• Load historical data

• Aggregate data into bars, daily, monthly intervals

• Enable queries & analysis on raw ticks or aggregates

• Drive backtesting or automated signals

Tick Data Capture & Analysis –Why MongoDB?

• High throughput => can capture real-time feeds for all

products/asset classes needed

• High scalability => all data and depth for all historical time

periods can be captured

• Flexible & Range-based indexing => fast querying on time

ranges and any fields

• Aggregation Framework => can shape raw data into aggregates

(e.g. ticks to bars)

• Map-reduce capability (Native MR or Hadoop Connector) =>

batch analysis looking for patterns and opportunities

• Easy to use => native language drivers and JSON expressions that

you can apply for most operational database needs as well

• Low TCO => Low software license cost and commodity hardware

Trades/metrics

High Level Trading Architecture

Feed Handler

Exchanges/Markets/Brokers

Capturing Application

Low Latency Applications

Higher Latency Trading

Applications

Backtesting and Analysis Applications

Market Data

Cached Static & Aggregated Data

News & social networking

sources

Orders

Orders

Trades/metrics

High Level Trading Architecture

Feed Handler

Exchanges/Markets/Brokers

Capturing Application

Low Latency Applications

Higher Latency Trading

Applications

Backtesting and Analysis Applications

Market Data

Cached Static & Aggregated Data

News & social networking

sources

Orders

Orders

Data Types• Top of book• Depth of book• Multi-asset• Derivatives (e.g.

strips)• News (text, video)• Social Networking

{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bidPrice: 55.37,offerPrice: 55.58,bidQuantity: 500,offerQuantity: 700

}

> db.ticks.find( {symbol: "DIS",

bidPrice: {$gt: 55.36} } )

Top of Book [e.g. equities]

{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bidPrices: [55.37, 55.36, 55.35],offerPrices: [55.58, 55.59, 55.60],bidQuantities: [500, 1000, 2000],offerQuantities: [1000, 2000, 3000]

}

> db.ticks.find( {bidPrices: {$gt: 55.36} } )

Depth of Book

{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),bids: [

{price: 55.37, amount: 500}, {price: 55.37, amount: 1000}, {price: 55.37, amount: 2000} ],

offers: [ {price: 55.58, amount: 1000}, {price: 55.58, amount: 2000}, {price: 55.59, amount: 3000} ]

}

> db.ticks.find( {"bids.price": {$gt: 55.36} } )

Or However Your App Uses It

{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS",timestamp: ISODate("2013-02-15 10:00"),spreadPrice: 0.58leg1: {symbol: “CLM13, price: 97.34}leg2: {symbol: “CLK13, price: 96.92}

}

db.ticks.find( { “leg1” : “CLM13” },

{ “leg2” : “CLK13” },

{ “spreadPrice” : {$gt: 0.50 } } )

Synthetic Spreads

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

symbol : "DIS",

timestamp: ISODate("2013-02-15 10:00"),

title: “Disney Earnings…”

body: “Walt Disney Company reported…”,

tags: [“earnings”, “media”, “walt disney”]

}

News

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

timestamp: ISODate("2013-02-15 10:00"),

twitterHandle: “jdoe”,

tweet: “Heard @DisneyPictures is releasing…”,

usernamesIncluded: [“DisneyPictures”],

hashTags: [“movierumors”, “disney”]

}

Social Networking

{ _id : ObjectId("4e2e3f92268cdda473b628f6"),symbol : "DIS”,openTS: Date("2013-02-15 10:00"),closeTS: Date("2013-02-15 10:05"),open: 55.36,high: 55.80,low: 55.20,close: 55.70

}

Aggregates (bars, daily, etc)

Querying/Analyzing Tick Data

Architecture for Querying Data

Higher Latency Trading

Applications

Backtesting Applications

• Ticks• Bars• Other

analysis

Research & Analysis

Applications

// Compound indexes

> db.ticks.ensureIndex({symbol: 1, timestamp:1})

// Index on arrays

>db.ticks.ensureIndex( {bidPrices: -1})

// Index on any depth

> db.ticks.ensureIndex( {“bids.price”: 1} )

// Full text search

> db.ticks.ensureIndex ( {tweet: “text”} )

Index Any Fields: Arrays, Nested, etc.

// Ticks for last month for media companies

> db.ticks.find({ symbol: {$in: ["DIS", “VIA“, “CBS"]}, timestamp: {$gt: new ISODate("2013-01-01")}, timestamp: {$lte: new ISODate("2013-01-31")}})

// Ticks when Disney’s bid breached 55.50 this month

> db.ticks.find({ symbol: "DIS",

bidPrice: {$gt: 55.50}, timestamp: {$gt: new ISODate("2013-02-01")}})

Query for ticks by time; price threshold

Analyzing/Aggregating Options

• Custom application code– Run your queries, compute your results

• Aggregation framework– Declarative, pipeline-based approach

• Native Map/Reduce in MongoDB– Javascript functions distributed across cluster

• Hadoop Connector– Offline batch processing/computation

//Aggregate minute bars for Disney for February

db.ticks.aggregate( { $match: {symbol: "DIS”, timestamp: {$gt: new ISODate("2013-02-01")}}}, { $project: { year: {$year: "$timestamp"}, month: {$month: "$timestamp"}, day: {$dayOfMonth: "$timestamp"}, hour: {$hour: "$timestamp"}, minute: {$minute: "$timestamp"}, second: {$second: "$timestamp"}, timestamp: 1, price: 1}}, { $sort: { timestamp: 1}}, { $group : { _id : {year: "$year", month: "$month", day: "$day", hour: "$hour", minute: "$minute"}, open: {$first: "$price"}, high: {$max: "$price"}, low: {$min: "$price"}, close: {$last: "$price"} }} )

Aggregate into min bars

//then count the number of down bars

{ $project: { downBar: {$lt: [“$close”, “$open”] }, timestamp: 1, open: 1, high: 1, low: 1, close: 1}}, { $group: {

_id: “$downBar”,

sum: {$sum: 1}}} })

Add Analysis on the Bars

var mapFunction = function () {

emit(this.symbol, this.bidPrice);

}

var reduceFunction = function (symbol, priceList) {

return Array.sum(priceList);

}

> db.ticks.mapReduce(

map, reduceFunction, {out: ”tickSums"})

MapReduce Example: Sum

Process Data in Hadoop

• MongoDB’s Hadoop Connector

• Supports Map/Reduce, Streaming, Pig

• MongoDB as input/output storage for Hadoop jobs– No need to go through HDFS

• Leverage power of Hadoop ecosystem against operational data in MongoDB

Performance, Scalability, and High Availability

Why MongoDB Is Fast and Scalable

Better data locality

Relational MongoDB

In-Memory Caching

Auto-Sharding

Read/write scaling

Auto-sharding for Horizontal Scale

mongod

Read/Write Scalability

Key RangeSymbol: A…Z

Auto-sharding for Horizontal Scale

Read/Write Scalability

mongod mongod

Key RangeSymbol: A…J

Key RangeSymbol: K…Z

Sharding

mongod mongodmongod mongod

Read/Write Scalability

Key RangeSymbol: A…F

Key RangeSymbol: G…J

Key RangeSymbol: K…O

Key RangeSymbol: P…Z

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

MongoS MongoS MongoS

Key RangeSymbol: A…F, Time

Key RangeSymbol: G…J,Time

Key RangeSymbol: K…O,Time

Key RangeSymbol: P…Z, Time

Application

Summary

• MongoDB is high performance for tick data

• Scales horizontally automatically by auto-sharding

• Fast, flexible querying, analysis, & aggregation

• Dynamic schema can handle any data types

• MongoDB has all these features with low TCO

• We can support you with anything discussed

Questions?

Sr. Solution Architect, MongoDB

Matt Kalan

#ConferenceHashtag

Thank You

top related